Questions & Answers

Using producer with pipeline

0 votes
asked Jan 8 by amfooladgar (200 points)

I successfully installed the PNDA via Openstack. All the components are green and working with no error. I want to use "Cisco XR streaming telemetry" pipeline plugin as the producer. I added below config to pipeline.conf 

[kafka]

stage = xport_output

type = kafka

encoding = json

brokers = <broker-ip>:9092

topic = telemetry

datachanneldepth = 1000

# logdata = on

and based on the info on pipeline.log, messages are being sent and received by PNDA kafka.

INFO[2018-01-08 16:00:20.872725] kafka producer configured                     brokers=[XXXX:9092] name=mykafka requiredAcks=0 streamSpec=&{2 <nil>} tag=pipeline topic=telemetry
INFO[2018-01-08 16:00:23.305733] TCP server accepted connection                encap=st keepalive=0s local="XXXX:57500" name=testbed remote="XXXX:42764" tag=pipeline

 I also created a topic "telemetry" for that and based on the statistics on the kafka, it is being received properly.

From this point though, I am not sure how to configure/check the consumer. My question is which steps should be followed for making a UI for all the statistics being sent from pipeline to kafka.

Thanks

1 Answer

0 votes
answered Jan 16 by donaldh (140 points)

There are two parts to your question that I will answer separately:

  • how to get telemetry into PNDA
  • how do do something useful with it once it is in PNDA

Getting Telemetry Into PNDA

I am assuming that when you say "pipeline" you are referring to https://github.com/cisco/bigmuddy-network-telemetry-pipeline which is successfully forwarding JSON telemetry messages to PNDA's Kafka bus. PNDA expects data to be encapsulated in an AVRO message format and will ignore data that is not AVRO encoded. I'm guessing that if you check the PNDA quarantine directory in HDFS then you might see your data getting stored there. The HDFS file browser URL will be something like this:

http://pnda-hadoop-mgr-1:8888/filebrowser/#/user/pnda/PNDA_datasets/quarantine

This blog post gives a bit of an overview about getting data into PNDA. It describes the AVRO schema that is used and provides links to a variety of tools that can be used for data ingest to PNDA:

https://pndablog.com/2017/03/10/getting-data-into-pnda/

The pipeline tool just sends JSON text to Kafka, so another tool needs to be used to perform the AVRO encapsulation. The simplest off the shelf solution is to use Logstash to receive messages from Kafka, and then resend them to Kafka as an AVRO encoded message. This is an example logstash.conf that will receive telemetry messages from Kafka on localhost and then send AVRO encapsulated messages to Kafka on pnda-kafka-0. Note that this requires the pnda-avro codec which you can get from: https://github.com/pndaproject/logstash-codec-pnda-avro

input {
  kafka {
    topic_id => "telemetry.raw"
  }
}

filter {
  mutate {
    add_field => {
      "src" => "telemetry"
      "host" => "%{Source}"
    }
  }
  mutate {
    rename => { "[Source]" => "[message][Source]" }
    rename => { "[Telemetry]" => "[message][Telemetry]" }
    rename => { "[Rows]" => "[message][Rows]" }
  }
  json_encode { source => "message" }
}

output {
  # stdout { codec => rubydebug }
  kafka {
    topic_id => "telemetry.avro"
    bootstrap_servers => "pnda-kafka-0:9092"
    value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"
    codec => pnda-avro { schema_uri => "/path/to/pnda.avsc" }
  }
}

Doing Useful Stuff With Data in PNDA

Once you have your data in PNDA, the next step is working with the data. There are sample PNDA applications available here https://github.com/pndaproject/example-applications that are good templates for building PNDA applications.

You could start by exploring the data in HDFS using a Jupyter Notebook. You can launch Jupyter from the PNDA console and there should be sample notebooks that show how to get started. From a Jupyter Notebook you can prototype your data analytics and you have access to Python charting tools. You could also use your Jupyter Notebook to write timeseries data to OpenTSDB.

https://github.com/pndaproject/example-applications/tree/develop/jupyter-notebooks

Alternatively you could start to build a Spark application, based on the kafka-spark-opentsdb example https://github.com/pndaproject/example-applications/tree/develop/kafka-spark-opentsdb to transform the telemetry messages into timeseries datasets in OpenTSDB. This would enable you to build Grafana dashboards that display timeseries charts for metrics that you extract from the telemetry messages. Here is a blog post that goes into more detail about trasforming incoming data from Kafka into timeseries data in OpenTSDB:

https://pndablog.com/2017/05/24/working-with-time-series-data-in-pnda/

Hope this helps.

...