Kafka is the "front door" of PNDA, allowing the ingest of high-velocity data streams, distributing data to all interested consumers and decoupling data sources from data processing applications and platform clients.
It is normally not necessary to create a new producer to start acquiring network data as there are a growing number of data plugins that have already been integrated with PNDA. It’s not always clear which plugins to use for which data types, hence we’ve summarized some common combinations in the table at the bottom of this page.
If you have have other data sources you want to integrate with PNDA it’s easy enough to write a PNDA producer – see http://pnda.io/pnda-guide/producer/producer.html
PNDA adopts a schema-on-read approach to data processing, so all data directed towards the platform is stored in as close to its raw form as possible. The only requirement is that each datum is encoded as a simple Avro schema fragment that adds the logical & network source of the data and a timestamp to the data payload.
Kafka data is stored in topics, each topic being divided into partitions and each partition being replicated to avoid data loss. Ingest is achieved by delivering data through a "producer" which is implemented to send data to one or more well defined topics by direct connection to the broker cluster. Load balancing is carried out by the broker cluster itself via negotiation with topic partition leaders.
PNDA is typically deployed with a set of well defined topics in accordance with the deployment context, each topic being carefully configured with a set of replicated partitions in line with the expected ingest and consumption rates. By convention topics are named according to a hierarchical scheme such that consumers are able to "whitelist" data of interest and subscribe to multiple topics at once (e.g.
PNDA includes tools for managing topics, partitions and brokers and for monitoring the data flow across them.
Integrators can make use of the high and low level Kafka APIs. Please refer to our Data Preparation Guide to understand how to encapsulate data to the required Avro schema. We will also provide a reference in a variety of common implementation languages to illustrate how to correctly use the Avro schema in conjunction with the Kafka API.