What is PNDA?

Open source Platform for Network Data Analytics
Aggregates data like logs, metrics and network telemetry
Scales up to consume millions of messages per second
Efficiently distributes data with publish and subscribe model
Processes bulk data in batches, or streaming data in real-time
Manages lifecycle of applications that process and analyze data
Lets you explore data using interactive notebooks

Overview

Scalable analytics platform

Innovation in the big data space is extremely rapid, but combining multiple technologies into an end-to-end solution can be extremely complex and time-consuming. The vision of PNDA is to remove this complexity and allow you to focus on your solution. PNDA brings together a number of open source technologies to provide a simple, scalable big data analytics platform.

Principles and benefits

Today, big data analytics architectures typically consist of a number of discrete solutions integrated in silos with collections of data sources. PNDA offers an innovative approach to collecting, processing and analyzing big data. It has a streamlined data pipeline that makes it easy to surface the right data at the right time.

Decoupling sources from applications

By decoupling data sources from data consumers, you can integrate data sources once, then make data available for any application to process. Platform apps can perform horizontally scalable data processing, while client apps can use one of several structured query interfaces or consume streams directly.

Taking a big data approach

PNDA is inspired by modern big data architecture patterns. It stores data in the rawest form possible, for as long as possible, in a resilient, distributed file system. You don't need to force your data into a domain-specific schema, or throw away data that could be valuable for use cases you haven't thought of yet.

Stream and batch processing

PNDA provides the tools to process near real-time streaming data, and to perform in-depth batch analysis on massive datasets. This lets you gain insight into what is happening right now, and keep up with changes in context while determining longer-term trends.

Scale and extend

PNDA is built entirely from scalable, open technologies. You can start small, and then grow a cluster horizontally as demand increases. As innovation in the big data industry moves forward, we are able to bring you the latest advances in performance, security and high availability.

Simplify and accelerate

There are a bewildering number of big data technologies out there, so how do you decide what to use? We've evaluated and chosen the best tools, based on technical capability and community support. PNDA combines them with interactive notebooks and application management to streamline the process of developing data processing applications.

Data ingress

PNDA uses Kafka producers to process vast amounts of streaming data. It's pre-configured to ingest data from Logstash and OpenDaylight, or you can adapt our sample code to build your own producers in a variety of languages. Do you need to analyze large amounts of data for a particular event? You can process it using our bulk ingest tool.

Data distribution

PNDA uses Kafka and Zookeeper for high velocity data distribution. Kafka consumer applications can consume data directly, or you can create your own toolchain with modular apps that process data, then add it to Hadoop or return it to Kafka.

High-volume batch processing

PNDA leverages Apache Spark for petabyte-scale batch processing and deep historical insight into data. You can write apps in Java or Scala and deploy them to process data in your PNDA cluster.

High-velocity stream processing

With Apache Spark Streaming, PNDA lets you process real-time streaming data the same way you process batch data. You can write your business logic once, and decide later whether to process data in batches or streams.

Free-form data exploration

To help you better understand your data, PNDA integrates Jupyter Notebook, a web-based application that enables free-form, interactive data exploration. You can load data from the distributed file system, run experiments in batch mode, generate graphs on-the-fly, and rapid prototype big data applications.

Structured query over big data

Processed data can be stored in the HDFS distributed file system or the HBase NoSQL database. Impala presents a standard SQL interface to these big data systems, allowing a wide range of business intelligence apps to pull data out of PNDA.

Handling time series

Analytics apps often need to process, store and display huge amounts of time series data. PNDA uses OpenTSDB to store time series data, and Grafana to create rich and engaging dashboards.

Presentation

This presentation provides an overview of the PNDA platform.

Console

The PNDA console provides a dashboard for all components in a cluster. Point to a component for a description.

Technologies

PNDA incorporates a number of open source technologies, including: