The PNDA console provides a real-time overview of all the components in a PNDA cluster.
See the following pages for a description of features available on other tabs:
Once you have provisioned a cluster, you can connect to the PNDA console at
clustername is the name of your cluster.
The home page shows health statistics of various different components which make up PNDA. Components are grouped into categories, including data distribution, data processing, data storage, applications, etc.
Components are displayed in green if everything is functioning properly, yellow if there is a warning, or red if there is an error. Some components show additional details on the home page, such as the amount of data in the HDFS file system.
Components have buttons that perform various functions:
- Click on the
(i)icon to open a popup with detailed metrics for a component.
- Click on the
(?)icon for contextual help for a component.
- Click on the gear icon to configure a component.
There is also a Help link in the toolbar that explains how to use each page in the console.
When all is well everything will be green, but if problems arise then you may see some components with warnings or errors.
If a component is in a warning or error state, the
(i) icon will be replaced by an exclamation mark
(!) icon, which you can click to see more detailed information about the problem.
For example, if the HDFS component is in an error state, the popup will show the causes of the problem, and a link to the configuration page for the component so that you can resolve the problem.
This section describes each of the components on the home page of the console, and the role it plays in PNDA.
Apache Kafka is a high-throughput, distributed, publish-subscribe messaging system.
In PNDA, it is used to collect data ready for processing. It decouples data aggregation (publishers) from data analysis (consumers), allowing any application to consume data present on Kafka.
Apache Zookeeper provides an open source distributed configuration service, synchronization service, and naming registry for large distributed systems.
It is used by Kafka for coordination of its distributed operations, to track leadership and to store topic metadata.
Apache Spark is framework and engine for distributed, large scale data processing.
In PNDA, it allows for both batch mode and streaming computation.
Apache Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
In PNDA, batch mode Spark jobs are run on a regular schedule by Oozie.
Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology.
It coordinates running of jobs and their component tasks on a cluster, allocating memory and cores to those tasks.
The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
OpenTSDB is a scalable time series database that lets you store and serve massive amounts of time series data, without losing granularity.
Grafana is a graph and dashboard builder for visualizing time series metrics.
It is pre-configured to connect to OpenTSDB as its data source. It is much easier to create dashboards in Grafana than using the OpenTSDB user interface.
HBase is a distributed, scalable data store, designed for fast, random access to very large data sets, i.e. millions of columns and billions of rows.
The Hive metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients access to this information via the metastore service API.
HDFS is a fault tolerant and self-healing distributed file system, suited to large-scale data processing workloads.
This section shows an overview of applications that have been deployed, and lets you launch apps. For more information, see the applications page.
The data logger is a service that collects all the data displayed in the console. It has POST APIs for collecting metrics, as well as package and application data.
The data manager is a service providing data to the console front-end or other clients. It has REST APIs for retrieving data about metrics, packages and applications, and also a web sockets API for real-time notifications.
The console front-end (often referred to as the Console itself) is a web application providing an overview of all the components in a PNDA cluster.