Cloud platform requirements

PNDA is designed to be deployed on bare metal servers, OpenStack cloud computing infrastructure or on Amazon Web Services (AWS). This guide assumes that you are familiar with OpenStack, and that you have an environment set up in which you can create instances, whether in a public or private cloud. Or in the case of AWS, have an account with AWS.

Openstack Requirements

  • PNDA is supported on OpenStack Kilo or later.
  • Instances are created using Ubuntu 14.04 or RHEL 7
  • PNDA is deployed using the Heat orchestration service, using heat_template_version: 2014-10-16. Alternatively, you can use Salt Cloud.
  • The cluster should be set up with one network and one router, and have the possibility to provision multiple virtual machines. See below.
  • The cluster must have access to the public Internet for installation of dependencies.

OpenStack Swift

PNDA expects two Swift containers to be present in Swift. These must be created prior to launching a cluster. The OpenStack Horizon console can be used to create Swift containers, by navigating to Object Store > Containers > Create Container.

Application Packages

Application packages are expected to be found in a pseudo-folder named releases in a Swift container called apps. This will be shared by all PNDA clusters for distributing application packages.

The Swift container and pseudo-folder path within the Swift container can be configured using the pnda.apps_container and pnda.apps_folder settings in the salt pillar.

Data Archive

Data from PNDA data sets can be archived to Swift automatically. A Swift container must be created in Swift to be used for this purpose. By default a swift container called archive should be created but this can be configured using the pnda.archive_container setting in the salt pillar.


The resource requirements for the default pico and standard flavor PNDA clusters are detailed below. However, you are strongly encouraged to create a PNDA flavor specifically designed for your infrastructure.


Cluster configuration: Pico

Pico flavor is intended for development / learning purposes. It is fully functional, but does not run the core services in high-availability mode and does not provide much storage space or compute resource.

Role Instance type Number required CPUs Memory Total Storage Root Volume Storage Log Volume Storage
bastion ec2.t2.medium 1 2 4 GB 20 GB 20 GB 0 GB
saltmaster ec2.t2.medium 1 2 4 GB 20 GB 20 GB 0 GB
edge ec2.m3.xlarge 1 4 15 GB 30 GB 20 GB 10 GB
mgr1 ec2.m3.xlarge 1 4 15 GB 30 GB 20 GB 10 GB
datanode ec2.c4.xlarge 1 4 7.5 GB 65 GB 20 GB 10 GB
kafka ec2.m3.large 1 2 7.5 GB 30 GB 20 GB 10 GB
- - - - - -
total 6 18 53 GB 195 GB

The storage per node is allocated as:

  • 10 GB log volume (not present on bastion or saltmaster). This is provision-time configurable.
  • 20 GB operating system partition. This is configured in the templates per-node.
  • 35 GB HDFS (only on datanode). This is configured in the templates for the datanode.

Cluster configuration: Standard

Standard flavor is intended for meaningful PoC and investigations at scale. It runs the core services in high-availability mode and provides reasonable storage space and compute resource.

Role Instance type Number required CPUs Memory Total Storage Root Volume Storage Log Volume Storage
bastion ec2.t2.medium 1 2 4 GB 50 GB 50 GB 0 GB
saltmaster ec2.m3.large 1 2 7.5 GB 50 GB 50 GB 0 GB
edge ec2.t2.medium 1 2 4 GB 370 GB 250 GB 120 GB
mgr1 ec2.m3.2xlarge 1 8 30 GB 370 GB 250 GB 120 GB
mgr2 ec2.m3.2xlarge 1 8 30 GB 370 GB 250 GB 120 GB
mgr3 ec2.m3.2xlarge 1 8 30 GB 370 GB 250 GB 120 GB
mgr4 ec2.m3.2xlarge 1 8 30 GB 370 GB 250 GB 120 GB
datanode ec2.m4.2xlarge 3 8 32 GB 1194 GB 50 GB 120 GB
opentsdb ec2.m3.xlarge 2 4 15 GB 50 GB 50 GB 0 GB
cloudera-manager ec2.m3.xlarge 1 4 15 GB 170 GB 50 GB 120 GB
jupyter ec2.m3.large 1 2 7.5 GB 50 GB 50 GB 0 GB
logserver ec2.m3.large 1 2 7.5 GB 500 GB 500 GB 0 GB
kafka ec2.m3.xlarge 2 4 15 GB 270 GB 150 GB 120 GB
zookeeper ec2.m3.large 3 2 7.5 GB 170 GB 50 GB 120 GB
tools ec2.m3.large 1 2 7.5 GB 50 GB 50 GB 0 GB
- - - - - -
total 21 94 352 GB 7.3TB

The storage per node is allocated as:

  • 120 GB log volume (not present on bastion, saltmaster, jupyter, tools or opentsdb). This is provision-time configurable.
  • 1024 GB HDFS (only on datanode). This is configured in the templates for the datanode.
  • 50-250 GB operating system partition. This is configured in the templates per-node.

OpenStack Heat vs. Salt Cloud

The primary way of provisioning a PNDA cluster on OpenStack cloud is with OpenStack Heat.

Alternatively, you can use Salt Cloud to provision a cluster. In this case, you must manually create a salt master instance, and then run salt remotely on that instance. Please refer to this guide for details.


Cluster configuration: Pico

Pico flavor is intended for development / learning purposes. It is fully functional, but does not run the core services in high-availability mode and does not provide much storage space or compute resource.

Role Instance type Number required CPUs Memory Storage
bastion t2.medium 1 2 4 GB 20 GB
edge m3.xlarge 1 4 15 GB 30 GB
mgr1 m3.xlarge 1 4 15 GB 30 GB
datanode c4.xlarge 1 4 7.5 GB 65 GB
kafka m3.large 1 2 7.5 GB 30 GB
- - - - - -
total 5 16 49 GB 175 GB

The storage per node is allocated as:

  • 10 GB log volume (not present on bastion or saltmaster). This is provision-time configurable.
  • 20 GB operating system partition. This is configured in the templates per-node.
  • 35 GB HDFS (only on datanode). This is configured in the templates for the datanode.

Cluster configuration: Standard

Standard flavor is intended for meaningful PoC and investigations at scale. It runs the core services in high-availability mode and provides reasonable storage space and compute resource.

Role Instance type Number required CPUs Memory Storage
bastion t2.medium 1 2 4 GB 50 GB
saltmaster m3.large 1 2 7.5 GB 50 GB
edge t2.medium 1 2 4 GB 370 GB
mgr1 m3.2xlarge 1 8 30 GB 370 GB
mgr2 m3.2xlarge 1 8 30 GB 370 GB
mgr3 m3.2xlarge 1 8 30 GB 370 GB
mgr4 m3.2xlarge 1 8 30 GB 370 GB
datanode m4.2xlarge 3 8 32 GB 1194 GB
opentsdb m3.xlarge 2 4 15 GB 50 GB
cloudera-manager m3.xlarge 1 4 15 GB 170 GB
jupyter m3.large 1 2 7.5 GB 50 GB
logserver m3.large 1 2 7.5 GB 500 GB
kafka m3.xlarge 2 4 15 GB 270 GB
zookeeper m3.large 3 2 7.5 GB 170 GB
tools m3.large 1 2 7.5 GB 50 GB
- - - - - -
total 21 94 352 GB 7.3TB

The storage per node is allocated as:

  • 120 GB log volume (not present on bastion, saltmaster, jupyter, tools or opentsdb). This is provision-time configurable.
  • 1024 GB HDFS (only on datanode). This is configured in the templates for the datanode.
  • 50-250 GB operating system partition. This is configured in the templates per-node.

results matching ""

    No results matching ""