Questions & Answers

Store the log data in another database other than HBase

0 votes
asked Apr 27 by mohsenari (120 points)
Hi,

I was wondering if it is possible to store the log data into another DB other than HBase that is mentioned in the documentation? I want to put my data into Elasticsearch for example.

Thanks

1 Answer

0 votes
answered Apr 28 by James Clarke (1,630 points)
Hi, we have a partially complete initiative to add elastic search to PNDA. A scalable multi-node Elasticsearch cluster can be included with the AWS version of PNDA (see https://github.com/pndaproject/pnda-aws-templates/blob/master/pnda_env_example.yaml#L97). This isn't integrated with the PNDA console or monitoring framework and isn't available with the Heat version (because the heat templates haven't had the work to bring up the infra required, the PNDA salt install scripts that install ES are done and cross-platform so will work on Heat). If you have any development resource and wanted to add this capability to the Heat templates or do any work regarding the integration with the PNDA console / monitoring framework then that would be very welcome, we could work together on it.
commented Apr 28 by yusong0926 (240 points)
do you guys have plan to integrate Cassandra to Pnada?
commented Apr 28 by James Clarke (1,630 points)
Not at this time. We would use HBase in the first instance for applications that have a requirement for a NoSQL key-value type datastore. You could also run an installation of Cassandra alongside PNDA as a consumer of the data from PNDA. Why did we do that with Elasticsearch you ask? Because someone really wanted Elasticsearch to be part of PNDA and contributed the code and we accepted the pull request. Although, as I mentioned they got distracted half way through and it isn't finished.
commented Apr 28 by yusong0926 (240 points)
can you explain me a little bit about the main advantage of running a database inide the platform and running as a side process? If I run cassandra alongside PNDA, can I still use other data processing like spark batch and oozie inside the platform. I am thinking if I use cassandra as storage, the only part I can use in the platform is the kafka which makes no difference with running a single kafka process without PNDA.
commented 3 days ago by joevans (960 points)
We use Hbase because it is layered on top of HDFS, and so isn't yet another cluster to manage, scale, deal with redundancy, availability etc and all of the complexity that goes with them.  You can use cassandra alongside PNDA as a store for derived (processed) data and still use all of the other services of PNDA.
...