why need avro schema?

Part of the advantage of PNDA is following the baked in best-practices we have set up so we would recommend using the PNDA schema. Doing this gets the advantage of a Gobblin map reduce job automatically archiving all data from Kafka into HDFS. The avro schema that is defined for PNDA just contains enough information to allow the data to be archived in HDFS in a time series and defines a raw bytes data field that can contain any payload you like. This is an application of schema-on-read where the detailed format of the payload is not constrained but interpreted at processing time by an application. So I would put your existing data format inside this PNDA envelope schema.

Having said all that, you could disable Gobblin, or blacklist the topics that don't use the PNDA schema, then write the application logic to handle your own schema.

Questions & Answers

why need avro schema?

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.