Cloudera Distribution for Hadoop Review

It gives us improved business intelligence reporting from daily to every two hours.


Valuable Features:

Faster runtime for batch jobs.

Improvements to My Organization:

Improved Business Intelligence reporting from daily to every two hours satisfying the business stakeholders who would favour transactional systems to draft reports because it had the latest data. 

The issue that arises using transactional systems with multiple version of truths across the enterprise. With faster turn-around time business stakeholders are now adopting the BI systems designed to give a cohesive view of the performance metrics important to them.

Room for Improvement:

Full Support for all Spark SQL features, support for SparkR, compatibility with Hive for DataFrame saved tables.

Cloudera CDH5.5.x does not support SparkR. SparkR, the integration of R models in API would be a great addition since this will enable fast near real-time analytical integration of R models with data feed.

The functionality in SparkSQL to save a DataFrame as a table in HIVE produces a table not compatible with HIVE. There is a workaround for this in creating the HIVE table first and then doing inserts.

Cloudera CDH5.5.x is a great product, but the adoption of additional features not currently supported will make the product even better but by no means subtract from its desirability.


Other Advice:

Do thorough research and ensure your use-cases or scale does not conflict with the system requirements and that those features that would make a difference are supported.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email