Cloudera Distribution for Hadoop Review

For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. But, it has HBase 1.0 stability issues and processing speed needs improvement.


Valuable Features:

  • Cluster rolling restarts 
  • Cluster wide configuration management

Improvements to My Organization:

For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. 

We are currently running six production clusters totaling 900+ nodes, and are building three more clusters. Knowing that if someone has some custom configuration on a node that they haven’t communicated out, and that I can ignore that configuration and bring that node into line with where we’ve decided to run the cluster, is very beneficial.

Room for Improvement:

HBase 1.0 stability issues and processing speed is a major area for improvement. Right now, our Cloudera 5 clusters run four to seven times slower than our Cloudera 4 clusters using our storm and kafka topologies, which causes real-time processing to be a major challenge.

CM’s API is very limited and difficult when used on multiple clusters in the same CM instance

Use of Solution:

We've used it for approximately two years. We also use Cloudera Manager, which is 6/10.

Deployment Issues:

No issues encountered.

Stability Issues:

Cloudera 5 is currently very unstable. Between two Cloudera 5 clusters, we have an incident at least twice a week due to what are now outstanding bugs.

Scalability Issues:

It's very easy to deploy and scale as large as you want. Once created on the CM management cluster, is difficult to scale up as needed, as you add more clusters to the same CM instance.

Previous Solutions:

No previous solution was used.

Initial Setup:

We were already running one production cluster with approximately 75 nodes when I joined, so I’m not familiar with what was needed to get the initial production cluster up. Once I joined, I assisted in standing up the additional nodes and clusters using our chef automation.

Implementation Team:

In house via chef automation. Chef, or similar systems, makes it much simpler to stand up large scale clusters. That said, I have not used or evaluated vendor team implementation methods.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest

Sign Up with Email