Cloudera Distribution for Hadoop Review

Performs well and the technical support is helpful, but the upgrade process needs to be consolidated

What is our primary use case?

We are dealing with data from the telecom industry. We were using an Oracle system but our volume has increased. We now have a lot of real-time data that needs to be transformed so that it can be made available and used.

What is most valuable?

The most valuable feature is Impala, the querying engine, which is very fast. We have been able to work with one terabyte of data in less than 20 minutes. The speed makes it easy for us to process all of the data that comes in, in time.

The support is very good.

All of the data has automatic triple replication in order to secure integrity.

What needs improvement?

There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon.

When we are upgrading CDH, there are many things that need to be upgraded and it would be helpful if it were bundled. As it is now, we have to upgrade many different things separately.

For how long have I used the solution?

I have been working with the Cloudera Distribution for Hadoop for around two years.

What do I think about the stability of the solution?

It is a stable solution.

What do I think about the scalability of the solution?

The scalability is good and it works on commodity hardware. One of the problems we have right now is that there is a lot of data and we're moving it from our Oracle solution. This means that there is a double cost, in terms of storage, during our transition to working with big data.

We are using a data lake that is a store for all of the data in our organization. There are more than25 projects, with between 25 and 30 people in each one, for a total of almost 1,000 people. All of them are dependent on this solution.

Most of our users are technicians who have problems to solve using the data available to them. A couple of them are data scientists and the remainder are upper management, who do the analysis.

How are customer service and technical support?

The technical support is very good. Whenever we open a ticket, we get support right away.

Which solution did I use previously and why did I switch?

We did use another solution prior to this one but it could not keep up with our increase in data.

What other advice do I have?

This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies.

My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?


Which version of this solution are you currently using?

**Disclosure: I am a real user, and this review is based on my own experience and opinions.
More Cloudera Distribution for Hadoop reviews from users
...who work at a Financial Services Firm
...who compared it with Oracle NoSQL
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: June 2021.
513,091 professionals have used our research since 2012.
Add a Comment
ITCS user