Cloudera Distribution for Hadoop Overview

Cloudera Distribution for Hadoop is the #2 ranked solution in our list of top Hadoop tools. It is most often compared to Amazon EMR: Cloudera Distribution for Hadoop vs Amazon EMR

What is Cloudera Distribution for Hadoop?
Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.
Cloudera Distribution for Hadoop Buyer's Guide

Download the Cloudera Distribution for Hadoop Buyer's Guide including reviews and more. Updated: June 2021

Cloudera Distribution for Hadoop Customers
37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
Cloudera Distribution for Hadoop Video

Pricing Advice

What users are saying about Cloudera Distribution for Hadoop pricing:
  • "When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive."

Filter Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
TG
BI Manager at Discovery Health
Real User
Open-source solution for intelligent data management and analysis

What is our primary use case?

We make recommendations to clients for using different models of this solution to handle data intelligently.

Pros and Cons

  • "Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
  • "The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."

What other advice do I have?

I would say that the product as it currently is should rate at an eight out of ten. The reason that score is not higher is because of the workarounds that we have to do when it comes to certain models that do not support using multiple programming languages. For example, in a single notebook, it is inflexible if you want to use other program languages. As far as other advice for people considering this solution, I would say take a good look at your business need before you decide on this technology and which solution to choose. Make sure that you are not already able to solve for your…
Zjaen Coetzee
Data Management at BCX
Real User
Leaderboard
Offers big data support for analytical applications but the technical support needs improvement

What is our primary use case?

We primarily use it only for big data support for analytical applications.

Pros and Cons

  • "In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
  • "The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."

What other advice do I have?

I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems. I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the…
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: June 2021.
511,307 professionals have used our research since 2012.
NavneetKaur
Senior Software Engineer at a tech services company with 10,001+ employees
Real User
Performs well and the technical support is helpful, but the upgrade process needs to be consolidated

What is our primary use case?

We are dealing with data from the telecom industry. We were using an Oracle system but our volume has increased. We now have a lot of real-time data that needs to be transformed so that it can be made available and used.

Pros and Cons

  • "The most valuable feature is Impala, the querying engine, which is very fast."
  • "There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."

What other advice do I have?

This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies. My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful. I would rate this solution a seven out of ten.
MG
Data engineer at a tech services company with 11-50 employees
Real User
Supports a wide range of tools and has a good support community

What is our primary use case?

Our primary use case for this solution is to host a big amount of data in our platform, processing, analysis and all of this stuff on the platform.

Pros and Cons

  • "We also really like the Cloudera community. You can have any question and will have your answer within a few hours."
  • "Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment."

What other advice do I have?

In terms of the advice, I would say to focus on what tools are available on the market. In terms of open-source, most companies are delivering open source technologies and providing support to these tools. Now I have the option to purchase a license for whatever platform for $1. I can deliver it with another small company at a lower cost. If I was the decision-maker, I'd invest in open-source tools. Cloudera and all of these companies are trying to adapt to these big data technologies and open source tools. Cloudera is trying to put it inside their platform so that we can have a compatible…
RS
AD - Associate Director at a financial services firm with 10,001+ employees
Real User
Top 10
Feature rich and scalable with good support, but there are performance issues and the security could be improved

What is our primary use case?

We are using this solution for storing Big Data in one centralized location.

Pros and Cons

  • "The main advantage is the storage is less expensive."
  • "Currently, we are using many other tools such as Spark and Blade Job to improve the performance."

What other advice do I have?

I am a part of security and software development. We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component. I would rate this solution a five out of ten.
Doron Sela
DBA team manager at a financial services firm with 1,001-5,000 employees
Real User
Helpful to build infrastructure for advanced analytics and is easy to install

What is our primary use case?

I'm part of the IT team at my company, and our primary use case of this solution is building infrastructure for advanced analytics, where we copy data from our data warehouse that is now our relational database. We copy it to the Cloudera Distribution for Hadoop and then analyze it with Python and machine learning.

Pros and Cons

  • "The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
  • "I would like to see an improvement in how the solution helps me to handle the whole cluster."

What other advice do I have?

I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools.
Sumit Chaudhuri
Lead Consultant - Product Development at FIS (http://www.fisglobal.com/)
Consultant
Top 5Leaderboard
We use this solution to use big data for our analyses

What is our primary use case?

Our core product is an insurance product and the actuarial module is quite complex. SMEs so far collect data from various sources into Excel sheets and through macros do the analytics which is a very crude form of doing the analysis. So we thought to use big data for such analysis.
KG
Associate Manager at a consultancy with 501-1,000 employees
Real User
Top 5Leaderboard
Easy to install, good technical support, and with a single script we can run jobs within minutes

What is our primary use case?

We use this solution to process data. When using an SQL Server you have to build indexes and you need to fine-tune the data. We import the data that is in the SQL Source. With a single script, we are able to run the jobs within minutes, which is an advantage. We are using the Power BI model for the business convention. The performance in Power BI will be reduced if you incorporate more calculations. Those calculations are captured in the Hadoop layer and processed.

What needs improvement?

It could be faster and more user-friendly.

For how long have I used the solution?

I have been using this solution for seven months.

What do I think about the stability of the solution?

It's a stable product. I don't see any performance issues.

What do I

See 4 more Cloudera Distribution for Hadoop Reviews