Compare Cloudera Distribution for Hadoop vs. Pentaho Data Integration

Cancel
You must select at least 2 products to compare!
Most Helpful Review
Find out what your peers are saying about Apache, Cloudera, IBM and others in Hadoop. Updated: May 2021.
501,151 professionals have used our research since 2012.
Quotes From Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:

Pros
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.""The search function is the most valuable aspect of the solution.""Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis.""We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that.""In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues.""The most valuable feature is Impala, the querying engine, which is very fast.""We also really like the Cloudera community. You can have any question and will have your answer within a few hours.""The most valuable feature is Kubernetes."

More Cloudera Distribution for Hadoop Pros »

"The solution has a free to use community version.""The amount of data that it loads and processes is good.""Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."

More Pentaho Data Integration Pros »

Cons
"I would like to see an improvement in how the solution helps me to handle the whole cluster.""The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better.""The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions.""We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve.""The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.""There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon.""Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment.""The price of this solution could be lowered."

More Cloudera Distribution for Hadoop Cons »

"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers.""I would like to see improvements made for real-time data processing.""I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."

More Pentaho Data Integration Cons »

Pricing and Cost Advice
"When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive.""The price could be better for the product."

More Cloudera Distribution for Hadoop Pricing and Cost Advice »

"The price of the regular version is not reasonable and it should be lower.""Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs."

More Pentaho Data Integration Pricing and Cost Advice »

report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
501,151 professionals have used our research since 2012.
Questions from the Community
Top Answer: There are better solutions out there that have more features than this one.
Top Answer: Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition :… more »
Top Answer: Depends upon the technologies being used. If you're using Oracle for both OLTP and OLAP then you'll get a lot of value from an Oracle solution. The other question is how up to date do you want your… more »
Top Answer: Pentaho Data Integration is quite simple to learn, and there is a lot of information available online.
Ranking
2nd
out of 22 in Hadoop
Views
4,936
Comparisons
3,316
Reviews
12
Average Words per Review
401
Rating
7.7
16th
Views
9,781
Comparisons
7,959
Reviews
3
Average Words per Review
652
Rating
7.7
Popular Comparisons
Also Known As
Kettle
Learn More
Overview
Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.

Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, "analytics ready" data to end users from any source. With visual tools to eliminate coding and complexity, Pentaho puts big data and all data sources at the fingertips of business and IT users alike.

Offer
Learn more about Cloudera Distribution for Hadoop
Learn more about Pentaho Data Integration
Sample Customers
37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Top Industries
REVIEWERS
Financial Services Firm43%
Computer Software Company21%
Marketing Services Firm14%
Healthcare Company7%
VISITORS READING REVIEWS
Computer Software Company28%
Comms Service Provider17%
Financial Services Firm11%
Government6%
REVIEWERS
Government15%
Comms Service Provider15%
Healthcare Company15%
Financial Services Firm15%
VISITORS READING REVIEWS
Computer Software Company27%
Comms Service Provider20%
Financial Services Firm8%
Government7%
Company Size
REVIEWERS
Small Business25%
Midsize Enterprise22%
Large Enterprise53%
REVIEWERS
Small Business25%
Midsize Enterprise25%
Large Enterprise50%
Find out what your peers are saying about Apache, Cloudera, IBM and others in Hadoop. Updated: May 2021.
501,151 professionals have used our research since 2012.

Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 12 reviews while Pentaho Data Integration is ranked 16th in Data Integration Tools with 3 reviews. Cloudera Distribution for Hadoop is rated 7.6, while Pentaho Data Integration is rated 7.6. The top reviewer of Cloudera Distribution for Hadoop writes "Open-source solution for intelligent data management and analysis". On the other hand, the top reviewer of Pentaho Data Integration writes "Free to use, easy to set up, and has a great metadata injection feature". Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, Apache Spark, MongoDB and IBM Netezza Performance Server, whereas Pentaho Data Integration is most compared with Talend Open Studio, SSIS, Informatica PowerCenter, Oracle Data Integrator (ODI) and CloverETL.

See our list of .

We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.