Pentaho Data Integration Review

Needs improvement on the Hadoop and JMS plugins.


Valuable Features:

It allows for rapid prototyping of a wide array of ETL workloads.

Room for Improvement:

Support for common Hadoop utilities can be expanded, such as bulk load with composite row keys for HBase, and include drivers for Impala out-of-the-box. A richer interface to Hive could also be beneficial as we currently have to go through a raw connection and execute SQL scripts, for which some syntax is not respected.

As of version 6, there are also some new issues introduced that pose a bit of an annoyance:


1) On kettle's ramp up - log4j errors

2) IBM Websphere MQ Producer - variable substitution for the URL does not work - you have to hardcode.

3) shared.xml for DB connections - variable substitution for connection properties does not work - have to hardcode things like Kerberos principal for a Hive/Impala connection.

Deployment Issues:

We had no issues deploying it.

Scalability Issues:

The robustness of this solution in a production cluster (>30 nodes) remains to be seen.

Which version of this solution are you currently using?

6.1
**Disclosure: I am a real user, and this review is based on my own experience and opinions.
More Pentaho Data Integration reviews from users
...who work at a Comms Service Provider
...who compared it with SAP Data Services
Learn what your peers think about Pentaho Data Integration. Get advice and tips from experienced pros sharing their opinions. Updated: July 2021.
523,431 professionals have used our research since 2012.
Add a Comment
ITCS user
Guest