Pentaho Data Integration Review

The user-defined class operator is currently very valuable to me.


Valuable Features:

I would say that user-defined class operator is currently very valuable to me. Other than that native connectivity to hadoop (MapR), analytical databases and enterprise systems are really important to me these days.

Improvements to My Organization:

I am a researcher in the field of data integration, and I am using this tool as a sandbox. I would say, because it is open source and high availability of forums and support has made my work really easy. Also, the reporting and analysis functionality provided gives me more freedom to test my test cases and results.

Room for Improvement:

I would like to have more languages/scripts supported in user-defined classes. Right now the options are very limited. I know, if I want to do core programming I can always import my classes/jars into it, but it would be really nice to have more functionality in terms of programming language and support in UD classes/operator. Besides that, different parallel algorithms/skeletons would be great. For example, it could suggest which parallel algorithm I should use on a particular operator or a set of operators. It would be really cool to have such a functionality.

Other Advice:

 If you are looking to integrate unstructured or semi-structured datasets with some parallelization, choose this tool. Parallelization supported by Pentaho Data Integration is a functionality that is really nice to have . You can choose which activities you want to parallelize and that's it. You do not have to write parallel code or something, as it does this job for you, which is awesome for a not so good programmer such as myself.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email