Pentaho Data Integration Review
Easy to use and has a nice GUI. The json input needs to perform better.


Valuable Features

Ease of use, stability, graphical interface, small amount of "voodoo" and cost.

Room for Improvement

There some steps that should perform better like the json input, but because of the flexibility we at inflow, override it by using scripting steps. Of course it's ideal to use the steps that come with the software but if you can write your own step that's powerful. Also, it would be nice to have the drivers for the data sources shipped with Pentaho Kettle instead of looking for the right ones on the Internet.

Deployment Issues

In every project there are issues with the deployment, but we were able to overcome them.

Stability Issues

I think that Pentaho Kettle is a stable software, if it wasn't, we wouldn't have used it (because we don’t like angry customers).

Scalability Issues

Actually, Pentaho Kettle comes equipped with the option to scale out, out of the box.
And no, we didn't encountered specific scalability problems.

Customer Service and Technical Support

Customer Service:

We mainly use material which was written over the years (pentaho kettle materials), the forum, Matt casters blog, videos, etc. We also try to solve our issues inside the company for our customers before contacting customer service. We even developed a full-scale data integration Pentaho Kettle online course and built a website for it.

When we use the customer service it's very good. There is a large community for the tool, people gladly help each another.

Technical Support:

Very good support.

Previous Solutions

Before Pentaho Kettle we used stored procedures, writing code and also Informatica. Informatica is a very good tool, but it is not open source so it is far more costly compared to Pentaho Kettle. From my perspective I don't see the difference, we can do almost everything with Pentaho Kettle and if we need a little extra we are tech guys, we solve it.

Of course that from the customer's perspective the cheaper the better, so if the customer has a smaller budget, they get more when using Pentaho Kettle open source. Even with the Pentaho Kettle enterprise edition.

Pricing, Setup Cost and Licensing

I can say from the vendor perspective- usually the part of the data integration (from data source to the warehouse/target) takes at least 60% of the whole initial business intelligence project. It depends on the data sources and complexity, for example: big data, NoSql, xml, web services, "weird" files and more.

After the data integration project is "live" it will work fine until someone breaks something. (Network connectivity, servers, DBA that changes the data source, or any other change for that matter that changes variables that the data integration was built upon) but this is true for all data integration software.

The day-to-day costs are very low if there are no new requirements. Luckily for us (as a vendor) once the customer starts and the users get their fancy reports and dashboards there's no turning back, and the requirements are piling up. But these are new requirements, not maintenance.

Other Advice

Instead of trying to decide on a specific data integration tool, pick the right vendor partner, not a biased one. They will be able to recommend the set of tools you need according to your requirements and budget.

Business intelligence project are made up of at least three components:

  • 1. Data integration tool
  • 2. Data warehouse tool
  • 3. Visualization tool

Several of the software vendors have them all, but not the best solution for each component. From my experience it's better to combine solutions. (Unless it is a small project.)

For example: data integration from Pentaho Kettle, if it's big data we need an in memory/ columnar database for data warehouse but if it's not we can use traditional databases (SQL Server, Oracle, even MySQL for smaller projects) and a BI visualization tool like Yellowfin/Tableau/Sisense/etc.

In the middle you have tens of software vendors that can be suitable for the customer needs.

Of course if the vendor partner is biased then suddenly Tableau/Sisense/Qlikview/etc. become the best data integration tool. Or "you don’t need a data integration tool at all" although they don't have the right components. (They are very good tools for visualization but not for "playing" extract and transform complex data). We work with several vendors such as Sisense and Yellowfin which are are great tool for the specific solution they were made for.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
1 visitor found this review helpful

Add a Comment

Guest
Why do you like it?

Sign Up with Email