Hitachi logo

Pentaho Review
One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities.

Valuable Features

Pentaho is a suite with five main products: Pentaho Data Integration for ETL, Pentaho Business Analytics Server for results delivery and development clients Report Designer, Metadata Editor and Schema Workbench.

Pentaho Data Integration's (PDI, former Kettle) features and resources are virtually unbeatable as it can handle everything from the smallest Excel files to the most complex and demanding data loads. It's able to scale from a single desktop computer to lots of nodes, on premises or in the cloud. Not only is it powerful, but it is also easy to use. I have never worked with anything else, like Informatica's PowerCenter or Microsoft's SSIS but I have always taken the opportunity to inquire who has. Lastly, PDI is easier to use and achieves more with less effort than those other products.

Then there is the Pentaho BA Server, built to be the linchpin on BI delivery for enterprises. It is built on a scalable, auditable platform able to deliver from dashboards and reports to OLAP and custom-made features. It supports background processing, results bursting by e-mail, load balacing (through native Java Webserver - like Tomcat - load balancing features), integration with corporate directories services as MS Active Directory and LDAP directories, with account management and lots of bell and whistles.

The suite's plugin architecture deserves a special remark: Both PDI and BA Server are built to be easily extended with plugins. There are two plugins marketplaces, one for PDI and onde for BA Server, with a good supply of diverse features. It all those plugins are not enough, there are means to develop you own plugin either coding in Java (mostly for PDI) or, for the BA Server, with point-and-click ease with Sparkl, a BA Server plugin for easy development and packing of new BA Server plugins (but some need of JavaScript, CSS and HTML is needed.)

Any company is able to design and delivery a deep and embrancing BI strategy with Pentaho. At its relatively low prices, when sided with comparable competition, the most valuable features are the data integration and the results delivery platform.

Improvements to My Organization

I work for the largest government owned IT enterprise in Brazil, employing over 10.000 people with yearly earning in surplus of half billion dollars. Designing and delivering timely BI solutions used to be a bogged down process because everything involved license costs. With Pentaho we were able to better suit our needs and better serve our customers. We use CE were for our departamental BI needs, and deliver solid service to our customers using paid licenses. Also, in being so complete, Pentaho has enabled a whole new level of experimentation and testing. We can completlly evaluate a customer need with CE licenses and then delivery the solution at a price, assembling it over EE licenses. We need paid support for our customers in order to be able to timely answer any outage.

Room for Improvement

Pentaho has a solid foundation and decent user interfaces. They are lacking, however, in the tool space for data exploration/presentation. The recent Data Discovery trend put a lot of strain on visual data analysis tools suppliers and Pentaho has chosen to strengthen their data integration features, aiming for Big Data and Hadoop growing market. The work on visual data exploring tools was then mainly left for the community to tackle on.

So, there is room for improvement regarding graphical interface for data exploration and presentation. Please note that there is no wanting for decent tools, only that the tools are not as sharp and as beautiful as QlikView, for instance. Pentaho delivers, no question, it only does not pleases the eye that much.

Use of Solution

I have been using the whole Pentaho suite for nine years. I have also self-published a book on Pentaho and regularly write for my BI/Pentaho blog.

Deployment Issues

Being such a young product, experiencing fast evolution and rapid company growth, not every time things are bug free. Every new release cames in with its share of new bugs. Any upgrades were not without concerns, although there were never risk of losing data - Pentaho is simple to an extreme and hardly we find some nasty dependency hurting our deliveries.

The main deploy problems were with LDAP and Apache integration. There is a need for quite some knowledge on web servers architecture to allow a team a smooth delivery experience.

Stability Issues

We did encounter stability issues. Being a data intensive application, Pentaho is quite sensitive to RAM limitations. Whenever not enough RAM is allocated for it to work, it would progressively slow down to a crawl and then to a halt. Lots of well managed disk cache and server clustering aleviates it though.

Scalability Issues

Pentaho scales really very well.

Pentaho Data Integration scalation is a breeze: just setup the machines, configure the slaves and master and that is it. One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities, and that might be tricky but easy nonetheless. Bottom line of data integration scalability is limited to developers ingenuity on data processing compartmentalization so processing parallelization and remote processing become profitable for clustering.

Pentaho BA Server also scales well, on a quite standard load balancing scheme. Being a regular and well behaved Java program, the Pentaho BA Server is enabled to be clustered on the Java web server, like JBoss, or in a Apache/Tomcats multi-server loading balancing schema.

It is not for the amateur Pentaho administration to do it, however. In fact, a Pentaho administrator alone probably will have a degree of difficulty to achieve server scaling, and would be better of having help from web server clustering professionals.

Customer Service and Technical Support

Customer Service:

My company has been served only be the Brazilian Pentaho's representative, which are knockout good guys and gals, which deliver it at any cost! They have even brought in Pentaho technicians from USA to assess some of our issues. Only kudos to them. I cannot opine on US or Europe support, but I have no reason to think less of them.

Technical Support:

Technical support is a mixed issue with Pentaho. As previously stated, it is a young product, from a young company. The technical support by the means of instructions manuals, fora, Wikis and the like is quite good. However, the fast growing has left some breaches along the documentation body.

For instance, I needed to find how to enable certain feature on reporting designing. I was not able to find it in the official help guides, but there was the project leader blog where I found a post talking about it. With the correct terming I was able to look for it in the International Forum, where lying there was the answer I was in need of. So, overall it is good, but it is still in the road for a complete and centralized, well managed, gapless documentation body.

Previous Solutions

In fact we are still using the whole lot: MicroStrategy, Business Objects, and PowerCenter. We have not turned off all those implementations, only Pentaho clang all around us like weed - it is so easy to start using and gives results with so little effort it is almost impossible to use something else. Most of the time, we offer other options only at the customers requesting. Otherwise, left to us, we are most likely to propose using Pentaho.

Initial Setup

Hard answer: both. We got up to delivering results in almost no time. However, a sizeable lot of little vicious details kept resisting to us - most issues with stability, latter associated with RAM limitations, and user management, tied to LDAP integration. Part of the said difficulties stemed from bugs, too, so there were only a matter of time waiting for Pentaho to fix them,

After that the customer kicked in a lot of small changes and adaptations, truly to the "since-we-are-at-it"-scope-creep-spirit (some rightful, some pure fancy), which had us and Pentaho scratching our mutual heads. In the end we kinda helped them advance some updates in the Server. And delivered all that was asked.

Implementation Team

We started with our in house team and when things started to get too much weird or complicated the vendor team landed in. After that first fire baptism we got a couple of hard boiled ninjas that were able to firefight anything and the vendor team was sent back home, with praises.


No ROI for us. The company I work for has no business approach to BI strategy. All we, as a company care, is to make the customer happy and that has the cost of not letting us turn down some unprofiting projects. So, Pentaho is a good tool and capable of delivering millions of dollars on new/recouped/saved revenue, but we are not posing for that.

Thinking a bit more, the mere fact we are able to deliver more, and hence take more orders, might be seem as a return on our investment. Yet I can't exact a number, for even this kind of return is a little unclear.

Pricing, Setup Cost and Licensing

Pentaho is cheap, and becomes cheaper as your team master it. However, it would be a total waste of good dollars to believe my word. Try it for free and go look for professional support from Pentaho. You can also try to compare other tools with Pentaho, but keep in mind that, appart from SAS, all other tools compete on a part of Pentaho. So you must assembly a set of different products to fully compare to it.

Let us say you are going to build a standard dimensional data mart to serve OLAP up. Pentaho has a single price tag, which must be matched to a MicroStrategy PLUS Informatica PowerCenter to make for a correct comparison.

The Community Edition, a free version, is not short on features when compared to the Enterprise Edition, it is just a bit uglier.

To match a Pentaho license price with only either one will give wrong results.

Other Solutions Considered

Pentaho was a total unknown product back in 2006-2007. We ran several feature comparison sheets. The biggest and most controversial were against Informatica's PowerCenter and MicroStrategy Intelligent Server. Both were matched with Pentaho at some degree, and few things Pentaho was not able to deliver then. But, and this is a rather strong but, most of the time Pentaho had to be tweaked with to deliver that itens. It was a match, allright, but not a finished product by then.

Since that time the suite has evolved a lot and became more head to head comparable with the same products.

Other Advice

Pentaho has a huge potential to deliver quite a lot of BI value. But on those days when BI is regarded as a simple multidimensional analytics tools, it seems a bit bloated and off the mark. It is so because Pentaho is not aimed to be flashy and eye-pleasing for a commomplace reporting monger (reporting is the farthest you can get from BI and still smell like it), and it requires a bit of strategy to allow for ROI. If you are looking for tools for immediate, prompt, beautifull remmedy, Pentaho might not be your pick. But if you know what you want to acomplish, go on and try it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Add a Comment

Anonymous avatar x30
Why do you like it?

Sign Up with Email