We just raised a $30M Series A: Read our story

Pentaho Business Analytics OverviewUNIXBusinessApplication

Pentaho Business Analytics is the #1 ranked solution in our list of top Cloud Operations Analytics tools. It is most often compared to Knowage: Pentaho Business Analytics vs Knowage

What is Pentaho Business Analytics?

Pentaho is an open source business intelligence company that provides a wide range of tools to help their customers better manage their businesses. These tools include data integration software, mining tools, dashboard applications, online analytical processing options, and more.

Pentaho has two product categories: There is the standard enterprise version. This is the product that comes directly from Pentaho itself with all of the benefits, features, and programs that come along with a paid application such us analysis services, dashboard design, and interactive reporting.

The alternative is an open source version, which the public is permitted to add to and tweak the product. This solution has its advantages, aside from the fact that it is free, in that there are many more people working on the project to improve its quality and breadth of functionality.

Pentaho Business Analytics is also known as Pentaho, Kettle, Hitachi Pentaho Business Analytics.

Buyer's Guide

Download the Business Intelligence (BI) Tools Buyer's Guide including reviews and more. Updated: October 2021

Pentaho Business Analytics Customers

Cargo 2000 Lufthansa, Marketo, ModCloth, Cardiac Science, Telefonica, ExactTarget, Active Broadband Networks, and Brussels Airport.

Pentaho Business Analytics Video

Archived Pentaho Business Analytics Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
it_user798240
Identity and Access Management Engineer at a financial services firm with 10,001+ employees
Real User
Easy to install, easy to use, the free edition meets our needs

What is most valuable?

Easy to use components to create the job.

What needs improvement?

Logging capability. Version control would be a good addition.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

A lot of time jobs get stuck, causing them to lock out and fail to run, until we kill them.

What do I think about the scalability of the solution?

I  have not needed to scale this product so far.

How are customer service and technical support?

Open community, you can find good responses at a high level for the free edition. I have not used the commercial version which includes support.

Which solution did I use previously and why did I switch?

This is first solution I have used and I like it.

How was

What is most valuable?

Easy to use components to create the job.

What needs improvement?

  • Logging capability.
  • Version control would be a good addition.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

A lot of time jobs get stuck, causing them to lock out and fail to run, until we kill them.

What do I think about the scalability of the solution?

I  have not needed to scale this product so far.

How are customer service and technical support?

Open community, you can find good responses at a high level for the free edition. I have not used the commercial version which includes support.

Which solution did I use previously and why did I switch?

This is first solution I have used and I like it.

How was the initial setup?

Simple, easy to install.

What's my experience with pricing, setup cost, and licensing?

Free and commercial versions are available.

Which other solutions did I evaluate?

We did not evaluate other options as this one is free.

What other advice do I have?

Good for any size organization. There are other products and vendors available to better handle errors and logging, but for us, the free version of Pentaho is good enough to satisfy our needs.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user736950
Director Tecnologia
Real User
Increases productivity and lowers costs, though should improve the construction of its dashboards

Pros and Cons

  • "I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
  • "Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."

What is most valuable?

I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced.

How has it helped my organization?

The first eight years, I used this tool in one company. Now, I have some customers who hire me to give them advice. I have a couple of great customers in my country and they are very satisfied because they have increased productivity and lowered costs.

What needs improvement?

Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention.

For how long have I used the solution?

For 12 years. I have been using Pentaho CE 6.0 and 7.0. Last year, I implemented Pentaho CE 5.0.

What do I think about the stability of the solution?

I am actually trying to use Pentaho 7.0 CE and determine if it has some issues. In Pentaho EE, I have several years using it without having issues.

What do I think about the scalability of the solution?

No, it is a highly experienced tool. It can do anything.

How is customer service and technical support?

Really, I don't know about the support of Pentaho EE. As for the support of Pentaho CE, it is bad. Fortunately, I am highly experienced and use it very little.

How was the initial setup?

To start, the first configurations were very difficult. I started with the CE version and without good documentation or support. I spent years learning for myself.

What other advice do I have?

Hire specialized support for Pentaho. If customers want a professional tool and have the money, they should invest in the enterprise version of Pentaho or hire a company from your country specializing in Pentaho with high experience.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Find out what your peers are saying about Hitachi, Knowage, Tableau and others in Business Intelligence (BI) Tools. Updated: October 2021.
542,721 professionals have used our research since 2012.
it_user505383
Director at a tech consulting company with 51-200 employees
Consultant
Helps us build decision supported systems.

What is most valuable?

Here are the valuable features:

  • Open source solution
  • Fully integrated
  • Customizable
  • Extensible with a large community behind it
  • Has a powerful ETL process (Pentaho Data Integration)

How has it helped my organization?

It helps us build decision supported systems. The use of MicroStrategy improved the decision-making process through enterprise report automation and corporate information management.

What needs improvement?

I would like to see self-service analysis and front-end tools.

The "self-service" feature of all BI tools deals with let final user (normally not tech one) do his/her work navigating data in a freely and not organized way.

Is exactly the opposite as create and manage a DW where data, metadata, are known and processed before.

Normally, traditional BI tools are more complex and complete managing traditional DW and in recent years other BI tools (like Qlik and Tableau for example) appears to fulfill this features.

For how long have I used the solution?

We have used this solution for five years.

What do I think about the stability of the solution?

There have been no stability issues. There were some minor bugs that were fixed by the manufacturer in their regular patches.

What do I think about the scalability of the solution?

There have been no scalability issues. Pentaho is based on industry standards to build scalable solutions. It’s very simple to scale up, horizontally, and vertically.

How is customer service and technical support?

Technical support is good.

How was the initial setup?

The installation had medium complexity. There is good documentation, but you have to follow certain procedures before using it.

What's my experience with pricing, setup cost, and licensing?

This solution has an open source philosophy. There is a community edition without license costs, although it takes some more time to develop.

There is also an enterprise option that allows you to perform certain tasks easily and includes support.

What other advice do I have?

Have a global and corporate design in mind. However, start with a particular area, small and well-defined.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Pentaho Specialist/Free Software Expert at a tech services company with 10,001+ employees
Consultant
One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities.

What is most valuable?

Pentaho is a suite with five main products: Pentaho Data Integration for ETL, Pentaho Business Analytics Server for results delivery and development clients Report Designer, Metadata Editor and Schema Workbench.

Pentaho Data Integration's (PDI, former Kettle) features and resources are virtually unbeatable as it can handle everything from the smallest Excel files to the most complex and demanding data loads. It's able to scale from a single desktop computer to lots of nodes, on premises or in the cloud. Not only is it powerful, but it is also easy to use. I have never worked with anything else, like Informatica's PowerCenter or Microsoft's SSIS but I have always taken the opportunity to inquire who has. Lastly, PDI is easier to use and achieves more with less effort than those other products.

Then there is the Pentaho BA Server, built to be the linchpin on BI delivery for enterprises. It is built on a scalable, auditable platform able to deliver from dashboards and reports to OLAP and custom-made features. It supports background processing, results bursting by e-mail, load balacing (through native Java Webserver - like Tomcat - load balancing features), integration with corporate directories services as MS Active Directory and LDAP directories, with account management and lots of bell and whistles.

The suite's plugin architecture deserves a special remark: Both PDI and BA Server are built to be easily extended with plugins. There are two plugins marketplaces, one for PDI and onde for BA Server, with a good supply of diverse features. It all those plugins are not enough, there are means to develop you own plugin either coding in Java (mostly for PDI) or, for the BA Server, with point-and-click ease with Sparkl, a BA Server plugin for easy development and packing of new BA Server plugins (but some need of JavaScript, CSS and HTML is needed.)

Any company is able to design and delivery a deep and embrancing BI strategy with Pentaho. At its relatively low prices, when sided with comparable competition, the most valuable features are the data integration and the results delivery platform.

How has it helped my organization?

I work for the largest government owned IT enterprise in Brazil, employing over 10.000 people with yearly earning in surplus of half billion dollars. Designing and delivering timely BI solutions used to be a bogged down process because everything involved license costs. With Pentaho we were able to better suit our needs and better serve our customers. We use CE were for our departamental BI needs, and deliver solid service to our customers using paid licenses. Also, in being so complete, Pentaho has enabled a whole new level of experimentation and testing. We can completlly evaluate a customer need with CE licenses and then delivery the solution at a price, assembling it over EE licenses. We need paid support for our customers in order to be able to timely answer any outage.

What needs improvement?

Pentaho has a solid foundation and decent user interfaces. They are lacking, however, in the tool space for data exploration/presentation. The recent Data Discovery trend put a lot of strain on visual data analysis tools suppliers and Pentaho has chosen to strengthen their data integration features, aiming for Big Data and Hadoop growing market. The work on visual data exploring tools was then mainly left for the community to tackle on.

So, there is room for improvement regarding graphical interface for data exploration and presentation. Please note that there is no wanting for decent tools, only that the tools are not as sharp and as beautiful as QlikView, for instance. Pentaho delivers, no question, it only does not pleases the eye that much.

For how long have I used the solution?

I have been using the whole Pentaho suite for nine years. I have also self-published a book on Pentaho and regularly write for my BI/Pentaho blog.

What was my experience with deployment of the solution?

Being such a young product, experiencing fast evolution and rapid company growth, not every time things are bug free. Every new release cames in with its share of new bugs. Any upgrades were not without concerns, although there were never risk of losing data - Pentaho is simple to an extreme and hardly we find some nasty dependency hurting our deliveries.

The main deploy problems were with LDAP and Apache integration. There is a need for quite some knowledge on web servers architecture to allow a team a smooth delivery experience.

What do I think about the stability of the solution?

We did encounter stability issues. Being a data intensive application, Pentaho is quite sensitive to RAM limitations. Whenever not enough RAM is allocated for it to work, it would progressively slow down to a crawl and then to a halt. Lots of well managed disk cache and server clustering aleviates it though.

What do I think about the scalability of the solution?

Pentaho scales really very well.

Pentaho Data Integration scalation is a breeze: just setup the machines, configure the slaves and master and that is it. One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities, and that might be tricky but easy nonetheless. Bottom line of data integration scalability is limited to developers ingenuity on data processing compartmentalization so processing parallelization and remote processing become profitable for clustering.

Pentaho BA Server also scales well, on a quite standard load balancing scheme. Being a regular and well behaved Java program, the Pentaho BA Server is enabled to be clustered on the Java web server, like JBoss, or in a Apache/Tomcats multi-server loading balancing schema.

It is not for the amateur Pentaho administration to do it, however. In fact, a Pentaho administrator alone probably will have a degree of difficulty to achieve server scaling, and would be better of having help from web server clustering professionals.

How are customer service and technical support?

Customer Service:

My company has been served only be the Brazilian Pentaho's representative, which are knockout good guys and gals, which deliver it at any cost! They have even brought in Pentaho technicians from USA to assess some of our issues. Only kudos to them. I cannot opine on US or Europe support, but I have no reason to think less of them.

Technical Support:

Technical support is a mixed issue with Pentaho. As previously stated, it is a young product, from a young company. The technical support by the means of instructions manuals, fora, Wikis and the like is quite good. However, the fast growing has left some breaches along the documentation body.

For instance, I needed to find how to enable certain feature on reporting designing. I was not able to find it in the official help guides, but there was the project leader blog where I found a post talking about it. With the correct terming I was able to look for it in the International Forum, where lying there was the answer I was in need of. So, overall it is good, but it is still in the road for a complete and centralized, well managed, gapless documentation body.

Which solution did I use previously and why did I switch?

In fact we are still using the whole lot: MicroStrategy, Business Objects, and PowerCenter. We have not turned off all those implementations, only Pentaho clang all around us like weed - it is so easy to start using and gives results with so little effort it is almost impossible to use something else. Most of the time, we offer other options only at the customers requesting. Otherwise, left to us, we are most likely to propose using Pentaho.

How was the initial setup?

Hard answer: both. We got up to delivering results in almost no time. However, a sizeable lot of little vicious details kept resisting to us - most issues with stability, latter associated with RAM limitations, and user management, tied to LDAP integration. Part of the said difficulties stemed from bugs, too, so there were only a matter of time waiting for Pentaho to fix them,

After that the customer kicked in a lot of small changes and adaptations, truly to the "since-we-are-at-it"-scope-creep-spirit (some rightful, some pure fancy), which had us and Pentaho scratching our mutual heads. In the end we kinda helped them advance some updates in the Server. And delivered all that was asked.

What about the implementation team?

We started with our in house team and when things started to get too much weird or complicated the vendor team landed in. After that first fire baptism we got a couple of hard boiled ninjas that were able to firefight anything and the vendor team was sent back home, with praises.

What was our ROI?

No ROI for us. The company I work for has no business approach to BI strategy. All we, as a company care, is to make the customer happy and that has the cost of not letting us turn down some unprofiting projects. So, Pentaho is a good tool and capable of delivering millions of dollars on new/recouped/saved revenue, but we are not posing for that.

Thinking a bit more, the mere fact we are able to deliver more, and hence take more orders, might be seem as a return on our investment. Yet I can't exact a number, for even this kind of return is a little unclear.

What's my experience with pricing, setup cost, and licensing?

Pentaho is cheap, and becomes cheaper as your team master it. However, it would be a total waste of good dollars to believe my word. Try it for free and go look for professional support from Pentaho. You can also try to compare other tools with Pentaho, but keep in mind that, appart from SAS, all other tools compete on a part of Pentaho. So you must assembly a set of different products to fully compare to it.

Let us say you are going to build a standard dimensional data mart to serve OLAP up. Pentaho has a single price tag, which must be matched to a MicroStrategy PLUS Informatica PowerCenter to make for a correct comparison.

The Community Edition, a free version, is not short on features when compared to the Enterprise Edition, it is just a bit uglier.

To match a Pentaho license price with only either one will give wrong results.

Which other solutions did I evaluate?

Pentaho was a total unknown product back in 2006-2007. We ran several feature comparison sheets. The biggest and most controversial were against Informatica's PowerCenter and MicroStrategy Intelligent Server. Both were matched with Pentaho at some degree, and few things Pentaho was not able to deliver then. But, and this is a rather strong but, most of the time Pentaho had to be tweaked with to deliver that itens. It was a match, allright, but not a finished product by then.

Since that time the suite has evolved a lot and became more head to head comparable with the same products.

What other advice do I have?

Pentaho has a huge potential to deliver quite a lot of BI value. But on those days when BI is regarded as a simple multidimensional analytics tools, it seems a bit bloated and off the mark. It is so because Pentaho is not aimed to be flashy and eye-pleasing for a commomplace reporting monger (reporting is the farthest you can get from BI and still smell like it), and it requires a bit of strategy to allow for ROI. If you are looking for tools for immediate, prompt, beautifull remmedy, Pentaho might not be your pick. But if you know what you want to acomplish, go on and try it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user414204
Reporting and Big Data Analyst (BI Development and Data Science) at a energy/utilities company with 10,001+ employees
Vendor
It has ​helped us form great visualizations at practically no cost since we built end to end open source architecture.

What is most valuable?

  • Open Source Community Edition feature - helped us a lot with our budget in the beginning of our project
  • Ability to produce different types of visualizations (on par with Tableau) - map, pie, bar, heat map, table etc. Several other different options like exporting to CSV, downloading a table and so on
  • Ability to integrate with different databases seamlessly - Hadoop, MySQL etc
  • Ability to highly customize the dashboards to the developer's preference.

How has it helped my organization?

Helped us form great visualizations at practically no cost since we built end to end open source architecture. This has helped in gaining business insights for us. We have also learned a lot from the development perspective since this is more of a developer's tool than tools like Tableau/Qlikview that are pretty much automated.

What needs improvement?

A lot of room for improvement! (for the Community Edition. We are a big telecoms company and we deal with a lot of data,. approximately two million per day. It is too slow in rendering it. Also, certain features aren't available in the Community Edition like geo maps which we made possible through intensive coding.

The Enterprise Edition is too pricey. I must warn you that to use Pentaho, the developer must have a good knowledge on javascript, HTML, CSS, and advanced SQL concepts.

For how long have I used the solution?

We've used it for a year.

What was my experience with deployment of the solution?

Deployment is an issue only if you have a very specific ask of the system.

What do I think about the stability of the solution?

With a complex and large volume database stability becomes an issue.

What do I think about the scalability of the solution?

With a complex and large volume database scalability becomes an issue.

How are customer service and technical support?

Pentaho has a good community support - forums.pentaho.com and there are a lot of other forums.

Which solution did I use previously and why did I switch?

We were using Tableau with not many problems at all. it is a great tool. We switched to Pentaho to help with our budget at that time. Also, for its web application capacity as anyone with the right permissions can access the dashboards with just a link. and they don't need to have a license to access.

How was the initial setup?

It is easily downloadable from the website. Tutorials are available on YouTube for installation of Pentaho, but it becomes difficult when your requirements are specific, such as the incorporation of Hadoop, when there is no tutorial available for that. Also, when you encounter a problem, there is not much support you can expect, but a good cloud team can definitely help resolve the installation issues.

What about the implementation team?

It's an in-house one. It is easily downloadable/install-able but not very easily customize-able. We had a lot of problems before we could have a working version of Pentaho. The JAR files need to be specific. We need people in our team that are good with building VMs/cloud computing for implementing the server version of pentaho. We cannot ignore the importance of a maintenance team to cope up with any on going problems (you would like to expect a bunch).

What's my experience with pricing, setup cost, and licensing?

I know that price is very high for Pentaho Enterprise Edition. I think it's $250,000 per year. It's worth it only if you need a web application with features like that of Tableau's. If you are to buy something, I would suggest Tableau. It depends on your needs and how you plan to collect the data, but if you will be able to manage with the features that are available in Community Edition, then Pentaho is a good option.

Which other solutions did I evaluate?

We chose Pentaho over five other Big Data supporting BI solutions.

What other advice do I have?

Research it and know your needs.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user409920
Product Analyst at a recruiting/HR firm with 51-200 employees
Vendor
It enabled seamless data movement from one system to another. ​Debugging and logging needs to improve.

What is most valuable?

Transformations. The wide range of transformations that are available in product suite, enable me to do data cleaning, transformation and mapping. I have used Pentaho mostly for ETL purposes.

How has it helped my organization?

Pentaho was used for data massaging in a system integration project. It enabled seamless data movement from one system to another. The biggest advantage of using this tool was that it support a wide range of input and output formats

What needs improvement?

Debugging and logging needs to improve. Not all the time debugging gives the accurate picture of what is going wrong with the jobs or transformations.

For how long have I used the solution?

I've been using it for one year.

What was my experience with deployment of the solution?

We've had no issues deploying it.

What do I think about the stability of the solution?

It's been stable.

What do I think about the scalability of the solution?

Data type mismatch issues causes a lot of error in the transformations, which I believe at times make it difficult to scale. For example in many transformations you have to mention the data type of the data and if the input file changes the data type, these transformation also needs to be updated. Hence one data type change in the input file have cascading effects

How are customer service and technical support?

I've never had to use it, but they do have a strong online community.

Which solution did I use previously and why did I switch?

There was no previous solution in place.

How was the initial setup?

It was straightforward, as there was not much of configuration was involved.

What about the implementation team?

I was the vendor team who implemented this solution for the client.

What was our ROI?

It's difficult to predict the ROI as it was just a component of the system.

What other advice do I have?

The tool is easy to implement, but it needs technical acumen to do so. In other words although it look like a simple drag and drop kind of tool, it can be fairly complex.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Researcher/Data Scientist at a tech services company with 51-200 employees
Consultant
The data integration feature is the most valuable feature for us.

Valuable Features:

The data integration feature is the most valuable feature for us.

Improvements to My Organization:

We've used Pentaho since 2007 in many open source business intelligence projects.

Room for Improvement:

The dashboards and reports could use more improvement to show more data and to provide more analysis.

Use of Solution:

We've used it since 2007.

Deployment Issues:

We haven't had any issues with deployment.

Stability Issues:

We haven't had any issues with stability.

Scalability Issues:

We haven't had any issues with scalability.

Initial Setup:

It's straightforward and we haven't really encountered complexities in the initial setup that we can't handle.

Valuable Features:

The data integration feature is the most valuable feature for us.

Improvements to My Organization:

We've used Pentaho since 2007 in many open source business intelligence projects.

Room for Improvement:

The dashboards and reports could use more improvement to show more data and to provide more analysis.

Use of Solution:

We've used it since 2007.

Deployment Issues:

We haven't had any issues with deployment.

Stability Issues:

We haven't had any issues with stability.

Scalability Issues:

We haven't had any issues with scalability.

Initial Setup:

It's straightforward and we haven't really encountered complexities in the initial setup that we can't handle.

Disclosure: My company has a business relationship with this vendor other than being a customer: Consultants
ITCS user
Senior Consultant at a consumer goods company with 1,001-5,000 employees
Consultant
The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity.

Valuable Features

Pentaho Business Analytics platform overall is an outstanding product that offers great cost saving solutions for companies of all sizes. The Pentaho Business Analytics platform is built on top of several underlying open source projects driven by the community’s contributions. There are several features that I find invaluable and with each release, improvements are made.

The Pentaho User Console provides a portal for users that makes it easy for users to explore information interactively. Dashboard Reporting, scheduling jobs, and managing data connections are some of the features that are made easy with the console. For more advanced users you can extend Pentaho Analyzer with custom visualizations or create reporting solutions with Ctools. The Marketplace empowers the community to develop new and innovative plugins and simplifies the installation process of the plugins for the users of the console. The plugin framework provides a plugin contributor that extends the core services offered by the BI Server.

Pentaho Data Integration (Spoon) is also another valuable tool for development. Spoon delivers powerful extraction, transformation, and load capabilities using a Metadata approach. The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity. More advanced users can extend Pentaho Data Integration creating transformations and jobs dynamically.

Improvements to My Organization

My company was able to reduce software costs and hire additional staff given the cost savings that Pentaho provided. We are moving towards a Hadoop environment after the migration of our current ETL processes and Pentaho’s easy to use development tools and big data analytics capabilities were a factor in choosing Pentaho as a solution.

Room for Improvement

For those that run the open source community edition at times it can be difficult to find updated references for support. Even for companies that use the Enterprise Edition finding useful resources when a problem occurs can be difficult. Pentaho driven best practices should be made available to both the Community and Enterprise users to motivate and empower more users to use the solutions effectively.

Customer Service and Technical Support

Pentaho has stellar support services with extremely intelligent Pentaho and Hitachi consultants all over the world. Those support services and documentation are made available to Enterprise clients that have purchased the Enterprise Edition and have access to the support portal.

Initial Setup

Pentaho is easy to deploy, easy to use and maintain. It’s low cost and a fully supported business intelligence solution. I have used Pentaho in small and large organizations with great success.

Pricing, Setup Cost and Licensing

Enterprise licenses can be paid for the Enterprise Pentaho full service solution which offers support through the portal and access to Pentaho/Hitachi Consultants for additional costs.

Other Advice

Pentaho offers a community edition which is an open source solution and can be downloaded for free. The community edition truly gives most companies everything they need but your solution needs are matched with your business needs. As a cost cutting option Enterprise license fees can be paid to vendors to fund in demand support.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Engineer at a tech company with 51-200 employees
Vendor
It’s user-friendly when using it for a small dataset.

Valuable Features

My company embeds the following features into our product -

  • BI server
  • Analyzer report
  • Interactive report
  • Dashboard

Improvements to My Organization

We can demonstrate a better report UI and vivid experience for our customers.

Room for Improvement

The OEM license pricing is quite expensive. Enhance the authentication and authorization part.

Use of Solution

We've been using the BA Server Enterprise Edition for three years.

Deployment Issues

Most of our customers have a complex authentication environment. Our customers found it difficult to configure flexibly.

Stability Issues

There were no issues with the stability.

Scalability Issues

We have had no issues scaling it.

Customer Service and Technical Support

Customer Service:

7/10

Technical Support:

7/10

Initial Setup

It's kind of complex if user wants to implement LDAP authentication, such as defining the scope and filters.

Implementation Team

We implement it ourselves and then embedded it into our product. It’s not easy to separate the different components one needs.

Pricing, Setup Cost and Licensing

I don't know much, but I have heard that the price is too high.

Other Solutions Considered

We are looking for a different solution but have not determined one yet.

Other Advice

It’s a comprehensive BA tool including ETL and reporting. It’s easy to use with a small dataset, but you will need to dig deep to optimize the performance when the data set is huge.

Disclosure: My company has a business relationship with this vendor other than being a customer: We're an OEM partner
ITCS user
Senior Software Engineer at a tech services company with 5,001-10,000 employees
Real User
CDE dashboards are valuable. It's not flexible for an enterprise database.

Valuable Features

Spoon Schema work bench CDE dashboards Client budget

Improvements to My Organization

License cost has come down Easy maintenance

Room for Improvement

It's not flexible for an enterprise database.

Use of Solution

I've used it for four and a half years.

Deployment Issues

Deployment is easy but upgrading is an issue.

Stability Issues

There are no issues with the stability.

Scalability Issues

There have been no issues with scaling it.

Customer Service and Technical Support

Customer Service: 7/10 Technical Support: 7/10

Valuable Features

  • Spoon
  • Schema work bench
  • CDE dashboards
  • Client budget

Improvements to My Organization

  • License cost has come down
  • Easy maintenance

Room for Improvement

It's not flexible for an enterprise database.

Use of Solution

I've used it for four and a half years.

Deployment Issues

Deployment is easy but upgrading is an issue.

Stability Issues

There are no issues with the stability.

Scalability Issues

There have been no issues with scaling it.

Customer Service and Technical Support

Customer Service:

7/10

Technical Support:

7/10

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user394440
Data Scientist at a tech services company with 501-1,000 employees
Consultant
It became a lot easier for our developers to switch between or join the different development projects.

What is most valuable?

I found Pentaho Data Integration the most valuable component since it is the most mature open-source ETL tool available. Compared to other proprietary products it has a less steep learning curve due to it's very intuitive user interface. Besides that it has a pluggable architecture which makes it quite easy to extend with custom functionality and features.

Another thing worth mentioning is the very active user community around the products which provide some great resources for community support.

How has it helped my organization?

As for the data integration part each development team were writing their own integration scripts, parsers and interfaces from scratch on each different project over and over again. With Pentaho Data Integration which offers all these common tasks out-of-the-box we reduced development time significantly. Also by using such a universal tool and introducing a uniform architecture it became a lot easier for our developers to switch and/or join between the different development projects.

Also on the business intelligence part we moved from developing custom solutions on each track to the usage of standard functionality of the BI server and thus cutting down both complexity and development time.

What needs improvement?

Since most of our projects start off as a proof-of-concept with the Community Edition version of the products we found that the differences between the Community- and the Enterprise Editions are too big on certain levels. It would be a big gain if the Community Edition version would be a full representation of the Enterprise Editions making it easier to move on to the Enterprise Edition and support.

For how long have I used the solution?

I started using Pentaho Data Integration around seven years ago and moved on to the full stack about five years ago.

What was my experience with deployment of the solution?

I have seen many different (custom build) deployment solutions for Pentaho throughout the years each having their own pros and cons.

What do I think about the stability of the solution?

We've had no issues with its stability.

What do I think about the scalability of the solution?

Since Pentaho supports running as a single process to a clustered architecture and has a big focus on big data (distributed) environments, scalability hasn't been an issue for us.

How are customer service and technical support?

The open source strategy of Pentaho has resulted in a very active community which provided us all the support we need. Compared to other big vendors my personal experience is that response times are a lot shorter.

Which solution did I use previously and why did I switch?

Most of our previously used solutions were custom built. We have evaluated both open-source and proprietary competitive products but found that Pentaho was the easiest to adopt.

How was the initial setup?

Depending upon the solutions nature, the initial setup for a basic data warehouse architecture is quite straightforward. But as with all solutions as the landscape grows and user requirements evolve, the complexity increases. I think that Pentaho suits well in today's demand for a continuous integration approach. With this in mind the initial setup is crucial in a way not to find yourself spending a lot of time and effort in refactoring the complete solution over-and-over again.

What about the implementation team?

We implemented it in-house. Keep your development and implementation cycles short and small if possible. Users demand fast implementation of requirements so the continuous integration approach becomes more crucial as well as self-service functionality. From which the latter is not yet the strongest use-case for using Pentaho yet.

What was our ROI?

Decrease of development time compared to our traditional development cycles in pure Enterprise JAVA solutions should be estimated around 60%.

What's my experience with pricing, setup cost, and licensing?

Unfortunately I can't provide any exact figures about this. But using the Community edition for the development and test cycles drops down the licensing costs for the complete OTAP street.

What other advice do I have?

As mentioned before, there is a great community of users, developers and other enthusiasts which I recommend to consult for your particular use-case. Check the latest Gartner report (2016) about BI vendors and ultimately visit one of the Pentaho Community Meetups to get more insight.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user54702
Sr. Business Intelligence Engineer at a tech services company with 1,001-5,000 employees
Consultant
​We were able to configure our data warehouse and implement more than eight data cubes in a very short period of time.

What is most valuable?

Pentaho Data Integration - Even though this is the community edition it is very powerful compared to other expensive products like MS SQL, SSIS, or Informatica.

How has it helped my organization?

We were able to configure our data warehouse and implement more than eight data cubes in a very short period of time.

What needs improvement?

It needs to improve scheduling for the integration aspect. Pentaho Mondrian (cubes) should allow snowflake schemas

For how long have I used the solution?

We've been using the follow tools within Pentaho for two years.

Pentaho Data Integration - 5.3.0.0-213
Pentaho Aggregation Designer 2008-2012
Pentaho Schema Workbench 3.6.1
Pentaho User Console (Mondrian) 5.4.0.1.130

What was my experience with deployment of the solution?

We had no deployment issues.

What do I think about the stability of the solution?

We had no issues with the stability.

What do I think about the scalability of the solution?

We have had no issues scaling it.

How are customer service and technical support?

We haven't used tech support, but the forums are an excellent source of help.

Which solution did I use previously and why did I switch?

I have used Microsoft and Informatica. We chose Pentaho because it is an open source.

How was the initial setup?

I believe the software installation (client or server) is very straightforward.
However, complexity occurs when tuning properties for the Mondrian cubes.

What about the implementation team?

We did an in-house implementation, as it was fully customized.

What other advice do I have?

Pentaho is a good choice for Unix/Linux environments. The integration tool is quite good. The configuration for cubes is quite simple (XML) and it is very simple to configure dimensions, factual and calculated data. although, we use a Mondrian emulator (Sinatra) for our solution implementation.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Programmer Analyst at a tech company with 10,001+ employees
Real User
Big Data plugins and connectors are available by default.

Valuable Features:

We use both DataStage and Pentaho, but I don't see any special features in Pentaho that is not available in DataStage, except Big Data plugins and connectors. These are default in Pentaho, and for those who have expertise in Java and Java script can write their custom code or enhance and call from Pentaho's code.

Improvements to My Organization:

When we wanted pursue this tool, the major factor was the amount of money we could save. Other than that, there is no special use cases for using this product as it was not that matured when we started using it in 2012 when compared to other ETL tools. 

Room for Improvement:

It's still not user friendly or robust enough for our needs.

Deployment Issues:

There were issues with the deployment.

Stability Issues:

There have been stability issues.

Scalability Issues:

We've had issues scaling it for what we need.

Other Advice:

Since there are very few skilled people available on the market, we had to risk the project. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Founder and Business Intelligence Consultant at Know Solutions
Consultant
It helps me to create new BI environments for my customers.

Valuable Features

Business analytics CDE Flexible dashboard development Cube

Improvements to My Organization

I use Pentaho everyday in my company, creating new BI environments for my customers.

Room for Improvement

The ad-hoc reporting needs to be improved.

Deployment Issues

There were no deployment issues.

Stability Issues

There have been no issues with the stability.

Scalability Issues

There have been no issues with the scalability.

Customer Service and Technical Support

I've never had to use technical support.

Initial Setup

It was easy, and only took a few minutes.

Other Advice

You should always check the community version first as it may be better for you.

Valuable Features

  • Business analytics
  • CDE
  • Flexible dashboard development
  • Cube

Improvements to My Organization

I use Pentaho everyday in my company, creating new BI environments for my customers.

Room for Improvement

The ad-hoc reporting needs to be improved.

Deployment Issues

There were no deployment issues.

Stability Issues

There have been no issues with the stability.

Scalability Issues

There have been no issues with the scalability.

Customer Service and Technical Support

I've never had to use technical support.

Initial Setup

It was easy, and only took a few minutes.

Other Advice

You should always check the community version first as it may be better for you.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Programmer Analyst at a tech vendor with 51-200 employees
Vendor
Pentaho BA server is supported by CDE (Common Dashboard Editor) to create a responsive Dashboard which can used in Mobile. Mobile Responsiveness of BA server should be improved.

Valuable Features:

Pentaho Business Analytics tool provides ETL. Easy creation of Analyser, interactive & attractive dashboards. Pentaho BA server is supported by CDE (Common Dashboard Editor) to create responsive Dashboard which can used in Mobile.

Improvements to My Organization:

Organizing the data & representing data with Graphical view gives an insight on the data.

Room for Improvement:

Mobile Responsiveness of BA server should be improved. Analyser and interactive Dashboard should be accessible via Dashboard.

Deployment Issues:

We've had no issues with deployment.

Stability Issues:

It's been stable for us.

Scalability Issues:

It scales without issues.

Valuable Features:

Pentaho Business Analytics tool provides ETL. Easy creation of Analyser, interactive & attractive dashboards.

Pentaho BA server is supported by CDE (Common Dashboard Editor) to create responsive Dashboard which can used in Mobile.

Improvements to My Organization:

Organizing the data & representing data with Graphical view gives an insight on the data.

Room for Improvement:

Mobile Responsiveness of BA server should be improved.

Analyser and interactive Dashboard should be accessible via Dashboard.

Deployment Issues:

We've had no issues with deployment.

Stability Issues:

It's been stable for us.

Scalability Issues:

It scales without issues.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user384954
Data Modeller and Pentaho Integration Analyst at a financial services firm with 1,001-5,000 employees
Real User
It covers most of our ETL requirements. It relies heavily on Java caching and sometimes it causes issues.​

What is most valuable?

It's free and easy to use. You can actually explain it to any non-ETL developer.

How has it helped my organization?

It covers most of our ETL requirements.

What needs improvement?

There are issues with Java code and sometimes we have to fix it or apply work around.

For how long have I used the solution?

I've been using it for two years.

What was my experience with deployment of the solution?

There are issues with clustering, and partitioning which we have reported.

What do I think about the stability of the solution?

It relies heavily on Java caching and sometimes it causes issues.

How are customer service and technical support?

Customer Service: 7/10 Technical Support: 7/10

Which solution did I use previously and why did I switch?

I…

What is most valuable?

It's free and easy to use. You can actually explain it to any non-ETL developer.

How has it helped my organization?

It covers most of our ETL requirements.

What needs improvement?

There are issues with Java code and sometimes we have to fix it or apply work around.

For how long have I used the solution?

I've been using it for two years.

What was my experience with deployment of the solution?

There are issues with clustering, and partitioning which we have reported.

What do I think about the stability of the solution?

It relies heavily on Java caching and sometimes it causes issues.

How are customer service and technical support?

Customer Service:

7/10

Technical Support:

7/10

Which solution did I use previously and why did I switch?

I have used ODI, Talend and Informatica. Pentaho is just current ETL choice for organization.

How was the initial setup?

It's very easy.

What about the implementation team?

We have implemented in-house with customization.

What's my experience with pricing, setup cost, and licensing?

We're only using the Community Edition.

What other advice do I have?

It's very easy to use and implement.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
CS
Jaspersoft BI Consultant at a tech services company with 501-1,000 employees
Consultant
Leaderboard
​It helped in managing the data from different sources into one unique target.​ I would like to see what code the report tool generates.

What is most valuable?

Pentaho data integration Most of the ETL stuff can be done with minimal coding Reporting capabilities

How has it helped my organization?

It helped in managing the data from different sources into one unique target.

What needs improvement?

In the reporting tool, I would like to see what code it generates. As of now, there is no provision to see the underlying code of the PRD file.

For how long have I used the solution?

I've used it for one year.

What was my experience with deployment of the solution?

There have been no issues with deployment.

What do I think about the stability of the solution?

There have been no stability issues.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer

What is most valuable?

  • Pentaho data integration
  • Most of the ETL stuff can be done with minimal coding
  • Reporting capabilities

How has it helped my organization?

It helped in managing the data from different sources into one unique target.

What needs improvement?

In the reporting tool, I would like to see what code it generates. As of now, there is no provision to see the underlying code of the PRD file.

For how long have I used the solution?

I've used it for one year.

What was my experience with deployment of the solution?

There have been no issues with deployment.

What do I think about the stability of the solution?

There have been no stability issues.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

Customer Service:

I have not had to use the customer service.

Technical Support:

I have not had to use technical support.

Which solution did I use previously and why did I switch?

There was no other solution in place.

How was the initial setup?

It was straightforward and became complex later to our understanding of the existing structure and the use of the ETL to align with those.

What about the implementation team?

We did it in-house. You need to have a good understanding of what the tool can offer like ETL, MDM, ans SCDs.

What's my experience with pricing, setup cost, and licensing?

We're using the free edition.

What other advice do I have?

It's moderate to use, learn, and implement. It's nice and you should use it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Senior Software Engineer - Alfresco, Liferay, Pentaho at a construction company with 1,001-5,000 employees
Vendor
It's helped us with the integration of Big Data.

What is most valuable?

Speed, performance and for me the best one is integration with any data type, including Big Data.

How has it helped my organization?

We wanted a tool where by using it we could integrate multiple data sets, and could get responsive reports and Pentaho helped us do this in a good way.

What needs improvement?

The presentation of data and reports. Pentaho have made a lot of improvements in the newer versions, but it's still an area of improvement when compared to other products in market.

For how long have I used the solution?

We have been using Pentaho for over four years, starting from v3.2 we are continuously upgrading to new versions. We're also using Pentaho BI server v4.8.

What was my experience with deployment of the solution?

Pentaho BI server comes with inbuilt Tomcat container which is easy to deploy.

What do I think about the stability of the solution?

No. So far so good.

What do I think about the scalability of the solution?

No. So far so good.

How are customer service and technical support?

Being a community user, most of the time we depend on forums and the kind of response we got for tricky questions was just amazing.

Which solution did I use previously and why did I switch?

We were using Talend but simplicity of Pentaho attracts us and we settled on Pentaho.

How was the initial setup?

It was very much straightforward except at some points where we did our customizations which we had to take into account during new installations or upgrades.

What about the implementation team?

We implemented in-house on our own servers.

What's my experience with pricing, setup cost, and licensing?

We are using the Community Edition of Pentaho.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
CTO at a tech services company with 51-200 employees
Consultant
Top 5
R Script needs work. Provides good dashboards for non-IT users.

Valuable Features:

- Correct URL integration - Excelent Look & Feel - Good Dashboards integration with Bootstrap

Improvements to My Organization:

- Senior management dashboard - Pentaho Reporting and CDF integration

Room for Improvement:

- R Script

Use of Solution:

6 months

Valuable Features:

- Correct URL integration

- Excelent Look & Feel

- Good Dashboards integration with Bootstrap

Improvements to My Organization:

- Senior management dashboard

- Pentaho Reporting and CDF integration

Room for Improvement:

- R Script

Use of Solution:

6 months

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Product Owner of Business Intelligence at a computer software company with 501-1,000 employees
Vendor
We found workarounds to a couple of issues, but we spend too much time dealing with them. We're now evaluating other solutions.
With Pentaho we ran into a couple of issues - Multi-tenancy capability issues. We've found workarounds to the issues but our developers need to spend a great amount of time dealing with them. We're looking to expand our product offering to our customers and we'd like to have the ability to give them better access to their data. Pentaho isn't proving to be a great solution for our needs. Development time is currently too long for developing basic charts, dashboards and reports. All of these basic items take a long time to develop when using Pentaho. This costs us money and time. Part of these issues is due to the fact that Pentaho documentation is pretty limited. We need to go ad hoc and pay a lot for additional training. There aren't many help files for our developers. We're…

With Pentaho we ran into a couple of issues - Multi-tenancy capability issues. We've found workarounds to the issues but our developers need to spend a great amount of time dealing with them.

We're looking to expand our product offering to our customers and we'd like to have the ability to give them better access to their data. Pentaho isn't proving to be a great solution for our needs.

Development time is currently too long for developing basic charts, dashboards and reports. All of these basic items take a long time to develop when using Pentaho. This costs us money and time.

Part of these issues is due to the fact that Pentaho documentation is pretty limited. We need to go ad hoc and pay a lot for additional training. There aren't many help files for our developers.

We're currently evaluating other solutions which will allow us to develop dashboards, charts and dashboards quicker while still taking the price point into consideration.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
CTO at a tech services company with 51-200 employees
Consultant
Top 5
Fast Development (Agile BI), Good Charts and Visualization, Good Security, Good User Interface

What is most valuable?

Pentaho Analyzer (EE) Saiku (CE) Marketplace (CE) R (EE and CE) Community Dashboard Framework (CE) Dashboard Editor (EE)

How has it helped my organization?

Powerful Analytics, Fast KPI Analysis

For how long have I used the solution?

4 Years

What was my experience with deployment of the solution?

Integration with GeoServer (Specially ShapeFiles Layers on Maps)

What do I think about the stability of the solution?

None

What do I think about the scalability of the solution?

Migrate old version of Reports (.prpt) to a new version

How are customer service and technical support?

Customer Service: 5/10 Technical Support: 9/10

Which solution did I use previously and why did I switch?

Yes, QlikView.

How was the initial setup?

Difficulty:…

What is most valuable?

Pentaho Analyzer (EE)

Saiku (CE)

Marketplace (CE)

R (EE and CE)

Community Dashboard Framework (CE)

Dashboard Editor (EE)

How has it helped my organization?

Powerful Analytics, Fast KPI Analysis

For how long have I used the solution?

4 Years

What was my experience with deployment of the solution?

Integration with GeoServer (Specially ShapeFiles Layers on Maps)

What do I think about the stability of the solution?

None

What do I think about the scalability of the solution?

Migrate old version of Reports (.prpt) to a new version

How are customer service and technical support?

Customer Service:

5/10

Technical Support:

9/10

Which solution did I use previously and why did I switch?

Yes, QlikView.

How was the initial setup?

Difficulty: medium

What was our ROI?

45%

Which other solutions did I evaluate?

QlikView

Tableau, SpagoBI

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user164988
Senior Analyst at a tech services company with 10,001+ employees
Real User
A few steps in PDI can be fine tuned, but has enabled our clients and their users to make calculated decisions.

What is most valuable?

Pentaho Data Integration.

How has it helped my organization?

Pentaho reports have enabled our client and their users to take calculated and informed tactical decisions. The drag and drop reporting of Pentaho is very user friendly and takes care of all reporting needs very efficiently.

What needs improvement?

  1. A few steps in PDI like Database Join can be fine tuned.
  2. Work could be done on roles and securities.

For how long have I used the solution?

3 and a half years.

What was my experience with deployment of the solution?

None so far.

What do I think about the stability of the solution?

Pretty stable. But database and repository cache sometimes need to be manually refreshed to make the changes effective.

What do I think about the scalability of the solution?

Yes. A few steps like Database join and merge join steps could be finetuned.

How are customer service and technical support?

Customer Service:

A 3/5. The support is prompt and timely. But there have been some technical limitations sometimes.

Technical Support:

3/5.

Which solution did I use previously and why did I switch?

Used MSBI sparingly. Used Pentaho mostly for cost benefits.

How was the initial setup?

Pretty straightforward. Also the support Pentaho documents are self explanatory for installation guidelines.

What about the implementation team?

In-house team.

What was our ROI?

Awesome. Even for enterprise edition. Obviously much better for community version.

What's my experience with pricing, setup cost, and licensing?

Currently using community edition. So zero maintenance cost.

Which other solutions did I evaluate?

Not thoroughly enough.

What other advice do I have?

If budget is a constraint, then Pentaho is a very good option. If budget is not an issue , then paid BI tools can be easily considered. Pentaho is leader in open source technology but still may not match paid BI tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Senior Software Engineer - BI at a tech services company with 501-1,000 employees
Consultant
With the help of the Data Analysis tool, decisions are very easy but Data Mining needs improvement

What is most valuable?

The ability to maintain historical data.

How has it helped my organization?

For decision purposes, it serves the information to managers to so they can make proper decisions using the data. With the help of the Data Analysis tool, decisions are very easy.

What needs improvement?

Populating Cube Data Mining

For how long have I used the solution?

Four months.

What was my experience with deployment of the solution?

Yes we did.

What do I think about the stability of the solution?

No issues encountered.

What do I think about the scalability of the solution?

No issues encountered.

How are customer service and technical support?

Customer Service: 9 out of 10. Technical Support: 7 out of 10.

Which solution did I use previously and why did I

What is most valuable?

The ability to maintain historical data.

How has it helped my organization?

For decision purposes, it serves the information to managers to so they can make proper decisions using the data. With the help of the Data Analysis tool, decisions are very easy.

What needs improvement?

  • Populating Cube
  • Data Mining

For how long have I used the solution?

Four months.

What was my experience with deployment of the solution?

Yes we did.

What do I think about the stability of the solution?

No issues encountered.

What do I think about the scalability of the solution?

No issues encountered.

How are customer service and technical support?

Customer Service:

9 out of 10.

Technical Support:

7 out of 10.

Which solution did I use previously and why did I switch?

No previous solution used.

How was the initial setup?

It was complex to set-up.

What about the implementation team?

We implemented it through a vendor and I would rate them 9/10.

Which other solutions did I evaluate?

We also looked at Talend.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user175566
CIO at a wellness & fitness company with 51-200 employees
Vendor
The initial setup of the tool is very simple and the product is fairly simple.

What is most valuable?

  • OLAP analysis (in CE edition JPivot)
  • Dashboards (CDF + CDE)

How has it helped my organization?

With the launch of Pentaho the organization has aggregated information about the various functional areas and the ability to generate real-time analysis (OLAP) without the need of IT.

For how long have I used the solution?

It's in production since 2010

What was my experience with deployment of the solution?

None encountered.

What do I think about the stability of the solution?

It's fairly stable.

How are customer service and technical support?

Customer Service:

I use the CE without support.

Technical Support:

I use the CE without support.

Which solution did I use previously and why did I switch?

Yes. I changed because of costs.

How was the initial setup?

The initial setup of the tool is very simple. In the last version we have a marketplace for installing new features in the product.

What about the implementation team?

In-House.

What's my experience with pricing, setup cost, and licensing?

The cost of initial setup was only for the hours and technical expertise (internal development).

What other advice do I have?

A very important aspect of performance is the choice of the database to house the data warehouse. I recommend investing time in deciding the engine and architecture. We recommend using oriented columns engines.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user173703
IT Consultant at a computer software company with 51-200 employees
Vendor
Creating your own plugin is very easy with Pentaho

Valuable Features

I really like Pentaho Data Integration, an ETL tool that makes your Extract, Transform and Load process easy, with rich functionality and nice design tools. And, for some reason creating your own plugin is very easy.

Use of Solution

I have been use this product for about 3 years.

Deployment Issues

No, I did not encounter any issues with deployment so far, deployment process very straight forward.

Stability Issues

No, I didn't. For each release Pentaho has good stability

Scalability Issues

I haven't explored much on scalability, but they have scalabity capability.

Customer Service and Technical Support

For most of the time, I usually use Pentaho forums and the response from the Pentaho community is very good.

Initial Setup

The set-up is very…

Valuable Features

I really like Pentaho Data Integration, an ETL tool that makes your Extract, Transform and Load process easy, with rich functionality and nice design tools. And, for some reason creating your own plugin is very easy.

Use of Solution

I have been use this product for about 3 years.

Deployment Issues

No, I did not encounter any issues with deployment so far, deployment process very straight forward.

Stability Issues

No, I didn't. For each release Pentaho has good stability

Scalability Issues

I haven't explored much on scalability, but they have scalabity capability.

Customer Service and Technical Support

For most of the time, I usually use Pentaho forums and the response from the Pentaho community is very good.

Initial Setup

The set-up is very easy. For full installation, Pentaho provides a binary package with an installation wizard. Or alternatively, you can just download and extract file, then just work through it.

Other Solutions Considered

If you are looking for complete tools and easy to customize for your business intelligence solution, I would reccommend this product.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
IT Manager at a transportation company with 51-200 employees
Vendor
In terms of functionality, they're not growing as fast as other companies. It's good for showing the need for BI.

What is most valuable?

Pentaho Data Integration (PDI).

Pentaho Analysis Services

Pentaho Reporting

How has it helped my organization?

We developed Sales’s and HR's datamarts. So nowadays, managers of these departments can have quick and flexible response with them. I think it was an improvement, because in the past each new analyses demanded IT resources, taking time, and this doesn't occur nowadays. The final users have much more freedom to discover the information they need.

What needs improvement?

I think that Pentaho can improve a lot its UI interface and its tool for dashboard maintenance.

For how long have I used the solution?

2 years

What was my experience with deployment of the solution?

I think the most complex are the solutions with the most hardcore implementations. Pentaho could invest more to make the life of developers’ easier.

What do I think about the stability of the solution?

Yes, once in a while, we have to face a unexpected problem that takes us time to overcome. And it causes problems with user’s satisfaction.

What do I think about the scalability of the solution?

No. I think the choice for Pentaho was right for my company. It fits very well for our purpose, which was demonstrate to the directors the power of BI for the business. But, now there is a perception of the benefits, and the company is become bigger. Perhaps, in the near future, I can evaluate other options, even Pentaho EE.

How are customer service and technical support?

Customer Service:

My company has a procedure to evaluate all of our suppliers and we have questions about promptness, level of expertise, pre-sale and post-sale, effectiveness and efficiency.

Technical Support:

7 out of 10

Which solution did I use previously and why did I switch?

Yes, when I started with Pentaho in 20111 I already had worked in another company that had Cognos BI Suite as a BI solution.

How was the initial setup?

The initial setup was straightforward. The setup was done by my team, which had no expertise with the Pentaho BI Suite. In 2 days, I was presented with the first dashboards.

What about the implementation team?

I implemented my first Pentaho project with a vendor team, which help us a lot, but its level of expertise could be better. In the middle of the project, we had some delays related to doubts which had to be clarified by Pentaho’s professionals.

What was our ROI?

The ROI of this product is good, because in little time you can have the first’s outputs. But it’s not excellent if compared with other BI solutions, like QlikView or Tableau.

What's my experience with pricing, setup cost, and licensing?

My original setup cost for the first project was $30,000 and the final cost was about $35,000.

Which other solutions did I evaluate?

Yes. Cognos, Microstrategy and Jaspersoft.

What other advice do I have?

For me, Pentaho is not growing in terms of functionality, as fast as other companies in the same segment. The UI falls short and for more complex solutions, it’s necessary to have good developers. However, being an Open Source solution, I think it allows IT departments to show with low investment the importance of BI for the company.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user165921
Database Analyst at a government with 51-200 employees
Vendor
Working on poverty alleviation in Indonesia, this tool has improved the ability of policy makers to make good decisions.

How has it helped my organization?

I'm working on poverty alleviation in Indonesia; this tool has improved the ability to make good decisions for policy makers.

For how long have I used the solution?

3 years.

What was my experience with deployment of the solution?

Sometimes I had problems, but the problems could be settled within a brief amount of time.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

No issues with scalability.

How are customer service and technical support?

This is an open source application and gets the full support of the community.

Which solution did I use previously and why did I switch?

No, when i joined the office I immediately began using Pentaho.

How was the

How has it helped my organization?

I'm working on poverty alleviation in Indonesia; this tool has improved the ability to make good decisions for policy makers.

For how long have I used the solution?

3 years.

What was my experience with deployment of the solution?

Sometimes I had problems, but the problems could be settled within a brief amount of time.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

No issues with scalability.

How are customer service and technical support?

This is an open source application and gets the full support of the community.

Which solution did I use previously and why did I switch?

No, when i joined the office I immediately began using Pentaho.

How was the initial setup?

Yes, for the initial setup we had to understand database and some script xml.

What about the implementation team?

We implemented with internal office team.

What was our ROI?

Government Project, so did not do ROI for this, just KPI.

Which other solutions did I evaluate?

Yes, we compared with other apps with almost the same features as this application, but I was wholeheartedly with Pentaho because its free and fulfills our needs.

What other advice do I have?

Use this product and feel how the application can change your company's decision-making policy.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user164859
Data Warehouse Specialist at a educational organization with 501-1,000 employees
Vendor
More built-in features could improve the solution, but everything related to ETL is valuable.

What is most valuable?

Everything related to ETL.

How has it helped my organization?

It has provided the ability to successfully integrate many different systems into one big data warehouse.

What needs improvement?

More built-in features, better version control integration, online tutorials.

For how long have I used the solution?

4 years.

What was my experience with deployment of the solution?

No problems.

What do I think about the stability of the solution?

Yes, sometimes the subprocesses keep running even though the main process is stopped.

What do I think about the scalability of the solution?

No.

How are customer service and technical support?

Customer Service: Good. Technical Support: Less than average.

Which solution did I use previously and why

What is most valuable?

Everything related to ETL.

How has it helped my organization?

It has provided the ability to successfully integrate many different systems into one big data warehouse.

What needs improvement?

More built-in features, better version control integration, online tutorials.

For how long have I used the solution?

4 years.

What was my experience with deployment of the solution?

No problems.

What do I think about the stability of the solution?

Yes, sometimes the subprocesses keep running even though the main process is stopped.

What do I think about the scalability of the solution?

No.

How are customer service and technical support?

Customer Service:

Good.

Technical Support:

Less than average.

Which solution did I use previously and why did I switch?

No.

How was the initial setup?

Straightforward.

What about the implementation team?

In-house.

What's my experience with pricing, setup cost, and licensing?

Initial cost was $16K for licensing, not sure after that.

Which other solutions did I evaluate?

Yes, Informatica.

What other advice do I have?

Go for it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Final Thoughts – Part 6 of 6
Introduction This is the last of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world. Data Mining In this sixth part, originally, I'd like to at least touch on the only part of Pentaho BI Suite we have not talked about before: Data Mining. However as I gather my materials, I realized that Data Mining (along with its ilks: Machine Learning, Predictive Analysis, etc.) is too big of a topic to fit in the space that we have here. Even if I try, the usefulness would be limited at best since at the moment, while the result is being used to solve real-world problems, the usage of Data Mining tools is still exclusively within the realm of data…

Introduction

This is the last of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

Data Mining

In this sixth part, originally, I'd like to at least touch on the only part of Pentaho BI Suite we have not talked about before: Data Mining. However as I gather my materials, I realized that Data Mining (along with its ilks: Machine Learning, Predictive Analysis, etc.) is too big of a topic to fit in the space that we have here. Even if I try, the usefulness would be limited at best since at the moment, while the result is being used to solve real-world problems, the usage of Data Mining tools is still exclusively within the realm of data scientists.

In addition, as of late I use Python more for working with datasets that requires a lot of munging, preparing, and cleaning. So as an extension to that, I ended using Pandas, SciKit Learning, and other Python-specific Data Mining libraries instead of Weka (which is basically what the Pentaho Data Mining tool is).

So for those who are new to Data Mining with Pentaho, here is a good place to start, an interview with Mark Hall who was one of the author of Weka who now works for Pentaho: https://www.floss4science.com/machine-learning-with-weka-mark-hall

The link above also has some links to where to find more information.

For those who are experienced data scientists, you probably already made up your mind on which tool suits your needs best and just like I went with Python libraries, you may or may not prefer the GUI approach like Weka.

New Release: Pentaho 5.0 CE

For the rest of this review, we will go over the new changes that comes with the highly anticipated release of the 5.0 CE version. Overall, there are a lot of improvements in various parts of the suite such as PDI and PRD, but we will focus on the BI Server itself, where the largest impact of the new release can be seen.

A New Repository System

In this new release, one of the biggest shock for existing users is the switch from file-based repository system to the new JCR-based one. JCR is a database-backed content repository system that was implemented by the Apache Foundation and code-named “Jackrabbit.”

The Good:

  • Better metadata management
  • No longer need to refresh the repository manually after publishing solutions
  • A much better UI for dealing with the solutions
  • API to access the solutions via the repository which opens up a lot of opportunities for custom applications

The Bad:

  • It's not as familiar or convenient as the old file-based system
  • Need to use a synchronizer plugin to version-control the solutions'

It remains to be seen if this switch will pay off for both the developers and the users in the long run. But it is stable and working for the most part, so I can't complain.

The Marketplace

One of the best feature of the Pentaho BI Server is its plugin-friendly architecture. In version 5.0 this architecture has been given a new face called the Marketplace:

This new interface serves two important functions:

  1. It allows admins to install and update plugins (almost all Pentaho CE tools are written as plugins) effortlessly
  2. It allows developers to publish their own plugins to the world

There are already several new plugins that is available with this new release, notably Pivot4J Analytics. An alternative to Saiku that shows a lot of promises to become a very useful tool to work with OLAP data. Another one that excites me is Sparkl with which you can create other custom plugins.

The Administration Console

The new version also brings about a new Administration Console where we manage Users and Roles:

No longer do we have to fire-off another server just to do this basic administrator task. In addition, you can manage the Mail server (no more wrangling configuration files).

The New Dashboard Editor

As we discussed in Part V of this review, the CDE is a very powerful dashboard editor. In version 5.0, the list of available Components are further lengthen by new ones. And the overall editor seems to be more responsive in this new release.

Usage experience: The improvements in the Dashboard editor is helping me to create dashboards for my clients that goes beyond the static ones. In fact, the one below (demo purposes only) has the interactivity level that rivals a web application or an electronic form:

NOTE: Nikon and Olympus are trademarks of Nikon Corporation and Olympus Group respectively.

Parting Thoughts

Even though the final product of a Data Warehouse of a BI system is a set of answers and forecasts, or dashboards and reports, it is easy to forget that without the tools that help us to consolidate, clean up, aggregate, and analyze the data, we will never get to the results we are aiming for.

As you can probably tell, I serve my clients with various tools that makes sense given their situation, but time and again, the Pentaho BI Suite (CE version especially) has risen to fulfill the needs. I have created Data Warehouses from scratch using Pentaho BI CE, pulling in data from various sources using the PDI, created OLAP cubes with the PSW, which ends up as the data source for the various dashboards (financial dashboards, inventory dashboards, marketing dashboards, etc.) and published reports created using the PRD.

Of course my familiarity with the tool helps, but I am also familiar with a lot of other BI tools beside Pentaho. And sometimes I do have to use other tools in preference to Pentaho because they suit the needs better.

But as I always mention to my clients, unless you have a good relationship with the vendor to avoid paying hundreds-of-thousands per year just to be able to use tools like IBM Cognos, Oracle BI, or SAP Business Objects, there is a good chance that the Pentaho (either EE or CE version) can do the same for less, even zero license cost in the case of CE.

Given the increased awareness on the value of data analysis in today's companies, these BI tools will continue to become more and more sophisticated and powerful. It is up to us business owners, consultants, and data analysis everywhere to develop the skills to harness the tool and crank out useful, accurate, and yes, easy-on-the-eyes decision-support systems. And I suspect that we will always see Pentaho as one of the viable options. A testament to the quality of the team working on it. The CE team in particular, it would be amiss not to acknowledge their efforts to improve and maintain a tool this complex using the Open Source paradigm.

So here we are, at the end of the sixth part. Writing this six-part review has been a blast. And I would like to give a shout out to the IT Central Station who has graciously hosted this review for all to benefit from. Thanks for reading.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user76890
Engineer at a marketing services firm with 51-200 employees
Vendor
It does a lot of what we need but off-the-shelf solutions often can’t do exactly what you need
Being in the business of online-to-offline ad attribution and advertising analytics, we need tools to help us analyze billions of records to discover interesting insights for our clients. One of the tools we use is Pentaho, an open source business intelligence platform that allows us to manage, transform, and explore our data. It offers some nice GUI tools, can be quickly set up on top of existing data, and has the advantage of being on our home team. But for all the benefits of Pentaho, making it work for us has required tweaking and in some cases replacing Pentaho with other solutions. Don’t take this the wrong way: we like Pentaho, and it does a lot of what we need. But at the edges, any off-the-shelf solution often can’t do exactly what you need. Perhaps the…

Being in the business of online-to-offline ad attribution and advertising analytics, we need tools to help us analyze billions of records to discover interesting insights for our clients. One of the tools we use is Pentaho, an open source business intelligence platform that allows us to manage, transform, and explore our data. It offers some nice GUI tools, can be quickly set up on top of existing data, and has the advantage of being on our home team.

But for all the benefits of Pentaho, making it work for us has required tweaking and in some cases replacing Pentaho with other solutions. Don’t take this the wrong way: we like Pentaho, and it does a lot of what we need. But at the edges, any off-the-shelf solution often can’t do exactly what you need.

Perhaps the biggest problem we faced was getting queries against our cubes to run quickly. Because Pentaho is built around Mondrian, and Mondrian is a ROLAP, every query against our cubes requires building dozens of queries that join tables with billions of rows. In some cases this meant that Mondrian queries could require hours to run. Our fix has been to make extensive use of summary tables, i.e. summarizing counts of raw data at levels we know our cubes will need to execute queries. This has allowed us to take queries that ran in hours to run in seconds by doing the summarization for all queries once in advance. At worst our Mondrian queries can take a couple minutes to complete if we ask for really complicated things.

Early on, we tried to extend our internal use of Pentaho to our clients by using Action Sequences, also known as xactions after the Action Sequence file extension. Our primary use of xactions was to create simple interfaces for getting the results of Mondrian queries that could then be displayed to clients in our Rails web application. But in addition to sometimes slow Mondrian queries (in the world of client-facing solutions, even 15 seconds is extremely slow), xactions introduce considerable latency as they start up and execute, adding as much as 5 seconds on top of the time it takes to execute the query.

Ultimately we couldn’t make xactions fast enough to deliver data to the client interface, so we instead took the approach we use today. We first discover what is useful in Pentaho internally, then build solutions that query directly against our RDBMS to quickly deliver results to clients. Although, to be fair to Mondiran, some of these solutions require us to summarize data in advance of user requests to get the speed we want because that data is just that big and the queries are just that complex.

We’ve also made extensive use of Pentaho Data Integration, also known as Kettle. One of the nice features about Kettle is Spoon, a GUI editor for writing Kettle jobs and transforms. Spoon made it easy for us to set up ETL processes in Kettle and take advantage of Kettle’s ability to easily spread load across processing resources. The tradeoff, as we soon learned, was that Spoon makes the XML descriptions of Kettle jobs and transforms difficult to work on concurrently, a major problem for us since we use distributed version control. Additionally, Kettle files don’t have a really good, general way of reusing code short of writing custom Kettle steps in Java, so it makes maintaining our large collection of Kettle jobs and transforms difficult. On the whole, Kettle was great for getting things up and running quickly, but over time we find its rapid development advantages are outweighed by the advantages of using a general programming language for our ETL. The result is that we are slowly transitioning to writing ETL in Ruby, but only transitioning 0n an as-needed basis since our existing Kettle code works well.

As we move forward, we may find additional places where Pentaho does not fully meet our needs and we must find other solutions to our unique problems. But on the whole, Pentaho has proven to be a great starting platform for getting our analytics up and running and has allowed us to iteratively build out our technologies without needing to develop custom solutions from scratch for everything we do. And, I expect, Pentaho will long have a place at our company as an internal tool for initial development of services we will offer to our clients.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Dashboards – Part 5 of 6
Introduction This is the fifth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world. In this fifth part, we'll be discussing how to create useful and meaningful dashboards using the tools available to us in the Pentaho BI Suite. As a complete Data Warehouse building tool, Pentaho offers the most important aspect for delivering enterprise-class dashboards, namely Access Control List (ACL). A dashboard-creation tool without this ability to limit dashboards access to a particular group or role within the company is missing a crucial feature, something that we cannot recommend to our clients. On the Enterprise Edition (EE) version 5.0,…

Introduction

This is the fifth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fifth part, we'll be discussing how to create useful and meaningful dashboards using the tools available to us in the Pentaho BI Suite. As a complete Data Warehouse building tool, Pentaho offers the most important aspect for delivering enterprise-class dashboards, namely Access Control List (ACL). A dashboard-creation tool without this ability to limit dashboards access to a particular group or role within the company is missing a crucial feature, something that we cannot recommend to our clients.

On the Enterprise Edition (EE) version 5.0, dashboard creation has a user-friendly UI that is as simple as drag-and-drop. It looks like this:

Figure 1. The EE version of the Dashboard Designer (CDE in the CE version)

Here the user is guided to choose a type of grid layout that is already prepared by Pentaho. Of course the option to customize the looks and change individual components are available under the hood, but it is clear that this UI is aimed towards end-users looking for quick results. More experienced dashboard designers would feel severely restricted by this.

In the rest of this review, we will go over dashboard creation using the Community Edition (CE) version 4.5. Here we are going to see a more flexible UI which unfortunately also demands familiarity with javacript and chart library customizations to create something more than just basic dashboards.

BI Server Revisited

In the Pentaho BI Suite, dashboards are setup in these two places:

  1. Using special ETLs we prepare the data to be displayed on the dashboards according to the frequency of update that is required by the user. For example, for daily sales figures, the ETL would be scheduled to run every night. Why do we do this? Because the benefits are two-fold: It increase the performance of the dashboards because it is working with pre-calculated data, and it allows us to apply dashboard-level business rules.
  2. The BI Server is where we design, edit, assign access permissions to dashboards. Deep URLs could be obtained for a particular dashboard to be displayed on a separate website, but some care has to be taken to go through the Pentaho user authorization; depending on the web server setup, it could be as simple as passing authorization tokens, or as complex as registering and configuring a custom module.

Next, we will discuss each of these steps in creating a dashboard. As usual, the screenshots below are sanitized and there are no real data being represented. Data from a fictitious microbrewery is used to illustrate and relate the concepts.

Ready, Set, Dash!

The first step is to initiate the creation of a dashboard. This is accomplished by selecting File > New > CDE Dashboard. A little background note, CDE (which stands for Ctools Dashboard Editor) is part of the Community Tools (or Ctools) created by the team who maintains and improve Pentaho CE.

After initiating the creation of a new dashboard, this is what we will see:

Figure 2. The Layout screen where we perform the layout step

The first thing to do is to save the newly created (empty) dashboard into somewhere within the Pentaho solution folder (just like what we did when we save an Analytic or Ad-Hoc Reports). To save the currently worked on dashboard, use the familiar New | Save | Save As | Reload | Settings menu. We will not go into details on each of this self-explanatory menus.

Now look at the top-right section. There are three buttons that will toggle the screen mode, this particular one is in the Layout mode.

In this mode, we take care of the layout of the dashboard. On the left panel, we see the Layout Structure. It is basically a grid that is made out of Row entries, which contains Column(s) which itself may contain another set of Row(s). The big difference between Row and Column is that the Column actually contains the Components such as charts, tables, and many other types. We give a name to a Column to tie it to a content. Because of this, the names of the Columns must be unique within a dashboard.

The panel to the right, is a list of properties that we can set the values of, mostly HTML and CSS attributes that tells the browser how to render the layout. It is recommended to create a company-wide CSS to show the company logo, colors, and other visual markings on the dashboard.

So basically all we are doing in this Layout mode is determining where certain contents should appear within the dashboard, and we do that by naming each of the place where we want those contents to be displayed.

NOTE: Even though the contents are placed within a Column, it is a good practice to name the Rows clearly to indicate the sections of the dashboard, so we can go back later and be able to locate the dashboard elements quickly.

Lining-Up Components

After we defined the layout of the dashboard using the Layout mode, we move on to the next step by clicking on the Components button on the top horizontal menu as shown in the screenshot below:

Figure 3. The Components mode where we define the dashboard components

Usage experience: Although more complex, the CDE is well implemented and quite robust. During our usage to build dashboards for our clients, we have never seen it produce inconsistent results.

In this Components mode, there are three sections (going from left to right). The left-most panel contains the selection of components (data presentation unit). Ranging from simple table, to the complex charting options (based on Protovis data visualization library), we can choose how to present the data on the dashboard.

The next section to the right contains the current components already chosen for the dashboard we are building. As we select each of these components, its properties are displayed in the section next to it. The Properties section is where we fill-in the information such as:

  • Where the data is coming from
  • Where the Component will be displayed in the dashboard. This is done by referring to the previously defined Column from the Layout screen
  • Customization such as table column width, the colors of a pie chart, custom scripting that should be run before or after the component is drawn

This clean separation between the Layout and the Components makes it easy for us to create dashboards that are easy to maintain and accommodates different versions of the components.

Where The Data Is Sourced

The last mode is the Data Source mode where we define where the dashboard Components will get their data:

Figure 4. The Data Sources mode where we define where the data is coming from

As seen in the left-most panel, the data source type is quite comprehensive. We typically use either SQL or MDX queries to fetch the data set in the format that is suitable to be presented in the Components we defined earlier.

For instance, a data set to be presented in a five-columns table will look different than one that will be presented in a Pie Chart.

This screen follows the other in terms of sections, we have (from left to right) the Data Source type list, the currently defined data sources, and the Properties section on the right.

Usage experience: There may be some confusion for those who are not familiar with the way Pentaho define a data source. There are two “data source” concepts represented here. One is the Data Source defined in this step for the dashboard, and the other, the “data source” or “data model” where the Data Source connects to and run the query against.

After we define the Data Sources and name them, we go back to the Components mode and specify these names as the value of the Data source property of the defined components.

Voila! A Dashboard

By the time we finished defining the Data Sources, Components, and Layout, we end up with a dashboard. Ours looks like this:

Figure 5. The resulting dashboard

The Title of the dashboard and the date range is contained within one Row. So are the first table and the pie chart. This demonstrates the flexibility of the grid system used in the Layout mode.

The company color and fonts used in this dashboard is controlled via the custom CSS specified as Resource in the Layout mode.

All that is left to do at this point is to give the dashboard some role-based permissions so access to it will be limited to those who are in the specified role.

TIP: Never assign permission at the individual user level. Why? Think about what has to happen when the person change position and is replaced by someone else.

Extreme Customization

Anything from table column width to the rotation-degrees of the x-axis labels can be customized via the properties. Furthermore, for those who are well-versed in Javascript language, there are tons of things that we can do to make the dashboard more than just a static display.

These customizations can actually be useful other than just making things sparkle and easier to read. For example, by using some scripting, we can apply some dashboard-level business rules to the dashboard.

Usage experience:Let's say we wanted to trigger some numbers displayed to be in the red when it fell below a certain threshold, we do this using the post-execution property of the component and the script looks like this:

Figure 6. A sample post-execution script

Summary

The CDE is a good tool for building dashboards, coupled with the ACL feature built into the Pentaho BI Server, they serve as a good platform for planning and delivering your dashboard solutions. Are there other tools out there that can do the same thing with the same degree of flexibility? Sure. But for the cost of only time spent on learning (which can be shortened significantly by hiring a competent BI consultant), it is quite hard to beat free licensing cost.

To squeeze out its potentials, CDE requires a lot of familiarity with programming concepts such as formatting masks, javascript scripting, pre- and post- events, and most of the times, the answer to how-to questions can only be found in random conversations between Pentaho CE developers. So please be duly warned.

But if we can get past those hurdles, it can bring about some of the most useful and clear dashboards. Notice we didn't mention “pretty” (as in “gimicky”) because that is not what makes a dashboard really useful for CEOs and Business Owners in day-to-day decision-making.

Next in the final part (part-six), we will wrap up the review with a peek into the Weka Data Mining facility in Pentaho, and some closing thoughts.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho Analytics – Part 4 of 6
Introduction This is the fourth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world. In this fourth part, we'll be discussing the Pentaho Analytics tools and facilities, which provides the ability to view, “slice and dice” data from multiple dimensions. This particular feature is the most associated with the word “Business Intelligence” due to its usefulness to aid cross-data-domain decision-making processes. Any decent BI suites have at least one facility with which users can perform data analysis with. One important note, specifically for Pentaho, the Analytics toolset is where the real advantage of the Enterprise…
Introduction

This is the fourth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fourth part, we'll be discussing the Pentaho Analytics tools and facilities, which provides the ability to view, “slice and dice” data from multiple dimensions. This particular feature is the most associated with the word “Business Intelligence” due to its usefulness to aid cross-data-domain decision-making processes. Any decent BI suites have at least one facility with which users can perform data analysis with.

One important note, specifically for Pentaho, the Analytics toolset is where the real advantage of the Enterprise Edition (EE) over Community Edition (CE) starts to show-through – other than the much more polished UI.

In the Pentaho BI Suite, we have these analytics tools:

  1. Saiku Analytics (In EE this is called “Analysis Report”) – A tool built into Pentaho User Console (PUC) that utilizes the available analysis models. Do not confuse this with the Saiku Reporting.
  2. Pentaho Model Data Source – In part three of the review, we discussed this facility to create data models for Ad-hoc reporting. The second usage of this facility is to create an OLAP “cube” for use with the Saiku Analytics tool. Once this is setup by the data personnel, data owners can use it to generate analytic reports.
  3. Schema Workbench – A separate program that allows for handcrafting OLAP cube schemas. Proviciency with MDX query language is not necessary but can come in handy in certain situations.

As usual, we'll discuss each of these components individually. The screenshots below are sanitized and there are no real data being represented. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.

Saiku Analytics (Analysis Report in EE)

One of the benefit of having a Data Warehouse is to be able to model existing data in a structure that is conducive to analysis. If we try to feed tools such as this with a heavily normalized transaction database, we are inviting two problems:

1. We will be forced to do complex joins which will manifest itself in performance hit and difficulty when business rules change

2. We lose the ability to apply non-transactional business rules to the data which is closer to the rule maintainers (typically those who work closely with the business decision-makers)

Therefore to use this tool effectively we need to be thinking in terms of what questions need to be answered, then work our way backwards employing data personnels to create the suitable model for the said questions. Coincidentally, this process of modeling data suitable for reporting is a big part of building a Data Warehouse.

Learning experience: Those who are familiar with MS Excel (or Libre Office) Pivot Tables will be at home with this tool. Basically, as the model allows, we can design the view or report by assigning dimensions into columns and rows, and then assigning measures to define what kind of numbers we are expecting to see. We will discuss below what 'dimension' and 'measure' mean in this context, but for an in-depth treatment, we recommend consulting your data personnels.

Usage experience: The EE version of this tool has a clearer interface as far as where to drop dimensions and measures, but the CE version is usable once we are accustomed to how it works. Another point for the EE version (version 5.0) is the ability to generate total sums in both row and column direction and a much more usable Excel export.

Figure 1. The EE version of the Analysis Report (Saiku Analytics in CE)

Pentaho Model Data Source

The Data Source facility is accessible from within the PUC. As described in Part 3 of this review, once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.

Here we are focusing on using this feature to setup “cubes” instead of “models.” This is something that your data personnels should be familiar with, guided by the business questions that needs answering.

Unlike the “model”, the “cubes” are not flat, rather it consists of multiple dimensions that determines how the measures are aggregated. Out of these “cubes” non-technical users can create reports by designing it just like they would Pivot Tables. The most useful aspect of this tool is to abstract a construction of an OLAP cube schema to its most core concepts. For example, given a fact table, this tool will try to generate an OLAP cube schema. And in most part, it's doing a good job in the sense that the cube is immediately usable to generate Analysis Reports.

This tool also hide the distinction between Hierarchies and Levels of dimensions. For the most part, you can do a lot with just one Level anyway, so this is easier to grasp for beginners in OLAP schema design.

Learning experience: The data personnel must be 1) familiar with the BI table structures or at the very least can pinpoint which of the tables are facts and dimensions; 2) comfortable with designing OLAP dimensions and measures. Data owners must be familiar with the structure and usage of the data. The combined efforts by these two roles are the building blocks of a workflow/process.

Usage experience: Utilizing the workflow/process defined above, an organization will generate a collection of OLAP cubes that can be used to analyze the business data with increasing accuracy and usefulness. The most important consideration from the business standpoint, is that all of this will take some time to materialize. The incorrect attitude here would be to expect instant results, which will not transpire unless the dataset is overly simplistic.

Figure 2. Creating a model out of a SQL query

NOTE: Again, this is where the maturity level of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the OLAP cube, which may or may not have an effect on the created reports and dashboards.

If the DW is designed correctly, there should be quite a few fact tables that can readily be used in the OLAP cube.

Schema Workbench

The Schema Workbench is for those who needs to create a custom OLAP schema that cannot be generated via the Data Source facility in the PUC. Usually this involves complicated measure definitions, multi-Hierarchy or multi-Level dimensions, or to evaluate and optimize MDX queries.

NOTE: In the 5.0 version of PUC, we can import existing MDX queries into the Data Source Model making it available for the Analysis Report (or Saiku Ad-Hoc report in the CE version). As can be seen in the screenshot below, the program is quite complex with the numerous features to handcraft an OLAP cube schema.

Once a schema is validated in the Workbench, we need to publish it. Using the password defined in the pentaho-solutions/system/publisher_config.xml, the Workbench will prompt for the location of the cube within the BI Server and the displayed name. From that point, it will be available to choose from the drop-down list on the top left of the Saiku Analytics tool.

Figure 3. A Saiku report in progress

OLAP Cube Schema Considerations

Start by defining the fact table (bi_convection in the above example), then start defining dimensions and measures.

We have been talking about these concepts of dimension and measure. Let's briefly define them:

  1. A dimension is a way to view existing business data. For instance, a single figure such as sales number can be viewed from the perspectives. We can view it per sales regions, per salesperson or department, or chronologically. Using aggregation function such as sum, average, min/max, standard deviation, etc. we can come up with different numbers that shows the data in a manner that we can draw conclusion from.
  2. A measure is the numbers or counts of business data that can provide an indication on how the business is doing. For a shoe manufacturing company, obviously the number of shoes sold is one very important measure, another would be the average price of sold shoes. Combined with dimensions, we can use the measures to make a business decision.

In the Schema Workbench, as you select the existing BI table fields into the proper dimensions, it will validate the accessibility of the fields using the existing database connection, then create a view of the measures using a certain user-configurable way to aggregate the numbers.

In the creation of an OLAP cube schema, there is a special dimension that enables us to see data chronologically. Due to its universal nature, this dimension is a good one to start with. The time dimension is typically served by a special BI table that contains a flat list of rows containing time and date information within the needed granularity (some businesses requires seconds, others days, or even weeks or months).

TIP: Measures can be defined using “case when” SQL construct, which opens a whole other level of flexibility.

When should we use MDX vs SQL?

The MDX query language, with its powerful concepts like ParallelPeriods, is suitable for generating tabular data containing aggregated data that is useful for comparison purposes.

True to its intended purposes, MDX queries allows for querying data which is presented in a multi-dimensional fashion. While SQL is easier to grasp and has a wider base of users/experts in any industry.

In reality, we use these two languages at different levels, the key is to be comfortable with both, and discover the cases where one would make more sense than the other.

NOTE: The powerful Mondrian engine is capable, but without a judicious use of database indexing, query performance can crawl into minutes instead of seconds easily. This is where data personnels with database tuning experiences would be extremely helpful.

Summary

The analytics tools in the Pentaho BI Suite is quite comprehensive. Certainly better than some of the competing tools out there. The analytic reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. There are three facilities available:

The Analysis Report (or Saiku Analytics in CE version) is a good tool for building reports that look into an existing OLAP cube and do the “slicing and dicing” of data.

The Data Source facility can also be used to create OLAP cubes from existing BI tables in the DW. A good use of this facility is to build a collection of OLAP cubes to answer business questions.

The Schema Workbench is a standalone tool which allows for handcrafting custom OLAP cube schemas. This tool is handy for complicated measure definitions and multilevel dimensions. It is also a good MDX query builder and evaluator.

Next in part-five, we will discuss the Pentaho Dashboard design tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho Reporting – Part 3 of 6
This is the third of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world. In this third part, we'll be discussing the tools and facilities, with which all of the reports are designed, generated, and served. A full BI suite should have a few reporting facilities that are usable by users with different level of technical/database knowledge. Why is this important? Because in the real world, owners of data (people who consume the reports to make various business decisions) ranges from accountants, customer account managers, supply-chain managers, C-level executives, manufacturing managers, etc. Notice that proficiency in…

This is the third of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this third part, we'll be discussing the tools and facilities, with which all of the reports are designed, generated, and served. A full BI suite should have a few reporting facilities that are usable by users with different level of technical/database knowledge.

Why is this important? Because in the real world, owners of data (people who consume the reports to make various business decisions) ranges from accountants, customer account managers, supply-chain managers, C-level executives, manufacturing managers, etc. Notice that proficiency in writing SQL queries a prerequisite to any of those positions?

In the Pentaho BI Suite, we have these reporting components:

  1. Pentaho Report Designer – A stand-alone program that are par with Jasper or iReport and to the lesser extent Crystal report designers.
  2. Pentaho Model Data Source – A way to encapsulate data sources which includes the most flexible of all, a SQL query. Once this is setup by the data personnel, data owners can use it to generate ad-hoc reports – and dashboards too, which we'll discuss in Part 5 of this review series.
  3. Saiku Reporting Tool – A convenient way to create ad-hoc reports based on the Pentaho Data Sources (see number 2 above).

Let's discuss each of these components individually. The screenshots below are sanitized to remove references to our actual clients. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.

This Java standalone program feels like the Eclipse Java development IDE because they share the UI library. If you are already familiar with Jasper Reports, iReports, or Crystal Report, the concepts are similar (bands, groups, details, sub-reports). You start with a master report in which you can combine different data sources (SQL and MDX queries in this case) into a layout that is managed via a set of properties.

Learning experience: As with any report designers, which are complex software because of the sheer number of tweak-able properties governing each element of the reports, one has to be prepared to learn the PRD. While the tools are laid out logically, it will take some time for a new personnel to absorb the main concepts. The sub-report facility is one of the most powerful feature of this program and it is the key to create reports that drills into more than one axis (or dimension) of data.

Usage experience: Things like the placement accuracy of elements within the page is not 100% precise and there are times when I had to work around the quirks and inconsistencies revolving around setting default values for properties, especially the ones containing formulas. Be prepared to have a dedicated personnel (either a permanent employee or a consultant) that can be reached for report designs *and* subsequent modifications. In addition, aesthetic considerations are also important in order to create a visually engaging reports (who wants to read a boring and bland report?).

Figure 1. The typical look of PRD when designing a report.

The Data Source facility is accessible from within the Pentaho BI Server UI (the PUC, see Part 2 of this review series for more information). Once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.

This feature allows data personnel to setup “models” that can be constructed from various data sources, that represents a flat-view of data, of which a non-technical data owners can create ad-hoc reports or dashboards. Obviously this feature does not alleviate the need for knowing how to use the available tools for creating those reports and dashboards. It simply detach the dependency on crafting SQL/MDX queries and the intricacies of OLAP data structures from creating an ad-hoc report.

Learning experience: A data personnel who are familiar with the Data Warehouse (DW) can easily create models out of SQL queries against existing tables within the DW, or by using MDX queries against existing OLAP cubes. Data owners who are familiar with the data itself, can then start to use the Saiku Ad hoc Reporting tool or the CDE (Community-tools Dashboard Editor) to create dashboards. In reality, expect a couple of weeks for the personnels to get accustomed to this feature. Assumption: A knowledgeable BI teacher or consultant is available during this time. Usage experience: By separating the technical-database skill from the ability to generate ad-hoc reports, Pentaho has provided a way for organizations to streamline their business decision-making process further away from the technical minutiae that tends to bog down the process with details that are not relevant to the business goals. I highly rate this feature in the Pentaho BI Suite as one of the more innovative contribution to the area of Business Process Management.


Figure 2. Creating a model out of a SQL query

NOTE: The most important part of using this facility has to do more with business process than the familiarity of the data itself. Without a good process in place, it is quite obvious that the reports can get out of sync with the underlying data model. This is where the construction and maturity of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the Model Data Structure, which may or may not have an effect on the ad-hoc reports.

If the DW is designed correctly, there should be quite a few fact tables that can readily be translated into a Model Data Source. This is the first step. Now let's look at how to use this model.

Saiku is the name of two tools available from the PUC. The first one is the Saiku Analytics tool which allows us drill into an OLAP cube and perform analysis using aggregated measures (we'll review this in Part 4). The second one is the Saiku Ad-hoc Reporting tool. This is the one we are going to look into at this time. Using the modern UI library such as jQuery, the developers of Saiku give us a convenient drag-and-drop UI that is easy to learn and use.

Once a model is published, it will be available to choose from the drop-down list on the top left of the Saiku Ad-hoc Reporting tool. See the screenshot below:
Figure 3. A Saiku report in progress

Next, you can start to choose from the list of available fields in the model to specify as part of either the Columns list, or Groups list. Next, from the same list of available fields, you can specify some values as filters. The most obvious example would be the transaction date and time range which determines what period is the report for.

As you select the fields into the proper report elements, the tool started to populate the preview area with what the report would look like. You can also specify aggregation for each of the groupings, which is very handy.

There is a limited control on templates which governs the appearance of the report, but obviously won't be enough for serious usages. The best remedy however, is available, via the exporting to .prpt file, which you can open in the PRD and tweak to your heart's content.

After you are happy with the report, you can save it for later editing. Another thoughtful design decision by the Pentaho team.

In overall, the Saiku Ad-hoc Reporting tool is a handy facility to craft quick reports that answer specific questions based on the available model data sources. If your data personnel diligently updates and maintains the models, this tool can be invaluable to support your business decisions.

None of the above discussions would mean a whole lot without a practical and useful way for the reports to be delivered to its requesters. Here, the comprehensive nature of the Pentaho BI Suite helps by providing the facilities like xaction and input UI controls for report parameters.

For example a report designed in PRD can be published on the PUC. At some point it is opened by the user on the PUC who supplies the necessary parameters, then the xaction script fire an ETL which renders a .prpt file into a .pdf and either email it to the requester or drop it in a shared folder.

Reports can also be “burst” via an ETL script that utilizes the Pentaho Reporting Output step available from within Spoon (the ETL editor). I have used this method to distribute periodically-generated reports to different recipients containing data that is specific to the said recipient's access permission level. This saves a lot of time and increased the efficiency of up-to-date information distribution inside a company.

The reporting tools in the Pentaho BI Suite is designed to allow different users within the company to generate reports that are either pre-designed or ad-hoc. The reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. Reports can also be scheduled to be generated via ETL scripts.

The PRD will be instantly recognizable by anyone who has experience using tools like Crystal Reports and its derivatives. You can also specify MDX queries against any OLAP cube schema published in the Pentaho BI Server as a data source.

The Model Data Source facility allows data owners who are not data personnels to create ad-hoc reports quickly and save it for future use and modifications.

The Saiku Ad-Hoc report is the UI with which available models can be used to generate reports on-the-fly. These reports can also be saved for later use.

Next in part-four, we will discuss the Pentaho Mondrian (MDX query engine) and the OLAP Cube Schema tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user6855
CEO with 51-200 employees
Vendor
Very capable suite of BI, reporting, and data mining tools with sophisticated functionality
Verdict: This is a very capable suite of BI, reporting, and data mining tools with sophisticated functionality, and will address the needs of many organisations. Pentaho BI Suite Community Edition (CE) includes ETL, OLAP, metadata, data mining, reporting and dashboards. This is a very broad capability and forms the basis for the commercial offering provided by Pentaho. A variety of open source solutions are brought together to deliver the functionality including Weka for data mining, Kettle for data integration, Mondrian for OLAP and several others to address reporting, BI, dashboards, OLAP analytics and big data. The Pentaho BI platform provides the environment for building BI solutions and includes authentication, a rules engine and web services. It includes a solution engine that…

Verdict:
This is a very capable suite of BI, reporting, and data mining tools with sophisticated functionality, and will address the needs of many organisations.

Pentaho BI Suite Community Edition (CE) includes ETL, OLAP, metadata, data mining, reporting and dashboards. This is a very broad capability and forms the basis for the commercial offering provided by Pentaho. A variety of open source solutions are brought together to deliver the functionality including Weka for data mining, Kettle for data integration, Mondrian for OLAP and several others to address reporting, BI, dashboards, OLAP analytics and big data.

The Pentaho BI platform provides the environment for building BI solutions and includes authentication, a rules engine and web services. It includes a solution engine that facilitates the integration of reporting, analysis, dashboards and data mining. Pentaho BI server supports web based report management, application integration and workflow.

The Pentaho Report Designer, Reporting Engine and Reporting SDK support the creation of relational and analytical reports with many output formats and data sources.

If you want a version with support, training and consulting, as well as a few more bells and whistles then Pentaho provide such services and product.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho BI Server – Part 2 of 6
Introduction This is the second of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world. In this second part, we'll be discussing the Pentaho BI Server from which all of the reports, dashboards, and analytic tools are served to the users. A BI suite usually has a central place where users log in using their assigned credentials. In this case, the server is a standalone web server (an Apache Tomcat instance) that is augmented by various tools that provides the functionalities – most of these tools are written by Webdetails (webdetails.pt). We'll visit these tools in subsequent review parts, for now, let's focus on the server itself. In…

Introduction

This is the second of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this second part, we'll be discussing the Pentaho BI Server from which all of the reports, dashboards, and analytic tools are served to the users. A BI suite usually has a central place where users log in using their assigned credentials. In this case, the server is a standalone web server (an Apache Tomcat instance) that is augmented by various tools that provides the functionalities – most of these tools are written by Webdetails (webdetails.pt). We'll visit these tools in subsequent review parts, for now, let's focus on the server itself.

In the case of Pentaho BI Server, it has two components:

  • The Pentaho User Console (a.k.a PUC) – this is what we usually associate with the main BI Server in the Pentaho world; where users would spend the majority of their time generating reports (both real-time or scheduled), using the analytic tools, build and publish dashboards, etc. This is also where administrator users can manage who can access which reports either by User or by Role – obviously, Role-based ACL is cleaner and easier to maintain.

  • The Administration Console (a.k.a PAC) – this is where admin users go to create new Users, Roles, and schedule jobs. It is another standalone web server that can be started and stopped when needed, it is totally independent of the main PUC server.

Is it Corporate-Ready?

BI servers are considered ready for corporate “demands” based on the number of users they can support, and the facilities to manage them. The Pentaho BI Suite Enterprise Edition is without a doubt ready for corporate use because it comes with the support that will make sure that is the case.

The Community Edition is more interesting, it is definitely corporate ready, but the personnels who set it up needs to be intimately familiar with the ins and outs of the server itself. Having installed three of these, I am confident that the BI Server, due to its built in ACL management is ready for prime time in the corporate world.

Although the Pentaho BI server includes a scheduler, another “corporate” feature, I find myself using cron (or Windows Task Scheduler) for the most part. The built-in scheduler is based on the Quartz library for Java. It is a good facility with decent UI to schedule reports or ETL from within the PUC.

Is it Easy to Use?

The PAC is very easy to use. The UI interface is simple enough due to the minimum numbers of menus and options. In a sense, it's a simple facility to manage user/role and scheduling – not ACL, just users and roles.

The PUC is more involved, but adopting the familiar file folder look and feel on the left panel, it is quite easy to get into and start using. Administrators would love the way they can set who can Execute, Edit, Schedule each reports, saved analytic views, and dashboards – by the way, Pentaho calls these: Solutions.

Setting up the BI server is better left to the consultants who are used to doing it. Or if there are in-house personnels who would be doing this, it is worth the time to participate in the training webinars that Pentaho held periodically. The steps to setup a BI server far from being simple, but that is the case for all BI servers, regardless the brand.

The collapsible left panel serves as the directory of the solutions, with the top part shows the folders, and the bottom part shows the individual solution. The bigger panel on the right is where you actually see the content of the solutions. And in some cases, that's where you'd create a Dashboard using the CDE tool (we'll revisit this in later review part).

Is it Easy to Create Solutions?

Remember that the concept “solution” here refer to the different types of reports, dashboards, analytic views. Pentaho BI server employs a “glue” scripting facility called the xactions. These are XML documents that contain some sequence of actions that can do various things like:

  1. Asking users for input parameters

  2. Issuing a SQL query based on user input

  3. Trigger an ETL that produce reports

Once you are familiar with this facility, it is not that hard to start producing solutions, but it pays to install the included examples and study them to find out how to do certain things with xaction and/or to copy snippets into your own scripts.

On the PUC, we can build these solutions:

  1. Dashboards using CDE

  2. Ad-hoc reports and data model using the built in Model generator (very handy for accessing those BI tables that are populated by ETL runs)

  3. Analytic Views using tools like Saiku or its equivalent for the Professional and Enterprise edition. NOTE: This requires a pre-published schema which is built using another tool called the schema-workbench (we will see this in the latter parts of this review series)

Is it Customizable?

Being the user-facing tool, one of the requirement would be the ability to customize the appearance via themes, at the very least, a BI server need to allow companies to change the logo into their own.

The good news is, you can do all that with Pentaho BI Server. If you opt for the Professional and Enterprise editions, you can rely on the support that you already paid for. For those using the Community Edition, customizing the appearance requires knowledge on how a typical Java Web Server is structured. Again, any good BI consultant should be able to tackle this without too much difficulties.

Here is an example of a customized PUC login page:

In case you are wondering, yes, you can customize the PUC interface also, and it even comes with a theme structure in which you can assign your graphic artists to redefine the CSS elements.

Summary

The Pentaho BI server, is the central place where users are going to interact with Pentaho BI Suite. It brings together solutions (what Pentaho call contents) produced by the other tools in the suite, and expose it to the user while being protected by a robust ACL.

On the balance between ease-of-use and the ability to customize, the Pentaho BI Server scores well provided that the personnel in charge is familiar with the Java Enterprise environment. To illustrate this, in one project, I managed to tweak the security framework to make the PUC part of a single-sign-on Liferay portal, along with other applications such as Opentaps and Alfresco.

Next in part-three, we will discuss the wide array of Pentaho Reporting tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: PDI – Part 1 of 6
Introduction The Pentaho BI Suite is one of the more comprehensive BI suite that is also available as an Open Source project (the Community Edition). Interestingly, the absence of license fees is far from being the only factor in choosing this particular tool to build your Data Warehouses (OLAP systems). This is the first of a six-part review of the BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world. In this first part, we'll be discussing the Pentaho Data Integration (from here on will be referred to as PDI) which is the ETL tool that comes with the suite. An ETL tool is the means with which you input data from various sources – typically out of some transactional systems, then…

Introduction

The Pentaho BI Suite is one of the more comprehensive BI suite that is also available as an Open Source project (the Community Edition). Interestingly, the absence of license fees is far from being the only factor in choosing this particular tool to build your Data Warehouses (OLAP systems).

This is the first of a six-part review of the BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this first part, we'll be discussing the Pentaho Data Integration (from here on will be referred to as PDI) which is the ETL tool that comes with the suite. An ETL tool is the means with which you input data from various sources – typically out of some transactional systems, then transform the format and flow into another data model that is OLAP-friendly. Therefore it acts as the gateway into using the other parts of the BI suite.

In the case of PDI, it has two components:

  • Spoon (the GUI), where you string together a set of Steps within a Transformation and optionally string multiple Transformations within a single Job. This is where you would spend the bulk of your time developing ETL scripts.

  • The accompanying set of command-line scripts that we can configure to be launched from a scheduler like cron or Windows Task Scheduler. Notably pan a single Transformation runner, kitchen the Job runner, and carte the slave-server runner. These tools give us the flexibility to create our own network of multi-tiered notification system, should we need to.

Is it Feature-Complete'

ETL tools are interesting because anyone who has implemented a BI system have a standard list of major features expected to be available. This standard list does not change from one tool brand to the other. Let's see how PDI fares:

  1. Serialized vs Parallel ETL processing: PDI handles parallel (async.) steps using Transformations, which can be strung together in a Job when we need a serialized sequences.

  2. Parameter-handling: PDI has a property file that allows us to parameterize things that are specific to different platforms (dev/test/prod) such as database name, credentials, external servers. It also features parameters that can be created during the ETL run out of the data in the stream, then passed on from one Transformation to another within a Job.

  3. Script management: Just like any other IT documents (or as some call it artifacts), ETL scripts need to be managed, version-controlled, and documented. PDI scores high on this front. Not because of some specific features, instead, due to design decisions that favor simplicity: The scripts are plain XML documents. That makes it very easy to manage, version-control, and if necessary batch-edit. NOTE: For those who wants enterprise level script management and version-control built into the tool, Pentaho made it available as part of their Enterprise offerings. But for the rest of us who already have a document management process – because we also develop software using other tools – it is not as crucial.

  4. Clustering: PDI supports round-robin -style load-balancing given a set of slave-servers. For those using Hadoop clusters, Pentaho recently added their support to run Jobs on those.

Is it Easy to Use'

With the drag and drop graphical UI approach, the ease of use is a given. It is quite easy to string together steps to accomplish the ETL process. The trick is knowing which steps to use, and when to use it.

The documentation on how to use each step can stand improvements that fortunately, slowly over the years have started to catch up – and should you have the budget, you can always pay for support that comes with the Enterprise Edition. But overall, it is a matter of using those enough to be familiar with the use cases.

This is why competent BI consultants are worth their weights in gold because they have been in the trenches, and have accumulated ways to deal with the quirks which is bound to be encountered in a software system this complex (not just Pentaho, this applies to any BI Suite products out there).



NOTE: I feel obligated to point out one (very) annoying fact that I cannot hit the Enter key to edit the selected step. Think about how many times we would use this functionality on any ETL tool.

Aside from that, in the few years that I've used various versions of the GUI, I've never encountered severe data loss due to stability problems.

Another measurement of ease-of-use that I evaluate a tool with is: How easy it is to debug the ETL scripts. With PDI, the logical structures of the scripts could be easily followed, therefore it's quite debug-friendly.

Is it Extensible'

It may be a strange question at first, but let us think about it. One of the purpose of using an ETL tool is to deal with a variety of data sources. No matter how comprehensive the included data format readers/writers, sooner or later you would have to talk to a proprietary system that is not widely-known. We had to do this once for one of our clients. We ended up writing a custom PDI step that communicates with the XML-RPC backend of an ERP system.

The good news is, with PDI, anyone with some Java SDK development experience, can readily implement the published interfaces and thus creating their own custom Transformation steps. In this regard, I am quite impressed with the modular design, that allows users to extend the functionality and consequently, the usefulness of the tool.

The scripting ability built into the Steps is also one of the ways to handle proprietary – or extremely complex data. PDI allows us to write Javascript (and Java, should you want faster performance) programs to manipulate the data both at the row level as well as pre- and post- run, which comes very handy to handle variable initializations or sending notifications that contain statistical info about all of the rows.

Summary

The PDI, is one of the jewels in the Pentaho BI Suite. Aside from some minor inconveniences within the GUI tool, the simplicity, extensibility, and stability of the whole package makes PDI a good tool for building a network of ETLs marshaling data from one end of the systems to another. In some cases, it even serves well as a development tool for the batch-processing side of an OLTP system.

Next in part-two, we will discuss the Pentaho BI Server.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Business Intelligence (BI) Tools Report and find out what your peers are saying about Hitachi, Knowage, Tableau, and more!