We just raised a $30M Series A: Read our story

Pentaho Data Integration OverviewUNIXBusinessApplication

Pentaho Data Integration is #17 ranked solution in top Data Integration Tools. IT Central Station users give Pentaho Data Integration an average rating of 8 out of 10. Pentaho Data Integration is most commonly compared to Talend Open Studio: Pentaho Data Integration vs Talend Open Studio. The top industry researching this solution is Computer Software Company, accounting for 26% of all views.
What is Pentaho Data Integration?

Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, "analytics ready" data to end users from any source. With visual tools to eliminate coding and complexity, Pentaho puts big data and all data sources at the fingertips of business and IT users alike.

Pentaho Data Integration is also known as Kettle.

Pentaho Data Integration Buyer's Guide

Download the Pentaho Data Integration Buyer's Guide including reviews and more. Updated: October 2021

Pentaho Data Integration Customers
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Pentaho Data Integration Video

Archived Pentaho Data Integration Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
it_user254223
Project Manager - Business Intelligence at www.datademy.es
Consultant
It has improved our data integration capabilities​

How has it helped my organization?

Developed ETL processes to load a data warehouse. Has improved our data integration capabilities.

What is most valuable?

Easy to use Development of the product A lot of predefined steps Good open source option

What needs improvement?

There is not a data quality or MDM solution in the Pentaho DI suite.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse.

How are customer service and technical support?

I work with the Community Edition, therefore I do not have support. There was an…

How has it helped my organization?

Developed ETL processes to load a data warehouse. Has improved our data integration capabilities.

What is most valuable?

  • Easy to use
  • Development of the product
  • A lot of predefined steps
  • Good open source option

What needs improvement?

There is not a data quality or MDM solution in the Pentaho DI suite.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse.

How are customer service and technical support?

I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.

Which solution did I use previously and why did I switch?

I switched from our previous solution for cost reasons.

How was the initial setup?

It was not complex.

What's my experience with pricing, setup cost, and licensing?

There is a good open source option (Community Edition).

Which other solutions did I evaluate?

No.

What other advice do I have?

There is a lack of support if you work with the Community Edition.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Consultant at a comms service provider with 11-50 employees
Consultant
Simple to install and simple to use and helps us mine, clean, and arrange terabytes of data

Pros and Cons

  • "It's very simple compared to other products out there."
  • "One thing that I don't like, just a little, is the backward compatibility."

What is most valuable?

It's very simple compared to other products out there.

How has it helped my organization?

We use Pentaho for data integration, but also PI to implement data mining. That has improved the intelligence behind the data. So, we are able to provide our customer with the ability to understand their data. Our customer produces terabytes of data, so arranging the data, cleaning the data, on data integration, aided our customer to understand the data to improve their business.

What needs improvement?

One thing that I don't like, just a little, is the backward compatibility. I used Pentaho from version 4, and version 6 does not work with the whole ETL design. So backward compatibility is a problem.

For how long have I used the solution?

I have worked with this product for seven years.

What do I think about the stability of the solution?

It's a stable product. In fact, contains some mocks, where you can write your own Java software, and do an ETL, specific for your needs.

How is customer service and technical support?

The support is very fast, but there are also a lot of forums to address problems, so you can find the solution to your issue easily. There is also the possibility to buy support, and when we bought support they resolved our problem in 24 hours.

How was the initial setup?

It was very, very simple. I copied the integration folder, started the tool to design the ETL, and it worked. Time was required to design the ETL, just to understand how each block works. So, when you understand how each block works, you need spend no more time to use the product.

Which other solutions did I evaluate?

Before using Pentaho, I analyzed other products to understand what is the best ETL product. I tested Talend and Oracle Data Integrator. Oracle Data Integrator is a little bit more difficult to understand, how it works.

So, I preferred Pentaho Data Integration because you just have to drag and drop the block, draw a line to connect the block, write the query, and connect to the DB. There's nothing else you need to do. For Oracle Data Integrator, and also for Talend, you spend more time installing the product. By contrast, with Pentaho, you just have to copy the folder, launch the product, and then you just need the Java machine and it works.

What other advice do I have?

When you start to use this product, if you have just a little experience and know about ETL, you will have to spend little time to learn the it. The product is very, very simple to understand. You can build functionality by yourself.

Anyone thinking about an ETL product, if they want high productivity on data cleaning and data movement, Pentaho Data Integration, in my opinion, is the best tool.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about Pentaho Data Integration. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
543,424 professionals have used our research since 2012.
Leonardo de Andrade
Brazil IT Coordinator at a transportation company with 1,001-5,000 employees
Real User
Integration between databases and data import for a BI solution is valuable.

What is most valuable?

Data transformation within Pentaho is a nice feature that they have and that I value.

How has it helped my organization?

Integration between databases and data import for a BI solution.

What needs improvement?

I would like to see more improvements with AS400 DB2. I journalled the tables/instance and the data migration is too slow if I compare it with other databases.

What was my experience with deployment of the solution?

There were no issues with the deployment.

What do I think about the stability of the solution?

Until now, the stability of Pentaho is great. I've already tested various scenarios and I didn't feel a loss of performance.

What do I think about the scalability of the solution?

There have been no issues so far in scaling the…

What is most valuable?

Data transformation within Pentaho is a nice feature that they have and that I value.

How has it helped my organization?

Integration between databases and data import for a BI solution.

What needs improvement?

I would like to see more improvements with AS400 DB2. I journalled the tables/instance and the data migration is too slow if I compare it with other databases.

What was my experience with deployment of the solution?

There were no issues with the deployment.

What do I think about the stability of the solution?

Until now, the stability of Pentaho is great. I've already tested various scenarios and I didn't feel a loss of performance.

What do I think about the scalability of the solution?

There have been no issues so far in scaling the product.

How was the initial setup?

I used self-learning to implement it and found that the tool is very easy to understand. For some things, I looked at YouTube videos for conceptual ideas during the planning phase.

What about the implementation team?

I did it myself.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user402600
Senior Consultant at a financial services firm with 10,001+ employees
Vendor
Needs improvement on the Hadoop and JMS plugins.

Valuable Features:

It allows for rapid prototyping of a wide array of ETL workloads.

Room for Improvement:

Support for common Hadoop utilities can be expanded, such as bulk load with composite row keys for HBase, and include drivers for Impala out-of-the-box. A richer interface to Hive could also be beneficial as we currently have to go through a raw connection and execute SQL scripts, for which some syntax is not respected.

As of version 6, there are also some new issues introduced that pose a bit of an annoyance:


1) On kettle's ramp up - log4j errors

2) IBM Websphere MQ Producer - variable substitution for the URL does not work - you have to hardcode.

3) shared.xml for DB connections - variable substitution for connection properties does not work - have to hardcode things like Kerberos principal for a Hive/Impala connection.

Deployment Issues:

We had no issues deploying it.

Scalability Issues:

The robustness of this solution in a production cluster (>30 nodes) remains to be seen.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user426117
DWH Specialist at a healthcare company with 1,001-5,000 employees
Vendor
​It is extremely flexible, it allows you to use variables/parameters for just about everything. ​

Valuable Features:

It is extremely flexible, it allows you to use variables/parameters for just about everything. 

Improvements to My Organization:

It enables us to automate our reporting and ETL to a very high extent.

Room for Improvement:

The product itself is great, the biggest downside in my opinion is that it is hard to find (hire) people with expertise. Our experience with Pentaho software is that few people have the required expertise. Hiring additional resources for projects can be tough.

Our solution is that we tend to train our own people, it’s definitely not hard to learn, basically anyone with SQL knowledge and experience in another tool can learn using Pentaho Data Integration very easily, but you might end up training them yourselves.


Deployment Issues:

We had no issues with the deployment.

Stability Issues:

There was no issues with the stability.

Scalability Issues:

We had no issues scaling it for our needs.

Other Advice:

Train your own people!

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user426030
Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees
Consultant
It helps to connect to various data sources including all available databases.

Valuable Features:

It's an ETL Platform including Big Data enablement. It's the most easy to use, extend and deploy. It helps to connect to various data sources including all available databases.  

We also use Pentaho Analyzer which is an ad-hoc analytics tool built on Mondrian OLAP server that enables the end user to slice and dice the data in various patterns.

Improvements to My Organization:

We Implement Pentaho for data warehouses and BI features for our various customers. No software can give as complete functionality for fulfilling end user requirements as Pentaho. As well as this, Pentaho offers a flexible platform which enables us to extend the tool to any of the end user's requirement. 

Another impressive feature is the Big Data implementation/integration is very quick and simple without the need to write any code. This enabled our clients to get maximum ROI with in a short period.

Room for Improvement:

Pentaho Dashboard Designer - needs an improvement on the various features of the Dashboards, since there are CTools available and which help to fulfil the gaps, but it needs developers involvement. A full fledged Dashboard designer to perform all the functions of what we do in CDE/CDF would be a great improvement for Pentaho.

Build Process - an inbuilt build process would provide an advantage to migrate between DEV-QA-UAT-PROD, currently it is mostly performed manually.

Data Profiling - including data profiling as part of PDI would be a great improvement to the platform and helps customers to save a lot of effort/cost of data quality.

Use of Solution:

We are Pentaho Service Providers and have implemented more than 130 projects in Pentaho. We are not direct customers of Pentaho but we recommend Pentaho to our clients if it meets their requirements.

Deployment Issues:

We had no issues with the deployment.

Stability Issues:

There have been no stability issues.

Scalability Issues:

We have not had any issues scaling it for our customers.

Initial Setup:

It is quick and easy to implement.

Cost and Licensing Advice:

Pentaho is available both in Community (Free) and Enterprise Edition (Subscription based) depending upon your budget.

Other Advice:

One of the best feature to lookout in this platform is its flexibility in enhancing or adapting to your requirements. Implementation can be very quick, you can enable few dashboards and analytics to your organization in a week's time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user415695
Project Lead at a tech services company with 10,001+ employees
Consultant
The best benefit of the product is that it is easy to use and to understand.

Valuable Features:

The best benefit of the product is that it is easy to use and to understand.

Improvements to My Organization:

We have a huge amount of data that needs to be cleaned and made more valuable for our organization. This Data Integration helps us to achieve that goal.

Room for Improvement:

I have used multiple versions of this product. The initial version we were on was v3.2 and we were had multiple issues, but currently don't find any issues as a blocker. In general, it would be good if we could get better performance from this product.

Deployment Issues:

We haven't had any issues with deployment.

Stability Issues:

We haven't had any issues with stability except for those described in the Areas for Improvement.

Scalability Issues:

We haven't had any issues with scalability.

Other Advice:

There are other products out there, but I feel that this is the best one.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user414117
Senior Data Engineer at a tech company with 501-1,000 employees
Vendor
It enables a technical product manager to be able to write ETL jobs themselves.

What is most valuable?

The most valuable thing for me is that it enables a technical product manager to be able to write ETL jobs themselves, which saves developers time so that they can do more important things.

How has it helped my organization?

Now developers focus on improving it as a tool (since it's open source) and teach Project Managers about it. The Project Managers are the ones responsible for their own ETL jobs as they know what they want, so hence it's best for them to manage their own jobs.

What needs improvement?

Its performance can be improved so it will work better with Big Data. Also, sometimes it can be very buggy which keeps away some potential users.

For how long have I used the solution?

I've used it for two years.

What was my experience with deployment of the solution?

We have had no issues with the deployment.

What do I think about the stability of the solution?

The performance for Big Data needs to be improved.

What do I think about the scalability of the solution?

We have had no issues scaling it for our needs.

How are customer service and technical support?

There is a community that can support limited technical help. I'll give a 6 to the community since it's not very active.

Which solution did I use previously and why did I switch?

It was already in place when I joined the company.

How was the initial setup?

It's very easy to install.

What about the implementation team?

We did it in-hous. It's worth it to have someone in your company who knows Pentaho really well.

What was our ROI?

ROI is pretty good since it is kind of a major thing in our company.

What's my experience with pricing, setup cost, and licensing?

The only cost is the time it takes for the developer to get to know it.

What other advice do I have?

If your ETL jobs are small and straightforward, then this solution is definitely worth it.

Disclosure: My company has a business relationship with this vendor other than being a customer: The company is also contributing back to the open source project.
it_user373128
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
Vendor
It doesn't have the capability to produce crosstab reports with formatting capabilities. It connects seamlessly to most commonly used data sources.​​

Valuable Features

It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.

Improvements to My Organization

The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.

Room for Improvement

Since there have already been newer versions, maybe some of these features are already fixed now. The most troublesome missing feature was the capability to produce crosstab reports with formatting capabilities in the BI Reporting product. The one annoyance that troubled us a lot was the fact that every step in a transformation that needed data, created its own data connection. With some data sources like Greenplum, this was a problem, because they have a limit on available number of connections.

Use of Solution

I used it for three years, from 2012 to 2015, and only stopped as I left the organization.

Deployment Issues

One issue with encountered constantly with PDI deployments was that the environment parameters for jobs had to be updated manually through the designer module 'Spoon'. Although the product has a feature of keeping Environment Variables outside Spoon, that didn't work for us, as we had one Development server used for Dev, QA and UAT.

Stability Issues

There were no issues with the stability.

Scalability Issues

We had no issues scaling it across the company as needed.

Customer Service and Technical Support

It's about average. Most of the help we got was through Google searches and Wiki pages. One time we had an issue with a feature - our version of PDI could not handle microseconds. The product owner came up with a solution, but instead of applying the patch, wanted to sell it to us for a fee.

Initial Setup

I am only aware of the client side setup which was simple enough. It was pretty much a one step installation process.

Implementation Team

It was done by an in-house team. A couple of issues we realized later were regarding memory configuration for the environment. This needs to be evaluated and fine tuned otherwise you can run into job failures with large amount of data. We ran into this issue with 'Commit' points and 'Sort' steps.

Other Solutions Considered

There was an evaluation performed, however I was not involved in it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Graduate Teaching Assistant with 1,001-5,000 employees
Vendor
We can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool.

Valuable Features:

The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory. 

Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.

Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and  applying this technology enterprise wide will enable your business to take data driven decisions.

Improvements to My Organization:

It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.

Room for Improvement:

I would like to see the data visualization tool combined with BI so I can see how data is progressing through various stages. I do think that they are working on this already. I also found, in my case, that the statistical data input wasn't working (.sas7bdat input wasn't working).

Deployment Issues:

There have been no issues with the deployment.

Stability Issues:

It could have been the case that I may not have been doing it the right way.

Scalability Issues:

We have had no issues scaling it.

Cost and Licensing Advice:

I would say it is one of the most affordable tools to use for business intelligence.

Other Advice:

You should go for this tool to manage your data warehouse, but I would suggest that you look for other reporting tools, such as Tableau, which are more user friendly and provide great insights in the data.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Business Intelligence Consultant at Sanmargar Team
Vendor
​We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools.

Valuable Features:

First of all, the ease of deployment. I’m pretty sure that almost anyone could do simple transformations without having any knowledge of  IT. Thanks to its graphical interface this tool is just drag and click. Another advantage, is that it fits everywhere. You can connect it to Big Data sources, relational databases, and all types of files. If the developer missed something, you can try finding it in the marketplace or quickly develop it yourself, because it is opensource. 

Improvements to My Organization:

We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools. We also build our Customer Centralized File and Data Quality Studio using it. What’s more, we use it for small solutions too, i.e. if we want to quickly export data from database to .xlsx. We also develop our own plugins for PDI and put them into the marketplace. 

Room for Improvement:

A big advantage, but also a problem, is that it is open source. Almost anyone can develop their own Pentaho code and release it. Now, Pentaho is a little messy, and some parts of it are super new and some look like it were developed at the beging. I think that developers should stop inventing new parts of it, and it can take a while to clean the code and optimize the older parts of it. Some old plugins, after a long time, still doesn’t work properly enough.

Use of Solution:

I've been using it for four years, and when I started using it I was in college. I quickly found that PDI with my text search analytic plug-in is useful for preparing notes for classes. When I was bored I came up with a funny tool. It was collecting data from all my roommates about what they need from shop and it was sending notifications to peoples phones who were going to the shop.

Deployment Issues:

We have never had any problems with deployment.

Stability Issues:

There are some with stability. As I said before there are some small bugs but it’s Pentaho you can always find workaround for it.

Scalability Issues:

With the Pentaho Community version you just download it, unpack, and it should be running. If not you should also install Java. 

Customer Service:

Customer service isn’t needed. Every problem solution is on the internet. If not,  you can post it to community forum and you will get an immediate answer, but I have never had to post a new topic.

Initial Setup:

Straightforward. You just need to unzip file and you can already run it. There is also some setup if you need. It’s very simple you just need to edit three files in notepad. 

Implementation Team:

I did this myself and we do it for other companies. All installations are easy, and you do not need to be an IT magician. 

Cost and Licensing Advice:

There is a Community Edition which is free. There is also an Enterprise licence but the price varies depending on the server hardware configuration and the purpose of use (BigData, Hadoop, etc.).

Other Solutions Considered:

I had the chance to test SAS Data Integration but I didn’t fall in love with it like I did with PDI. I think that PDI is easier to use and you can do much more with PDI than with SAS.

Other Advice:

The tool is excellent, and almost everyone can use it. You just need to take it out of the box and run. There is no limit to the application – you can do everything with it. However, it still has a lot of faults. Not every component runs as you wish to. Always look for solutions on the Internet. There are many problems and build transformations/jobs that are already fixed. 

Disclosure: My company has a business relationship with this vendor other than being a customer: Company where I work Sanmargar Team is a reseller of this solution and a Pentaho partner in Poland.
ITCS user
BI developer - (Jaspersoft/Pentaho/Pentaho C-Tools/Kettle/Talend/Data warehouse) at a tech services company with 501-1,000 employees
Real User
You can get ETL, reporting, analysis, and analytics in a single shop.

Valuable Features:

  • Best in performance in both hosted and local environments
  • Best open source warehouse solution using the Kimball method
  • Best Big Data discovery components and BI
  • Simple and easy to understand and work with
  • Complete cost effective solutions
  • Best support in forums
  • Best visualizations in the market - Protovis & D3
  • Best custom interactivity features
  • Best product for embedded BI
  • Best for mobile responsive technology integrated, i.e. bootstrap
  • Best support in forums
  • Best documentation - Open API's

Improvements to My Organization:

  • It's reduced our costs
  • With self-service we can save time
  • Open plug-ins contributors

Room for Improvement:

  • Searching repository for reports or dashboards
  • Repository UI
  • Loading of percentage reports and dashboards

Other Advice:

It has a fancy look, the best visualization libraries and is open source. You can get ETL, reporting, analysis, and analytics in a single shop.  Small, mid sized and enterprises such as CA have been implementing Pentaho. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Business Intelligence Supervisor at a manufacturing company with 501-1,000 employees
Vendor
​We have performed a lot of setups since we started using it, and have had no issues.

Valuable Features

  • Fast
  • Easy to learn and then teach to our team
  • It integrates with everything on market

Improvements to My Organization

We never used a data integration or BI platform before, and struggled with lots of Excel spreadsheets and CSV files. So when we first used Pentaho to automate a data-integration flow, we were stunned with how fast and how easy it was. We are very productive today thanks to that piece of software integration our data and the platform serving the processed data to our users.

Room for Improvement

An easier upgrade process for community tools would be nice. They also need to update the ad-hoc reports tool, as the one available is outdated. To get round this, we are using Excel as the output for some reports.

Use of Solution

For more then 4 year

Deployment Issues

Upgrading the bi-platform that is a little pain, but the rest is easy to use and to set-up.

Stability Issues

There have been no issues with the stability.

Scalability Issues

There have been no issues with the scalability.

Customer Service and Technical Support

We use only the community edition, so we only consult the internet for help. There is a strong community of users all over the world. Here in Brazil, the e-mail list is very helpful.

Initial Setup

We have performed a lot of setups since we started using Pentaho, and there have been no issues there.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user392367
Research Assistant at a university with 1,001-5,000 employees
Vendor
The user-defined class operator is currently very valuable to me.

Valuable Features:

I would say that user-defined class operator is currently very valuable to me. Other than that native connectivity to hadoop (MapR), analytical databases and enterprise systems are really important to me these days.

Improvements to My Organization:

I am a researcher in the field of data integration, and I am using this tool as a sandbox. I would say, because it is open source and high availability of forums and support has made my work really easy. Also, the reporting and analysis functionality provided gives me more freedom to test my test cases and results.

Room for Improvement:

I would like to have more languages/scripts supported in user-defined classes. Right now the options are very limited. I know, if I want to do core programming I can always import my classes/jars into it, but it would be really nice to have more functionality in terms of programming language and support in UD classes/operator. Besides that, different parallel algorithms/skeletons would be great. For example, it could suggest which parallel algorithm I should use on a particular operator or a set of operators. It would be really cool to have such a functionality.

Other Advice:

 If you are looking to integrate unstructured or semi-structured datasets with some parallelization, choose this tool. Parallelization supported by Pentaho Data Integration is a functionality that is really nice to have . You can choose which activities you want to parallelize and that's it. You do not have to write parallel code or something, as it does this job for you, which is awesome for a not so good programmer such as myself.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user384993
Datawarehouse Administrator at a tech services company with 501-1,000 employees
Consultant
​We have been able to expose data services through the use of CDA relying on the same database as the reporting tools.

What is most valuable?

Its ability of blending data and the dashboarding with C*TOOLS for creating responsive single page apps.

How has it helped my organization?

We have been able to expose data services through the use of CDA relying on the same database as the reporting tools, thus avoiding inconsistencies among the data shown by reports and data acquired by external systems.

What needs improvement?

The User Console, aka workspace, and the development of dashboards. They work but they require some programmer skills. This means a continous application management on behalf of IT dept.

For how long have I used the solution?

I've used it for six years.

What was my experience with deployment of the solution?

There were issues, but they were solved with help from tech support.

What do I think about the stability of the solution?

There were issues, but they were solved with help from tech support.

What do I think about the scalability of the solution?

There were issues, but they were solved with help from tech support.

How are customer service and technical support?

It depends, as it takes usually a long time, and some answers are just a way to acquire time and the commitment seems poor. However, when you finally get to an engineer your are likely to have your problem solved in a few days.

Which solution did I use previously and why did I switch?

We used Microstrategy, Cognos, and Business Objects. The pricing was the key driver, but also the open source licensing which made us think we would have been able to develop on our own improvements. This didn't happen because primarily of the few resources we effectively put on development.

How was the initial setup?

It's complex because of the lack of documentation and the absence of an installer for Linux.

What about the implementation team?

We did it in-house one, and we had to hire some developers for some months with Java skills.

What other advice do I have?

Have a vision, and do not let yourself be guided by the technology.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user384984
Sr BI Administrator at a healthcare company with 1,001-5,000 employees
Vendor
​It gave ‘out-of-the-box’ widgets for reading XML and Json interfaces which would otherwise have to be build from scratch​.

What is most valuable?

It allows for very quick development due to the intuitive interface. Compared to other ETL tools like Powercenter, SSIS and SAS DI Studio it excels in rapid development cycles.

How has it helped my organization?

It gave ‘out-of-the-box’ widgets for reading XML and JSON interfaces which would otherwise have to be build from scratch.

What needs improvement?

PDI excels at the development part. Administration and monitoring are pretty weak and basic. But, I must say I have been spoiled with the great capabilities that Powercenter offers ‘out-of-the-box’ The Pentaho development team seems to rely very heavily on Linux/Unix for the admin part. Debugging could be enhanced with better feed-back.

For how long have I used the solution?

We used PDI 4.3 in a pilot against SSIS during 2013 for a couple of months. In 2014 I have the 4.4 version on a daily basis within a production environment for exactly one year. We also looked into the commercial front-end solution and found this to be too much of a collection of loosely connected applications

What was my experience with deployment of the solution?

There have been no deployment issues.

What do I think about the stability of the solution?

Stability is a bit of an issue. The GUI quite often ‘freezes’ and the is no alternative to killing the session. Very frequent saving is in order

What do I think about the scalability of the solution?

There have been no issues with scalability.

How are customer service and technical support?

The community site is pretty brilliant. Every technical component is handled on its own Wiki page. You can even look into the scrum backlog of the dev. team. Absolutely amazing.

Which solution did I use previously and why did I switch?

Heavy ETL solutions were simply too expensive and the SSIS alternative is simply too hidious to consider. It took at least three times as much time to develop the same ETL proces with SSIS as compared to Pentaho. (And having to deal with the abject Microsoft ‘debugging’.

How was the initial setup?

Incredibily easy. Just unpack, make sure you got the right drivers installed, and beware of other Java applications running.

What about the implementation team?

We simply did everything ourselves, with a little aid from the community.

What other advice do I have?

Make sure Pentaho solutions are still available as they were prior to the commercial take-over. Administration is not the best developed component . The ETL is brilliant. Make sure that the admin part is covered.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user382572
Pentaho Consultant at a comms service provider with 10,001+ employees
Vendor
It is an open source product it is very easy to build your own solution against it.

What is most valuable?

It is a very good open source ETL tool that's capable of connecting to most databases. It has a lot of functions that makes transforming the data very easy. Also, because it is an open source product, it is very easy to build your own solution with it.

How has it helped my organization?

It is also possible to build a new solution quit quick so the customer sees results quite fast.

What needs improvement?

In the community version the scheduling tool is not good, and we had to build it ourselves.

For how long have I used the solution?

I have worked with different versions of Pentaho since 2009.

What was my experience with deployment of the solution?

There are a couple of bugs in the newer versions. We were forced to wait until those bugs were fixed before we could upgrade.

What do I think about the stability of the solution?

There were no issues with its stability.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

Because we use the community edition, there is no support from the vendor. When I worked with the Enterprise edition last year the technical support was quick and to the point. I was more than happy with their knowledge.

Which solution did I use previously and why did I switch?

In the past I also worked with SAP BI. The main reason we switched to Pentaho was the cost of SAP. Because of the flexibility of Pentaho, I prefer to work with it.

How was the initial setup?

When I started using Pentaho in 2009 the initial setup was quit complex, mainly because of a lack of good documentation at that time. Since then, it has dramatically improved. Also the community on the web is quit active and there are some good blogs.

What about the implementation team?

I was hired to do the implementation. I think it is necessary to have a good understanding of the product to implement is well so I would recommend, when not in-house, to hire the appropriate knowledge

What other advice do I have?

When you don’t have the knowledge of the product I would recommend to follow some courses in to speed up the learning curve. A cheap way to start with Pentaho is using the Community Edition. You can do almost everything with it and the purchase of the Enterprise Edition is not necessary

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Data Developer at a tech services company with 10,001+ employees
Consultant
It is possible to understand how to develop an ETL solution even when using it for the first time.

What is most valuable?

  • Pentaho Kettle has a very intuitive and easy to use graphical user interface (GUI)
  • It is possible to understand how to develop an ETL solution even when using it for the first time
  • The Community Edition is free and very efficient
  • They have versions for Windows, Linux and Mac
  • Large selection of options.

How has it helped my organization?

We have developed some complex ETL processes for some clients and they are very satisfied with the results.

What needs improvement?

They could improve the logging generator. Sometimes the error description is so generic that it is not possible to detect the problem.

For how long have I used the solution?

We've used it for three years.

What was my experience with deployment of the solution?

There were no issues with the deployment.

What do I think about the stability of the solution?

There were no issues with the stability.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

I use the Community Edition without support or customer service. I recommend the Pentaho Community Forums for technical issues.

Which solution did I use previously and why did I switch?

I have used Informatica PowerCenter, which is an excellent solution. However, it´s not so easy to use as Pentaho kettle.

How was the initial setup?

The initial setup is straightforward. All you need to do is to download it, unzip the file into a folder and execute the Spoon.bat (for Windows) or Spoon.sh (for Linux) to start the graphical user interface (GUI).

What about the implementation team?

In-house. The implementation is very simple. Data developers will not encounter difficulties to implement ETL solutions.

What's my experience with pricing, setup cost, and licensing?

The community edition is free. If you need a full BI solution, I would recommend the enterprise edition.

What other advice do I have?

Pentaho Kettle is an excellent solution to implement ETL process.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Consultant at a tech vendor with 501-1,000 employees
Vendor
It's open source so there's no concern for pricing and licensing, and we've deployed it with minimal hardware.

What is most valuable?

  • It has a nice GUI that anyone can learn to use in just a few days with minimal training.
  • It has great support for big data technologies; Pentaho 5.3 comes with support for HBase, Pig, Oozie, and Hadoop distribution support.

How has it helped my organization?

  • It's an open-source tool, so you don't need to worry about licensing costs.
  • We've deployed it with very minimal hardware.
  • We have migrated one of the key project from Microsoft BI to Pentaho Data integration, this saved lot of money as well there were much improvement in performance as well.

What needs improvement?

The rule executor step can be improved. It has one limitation: we cannot give it a dynamic file name in this step.

For how long have I used the solution?

I've used it for 4.5 years.

What was my experience with deployment of the solution?

We've had no issues with deployment.

What do I think about the stability of the solution?

We've had no issues with stability.

What do I think about the scalability of the solution?

We've had no issues with scalability.

Which solution did I use previously and why did I switch?

We used MSBI and we switched to Pentaho Data Integration and it has saved us lot of money and time.

How was the initial setup?

For Pentaho, the initial setup is very straightforward.

What about the implementation team?

We did it in-house.

What's my experience with pricing, setup cost, and licensing?

Because it's open source, there's no issue of pricing or licensing.

What other advice do I have?

Use if for any data warehousing and migration projects. I love this tool and we can use it without spending a penny.

I would say this is the best ETL tool in the market, considering this is open source and ease to use, very nice GUI.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
CEO with 51-200 employees
Vendor
Easy to use and has a nice GUI. The json input needs to perform better.

What is most valuable?

Ease of use, stability, graphical interface, small amount of "voodoo" and cost.

What needs improvement?

There some steps that should perform better like the json input, but because of the flexibility we at inflow, override it by using scripting steps. Of course it's ideal to use the steps that come with the software but if you can write your own step that's powerful. Also, it would be nice to have the drivers for the data sources shipped with Pentaho Kettle instead of looking for the right ones on the Internet.

What was my experience with deployment of the solution?

In every project there are issues with the deployment, but we were able to overcome them.

What do I think about the stability of the solution?

I think that Pentaho Kettle is a stable software, if it wasn't, we wouldn't have used it (because we don’t like angry customers).

What do I think about the scalability of the solution?

Actually, Pentaho Kettle comes equipped with the option to scale out, out of the box.
And no, we didn't encountered specific scalability problems.

How are customer service and technical support?

Customer Service:

We mainly use material which was written over the years (pentaho kettle materials), the forum, Matt casters blog, videos, etc. We also try to solve our issues inside the company for our customers before contacting customer service. We even developed a full-scale data integration Pentaho Kettle online course and built a website for it.

When we use the customer service it's very good. There is a large community for the tool, people gladly help each another.

Technical Support:

Very good support.

Which solution did I use previously and why did I switch?

Before Pentaho Kettle we used stored procedures, writing code and also Informatica. Informatica is a very good tool, but it is not open source so it is far more costly compared to Pentaho Kettle. From my perspective I don't see the difference, we can do almost everything with Pentaho Kettle and if we need a little extra we are tech guys, we solve it.

Of course that from the customer's perspective the cheaper the better, so if the customer has a smaller budget, they get more when using Pentaho Kettle open source. Even with the Pentaho Kettle enterprise edition.

What's my experience with pricing, setup cost, and licensing?

I can say from the vendor perspective- usually the part of the data integration (from data source to the warehouse/target) takes at least 60% of the whole initial business intelligence project. It depends on the data sources and complexity, for example: big data, NoSql, xml, web services, "weird" files and more.

After the data integration project is "live" it will work fine until someone breaks something. (Network connectivity, servers, DBA that changes the data source, or any other change for that matter that changes variables that the data integration was built upon) but this is true for all data integration software.

The day-to-day costs are very low if there are no new requirements. Luckily for us (as a vendor) once the customer starts and the users get their fancy reports and dashboards there's no turning back, and the requirements are piling up. But these are new requirements, not maintenance.

What other advice do I have?

Instead of trying to decide on a specific data integration tool, pick the right vendor partner, not a biased one. They will be able to recommend the set of tools you need according to your requirements and budget.

Business intelligence project are made up of at least three components:

  • 1. Data integration tool
  • 2. Data warehouse tool
  • 3. Visualization tool

Several of the software vendors have them all, but not the best solution for each component. From my experience it's better to combine solutions. (Unless it is a small project.)

For example: data integration from Pentaho Kettle, if it's big data we need an in memory/ columnar database for data warehouse but if it's not we can use traditional databases (SQL Server, Oracle, even MySQL for smaller projects) and a BI visualization tool like Yellowfin/Tableau/Sisense/etc.

In the middle you have tens of software vendors that can be suitable for the customer needs.

Of course if the vendor partner is biased then suddenly Tableau/Sisense/Qlikview/etc. become the best data integration tool. Or "you don’t need a data integration tool at all" although they don't have the right components. (They are very good tools for visualization but not for "playing" extract and transform complex data). We work with several vendors such as Sisense and Yellowfin which are are great tool for the specific solution they were made for.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
CTO at a tech services company with 51-200 employees
Consultant
Top 5
For me, it's the best ETL tool in the world

What is most valuable?

Easy to use, support for all databases (jdbc and odbc connection), xls , csv, files, txt, SAS, R

How has it helped my organization?

Integrate all datasources in one OLTP or OLAP database

For how long have I used the solution?

4 years

What was my experience with deployment of the solution?

None

What do I think about the stability of the solution?

None

What do I think about the scalability of the solution?

None

How are customer service and technical support?

Customer Service: 5/10Technical Support: 10/10

Which solution did I use previously and why did I switch?

Talend Studio.

How was the initial setup?

Easy

What was our ROI?

100% (PDI CE)

Which other solutions did I evaluate?

Talend Studio

What is most valuable?

Easy to use, support for all databases (jdbc and odbc connection), xls , csv, files, txt, SAS, R

How has it helped my organization?

Integrate all datasources in one OLTP or OLAP database

For how long have I used the solution?

4 years

What was my experience with deployment of the solution?

None

What do I think about the stability of the solution?

None

What do I think about the scalability of the solution?

None

How are customer service and technical support?

Customer Service: 5/10Technical Support: 10/10

Which solution did I use previously and why did I switch?

Talend Studio.

How was the initial setup?

Easy

What was our ROI?

100% (PDI CE)

Which other solutions did I evaluate?

Talend Studio
Disclosure: My company has a business relationship with this vendor other than being a customer: EspriSûr Consultants