Pentaho Data Integration and Analytics Benefits
At the start of my team's journey at the company, it was difficult to do cross-platform storage analytics. That means ingesting data from different analytics sources inside a single storage machine and building out KPIs and some other analytics.
Pentaho was a good start because we can create different connections and import data. We can then do some global queries on that data from various sources. We've been able to replace some of our other data tools like Talend for our managing data warehouse workflow. Later, we adopted some other cloud technologies, so we don't primarily use Pentaho for those use cases anymore.
View full review »The biggest benefit is that it's a low-code solution. When you hire junior ETL developers or engineers, who may have a schooling background but no real experience with ETL or coding for ETL, it's a UI-based, low-code solution in which they can make something happen within weeks instead of, potentially, months.
Because it's low-code, while I could technically have done everything in Python alone, that would definitely have taken longer than using Pentaho. In addition, by being able to standardize pipelines to handle the onboarding process for new clients, development costs were significantly reduced. To put in perspective, prior to my leading the effort to standardize things, it would typically take about a week to build a feed from start to finish, and sometimes more depending on how complicated it was. With this solution, instead of it taking a week, it was reduced to an afternoon, or about three hours. That was a significant difference.
Instead of paying a developer a full week's worth of work, which could be $2,500 or more, it cut it down to three hours or about $300. That's a big difference.
View full review »PR
PhilipRobinson
Senior Engineer at a comms service provider with 501-1,000 employees
It enables us to create low-code pipelines without custom coding efforts. A lot of transformations are quite straightforward because there are a lot of built-in connectors, which is really good. It has got connectors to Salesforce, which makes it very easy for us to wire up a connection to Salesforce and scrape all of that data into another table. Their flows have got absolutely no code in them. It has a Python integrator, and if you want to go into a coding environment, you've got your choice of writing in Java or Python.
The creation of low-code pipelines is quite important. We have around 200 external data sets that we query and pull the data from on a daily basis. The low-code environment makes it easier for our support function to maintain it because they can open up a transformation and very easily see what that transformation is doing, rather than having to troll through reams and reams of code. ETLs written purely in code become very difficult to trace very quickly. You spend a lot of time trying to unpick it. They never get commented on as well as you'd expect, whereas, with a low-code environment, you have your transformation there, and it almost self documents itself. So, it is much easier for somebody who didn't write the original transformation to pick that up later on.
We reuse various components. For instance, we might develop a transformation that does a lookup based on the domain name to match to a consumer record, and then we can repeat that bit of code in multiple transformations.
We have a metadata-driven framework. Most of what we do is metadata-driven, which is quite important because that allows us to describe all of our data flows. For example, Table one moves to Table two, Table two moves to table three, etc. Because we've got metadata that explains all of those steps, it helps people investigate where the data comes from and allows us to publish reports that show, "You've got this end metric here, and this is where the data that drives that metric came from." The variable substitution that Pentaho has to allow metadata-driven frameworks is definitely a key feature that Pentaho offers.
The ability to automate data pipeline templates affects our productivity and costs. We run a lot of processes, and if it wasn't reliable, it would take a lot more effort. We would need a lot bigger team to support the 200 integrations that we run every day. Because it is a low-code environment, we don't have to have support instances escalated to the third line support to be investigated, which affects the cost. Very often our support analysts or more junior members are able to look into what an issue is and fix it themselves without having to escalate it to a more senior developer.
The automation of data pipeline templates affects our ability to scale the onboarding of data because after we've done a few different approaches and we get new requirements, they fit into a standard approach. It gives us the ability to scale with code and reuse, which also ties in with the metadata aspect of things. A lot of our intermediate stages of processing data are purely configured in metadata, so in order to implement transformation, no custom coding is required. It is really just writing a few lines of metadata to drive the process, and that gives us quite a big efficiency.
It has certainly reduced our ETL development time. I've worked at other places that had a similar-sized team to manage a system with a much lesser number of integrations. We've certainly managed to scale Pentaho not just for the number of things we do but also for the type of things we do.
We do the obvious direct database connections, but there is a whole raft of different types of integrations that we've developed over time. We have REST APIs, and we download data from Excel files that are hosted in SharePoint. We collect data from S3 buckets in Amazon, and we collect data from Google Analytics and other Google services. We've not come across anything that we've not been able to do with Pentaho. It has proved to be a very flexible way of getting data from anywhere.
Our time savings are probably quite significant. By using some of the components that we've already got written, our developers are able to, for instance, put in a transformation from a staging area to its model data area. They are probably able to put something in place in an hour or a couple of hours. If they were starting from a blank piece of paper, that would be several days worth of work.
View full review »Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,415 professionals have used our research since 2012.
When we get a question from our CEO that needs a response and that requires a little bit of legwork of pulling in from various market data, our own in-house repositories, and everything else, it allows me to arrive at the solutions much faster than having to do it through scripting in Python, coding, or anything else. I use multiple tools within my toolkit. I'm pretty heavy on Python, but I find that I can do quite a bit of pre-transformation of the data within the actual application for PDI Spoon than having to do everything through coding in Python.
It has significantly reduced our ETL development time. I can't really quantify the hours, but it's a no-brainer for me for just pumping in things. If I have a simple question to ascertain, I can pull up and create any type of job or transform to easily get the solution within minutes, as opposed to however many hours of coding it would take. My estimate is that per week, I would be spending about 75% of my time in coding external to the application, whereas, with the application itself, I can do things within a fraction of that. So, it has reduced my time from 75% to about 5%. In terms of the cost of full-time employee coding and everything, the savings would also roughly be the same, which is from 75% to 5% per week. There is also a broader impact on other colleagues within my team. Currently, their processes are fairly manual, such as Excel-based, so the time savings are carried over to them as well.
I work with a lot of data. We have about 50 terabytes of information, and working with Pentaho Data Integration along with other databases is very fast.
Previously, I had three people to collect all the data and integrate all Excel spreadsheets. To give me a dashboard with the information that I need, it took them a day or two. Now, I can do this work in about 15 minutes.
It enables us to create pipelines with minimal manual coding or custom coding efforts, which is one of its best features. Pentaho is one of the few tools with which you can do anything you can imagine. Our business is changing all the time, and it is best for our business if I can use less time to develop new pipelines.
It provides the ability to develop and deploy data pipeline templates once and reuse them. I use them at least once a day. It makes my daily life easier when it comes to data pipelines.
Previously, I have used other tools such as Integration Services from Microsoft, Data Services for SAP, and Informatica. Pentaho reduces the ETL implementation time by 5% to 50%.
View full review »VK
reviewer995501455
Solution Integration Consultant II at a tech vendor with 201-500 employees
We have been able to reduce the effort required to build sophisticated ETLs. Also, we now are in the migration phase from an on-prem product to a cloud-native application.
We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines.
TJ
Tobias Johnson
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees
We've had it for a long time. So, we've realized a lot of the improvements that anybody would realize from almost any data integration product.
The speed of developing solutions has been the best improvement. It has reduced the development time and improved the speed of getting solutions deployed. The reduced ETL development time varies by the size and complexity of the project. We probably spend days or weeks less than then if we were using a different tool.
It is tremendously flexible in terms of adding custom code by using a variety of different languages if you want to, but we had relatively few scenarios where we needed it. We do very little custom coding. Because of the tool we're using, it is not critical. We have developed thousands of transformations and jobs in the tool.
Our data flow processes became faster with this solution.
View full review »RV
Rodrigo Vazquez
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created.
Right now, I am working in the business intelligence area. However, we use BI in all our companies. So, it is not only in one area. So, I create different data parts for different business units, e.g., HR, IT, sales, and marketing.
View full review »In my current company, it does not have any major impact. We use it for old and simple ETLs only.
In terms of setting the ETL tools that we put on the panel, it's quite useful. However, this kind of functionality that we currently put on the solution can be easily switched by other tools that exist on the market. It's time to change it entirely to Airflow. We'll likely change in the next six months.
RK
reviewer1872000
Senior Data Analyst at a tech services company with 51-200 employees
Before we used Pentaho, our processes were in Microsoft Excel and the updates from databases had to be done manually. Now all our routines are done automatically and we have more time to do other jobs. It saves us four or five hours daily.
In terms of ETL development time, it depends on the complexity of the job, but if the job is simple it saves two or three hours.
View full review »There are a lot of different benefits we receive from using this solution. For example, we can easily accept data from an API and create JSON files. The integration is also very good.
I have created many data pipelines and after they are created, they can be reused on different levels.
View full review »Lumada provides us with a single, end-to-end data management experience from ingestion to insights. This single data management experience is pretty good because then you don't have every analyst doing their own stuff. When you have one unique tool to do that, you can keep improving as well as have good practices and a solid process to do the projects.
View full review »We needed to gather data from many servers at my company. We had probably 10 or 12 equivalent databases spread around the world, i.e., Brazil, Paraguay, or Chile, and had an instance in each country. So, this server is Microsoft SQL Server-based. We are using Lumada to get the data from these international databases. We can parallelize the extraction from various servers at the same time because we have the same structure, schemas, and tables in each of these SQL Server-based servers. This provides a good value for us, as we can extract data at the same time in parallel, which accelerates our extraction.
In one integration process, I can retrieve data from 10 or 12 servers at the same time in one transformation. In the past, using SQL Server or other manual tools, we needed to have 10 or 12 different processes, one per server. Using Lumada in parallel accelerates our extraction. The tools that Lumada provides enable us to transform the data during this process, integrating the data in our data warehouse with good performance.
Because Lumada uses Java virtual machines, we can deploy and operate in whatever operational system that we want. We can deploy on Linux, even when we had a Linux version from Lumada and a Windows version from Lumada.
It is simple to deploy my ETLs because Lumada has the Pentaho Server version. I installed the desktop version so we can deploy our transformations in the repository. We install our own Lumada on a server, then we have a web interface to schedule our ETLs. We are also able to reschedule our ETLs. We can schedule the hour that we want to run our ETL processes and transformations. We can schedule how many times we want to process the data. We can save all our transformations in a repository located in a Pentaho Server. Since we have a repository, we can save many versions of our transformation, such as 1.0, 1.1, and 1.2, in the repository. I can save four or five versions of a transformation. I can ask Lumada to run only the last version that I saved in the database.
Lumada offers a web interface to follow these transformations. We can check the logs to see if the transformations were successfully completed, we had a network query, or some database log issues. Using Lumada, there is a feature where we can get logs at the execution time. We can also be notified by email if transformations occurred successfully or failed. We have a file for each process that we schedule on Pentaho Server.
The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data.
View full review »RE
reviewer1855218
Data Architect at a consumer goods company with 1,001-5,000 employees
People are now able to get access to the data when they need it. That is what is most important. All the reports go out on time.
The solution enables us to use one tool that gives a single, end-to-end data management experience from ingestion to insights. From the reporting point of view, we are able to make our customers happy. Are they able to get their reports in time? Are they able to get access to the data that they need on time? Yes. They're happy, we're happy, that's it.
With the automation of everything, if I start breaking it into numbers, we don't have to hire three or four people to do one simple task. We've been able to develop some generic IT processes so that we don't have to reinvent the wheel. I just have to extend the existing pipeline and customize it to whatever requirements I have at that point in time. Otherwise, whenever we would get a project, we would actually have to reinvent the wheel from scratch. Now, the generic pipeline templates that we can reuse save us so much time and money.
It has also reduced our ETL development time by 40 percent, and that translates into cost savings.
Before we used Pentaho, we used to do some of this stuff manually, and some of the ETL jobs would run for hours, but most of the ETL jobs, like the monthly reports, now run within 45 minutes, which is pretty awesome. Everything that we used to do manually is now orchestrated.
And now, with everything in the cloud, any concerns about hardware are taken care of for us. That helps with maintenance costs.
View full review »NA
reviewer1751571
Systems Analyst at a university with 5,001-10,000 employees
Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.
And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs.
View full review »KM
Krisjanis Muskars
Data Architect at a tech services company with 1,001-5,000 employees
As a result of one of the projects that we did in the Middle East, we achieved the main goal of fully digitalizing their population census. They did previous censuses doing door-to-door surveys, but for the last census, using Pentaho Data Integration, we managed to get it all running in a fully digital way, with nothing on paper forms. No one had to go door-to-door and survey the people.
View full review »ES
Eric Smets
System Engineer at a tech services company with 11-50 employees
Before, a lot of manual work had to be done, work that isn't done anymore. We have also given additional reports to the end-users and, based upon them, they have to take some action. Based on the feedback of the users, some of the data cleaning tasks that were done manually have been automated. It has also given us a fast response to new data that is introduced into the organization.
Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent. And when it comes to personnel costs, we have gained 10 percent.
View full review »SK
Stephen Knox
Lead, Data and BI Architect at a financial services firm with 201-500 employees
I love the fact that we haven't come up with a problem yet that we haven't been able to address with this tool. I really appreciate its maturity and the breadth of its capabilities.
If we did not have this tool, we would probably have to use a whole different variety of tools, then our environment would be a lot more complicated.
We develop metadata pipelines and use them.
Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us.
View full review »DG
reviewer1772286
Director of Software Engineering at a healthcare company with 10,001+ employees
This was an OEM solution for our product. The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product.
View full review »VM
reviewer1510395
Technical Manager at a computer software company with 51-200 employees
As we are a software company, we are using the tools provided with the Pentaho Data Integration for our various teams.
View full review »TG
Tracy Gettings
Analytics Team Leader at HealtheLink
The solution has allowed us to automate reporting by automating its scheduling.
It is also important to us that the solution enables you to leverage metadata to automate data pipeline templates and reuse them. It allows us to generate reports with fewer resources.
If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do. It's very important for us that it provides a single, end-to-end data management experience from ingestion to insights. We are a high-volume department and without those features, we wouldn't be able to manage the current workload.
View full review »The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.
View full review »Now developers focus on improving it as a tool (since it's open source) and teach Project Managers about it. The Project Managers are the ones responsible for their own ETL jobs as they know what they want, so hence it's best for them to manage their own jobs.
View full review »It is also possible to build a new solution quit quick so the customer sees results quite fast.
View full review »We have developed some complex ETL processes for some clients and they are very satisfied with the results.
View full review »It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.
View full review »We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools. We also build our Customer Centralized File and Data Quality Studio using it. What’s more, we use it for small solutions too, i.e. if we want to quickly export data from database to .xlsx. We also develop our own plugins for PDI and put them into the marketplace.
View full review »We Implement Pentaho for data warehouses and BI features for our various customers. No software can give as complete functionality for fulfilling end user requirements as Pentaho. As well as this, Pentaho offers a flexible platform which enables us to extend the tool to any of the end user's requirement.
Another impressive feature is the Big Data implementation/integration is very quick and simple without the need to write any code. This enabled our clients to get maximum ROI with in a short period.
View full review »
Integrate all datasources in one OLTP or OLAP database
View full review »
It gave ‘out-of-the-box’ widgets for reading XML and JSON interfaces which would otherwise have to be build from scratch.
View full review »We use Pentaho for data integration, but also PI to implement data mining. That has improved the intelligence behind the data. So, we are able to provide our customer with the ability to understand their data. Our customer produces terabytes of data, so arranging the data, cleaning the data, on data integration, aided our customer to understand the data to improve their business.
Developed ETL processes to load a data warehouse. Has improved our data integration capabilities.
View full review »We have been able to expose data services through the use of CDA relying on the same database as the reporting tools, thus avoiding inconsistencies among the data shown by reports and data acquired by external systems.
View full review »We have a huge amount of data that needs to be cleaned and made more valuable for our organization. This Data Integration helps us to achieve that goal.
View full review »It enables us to automate our reporting and ETL to a very high extent.
- It's reduced our costs
- With self-service we can save time
- Open plug-ins contributors
I am a researcher in the field of data integration, and I am using this tool as a sandbox. I would say, because it is open source and high availability of forums and support has made my work really easy. Also, the reporting and analysis functionality provided gives me more freedom to test my test cases and results.
View full review »- It's an open-source tool, so you don't need to worry about licensing costs.
- We've deployed it with very minimal hardware.
- We have migrated one of the key project from Microsoft BI to Pentaho Data integration, this saved lot of money as well there were much improvement in performance as well.
Integration between databases and data import for a BI solution.
View full review »We never used a data integration or BI platform before, and struggled with lots of Excel spreadsheets and CSV files. So when we first used Pentaho to automate a data-integration flow, we were stunned with how fast and how easy it was. We are very productive today thanks to that piece of software integration our data and the platform serving the processed data to our users.
View full review »Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,415 professionals have used our research since 2012.