Pentaho Data Integration and Analytics Benefits

Jacopo Zaccariotto - PeerSpot reviewer
Head of Data Engineering at InfoCert

At the start of my team's journey at the company, it was difficult to do cross-platform storage analytics. That means ingesting data from different analytics sources inside a single storage machine and building out KPIs and some other analytics. 

Pentaho was a good start because we can create different connections and import data. We can then do some global queries on that data from various sources. We've been able to replace some of our other data tools like Talend for our managing data warehouse workflow. Later, we adopted some other cloud technologies, so we don't primarily use Pentaho for those use cases anymore. 

View full review »
Ryan Ferdon - PeerSpot reviewer
Senior Data Engineer at Burgiss

The biggest benefit is that it's a low-code solution. When you hire junior ETL developers or engineers, who may have a schooling background but no real experience with ETL or coding for ETL, it's a UI-based, low-code solution in which they can make something happen within weeks instead of, potentially, months.

Because it's low-code, while I could technically have done everything in Python alone, that would definitely have taken longer than using Pentaho. In addition, by being able to standardize pipelines to handle the onboarding process for new clients, development costs were significantly reduced. To put in perspective, prior to my leading the effort to standardize things, it would typically take about a week to build a feed from start to finish, and sometimes more depending on how complicated it was. With this solution, instead of it taking a week, it was reduced to an afternoon, or about three hours. That was a significant difference.

Instead of paying a developer a full week's worth of work, which could be $2,500 or more, it cut it down to three hours or about $300. That's a big difference.

View full review »
PR
Senior Engineer at a comms service provider with 501-1,000 employees

It enables us to create low-code pipelines without custom coding efforts. A lot of transformations are quite straightforward because there are a lot of built-in connectors, which is really good. It has got connectors to Salesforce, which makes it very easy for us to wire up a connection to Salesforce and scrape all of that data into another table. Their flows have got absolutely no code in them. It has a Python integrator, and if you want to go into a coding environment, you've got your choice of writing in Java or Python.

The creation of low-code pipelines is quite important. We have around 200 external data sets that we query and pull the data from on a daily basis. The low-code environment makes it easier for our support function to maintain it because they can open up a transformation and very easily see what that transformation is doing, rather than having to troll through reams and reams of code. ETLs written purely in code become very difficult to trace very quickly. You spend a lot of time trying to unpick it. They never get commented on as well as you'd expect, whereas, with a low-code environment, you have your transformation there, and it almost self documents itself. So, it is much easier for somebody who didn't write the original transformation to pick that up later on.

We reuse various components. For instance, we might develop a transformation that does a lookup based on the domain name to match to a consumer record, and then we can repeat that bit of code in multiple transformations. 

We have a metadata-driven framework. Most of what we do is metadata-driven, which is quite important because that allows us to describe all of our data flows. For example, Table one moves to Table two, Table two moves to table three, etc. Because we've got metadata that explains all of those steps, it helps people investigate where the data comes from and allows us to publish reports that show, "You've got this end metric here, and this is where the data that drives that metric came from." The variable substitution that Pentaho has to allow metadata-driven frameworks is definitely a key feature that Pentaho offers.

The ability to automate data pipeline templates affects our productivity and costs. We run a lot of processes, and if it wasn't reliable, it would take a lot more effort. We would need a lot bigger team to support the 200 integrations that we run every day. Because it is a low-code environment, we don't have to have support instances escalated to the third line support to be investigated, which affects the cost. Very often our support analysts or more junior members are able to look into what an issue is and fix it themselves without having to escalate it to a more senior developer.

The automation of data pipeline templates affects our ability to scale the onboarding of data because after we've done a few different approaches and we get new requirements, they fit into a standard approach. It gives us the ability to scale with code and reuse, which also ties in with the metadata aspect of things. A lot of our intermediate stages of processing data are purely configured in metadata, so in order to implement transformation, no custom coding is required. It is really just writing a few lines of metadata to drive the process, and that gives us quite a big efficiency.

It has certainly reduced our ETL development time. I've worked at other places that had a similar-sized team to manage a system with a much lesser number of integrations. We've certainly managed to scale Pentaho not just for the number of things we do but also for the type of things we do.

We do the obvious direct database connections, but there is a whole raft of different types of integrations that we've developed over time. We have REST APIs, and we download data from Excel files that are hosted in SharePoint. We collect data from S3 buckets in Amazon, and we collect data from Google Analytics and other Google services. We've not come across anything that we've not been able to do with Pentaho. It has proved to be a very flexible way of getting data from anywhere.

Our time savings are probably quite significant. By using some of the components that we've already got written, our developers are able to, for instance, put in a transformation from a staging area to its model data area. They are probably able to put something in place in an hour or a couple of hours. If they were starting from a blank piece of paper, that would be several days worth of work.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,415 professionals have used our research since 2012.
Dale Bloom - PeerSpot reviewer
Credit Risk Analytics Manager at MarketAxess

When we get a question from our CEO that needs a response and that requires a little bit of legwork of pulling in from various market data, our own in-house repositories, and everything else, it allows me to arrive at the solutions much faster than having to do it through scripting in Python, coding, or anything else. I use multiple tools within my toolkit. I'm pretty heavy on Python, but I find that I can do quite a bit of pre-transformation of the data within the actual application for PDI Spoon than having to do everything through coding in Python.

It has significantly reduced our ETL development time. I can't really quantify the hours, but it's a no-brainer for me for just pumping in things. If I have a simple question to ascertain, I can pull up and create any type of job or transform to easily get the solution within minutes, as opposed to however many hours of coding it would take. My estimate is that per week, I would be spending about 75% of my time in coding external to the application, whereas, with the application itself, I can do things within a fraction of that. So, it has reduced my time from 75% to about 5%. In terms of the cost of full-time employee coding and everything, the savings would also roughly be the same, which is from 75% to 5% per week. There is also a broader impact on other colleagues within my team. Currently, their processes are fairly manual, such as Excel-based, so the time savings are carried over to them as well.

View full review »
RicardoDíaz - PeerSpot reviewer
COO / CTO at a tech services company with 11-50 employees

I work with a lot of data. We have about 50 terabytes of information, and working with Pentaho Data Integration along with other databases is very fast.

Previously, I had three people to collect all the data and integrate all Excel spreadsheets. To give me a dashboard with the information that I need, it took them a day or two. Now, I can do this work in about 15 minutes.

It enables us to create pipelines with minimal manual coding or custom coding efforts, which is one of its best features. Pentaho is one of the few tools with which you can do anything you can imagine. Our business is changing all the time, and it is best for our business if I can use less time to develop new pipelines.

It provides the ability to develop and deploy data pipeline templates once and reuse them. I use them at least once a day. It makes my daily life easier when it comes to data pipelines.

Previously, I have used other tools such as Integration Services from Microsoft, Data Services for SAP, and Informatica. Pentaho reduces the ETL implementation time by 5% to 50%.

View full review »
VK
Solution Integration Consultant II at a tech vendor with 201-500 employees

We have been able to reduce the effort required to build sophisticated ETLs. Also, we now are in the migration phase from an on-prem product to a cloud-native application. 

We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines.

View full review »
TJ
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees

We've had it for a long time. So, we've realized a lot of the improvements that anybody would realize from almost any data integration product.

The speed of developing solutions has been the best improvement. It has reduced the development time and improved the speed of getting solutions deployed. The reduced ETL development time varies by the size and complexity of the project. We probably spend days or weeks less than then if we were using a different tool.

It is tremendously flexible in terms of adding custom code by using a variety of different languages if you want to, but we had relatively few scenarios where we needed it. We do very little custom coding. Because of the tool we're using, it is not critical. We have developed thousands of transformations and jobs in the tool.

View full review »
Anton Abrarov - PeerSpot reviewer
Project Leader at a mining and metals company with 10,001+ employees

Our data flow processes became faster with this solution.

View full review »
RV
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees

I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created.

Right now, I am working in the business intelligence area. However, we use BI in all our companies. So, it is not only in one area. So, I create different data parts for different business units, e.g., HR, IT, sales, and marketing.

View full review »
Ridwan Saeful Rohman - PeerSpot reviewer
Data Engineering Associate Manager at Zalora Group

In my current company, it does not have any major impact. We use it for old and simple ETLs only.

In terms of setting the ETL tools that we put on the panel, it's quite useful. However, this kind of functionality that we currently put on the solution can be easily switched by other tools that exist on the market. It's time to change it entirely to Airflow. We'll likely change in the next six months. 

View full review »
RK
Senior Data Analyst at a tech services company with 51-200 employees

Before we used Pentaho, our processes were in Microsoft Excel and the updates from databases had to be done manually. Now all our routines are done automatically and we have more time to do other jobs. It saves us four or five hours daily.

In terms of ETL development time, it depends on the complexity of the job, but if the job is simple it saves two or three hours.

View full review »
Aqeel UR Rehman - PeerSpot reviewer
BI Analyst at Vroozi

There are a lot of different benefits we receive from using this solution. For example, we can easily accept data from an API and create JSON files. The integration is also very good.

I have created many data pipelines and after they are created, they can be reused on different levels.

View full review »
Renan Guedert - PeerSpot reviewer
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees

Lumada provides us with a single, end-to-end data management experience from ingestion to insights. This single data management experience is pretty good because then you don't have every analyst doing their own stuff. When you have one unique tool to do that, you can keep improving as well as have good practices and a solid process to do the projects.

View full review »
José Orlando Maia - PeerSpot reviewer
Data Engineer at a tech services company with 201-500 employees

We needed to gather data from many servers at my company. We had probably 10 or 12 equivalent databases spread around the world, i.e., Brazil, Paraguay, or Chile, and had an instance in each country. So, this server is Microsoft SQL Server-based. We are using Lumada to get the data from these international databases. We can parallelize the extraction from various servers at the same time because we have the same structure, schemas, and tables in each of these SQL Server-based servers. This provides a good value for us, as we can extract data at the same time in parallel, which accelerates our extraction.

In one integration process, I can retrieve data from 10 or 12 servers at the same time in one transformation. In the past, using SQL Server or other manual tools, we needed to have 10 or 12 different processes, one per server. Using Lumada in parallel accelerates our extraction. The tools that Lumada provides enable us to transform the data during this process, integrating the data in our data warehouse with good performance. 

Because Lumada uses Java virtual machines, we can deploy and operate in whatever operational system that we want. We can deploy on Linux, even when we had a Linux version from Lumada and a Windows version from Lumada.

It is simple to deploy my ETLs because Lumada has the Pentaho Server version. I installed the desktop version so we can deploy our transformations in the repository. We install our own Lumada on a server, then we have a web interface to schedule our ETLs. We are also able to reschedule our ETLs. We can schedule the hour that we want to run our ETL processes and transformations. We can schedule how many times we want to process the data. We can save all our transformations in a repository located in a Pentaho Server. Since we have a repository, we can save many versions of our transformation, such as 1.0, 1.1, and 1.2, in the repository. I can save four or five versions of a transformation. I can ask Lumada to run only the last version that I saved in the database. 

Lumada offers a web interface to follow these transformations. We can check the logs to see if the transformations were successfully completed, we had a network query, or some database log issues. Using Lumada, there is a feature where we can get logs at the execution time. We can also be notified by email if transformations occurred successfully or failed. We have a file for each process that we schedule on Pentaho Server.

The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data.

View full review »
RE
Data Architect at a consumer goods company with 1,001-5,000 employees

People are now able to get access to the data when they need it. That is what is most important. All the reports go out on time.

The solution enables us to use one tool that gives a single, end-to-end data management experience from ingestion to insights. From the reporting point of view, we are able to make our customers happy. Are they able to get their reports in time? Are they able to get access to the data that they need on time? Yes. They're happy, we're happy, that's it.

With the automation of everything, if I start breaking it into numbers, we don't have to hire three or four people to do one simple task. We've been able to develop some generic IT processes so that we don't have to reinvent the wheel. I just have to extend the existing pipeline and customize it to whatever requirements I have at that point in time. Otherwise, whenever we would get a project, we would actually have to reinvent the wheel from scratch. Now, the generic pipeline templates that we can reuse save us so much time and money.

It has also reduced our ETL development time by 40 percent, and that translates into cost savings.

Before we used Pentaho, we used to do some of this stuff manually, and some of the ETL jobs would run for hours, but most of the ETL jobs, like the monthly reports, now run within 45 minutes, which is pretty awesome. Everything that we used to do manually is now orchestrated.

And now, with everything in the cloud, any concerns about hardware are taken care of for us. That helps with maintenance costs.

View full review »
NA
Systems Analyst at a university with 5,001-10,000 employees

Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.

And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs. 

View full review »
KM
Data Architect at a tech services company with 1,001-5,000 employees

As a result of one of the projects that we did in the Middle East, we achieved the main goal of fully digitalizing their population census. They did previous censuses doing door-to-door surveys, but for the last census, using Pentaho Data Integration, we managed to get it all running in a fully digital way, with nothing on paper forms. No one had to go door-to-door and survey the people.

View full review »
ES
System Engineer at a tech services company with 11-50 employees

Before, a lot of manual work had to be done, work that isn't done anymore. We have also given additional reports to the end-users and, based upon them, they have to take some action. Based on the feedback of the users, some of the data cleaning tasks that were done manually have been automated. It has also given us a fast response to new data that is introduced into the organization.

Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent. And when it comes to personnel costs, we have gained 10 percent.

View full review »
SK
Lead, Data and BI Architect at a financial services firm with 201-500 employees

I love the fact that we haven't come up with a problem yet that we haven't been able to address with this tool. I really appreciate its maturity and the breadth of its capabilities.

If we did not have this tool, we would probably have to use a whole different variety of tools, then our environment would be a lot more complicated.

We develop metadata pipelines and use them.

Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us. 

View full review »
DG
Director of Software Engineering at a healthcare company with 10,001+ employees

This was an OEM solution for our product. The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product.

View full review »
VM
Technical Manager at a computer software company with 51-200 employees

As we are a software company, we are using the tools provided with the Pentaho Data Integration for our various teams.

View full review »
TG
Analytics Team Leader at HealtheLink

The solution has allowed us to automate reporting by automating its scheduling. 

It is also important to us that the solution enables you to leverage metadata to automate data pipeline templates and reuse them. It allows us to generate reports with fewer resources.

If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do. It's very important for us that it provides a single, end-to-end data management experience from ingestion to insights. We are a high-volume department and without those features, we wouldn't be able to manage the current workload.

View full review »
it_user373128 - PeerSpot reviewer
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees

The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.

View full review »
it_user414117 - PeerSpot reviewer
Senior Data Engineer at a tech company with 501-1,000 employees

Now developers focus on improving it as a tool (since it's open source) and teach Project Managers about it. The Project Managers are the ones responsible for their own ETL jobs as they know what they want, so hence it's best for them to manage their own jobs.

View full review »
it_user382572 - PeerSpot reviewer
Pentaho Consultant at a comms service provider with 10,001+ employees

It is also possible to build a new solution quit quick so the customer sees results quite fast.

View full review »
it_user376926 - PeerSpot reviewer
Data Developer at a tech services company with 10,001+ employees

We have developed some complex ETL processes for some clients and they are very satisfied with the results.

View full review »
it_user396720 - PeerSpot reviewer
Graduate Teaching Assistant with 1,001-5,000 employees

It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.

View full review »
it_user391695 - PeerSpot reviewer
Business Intelligence Consultant at Sanmargar Team

We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools. We also build our Customer Centralized File and Data Quality Studio using it. What’s more, we use it for small solutions too, i.e. if we want to quickly export data from database to .xlsx. We also develop our own plugins for PDI and put them into the marketplace. 

View full review »
it_user426030 - PeerSpot reviewer
Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees

We Implement Pentaho for data warehouses and BI features for our various customers. No software can give as complete functionality for fulfilling end user requirements as Pentaho. As well as this, Pentaho offers a flexible platform which enables us to extend the tool to any of the end user's requirement. 

Another impressive feature is the Big Data implementation/integration is very quick and simple without the need to write any code. This enabled our clients to get maximum ROI with in a short period.

View full review »
Ricardo Díaz - PeerSpot reviewer
COO at a tech services company with 11-50 employees
Integrate all datasources in one OLTP or OLAP database View full review »
it_user384984 - PeerSpot reviewer
Sr BI Administrator at a healthcare company with 1,001-5,000 employees

It gave ‘out-of-the-box’ widgets for reading XML and JSON interfaces which would otherwise have to be build from scratch.

View full review »
it_user172275 - PeerSpot reviewer
Consultant at a comms service provider with 11-50 employees

We use Pentaho for data integration, but also PI to implement data mining. That has improved the intelligence behind the data. So, we are able to provide our customer with the ability to understand their data. Our customer produces terabytes of data, so arranging the data, cleaning the data, on data integration, aided our customer to understand the data to improve their business.

View full review »
it_user254223 - PeerSpot reviewer
Project Manager - Business Intelligence at www.datademy.es

Developed ETL processes to load a data warehouse. Has improved our data integration capabilities.

View full review »
it_user384993 - PeerSpot reviewer
Datawarehouse Administrator at a tech services company with 501-1,000 employees

We have been able to expose data services through the use of CDA relying on the same database as the reporting tools, thus avoiding inconsistencies among the data shown by reports and data acquired by external systems.

View full review »
it_user415695 - PeerSpot reviewer
Project Lead at a tech services company with 10,001+ employees

We have a huge amount of data that needs to be cleaned and made more valuable for our organization. This Data Integration helps us to achieve that goal.

View full review »
it_user426117 - PeerSpot reviewer
DWH Specialist at a healthcare company with 1,001-5,000 employees

It enables us to automate our reporting and ETL to a very high extent.

View full review »
it_user8199 - PeerSpot reviewer
BI developer - (Jaspersoft/Pentaho/Pentaho C-Tools/Kettle/Talend/Data warehouse) at a tech services company with 501-1,000 employees
  • It's reduced our costs
  • With self-service we can save time
  • Open plug-ins contributors
View full review »
it_user392367 - PeerSpot reviewer
Research Assistant at a university with 1,001-5,000 employees

I am a researcher in the field of data integration, and I am using this tool as a sandbox. I would say, because it is open source and high availability of forums and support has made my work really easy. Also, the reporting and analysis functionality provided gives me more freedom to test my test cases and results.

View full review »
it_user375219 - PeerSpot reviewer
Consultant at a tech vendor with 501-1,000 employees
  • It's an open-source tool, so you don't need to worry about licensing costs.
  • We've deployed it with very minimal hardware.
  • We have migrated one of the key project from Microsoft BI to Pentaho Data integration, this saved lot of money as well there were much improvement in performance as well.
View full review »
it_user369171 - PeerSpot reviewer
Brazil IT Coordinator at a transportation company with 1,001-5,000 employees

Integration between databases and data import for a BI solution.

View full review »
it_user386202 - PeerSpot reviewer
Business Intelligence Supervisor at a manufacturing company with 501-1,000 employees

We never used a data integration or BI platform before, and struggled with lots of Excel spreadsheets and CSV files. So when we first used Pentaho to automate a data-integration flow, we were stunned with how fast and how easy it was. We are very productive today thanks to that piece of software integration our data and the platform serving the processed data to our users.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,415 professionals have used our research since 2012.