Pentaho Data Integration and Analytics Scalability

DP
Enterprise Data Architect at a manufacturing company with 201-500 employees

Scaling out our processes hasn't been a big deal. We're a relatively small shop with only a couple of production databases. We're more of a regional enterprise, and I haven't had any issues with performance yet. It's always been some other product or solution that has gotten in the way. Lumada can handle anything we throw at it. Every night I run reports on our part ledger. That includes 200 million records, and Lumada can chew through it in about an hour and a half. 

I know we can extend processing into the Spark realm if we need to. We've thought about that but never really needed it. It's something we keep in our back pocket. Someone suggested trying it out, but it never really got off the ground because other more pressing needs came up. From what I've seen, it'll scale out to whatever I need it to do. Any limitations are in the backend rather than the software. I've done some metrics on it. It's the database that I have to wait on more than the software. It's not doing a whole lot CPU-wise. My limitations are elsewhere, usually.

Right now, we have about 100 users working with Lumada. About 100 people log in to the system, but probably 200 people get reports from it. Only about 50 use the analysis tools, including the top sales managers and all of the buying group. There are also some analysts from various groups who use it constantly. 

View full review »
Jacopo Zaccariotto - PeerSpot reviewer
Head of Data Engineering at InfoCert

I rate Pentaho six out of 10 for scalability. The scalability depends on how you deploy it. In our case, the on-premise virtual machine is relatively small and doesn't have a lot of resources. That is why Pentaho does not handle big datasets well in our case. 

I'm also unsure if we can deploy Pentaho in the cloud. So when you're not dealing with the cloud, scalability is always limited. We cannot indefinitely pump resources into a virtual machine.

Currently, we have five or six active workflows running each night. Some of them are ingesting data from ADU. Others take data from AWS Redshift or on-premise Oracle. In terms of people, three other people on the data engineering team and I are actively using Pentaho.

View full review »
Ryan Ferdon - PeerSpot reviewer
Senior Data Engineer at Burgiss

I think it could scale, but only up to a point. I didn't test it on larger datasets. But after talking to people who have worked on larger datasets, they wouldn't recommend using it, but that is hearsay.

In my former company, there were about five people in the data engineering department who were using the solution in their roles as ETL data integration Specialists.

In that company, it's their go-to solution and I think it will work for everything that they need. When I was there, I tried opening pathways to different things, but there were so many feeds already on it, and it worked for what they need, and it's low-code and open source, so I think they'll stick with it. As they gain more clients they'll increase their usage of it.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
March 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,847 professionals have used our research since 2012.
PR
Senior Engineer at a comms service provider with 501-1,000 employees

It meets our purposes. It does have horizontal scaling capability, but it is not something that we needed to use. We have lots of small-sized and medium-sized data sets. We don't have to deal with super large data sets. Where we do have some requirements for that, it works quite well. We can push some of that processing down onto our cloud provider. We've dealt with some of such issues by using S3, Athena, and Redshift. You can almost offload some of the big data processing to those platforms.

View full review »
Dale Bloom - PeerSpot reviewer
Credit Risk Analytics Manager at MarketAxess

It seems highly scalable. I've used the product in other firms, and we've managed to work pretty coherently pushing our changes for code, revisions, and everything else to Git and things like that.

In terms of users, currently, in my firm, I'm the only user, but the intention is to push it globally for all of our users to be able to use it. 

We would like to be able to support other teams and other departments within the organization. Currently, this is being used only for our credit risk team, but in general, within risk, we have many departments such as operational risk, enterprise risk, market risk, and credit risk. I'm bridging all of them right now. However, with other teams that have expressed an interest, it also will include our settlements team and potentially even our research team and FP&A.

View full review »
VK
Solution Integration Consultant II at a tech vendor with 201-500 employees

Lumada is flexible to deploy in any environment, whether on-premises or the cloud, which is very important. When we are processing data in batches on certain days, e.g., at the end of the week or month, we might have more data and need more processing power or RAM. However, most times, there might be very minimal usage of that CPU power. In that way, the solution has helped us to dynamically scale up, then scale down when we see that we have more data that we need to process.

The scalability is another key advantage of this product versus some of the others in the market since we can tweak and modify a number of parameters. We are really impressed with the scalability.

We have close to 80 people who are using this product actively. Their roles go all the way from junior developers to support engineers. We also have people who have very little coding knowledge and are more into the management side of things utilizing this tool.

View full review »
TJ
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees

Its scalability is very good. We've been running it for a long time, and we've got dozens, if not hundreds, of jobs running a day.

We probably have 200 or 300 people using it across all areas of the business. We have people in production control, finance, and what we call materials management. We have people in manufacturing, procurement, and of course, IT. It is very widely and extensively used. We're increasing its usage all the time.

View full review »
Anton Abrarov - PeerSpot reviewer
Project Leader at a mining and metals company with 10,001+ employees

We didn't have to scale too much. So, I can't evaluate it properly in terms of scalability.

In terms of its users, only our team was using it. There were approximately 20 users. It was not for the whole company.

View full review »
RV
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees

It is scalable. 

View full review »
AG
Assistant General Manager at DTDC Express Limited

In terms of data loading and processes, the scalability is good.

We have a team of four people who are using it for analytics.

View full review »
Ridwan Saeful Rohman - PeerSpot reviewer
Data Engineering Associate Manager at Zalora Group

I'm not sure that the product could keep up with the data growth. It can be useful for millions of data points. However, I haven't explored the option of billions of data points. I think there are better solutions that are on the market. It's also applied to the other drag-and-drop ETL tools too like SQL Server Integration Service, Informatica, etc. 

View full review »
RK
Senior Data Analyst at a tech services company with 51-200 employees

It's scalable.

View full review »
Aqeel UR Rehman - PeerSpot reviewer
BI Analyst at Vroozi

This is a good product for an enterprise-level company.

We use this solution for all of our data integration jobs. It handles the transformation. As our business grows and the demand for data integration increases, our usage of this tool will also increase.

Between versions, they have added a lot of plugins.

View full review »
Michel Philippenko - PeerSpot reviewer
Project Manager at a computer software company with 51-200 employees

I didn't scale the solution. I had to migrate from an old Pentaho to a new Pentaho. I had quite a big set of data, but I didn't add new data. I worked with the same volume of data all the time so I didn't test the scaling.

In the company I consulted for, there were about 15 people who input the data and worked with the technical part of Pentaho. There were a lot of end-users, who were the people interested in the reports; on the order of several thousand end-users. 

View full review »
RE
Data Architect at a consumer goods company with 1,001-5,000 employees

We are able to scale our environment. For example, if I had that many workloads, I could scale the tool to run on three instances, and all the workloads would be distributed equally.

View full review »
NA
Systems Analyst at a university with 5,001-10,000 employees

The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.

For maintenance, surprisingly, it's just me who does so in our organization.

View full review »
KM
Data Architect at a tech services company with 1,001-5,000 employees

If you work with relatively small data sets, it's all okay. But if you are going to use really huge data sets, then you might get into a bit of trouble, at least from what I have seen.

View full review »
ES
System Engineer at a tech services company with 11-50 employees

At the scale we are using it, the solution is sufficient. The scalability is good, but we don't have that big of a data set. We have a couple of billion data records involved in the integration. 

We have it in one location across different departments with an outside disaster recovery location. It's on a cluster of VMs and running on Linux. The backend data store is PostgreSQL.

Maybe our design wasn't quite optimal for reloading the billions of records every night, but that's probably not due to the product but to the migration. The migration should have been done in a bit of a different way.

View full review »
SK
Lead, Data and BI Architect at a financial services firm with 201-500 employees

We don't have a huge amount of data, so I can't really answer how we could scale up to very large solutions.

View full review »
DG
Director of Software Engineering at a healthcare company with 10,001+ employees

The only complaint that I have with Pentaho has been with scaling. As our data grew, we tested it with millions of records. When we started to implement it, we had clients that went from 80 million to 100 million. I think scale did present a problem with the clients. I know that Pentaho talks about being able to manage big data, which is much more data than what we have. I don't know if it was our architecture versus the product limitations, but we did have issues with scaling.

Our product doesn't deal with big data at large. There are probably 17 million records. With those 17 million records, it performs well when it has been internally cached within Pentaho. However, if you are loading the dataset or querying it for the first time, then it does take awhile. Once it has been cached in Pentaho, the subsequent queries are reasonably fast.

View full review »
TG
Analytics Team Leader at HealtheLink

Its scalability is very good. We use it with multiple, large databases. We've added to it over time and it scales.

We have about 10 users of the solution including a data quality manager, clinical analyst, healthcare informatics analysts, senior healthcare informatics analyst, and an analytics team leader. It's used very extensively by all of those job roles in their day-to-day work. When we add additional staff members, they routinely get access to and are trained on the solution.

View full review »
it_user164838 - PeerSpot reviewer
CEO with 51-200 employees

Actually, Pentaho Kettle comes equipped with the option to scale out, out of the box.
And no, we didn't encountered specific scalability problems.

View full review »
it_user373128 - PeerSpot reviewer
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees

We had no issues scaling it across the company as needed.

View full review »
it_user414117 - PeerSpot reviewer
Senior Data Engineer at a tech company with 501-1,000 employees

We have had no issues scaling it for our needs.

View full review »
it_user382572 - PeerSpot reviewer
Pentaho Consultant at a comms service provider with 10,001+ employees

There have been no issues scaling it.

View full review »
it_user376926 - PeerSpot reviewer
Data Developer at a tech services company with 10,001+ employees

There have been no issues scaling it.

View full review »
OM
IT-Services Manager & Solution Architect at Stratis

According to the documentation, it's quite scalable. That said, I haven't tried to expand it. We just use a single server and that's all we need right now. We don't have plans to increase usage.

We have three people who use the solution currently.

View full review »
it_user402600 - PeerSpot reviewer
Senior Consultant at a financial services firm with 10,001+ employees

The robustness of this solution in a production cluster (>30 nodes) remains to be seen.

View full review »
it_user396720 - PeerSpot reviewer
Graduate Teaching Assistant with 1,001-5,000 employees

We have had no issues scaling it.

View full review »
VD
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees

I am the only person using the solution currently. There are two other people that occasionally also assist in it. I'm helping them understand the tool and they are beginning to use it. In that sense, we're slowly scaling.

I don't know if the solution scales well on a large scale, however.

It scales very well, overall with the very useful feature to run n copies to Start attribute in every step, perhaps balancing with the side effect of consuming a lot of memory and CPU resources.

View full review »
it_user391695 - PeerSpot reviewer
Business Intelligence Consultant at Sanmargar Team

With the Pentaho Community version you just download it, unpack, and it should be running. If not you should also install Java. 

View full review »
it_user426030 - PeerSpot reviewer
Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees

We have not had any issues scaling it for our customers.

View full review »
Ricardo Díaz - PeerSpot reviewer
COO at a tech services company with 11-50 employees
it_user384984 - PeerSpot reviewer
Sr BI Administrator at a healthcare company with 1,001-5,000 employees

There have been no issues with scalability.

View full review »
it_user254223 - PeerSpot reviewer
Project Manager - Business Intelligence at www.datademy.es

I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse.

View full review »
it_user384993 - PeerSpot reviewer
Datawarehouse Administrator at a tech services company with 501-1,000 employees

There were issues, but they were solved with help from tech support.

View full review »
it_user415695 - PeerSpot reviewer
Project Lead at a tech services company with 10,001+ employees

We haven't had any issues with scalability.

View full review »
it_user426117 - PeerSpot reviewer
DWH Specialist at a healthcare company with 1,001-5,000 employees

We had no issues scaling it for our needs.

View full review »
it_user375219 - PeerSpot reviewer
Consultant at a tech vendor with 501-1,000 employees

We've had no issues with scalability.

View full review »
it_user369171 - PeerSpot reviewer
Brazil IT Coordinator at a transportation company with 1,001-5,000 employees

There have been no issues so far in scaling the product.

View full review »
it_user386202 - PeerSpot reviewer
Business Intelligence Supervisor at a manufacturing company with 501-1,000 employees

There have been no issues with the scalability.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
March 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,847 professionals have used our research since 2012.