Pentaho Data Integration and Analytics Scalability
DP
Dan Peacock
Enterprise Data Architect at a manufacturing company with 201-500 employees
Scaling out our processes hasn't been a big deal. We're a relatively small shop with only a couple of production databases. We're more of a regional enterprise, and I haven't had any issues with performance yet. It's always been some other product or solution that has gotten in the way. Lumada can handle anything we throw at it. Every night I run reports on our part ledger. That includes 200 million records, and Lumada can chew through it in about an hour and a half.
I know we can extend processing into the Spark realm if we need to. We've thought about that but never really needed it. It's something we keep in our back pocket. Someone suggested trying it out, but it never really got off the ground because other more pressing needs came up. From what I've seen, it'll scale out to whatever I need it to do. Any limitations are in the backend rather than the software. I've done some metrics on it. It's the database that I have to wait on more than the software. It's not doing a whole lot CPU-wise. My limitations are elsewhere, usually.
Right now, we have about 100 users working with Lumada. About 100 people log in to the system, but probably 200 people get reports from it. Only about 50 use the analysis tools, including the top sales managers and all of the buying group. There are also some analysts from various groups who use it constantly.
View full review »I rate Pentaho six out of 10 for scalability. The scalability depends on how you deploy it. In our case, the on-premise virtual machine is relatively small and doesn't have a lot of resources. That is why Pentaho does not handle big datasets well in our case.
I'm also unsure if we can deploy Pentaho in the cloud. So when you're not dealing with the cloud, scalability is always limited. We cannot indefinitely pump resources into a virtual machine.
Currently, we have five or six active workflows running each night. Some of them are ingesting data from ADU. Others take data from AWS Redshift or on-premise Oracle. In terms of people, three other people on the data engineering team and I are actively using Pentaho.
I think it could scale, but only up to a point. I didn't test it on larger datasets. But after talking to people who have worked on larger datasets, they wouldn't recommend using it, but that is hearsay.
In my former company, there were about five people in the data engineering department who were using the solution in their roles as ETL data integration Specialists.
In that company, it's their go-to solution and I think it will work for everything that they need. When I was there, I tried opening pathways to different things, but there were so many feeds already on it, and it worked for what they need, and it's low-code and open source, so I think they'll stick with it. As they gain more clients they'll increase their usage of it.
View full review »Buyer's Guide
Pentaho Data Integration and Analytics
March 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,847 professionals have used our research since 2012.
PR
PhilipRobinson
Senior Engineer at a comms service provider with 501-1,000 employees
It meets our purposes. It does have horizontal scaling capability, but it is not something that we needed to use. We have lots of small-sized and medium-sized data sets. We don't have to deal with super large data sets. Where we do have some requirements for that, it works quite well. We can push some of that processing down onto our cloud provider. We've dealt with some of such issues by using S3, Athena, and Redshift. You can almost offload some of the big data processing to those platforms.
View full review »It seems highly scalable. I've used the product in other firms, and we've managed to work pretty coherently pushing our changes for code, revisions, and everything else to Git and things like that.
In terms of users, currently, in my firm, I'm the only user, but the intention is to push it globally for all of our users to be able to use it.
We would like to be able to support other teams and other departments within the organization. Currently, this is being used only for our credit risk team, but in general, within risk, we have many departments such as operational risk, enterprise risk, market risk, and credit risk. I'm bridging all of them right now. However, with other teams that have expressed an interest, it also will include our settlements team and potentially even our research team and FP&A.
View full review »VK
reviewer995501455
Solution Integration Consultant II at a tech vendor with 201-500 employees
Lumada is flexible to deploy in any environment, whether on-premises or the cloud, which is very important. When we are processing data in batches on certain days, e.g., at the end of the week or month, we might have more data and need more processing power or RAM. However, most times, there might be very minimal usage of that CPU power. In that way, the solution has helped us to dynamically scale up, then scale down when we see that we have more data that we need to process.
The scalability is another key advantage of this product versus some of the others in the market since we can tweak and modify a number of parameters. We are really impressed with the scalability.
We have close to 80 people who are using this product actively. Their roles go all the way from junior developers to support engineers. We also have people who have very little coding knowledge and are more into the management side of things utilizing this tool.
View full review »TJ
Tobias Johnson
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees
Its scalability is very good. We've been running it for a long time, and we've got dozens, if not hundreds, of jobs running a day.
We probably have 200 or 300 people using it across all areas of the business. We have people in production control, finance, and what we call materials management. We have people in manufacturing, procurement, and of course, IT. It is very widely and extensively used. We're increasing its usage all the time.
View full review »We didn't have to scale too much. So, I can't evaluate it properly in terms of scalability.
In terms of its users, only our team was using it. There were approximately 20 users. It was not for the whole company.
View full review »RV
Rodrigo Vazquez
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
It is scalable.
View full review »AG
ABDULGAFFAR
Assistant General Manager at DTDC Express Limited
In terms of data loading and processes, the scalability is good.
We have a team of four people who are using it for analytics.
View full review »I'm not sure that the product could keep up with the data growth. It can be useful for millions of data points. However, I haven't explored the option of billions of data points. I think there are better solutions that are on the market. It's also applied to the other drag-and-drop ETL tools too like SQL Server Integration Service, Informatica, etc.
View full review »RK
reviewer1872000
Senior Data Analyst at a tech services company with 51-200 employees
It's scalable.
View full review »This is a good product for an enterprise-level company.
We use this solution for all of our data integration jobs. It handles the transformation. As our business grows and the demand for data integration increases, our usage of this tool will also increase.
Between versions, they have added a lot of plugins.
View full review »I didn't scale the solution. I had to migrate from an old Pentaho to a new Pentaho. I had quite a big set of data, but I didn't add new data. I worked with the same volume of data all the time so I didn't test the scaling.
In the company I consulted for, there were about 15 people who input the data and worked with the technical part of Pentaho. There were a lot of end-users, who were the people interested in the reports; on the order of several thousand end-users.
RE
reviewer1855218
Data Architect at a consumer goods company with 1,001-5,000 employees
We are able to scale our environment. For example, if I had that many workloads, I could scale the tool to run on three instances, and all the workloads would be distributed equally.
View full review »NA
reviewer1751571
Systems Analyst at a university with 5,001-10,000 employees
The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.
For maintenance, surprisingly, it's just me who does so in our organization.
View full review »KM
Krisjanis Muskars
Data Architect at a tech services company with 1,001-5,000 employees
If you work with relatively small data sets, it's all okay. But if you are going to use really huge data sets, then you might get into a bit of trouble, at least from what I have seen.
ES
Eric Smets
System Engineer at a tech services company with 11-50 employees
At the scale we are using it, the solution is sufficient. The scalability is good, but we don't have that big of a data set. We have a couple of billion data records involved in the integration.
We have it in one location across different departments with an outside disaster recovery location. It's on a cluster of VMs and running on Linux. The backend data store is PostgreSQL.
Maybe our design wasn't quite optimal for reloading the billions of records every night, but that's probably not due to the product but to the migration. The migration should have been done in a bit of a different way.
View full review »SK
Stephen Knox
Lead, Data and BI Architect at a financial services firm with 201-500 employees
We don't have a huge amount of data, so I can't really answer how we could scale up to very large solutions.
View full review »DG
reviewer1772286
Director of Software Engineering at a healthcare company with 10,001+ employees
The only complaint that I have with Pentaho has been with scaling. As our data grew, we tested it with millions of records. When we started to implement it, we had clients that went from 80 million to 100 million. I think scale did present a problem with the clients. I know that Pentaho talks about being able to manage big data, which is much more data than what we have. I don't know if it was our architecture versus the product limitations, but we did have issues with scaling.
Our product doesn't deal with big data at large. There are probably 17 million records. With those 17 million records, it performs well when it has been internally cached within Pentaho. However, if you are loading the dataset or querying it for the first time, then it does take awhile. Once it has been cached in Pentaho, the subsequent queries are reasonably fast.
View full review »TG
Tracy Gettings
Analytics Team Leader at HealtheLink
Its scalability is very good. We use it with multiple, large databases. We've added to it over time and it scales.
We have about 10 users of the solution including a data quality manager, clinical analyst, healthcare informatics analysts, senior healthcare informatics analyst, and an analytics team leader. It's used very extensively by all of those job roles in their day-to-day work. When we add additional staff members, they routinely get access to and are trained on the solution.
View full review »Actually, Pentaho Kettle comes equipped with the option to scale out, out of the box.
And no, we didn't encountered specific scalability problems.
We had no issues scaling it across the company as needed.
View full review »We have had no issues scaling it for our needs.
View full review »There have been no issues scaling it.
View full review »There have been no issues scaling it.
View full review »OM
Oscar Mejia
IT-Services Manager & Solution Architect at Stratis
According to the documentation, it's quite scalable. That said, I haven't tried to expand it. We just use a single server and that's all we need right now. We don't have plans to increase usage.
We have three people who use the solution currently.
View full review »The robustness of this solution in a production cluster (>30 nodes) remains to be seen.
View full review »We have had no issues scaling it.
View full review »VD
reviewer1384743
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees
I am the only person using the solution currently. There are two other people that occasionally also assist in it. I'm helping them understand the tool and they are beginning to use it. In that sense, we're slowly scaling.
I don't know if the solution scales well on a large scale, however.
It scales very well, overall with the very useful feature to run n copies to Start attribute in every step, perhaps balancing with the side effect of consuming a lot of memory and CPU resources.
View full review »With the Pentaho Community version you just download it, unpack, and it should be running. If not you should also install Java.
View full review »We have not had any issues scaling it for our customers.
View full review »
None
View full review »
There have been no issues with scalability.
View full review »I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse.
View full review »There were issues, but they were solved with help from tech support.
View full review »We haven't had any issues with scalability.
View full review »We had no issues scaling it for our needs.
View full review »We've had no issues with scalability.
View full review »There have been no issues so far in scaling the product.
View full review »There have been no issues with the scalability.
View full review »Buyer's Guide
Pentaho Data Integration and Analytics
March 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,847 professionals have used our research since 2012.