We performed a comparison between AWS Glue and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
"AWS Glue is a stable and easy-to-use solution."
"Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs."
"It's fairly straightforward as a product; it's not very complicated."
"AWS Glue is fast and managed by AWS. Hence, you don't have to worry about capacity and the performance of Glue jobs. It has integrations with other data stores of AWS. The product offers metadata management, logging, and ETL processing capabilities. It comes with a powerful feature, Glue Studio, which helps to do queries interactively within the community. It is a managed service and very secure. Another popular and mature service is S3."
"One of the best features of the solution is its ability to easily integrate with other AWS services."
"AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software."
"The solution is stable and reliable."
"The abstraction is quite good."
"It has improved our data integration capabilities."
"The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
"The amount of data that it loads and processes is good."
"We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"Provides a good open source option."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
"There should be more connectors for different databases."
"Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background."
"AWS Glue would be improved by making it easier to switch from single to multi-cloud."
"While working on AWS Glue, I could not find any training material for it."
"The setup and installation is a bit complex without advanced knowledge or training."
"The solution’s stability could be improved."
"One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools."
"The monitoring is not that good."
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
"It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively."
"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."
"The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products."
"I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."
"Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
"I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. AWS Glue is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and SAP Data Services. See our AWS Glue vs. Pentaho Data Integration and Analytics report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.