We performed a comparison between AWS Glue and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."AWS Glue is fast and managed by AWS. Hence, you don't have to worry about capacity and the performance of Glue jobs. It has integrations with other data stores of AWS. The product offers metadata management, logging, and ETL processing capabilities. It comes with a powerful feature, Glue Studio, which helps to do queries interactively within the community. It is a managed service and very secure. Another popular and mature service is S3."
"AWS Glue is a stable and easy-to-use solution."
"AWS Glue is quite better than other tools, but you have to learn it properly before you start using it."
"The product has a valuable feature for data catalog."
"AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code."
"AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software."
"I appreciate AWS Glue for its cost-effectiveness."
"The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need."
"This solution allows us to create pipelines using a minimal amount of custom coding."
"We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics."
"We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
"It's very simple compared to other products out there."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"AWS Glue would be improved by making it easier to switch from single to multi-cloud."
"There is a learning curve to this tool."
"The solution’s stability could be improved."
"We face performance issues when using AWS Glue for data transformation and integration."
"The price of the solution could improve."
"The mapping area and the use of the data catalog from Glue could be better."
"The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
"I would like to see more improvements with AS400 DB2."
"In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
"If you develop it on MacBook, it'll be quite a hassle."
"One thing that I don't like, just a little, is the backward compatibility."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. AWS Glue is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and SAP Data Services. See our AWS Glue vs. Pentaho Data Integration and Analytics report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.