We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"I like that it's flexible, powerful, and allows you to write your own queries and scripts to get the needed transformations."
"The solution integrates well with other AWS products or services."
"AWS Glue is fast and managed by AWS. Hence, you don't have to worry about capacity and the performance of Glue jobs. It has integrations with other data stores of AWS. The product offers metadata management, logging, and ETL processing capabilities. It comes with a powerful feature, Glue Studio, which helps to do queries interactively within the community. It is a managed service and very secure. Another popular and mature service is S3."
"The most valuable feature of AWS Glue is scalability."
"We have found it beneficial when moving data from one source to another."
"I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages."
"AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code."
"It is a stable and scalable solution."
"It works with multiple servers and offers high availability."
"I am impressed with the tool's ETL tracing."
"Offers great flexibility."
"The product is easy to deploy."
"Finding logs is very easy on the solution."
"Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good."
"Once you have Infosphere up and running properly, it is stable."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"Only people who can code, either in Java or Python, can use the product freely. Those who don't know Java or Python might find using AWS Glue difficult."
"The setup and installation is a bit complex without advanced knowledge or training."
"I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."
"The product has only a few built-in transformations."
"The price of the solution could improve."
"In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement."
"Glue could perform better. It sometimes takes too long to test a Glue job. Google Cloud Platform offers more Python scripts than AWS."
"In terms of improvement, the performance of AWS Glue could be faster."
"The interface needs improvement. It is really too technical. That is the main problem."
"The solution can be a bit more user-friendly, similar to Informatica."
"Improvements for DataStage could include better integration with modern data sources like cloud solutions and documents, along with enhancing its capability to handle non-structured data."
"The setup is extremely difficult."
"It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming."
"The initial setup could be more straightforward."
"Their web interface is good but the on-prem sites are outdated. The solution could also be improved if they could integrate the data pipeline scheduling part of their interface."
"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, Informatica Cloud Data Integration, SSIS and Matillion ETL, whereas IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Oracle GoldenGate. See our AWS Glue vs. IBM InfoSphere DataStage report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.