We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"The most valuable feature of AWS Glue is scalability."
"Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs."
"It is a stable and scalable solution."
"We have found it beneficial when moving data from one source to another."
"Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."
"The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
"AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software."
"Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
"IBM is stable and accurate to monitor. It's easy to understand to monitor the data lineage from source to target."
"As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"The concept of integration is a valuable feature of the product."
"Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job."
"The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need."
"The product is easy to deploy."
"I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement."
"I have encountered challenges with multi-region support."
"The solution’s stability could be improved."
"We face performance issues when using AWS Glue for data transformation and integration."
"Glue could perform better. It sometimes takes too long to test a Glue job. Google Cloud Platform offers more Python scripts than AWS."
"The mapping area and the use of the data catalog from Glue could be better."
"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
"We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great."
"The troubleshooting guide is very bad."
"The setup is extremely difficult."
"The solution can be a bit more user-friendly, similar to Informatica."
"In the future, I would like to see more integration with cloud technologies."
"There could be more customization options for the product."
"The template mapping could be easier."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, Informatica Cloud Data Integration, SSIS and Matillion ETL, whereas IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Oracle GoldenGate. See our AWS Glue vs. IBM InfoSphere DataStage report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.