We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"The most valuable feature for me is the visual interface of AWS Glue."
"It's fairly straightforward as a product; it's not very complicated."
"I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages."
"We have found it beneficial when moving data from one source to another."
"The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features."
"Its user interface is quite good. You just need to choose some options to create a job in AWS Glue. The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you."
"Our entire use case was very easily handled or solved using this solution."
"I like its integration and ability to handle all data-related tasks."
"The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage."
"ETL is the most valuable feature."
"The concept of integration is a valuable feature of the product."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"It works with multiple servers and offers high availability."
"Once you have Infosphere up and running properly, it is stable."
"The product is a stable and powerful data management solution that can run in parallel mode for enhanced speed."
"It fails to handle massive databases acquired from various sources."
"The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment."
"I have encountered challenges with multi-region support."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"The product is expensive for data streaming. This area needs improvement."
"Cost-wise, AWS Glue is expensive, so that's an area for improvement. The process for setting up the solution was also complex, which is another area for improvement."
"I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."
"In terms of improvement, the performance of AWS Glue could be faster."
"In terms of intermediate storage, we have some challenges, especially with customers who store data in intermediate locations."
"Working with some of the big data components is good, but I can see improvements are needed."
"The interface needs improvement."
"So, there are some features that are missing. If I compare DataStage to Talend, Talend allows you to write custom code in Java or use these tools in your applications as well if you are building a job application. But in DataStage, it does not allow you to write custom code for any component."
"What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag. Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources. The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well."
"The interface needs improvement. It is really too technical. That is the main problem."
"The setup is extremely difficult."
"It would be great if they can include some basic version of data quality checking features."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Matillion ETL, whereas IBM InfoSphere DataStage is most compared with IBM Cloud Pak for Data, SSIS, Azure Data Factory, Talend Open Studio and SnapLogic. See our AWS Glue vs. IBM InfoSphere DataStage report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.