We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages."
"Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs."
"The product has a valuable feature for data catalog."
"The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need."
"The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
"Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
"I like its integration and ability to handle all data-related tasks."
"The solution integrates well with other AWS products or services."
"Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good."
"The solution is very easy to use."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"The solution is stable."
"The product is easy to deploy."
"It is quite useful and powerful."
"In IBM DataStage, the Transformer is the most valuable feature for me. It enables me to apply complex transformations, generate the gateway key, and map source tables into the session table."
"The price of the solution could improve."
"Glue could perform better. It sometimes takes too long to test a Glue job. Google Cloud Platform offers more Python scripts than AWS."
"In terms of improvement, the performance of AWS Glue could be faster."
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
"The setup and installation is a bit complex without advanced knowledge or training."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"There should be more connectors for different databases."
"It fails to handle massive databases acquired from various sources."
"The solution should be more user-friendly."
"It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming."
"The setup is extremely difficult."
"The solution can be a bit more user-friendly, similar to Informatica."
"I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers."
"We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great."
"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
"In terms of intermediate storage, we have some challenges, especially with customers who store data in intermediate locations."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, Informatica Cloud Data Integration, SSIS and Matillion ETL, whereas IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Oracle GoldenGate. See our AWS Glue vs. IBM InfoSphere DataStage report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.