We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code."
"The key role for Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it."
"It's fairly straightforward as a product; it's not very complicated."
"The most valuable feature for me is the visual interface of AWS Glue."
"The most valuable feature of AWS Glue is scalability."
"I like that it's flexible, powerful, and allows you to write your own queries and scripts to get the needed transformations."
"We no longer had to worry much about infrastructure management because AWS Glue is serverless, and Amazon takes care of the underlying infrastructure."
"The solution's technical support is good. Whenever we raise a use case where we face an issue in our company, we get a response from the solution's technical team."
"When we have needed help from the IBM team, they were helpful. Our company is a premium partner so we get fast responses."
"Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good."
"We like the flexibility of modeling."
"Once you have Infosphere up and running properly, it is stable."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"It is quite useful and powerful."
"The solution is stable."
"Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job."
"AWS Glue would be improved by making it easier to switch from single to multi-cloud."
"The technical support for this solution could be improved. In future, we would like to connect more services like Athena or Kinesis to help control more loads of data."
"The solution's visual ETL tool is of no use for actual implementation."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"There should be more connectors for different databases."
"The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."
"I have encountered challenges with multi-region support."
"The mapping area and the use of the data catalog from Glue could be better."
"The response time from support is slow and needs to be improved."
"Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions."
"The interface needs improvement."
"There are three things that could improve - the cloud, monitoring and cloud integration. It's a solid product but not a modern one and of course it depends what you're looking for."
"The initial setup could be more straightforward."
"The interface needs work to be more user-friendly."
"We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great."
"The interface needs improvement. It is really too technical. That is the main problem."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and SnapLogic, whereas IBM InfoSphere DataStage is most compared with IBM Cloud Pak for Data, SSIS, Azure Data Factory, Talend Open Studio and SnapLogic. See our AWS Glue vs. IBM InfoSphere DataStage report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.