AWS Glue vs IBM InfoSphere DataStage Comparison 2024

AWS Glue

IBM InfoSphere DataStage

AWS Glue

Read 37 AWS Glue reviews

11,729 views|8,292 comparisons

IBM InfoSphere DataStage

Read 37 IBM InfoSphere DataStage reviews

10,952 views|9,105 comparisons

Comparison Buyer's Guide

Download the complete report

Buyer's Guide

AWS Glue vs. IBM InfoSphere DataStage

March 2024

Executive Summary

Updated on Sep 5, 2022

We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.

Ease of Deployment: For the most part, users of both solutions feel they are easy and straightforward to deploy.

Features: AWS Glue can easily sync data from the source to the solution phase and provides excellent intuitive automation. Users like that it is very robust and flexible, and that they can write their own queries to achieve the desired transformations quickly. They find that it is not very user friendly and that it only works with other AWS tools and solutions.

IBM Infosphere DataStorage is robust and can handle huge amounts of data with ease. The solution is very user-friendly, providing drag-and-drop features with a large number of capabilities. Users feel the solution is lacking virtualization features and is a bit dated. They feel there needs to be more focus on cloud technologies to be more competitive in the marketplace.

Pricing: AWS Glue users tell us the solution is affordable and offers a pay-as-you-use option. Users feel IBM Infosphere Data Storage is an expensive solution.

Service and Support: Overall, users are satisfied with the service and support of both solutions.

Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.

To learn more, read our detailed AWS Glue vs. IBM InfoSphere DataStage Report (Updated: March 2024).

Download the complete report

772,649 professionals have used our research since 2012.

Featured Review

Sainagaraju Vaduka

Data solution architect at a pharma/biotech company

Excellent scalability, with valuable features, and profitable return on investment

We have a large set of data and we are doing some transformations and identification. We are cleaning the data and transformations. Then we are... Read more →

Murali B

Data Engineer at Ernst & Young

Facilitated our peak data integration projects, offers good GUI and availability of connectors is strong

DataStage facilitated our peak data integration projects. For example, big data integrations have happened, particularly when we worked with... Read more →

Quotes From Members

We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:

Pros

"I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages.""Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs.""The product has a valuable feature for data catalog.""The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need.""The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users.""Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process.""I like its integration and ability to handle all data-related tasks.""The solution integrates well with other AWS products or services."

More AWS Glue Pros →

"Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good.""The solution is very easy to use.""The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities.""The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms""The solution is stable.""The product is easy to deploy.""It is quite useful and powerful.""In IBM DataStage, the Transformer is the most valuable feature for me. It enables me to apply complex transformations, generate the gateway key, and map source tables into the session table."

More IBM InfoSphere DataStage Pros →

Cons

"The price of the solution could improve.""Glue could perform better. It sometimes takes too long to test a Glue job. Google Cloud Platform offers more Python scripts than AWS.""In terms of improvement, the performance of AWS Glue could be faster.""The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.""The setup and installation is a bit complex without advanced knowledge or training.""It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.""There should be more connectors for different databases.""It fails to handle massive databases acquired from various sources."

More AWS Glue Cons →

"The solution should be more user-friendly.""It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming.""The setup is extremely difficult.""The solution can be a bit more user-friendly, similar to Informatica.""I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers.""We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great.""Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.""In terms of intermediate storage, we have some challenges, especially with customers who store data in intermediate locations."

More IBM InfoSphere DataStage Cons →

Pricing and Cost Advice

"The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes."

"It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."

"Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients. In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend."

"Technical support is a paid service, and which subscription you have is dependent on that. You must pay one of them, and it ranges from $15,000 to $25,000 per year."

"This solution is affordable and there is an option to pay for the solution based on your usage."

"AWS Glue is quite costly, especially for small organizations."

"AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage."

"The overall cost of AWS Glue could be better. It cost approximately $1,000 a month. There is paid support available from AWS Glue."

More AWS Glue Pricing and Cost Advice →

"High-cost of ownership: They could take a page from open source software."

"Pricing varies based on use, and it is not as costly as some competing enterprise solutions."

"Small and medium-sized companies cannot afford to pay for this solution."

"The cost is too high."

"It's very expensive."

"Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."

"The price is expensive but there are no licensing fees."

"It is quite expensive."

More IBM InfoSphere DataStage Pricing and Cost Advice →

See Which Vendors Are Best For You

Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.

See Recommendations

772,649 professionals have used our research since 2012.

Questions from the Community

How do you select the right cloud ETL tool?

Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.

Read all 2 answers →

How does Talend Open Studio compare with AWS Glue?

Top Answer:We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in… more »

What are the most common use cases for AWS Glue?

Top Answer:AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or… more »

Read all 2 answers →

Would you upgrade to more premium versions of IBM InfoSphere DataStage?

Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »

Read all 2 answers →

Is IBM InfoSphere DataStage more difficult to use compared to other tools...

Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »

Read all 2 answers →

Do you rely on IBM Cloud Paks for your data? Have you utilized this produ...

Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »

Read all 2 answers →

Ranking

1st

out of 44 in Cloud Data Integration

Views

11,729

Comparisons

8,292

Reviews

Average Words per Review

419

Rating

7.8

7th

out of 101 in Data Integration

Views

10,952

Comparisons

9,105

Reviews

Average Words per Review

467

Rating

7.9

Comparisons

AWS Database Migration Service vs. AWS Glue

Compared 27% of the time.

Informatica PowerCenter vs. AWS Glue

Compared 9% of the time.

Informatica Cloud Data Integration vs. AWS Glue

Compared 7% of the time.

SSIS vs. AWS Glue

Compared 7% of the time.

Matillion ETL vs. AWS Glue

Compared 3% of the time.

More AWS Glue Competitors →

SSIS vs. IBM InfoSphere DataStage

Compared 11% of the time.

IBM Cloud Pak for Data vs. IBM InfoSphere DataStage

Compared 11% of the time.

Azure Data Factory vs. IBM InfoSphere DataStage

Compared 11% of the time.

Talend Open Studio vs. IBM InfoSphere DataStage

Compared 10% of the time.

Oracle GoldenGate vs. IBM InfoSphere DataStage

Compared 4% of the time.

More IBM InfoSphere DataStage Competitors →

Learn More

Amazon Web Services (AWS)

IBM

Overview

AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring.

AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates visual creation, running, and monitoring of extract, transform, and load (ETL) pipelines to load data into users' data lakes. This Amazon product seamlessly integrates with other native applications of the brand and allows users to search and query cataloged data using Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum.

The solution also utilizes application programming interface (API) operations to transform users' data, create runtime logs, store job logic, and create notifications for monitoring job runs. The console of AWS Glue connects all of these services into a managed application, facilitating the monitoring and operational processes. The solution also performs provisioning and management of the resources required to run users' workloads in order to minimize manual work time for organizations.

AWS Glue Features

AWS Glue groups its features into four categories - discover, prepare, integrate, and transform. Within those groups are the following features:

Automatic schema discovery: AWS Glue crawlers connect to the organization's source or target data source through a prioritized list of classifiers to determine the schema for users' data. This feature creates metadata in companies' AWS Glue Data Catalog.
Schemas for data stream management: The AWS Glue Schema Registry enables users to validate and control the evolution of streaming data through registered Apache Avro schemas for no additional charge.
Automatic scaling based on workload: This feature dynamically scales resources up and down based on workload. The feature controls job resources, removing them depending on how much the workload can be split up.
FindMatches: This feature is for machine learning-based data deduplication and cleansing, and works by finding records that are imperfect matches of each other to remove useless data copies.
Edit, debug, and test ETL code: This feature helps users who have chosen to interactively develop their ETL code by providing development endpoints for editing, debugging, and testing the code it generates for them.
AWS Glue DataBrew: An interactive, point-and-click visual interface for specialists to clean and normalize data without the need to write any code.
AWS Glue Interactive Sessions: This feature simplifies the development of data integration jobs by enabling data engineers to interactively prepare and explore data.
AWS Glue Studio Job Notebooks: This AWS Glue feature provides serverless notebooks with minimal setup, allowing developers to start working in a timely manner.
Complex ETL pipeline building: This feature allows the product to be invoked on a schedule, on demand, or based on an event, allowing users to start multiple jobs in parallel or specify dependencies to build complex ETL pipelines.
AWS Glue Studio: This AWS Glue feature allows users to visually transform data through a drag-and-drop interface. The product automatically generates the code for ETL processes for users' data.

AWS Glue Benefits

AWS Glue offers a wide range of benefits for its users. These benefits include:

Users of other AWS products can easily onboard with AWS Glue, as it is integrated across a wide range of the company's services.
The solution is serverless, which allows for a lower total cost of ownership.
AWS Glue offers more power for users, as it automates much of the effort in building, maintaining, and running ETL jobs.
The product allows customers to easily discover and search across all their AWS datasets through AWS Glue Data Catalog.
AWS Glue does not require additional payment for managing and enforcing schemas for data streams.
The solution facilitates the authority of scalable ETL jobs for beginners and non-coding experts through a drag-and-drop interface.

Reviews from Real Users

Mustapha A., a cloud data engineer at Jems Groupe, likes AWS Glue because it is a product that is great for serverless data transformations.

Liana I., CEO at Quark Technologies SRL, describes AWS Glue as a highly scalable, reliable, and beneficial pay-as-you-go pricing model.

IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.
Delivery of relevant and accurate data through direct connections to enterprise applications.
Reduction of development time and improvement of consistency through prebuilt functions.
Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

IBM InfoSphere DataStage can be deployed in various ways, including:

As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.
On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.
On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

IBM InfoSphere DataStage Features

The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.
Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.
Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.
Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.
Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.
IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.
IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.
Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.
Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

IBM InfoSphere DataStage Benefits

This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

Increased speed of workload execution due to better balancing and a parallel engine.
Reduction of data movement costs through integrations and seamless design of jobs.
Modernization of data integration by extending the capabilities of companies' data.
Delivery of reliable data through IBM Cloud Pak for Data.
Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.
Effective data manipulation allows data to be merged before being mapped and transformed.
Creating easier access of users to their data by providing visual maps of the process and the delivered data.

Reviews from Real Users

A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday

Dubai Statistics Center, Etisalat Egypt

Top Industries

REVIEWERS

Computer Software Company47%

Financial Services Firm18%

Pharma/Biotech Company12%

Consumer Goods Company6%

VISITORS READING REVIEWS

Financial Services Firm20%

Computer Software Company14%

Manufacturing Company8%

Insurance Company7%

REVIEWERS

Computer Software Company50%

Insurance Company14%

Transportation Company7%

Healthcare Company7%

VISITORS READING REVIEWS

Financial Services Firm26%

Manufacturing Company11%

Computer Software Company10%

Insurance Company8%

Company Size

REVIEWERS

Small Business29%

Midsize Enterprise13%

Large Enterprise58%

VISITORS READING REVIEWS

Small Business15%

Midsize Enterprise12%

Large Enterprise72%

REVIEWERS

Small Business45%

Midsize Enterprise6%

Large Enterprise49%

VISITORS READING REVIEWS

Small Business16%

Midsize Enterprise10%

Large Enterprise74%

Buyer's Guide

AWS Glue vs. IBM InfoSphere DataStage

March 2024

Free Report: AWS Glue vs. IBM InfoSphere DataStage

Find out what your peers are saying about AWS Glue vs. IBM InfoSphere DataStage and other solutions. Updated: March 2024.

DOWNLOAD NOW

772,649 professionals have used our research since 2012.

AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, Informatica Cloud Data Integration, SSIS and Matillion ETL, whereas IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Oracle GoldenGate. See our AWS Glue vs. IBM InfoSphere DataStage report.

See our list of best Cloud Data Integration vendors.

We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.

AWS Glue vs IBM InfoSphere DataStage comparison