We just raised a $30M Series A: Read our story

Compare AWS Glue vs. IBM InfoSphere DataStage

Cancel
You must select at least 2 products to compare!
AWS Glue Logo
9,874 views|8,365 comparisons
IBM InfoSphere DataStage Logo
15,406 views|12,676 comparisons
Featured Review
Find out what your peers are saying about MuleSoft, Informatica, Denodo and others in Cloud Data Integration. Updated: November 2021.
552,305 professionals have used our research since 2012.
Quotes From Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:

Pros
"One of the best features of the solution is its ability to easily integrate with other AWS services.""Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process.""Its user interface is quite good. You just need to choose some options to create a job in AWS Glue. The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you.""The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features."

More AWS Glue Pros »

"Once you have Infosphere up and running properly, it is stable.""We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions.""Offers great flexibility.""The most valuable feature is the product's versatility to inject data.""The Hierarchical Data Stage is good.""It's a robust solution.""ETL is the most valuable feature.""As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."

More IBM InfoSphere DataStage Pros »

Cons
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.""Overall, I consider the technical support to be fine, although the response time could be faster in certain cases.""Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background.""The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."

More AWS Glue Cons »

"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.""Currently lacking virtualization ability.""The initial setup could be more straightforward.""The setup is extremely difficult.""The template mapping could be easier.""The interface needs improvement. It is really too technical. That is the main problem.""It would be useful to provide support for Python, AR, and Java.""There are three things that could improve - the cloud, monitoring and cloud integration. It's a solid product but not a modern one and of course it depends what you're looking for."

More IBM InfoSphere DataStage Cons »

Pricing and Cost Advice
"The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes.""Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients. In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend.""It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."

More AWS Glue Pricing and Cost Advice »

"It's quite expensive.""It's very expensive.""It is quite expensive.""The cost is too high.""The price is expensive but there are no licensing fees.""Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."

More IBM InfoSphere DataStage Pricing and Cost Advice »

report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
552,305 professionals have used our research since 2012.
Questions from the Community
Top Answer: AWS Glue and Azure Data factory for ELT best performance cloud services.
Top Answer: We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in… more »
Top Answer: The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features.
Top Answer: Comparable solutions will have common disadvantages, which is the total cost of the project. It's quite expensive.
Top Answer: From a practice point of view, solutions such as IBM InfoSphere DataStage and Oracle Data Integrator are losing ground, whereas open-source solutions are becoming increasingly powerful. For example… more »
Ranking
5th
Views
9,874
Comparisons
8,365
Reviews
4
Average Words per Review
546
Rating
7.8
6th
Views
15,406
Comparisons
12,676
Reviews
12
Average Words per Review
410
Rating
7.5
Comparisons
Learn More
Overview

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.

IBM InfoSphere DataStage integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.
Offer
Learn more about AWS Glue
Learn more about IBM InfoSphere DataStage
Sample Customers
bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
Dubai Statistics Center, Etisalat Egypt
Top Industries
VISITORS READING REVIEWS
Computer Software Company26%
Media Company15%
Comms Service Provider10%
Financial Services Firm8%
REVIEWERS
Computer Software Company63%
Aerospace/Defense Firm13%
Healthcare Company13%
Financial Services Firm13%
VISITORS READING REVIEWS
Computer Software Company28%
Comms Service Provider14%
Financial Services Firm12%
Insurance Company6%
Company Size
No Data Available
REVIEWERS
Small Business42%
Midsize Enterprise4%
Large Enterprise54%
Find out what your peers are saying about MuleSoft, Informatica, Denodo and others in Cloud Data Integration. Updated: November 2021.
552,305 professionals have used our research since 2012.

AWS Glue is ranked 5th in Cloud Data Integration with 4 reviews while IBM InfoSphere DataStage is ranked 6th in Data Integration Tools with 11 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.4. The top reviewer of AWS Glue writes "Improved our time to implement a new ETL process and has a good price and scalability, but only works with AWS". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data". AWS Glue is most compared with Talend Open Studio, AWS Database Migration Service, Informatica PowerCenter, SSIS and Informatica Enterprise Data Catalog, whereas IBM InfoSphere DataStage is most compared with SSIS, Talend Open Studio, Azure Data Factory, Informatica PowerCenter and IBM InfoSphere Information Server.

We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.