We just raised a $30M Series A: Read our story

IBM InfoSphere DataStage OverviewUNIXBusinessApplication

IBM InfoSphere DataStage is the #6 ranked solution in our list of top Data Integration Tools. It is most often compared to SSIS: IBM InfoSphere DataStage vs SSIS

What is IBM InfoSphere DataStage?
IBM InfoSphere DataStage integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.
IBM InfoSphere DataStage Buyer's Guide

Download the IBM InfoSphere DataStage Buyer's Guide including reviews and more. Updated: October 2021

IBM InfoSphere DataStage Customers
Dubai Statistics Center, Etisalat Egypt
IBM InfoSphere DataStage Video

Pricing Advice

What users are saying about IBM InfoSphere DataStage pricing:
  • "It is quite expensive."
  • "Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
  • "It's very expensive."
  • "It's quite expensive."
  • "The cost is too high."

IBM InfoSphere DataStage Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
RT
Data/Solution Architect at a computer software company with 51-200 employees
Real User
Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data

Pros and Cons

  • "As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
  • "Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."

What is our primary use case?

We use it for creating a pattern for data integration with our data vault. We have also used it for creating APIs.

What is most valuable?

As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. 

Its error logging mechanism is far simpler and easier to understand than other data integration tools.

The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables.

What needs improvement?

Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate.

In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.

For how long have I used the solution?

It was DataStage previously, and then it became InfoSphere. I have used DataStage for ten years and InfoSphere for one year.

What do I think about the stability of the solution?

It is quite stable. In the newer components of InfoSphere, you have a mapping tool called FastTrack and a metadata generator, which can have issues from time to time, but they get resolved.

What do I think about the scalability of the solution?

It is not that easy to scale on-premises. I have worked on the ones deployed on Windows or Unix, and scalability is often dependent on whether you can add more CPUs or boxes. On the cloud, it would have been easier to scale. However, the current version can only be deployed on Windows or Unix.

How are customer service and technical support?

I have not been in touch with them recently. Earlier, I was in touch with their technical support and had raised tickets because some weird errors, such as fantom error, were being logged in the error log, which made no sense. We used to get in touch with their support team to understand these.

Which solution did I use previously and why did I switch?

I have used Informatica and SAS CA. IBM InfoSphere has the highest cost of licensing as compared to others. It is not very widely used, and it is very difficult to find people who have this sort of knowledge. 

The newer version of Informatica is on the cloud and is much more user-friendly than InfoSphere because it provides profiling information in nice graphs and charts. It also provides a lot of templates. For example, if I want to build a whole dimensional kind of structure, Informatica has a template. I just need to use that template. So, the ease of use is far better in Informatica, and it has everything that InfoSphere has. The only thing is that Informatica comes in bundles. That's the reason sometimes organizations don't go for it. For example, the data integration is a separate section, and the data quality is a separate section. They have separate pricing.

How was the initial setup?

The initial setup is quite simple. It didn't take more than half an hour to set it up on my laptop.

What about the implementation team?

I implemented it myself. In terms of maintenance, a particular version might not require any maintenance. There could be bug fixes and minor versions going in for some versions.

What's my experience with pricing, setup cost, and licensing?

It is quite expensive.

What other advice do I have?

I would recommend this solution for large-scale implementation where you need a complex transformation and data integration to happen according to a structured format, either a data vault or a dimension model. It is suitable for big companies because of the cost. It is a very valuable platform for data in large volumes. For small volumes, you have other open-source tools that can do the same thing for you.

I am part of a consultancy, and I have deployed this product for companies. We have five to eight developers. Because InfoSphere is a licensed product, and its licenses cost a lot, there are not many InfoSphere developers.

I would rate IBM InfoSphere DataStage an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Flag as inappropriate
BB
DataStage at a healthcare company with 10,001+ employees
Real User
User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data

Pros and Cons

  • "We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions."
  • "It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved. It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies."

What is our primary use case?

We are supporting a healthcare domain vendor located in the US. We get data from various domains, such as health insurance. We have member data, provider data, and consumer data. We also have client-related stuff and broker-related commission data. 

We get the data from these domains, and after receiving it, we apply the transformation rules, such as joints. We also do the standardization of data by formatting and doing field validations, such as formatting the date field and doing data and time validations. We also do other normal transformations with some business logic. After applying all this, we send the data to the business.

What is most valuable?

We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions.

What needs improvement?

It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. 

Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved.

It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies.

For how long have I used the solution?

I have been using this solution for around 15 years.

What do I think about the scalability of the solution?

It is easy to scale. In my project, six or seven people are using this solution, but in my company, we have around 15 to 16 projects.

How are customer service and technical support?

We have an internal admin team for support. If they are not able to solve an issue, they raise a ticket with the IBM team. In the last ten years, we had to contact IBM only two to three times. Our internal team is able to handle most of the issues.

How was the initial setup?

Its initial setup has moderate complexity. It required some coordination with the vendor because their system also needs to be ready. We also get maintenance support from them.

What's my experience with pricing, setup cost, and licensing?

Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side.

For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction.

What other advice do I have?

DataStage is a good tool for the ETL platform, but it is not suitable for a huge volume of data. It works well for low to medium volume of data. I would advise others to do a feasibility study and evaluate available options in the market in terms of features and cost.

I would rate IBM InfoSphere DataStage a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
541,708 professionals have used our research since 2012.
PravasRay
Systems Integration Associate Director at a computer software company with 10,001+ employees
Vendor
Top 20
Helpful support, and the Hierarchical Data Stage is good

Pros and Cons

  • "The Hierarchical Data Stage is good."
  • "The interface needs improvement."

What is our primary use case?

We are a consulting company and we use this solution for our clients. We set up the data for them. We have various healthcare-related information from their vendor and business partners. They have integrated them and get data reports from it.

How has it helped my organization?

It improves how our client's organization functions.

What is most valuable?

We mainly use the designer and developer qualities. We use the basic features that we have.

They have many good features. The Hierarchical Data Stage is good.

What needs improvement?

The interface needs improvement. The interface in Informatica is easier than in DataStage.

The licensing can be improved. Many companies are moving away from DataStage because it is expensive.

The biggest issue that is unclear is how are they integrating into DevOps when they are binary files.

We would like to see DataStage integrated with DevOps so that a pipeline can be created for auto-deployment. Right now we are all doing it manually.

For how long have I used the solution?

I have been working with IBM InfoSphere DataStage for seven years.

We have the 11.3 version but have recently migrated to the 11.7 version.

What do I think about the stability of the solution?

It's a stable product, it's not new.

What do I think about the scalability of the solution?

It's very scalable. Our clients are medium-sized companies with a 1.5 billion turnover.

How are customer service and technical support?

We reached out to IBM because the file was not readable, and they resolved the issue.

Technical support is good. I have not found any issues with technical support. I would rate them an eight out of ten.

In some cases, they have a delay in giving suggestions for the configuration.

Which solution did I use previously and why did I switch?

Previously, in another company, I worked with Informatica. There are not a lot of differences but the interface is easier than it is in DataStage.

How was the initial setup?

I don't do the setup, but I think that they have many challenges.

Initially, we had challenges with the configuration. We were trying to use the comparison for Excel, and reading the Excel files from the source, but the files were not readable.

What's my experience with pricing, setup cost, and licensing?

It's very expensive.

Which other solutions did I evaluate?


What other advice do I have?

I am not a developer, I have a team within our company for that.

There is a cloud migration strategy going on, so they are thinking of moving to the cloud. They want a tool that is not heavy and suitable for their budget.

The recommendation for using this tool would depend on the requirements. 

I don't have anything bad to say about this product.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
KirillSlivchikov
Owner at 7Spring Consult
Real User
Top 20
Reliable, simple to install, and useful

Pros and Cons

  • "It is quite useful and powerful."
  • "It would be useful to provide support for Python, AR, and Java."

What is our primary use case?

I am a consultant. I provide product information for our clients.

What is most valuable?

IBM InfoSphere DataStage is a good product.

It is quite useful and powerful.

What needs improvement?

From a practice point of view, solutions such as IBM InfoSphere DataStage and Oracle Data Integrator are losing ground, whereas open-source solutions are becoming increasingly powerful.

For example, we are currently working hard on several examples, and in a few years, open-source solutions will take the lead in the market. It will be used by large enterprises. 

Clients are looking for open-source solutions more and more.

It would be useful to provide support for Python, R, and Java.

For how long have I used the solution?

I have more than 22 years of experience with many different products. 

It has been three to four years that we have been using IBM InfoSphere DataStage.

What do I think about the stability of the solution?

I have no issues with the stability of IBM InfoSphere DataStage.

How are customer service and support?

Clients are quite dependant on support from the vendor. For example, if you want to activate a new feature on the product, you must create a ticket. You have no information on when it will be implemented, and the vendor does not know because they have a stream of tickets that are completed by the priority given to the ticket.

Which solution did I use previously and why did I switch?

I am a consultant. I have different projects with different platforms. We are constantly going back and forth to different solutions for different projects.

I have had clients who have used Amazon Redshift.

Over the years, my clients have used many different products. For example, they use IBM Landscape and we use IBM InfoSphere.

How was the initial setup?

The initial setup was straightforward. We did not have issues.

What's my experience with pricing, setup cost, and licensing?

Comparable solutions will have common disadvantages, which is the total cost of the project.

It's quite expensive.

Which other solutions did I evaluate?

From time to time, I evaluate different products for my clients.

What other advice do I have?

We have had different projects with three of four clients. The average term per project has been nine months and one year.

If you are working with an open-source solution or another solution, you can implement some features by yourself. For example, in the case of Amazon, which has Amazon Lambda, you can easily write your code in Python or Java, and it will orchestrate it. You can create your features yourself easily and gives you more abilities to make your solution run quicker, eliminating the dependence from the vendor.

I would rate IBM InfoSphere DataStage an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PB
Senior Data Warehouse Developer at a computer software company with 5,001-10,000 employees
MSP
Top 20
Stable and scalable with a straightforward setup

Pros and Cons

  • "Finding logs is very easy on the solution."
  • "The template mapping could be easier."

What is our primary use case?

We primarily use the solution for the UTS tool as well a for billing and data. We use it to clean the data from different systems and for pulling in data.

What is most valuable?

The solution is good for bringing in data from third-party systems.

Various aspects of the solution are valuable, but it often depends on the use cases.

Finding logs is very easy on the solution.

Overall, compared to SAS or Informatica, the solution is much easier to navigate.

What needs improvement?

The mod options should be simplified. Some options on DataStage aren't working properly.

The solution needs to lower its price.

The template mapping could be easier.

The solution should allow for compression of data.

For how long have I used the solution?

I've been using the solution for about 12 years.

What do I think about the stability of the solution?

Typically, the solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. We have about six or seven clients using the solution currently.

How are customer service and technical support?

We've never been in direct contact with IBM's technical support. We use a third party, so if we have issues, we turn to them for troubleshooting.

Which solution did I use previously and why did I switch?

We previously used tools such as Informatica. We've also previously used SQL for billing.

How was the initial setup?

The initial setup is easy for the DataStage, but for billing and metadata, it gets more complicated. In our case, before installing the metadata, the proper documentation was not there, which complicated things a bit.

What about the implementation team?

We handled the implementation ourselves.

What's my experience with pricing, setup cost, and licensing?

The solution is quite expensive in comparison to similar solutions.

Which other solutions did I evaluate?

We did approach other vendors before ultimately choosing IBM.

What other advice do I have?

We use the on-premises deployment model.

If you are comparing the solution to Informatica, this solution is much simpler. In Informatica, for example, there might be two to three ways to find a log, but with DataStage, they make it much easier. However, compared to other vendors, IBM's licensing costs are more expensive.

I'd rate the solution eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Muharrem Iseri
Managing Partner at Caligo
Real User
Top 10
A good platform for integrating data and has good ETL features

Pros and Cons

  • "The ETL tools are probably the most valuable feature. It has an IBM tool, a friendly UI and it makes things more comfortable."
  • "Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions."

What is our primary use case?

My company is a consulting firm and I'm a managing partner. We consult for large size companies in the telecommunications, banking and insurance sectors. We partner with IBM. We're not using the latest version of DataStage but one of the more recent ones. I use the product for a migration project. It's a DB2 to Oracle migration project and the source database is mainframe. I need DataStage for that purpose.

What is most valuable?

I think the ETL tools are the most valuable features. It has an IBM tool, a friendly UI and it makes things more comfortable. A second good feature is that after you make some ETLs, migrations source to target, DataStage is capable of providing details and extracting data. The push down feature is also a valuable feature. 

What needs improvement?

The price would be the first thing I would want to change. Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions. I think it would also be helpful if the product was more adaptable to other platforms and vendors. I would also like to see an improvement in support. 

For how long have I used the solution?

I've been using the product for more than two years. 

What do I think about the stability of the solution?

The product is stable. 

What do I think about the scalability of the solution?

The product is scalable.

How are customer service and technical support?

We do sometimes have issues in terms of support with regard to IBM products in this country. It's difficult to get what we want and it could really be improved and made more efficient. 

How was the initial setup?

I recall that the setup was quite difficult. We had to call in some IBM people. It's possible that they have improved that aspect. I know there's a difference in installation depending on whether it's on-prem or cloud. It's possible that the cloud version is easier to install but I don't have experience with that. 

What other advice do I have?

I would rate InfoSphere an eight out of 10. 

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner.
MU
Enterprise and Information Architect at a tech consulting company with self employed
Real User
Top 20
It's a good and stable product but performance monitoring could be improved

Pros and Cons

  • "ETL is the most valuable feature."
  • "There are three things that could improve - the cloud, monitoring and cloud integration. It's a solid product but not a modern one and of course it depends what you're looking for."

What is our primary use case?

I'm an enterprise and information architect, and we're a customer of IBM. I personally don't use the product but assist companies in the market do for making the integration picture and design.

What is most valuable?

ETL is the most valuable feature, extract, transform and load. 

What needs improvement?

I think that performance monitoring could be improved. I know that my colleagues don't give good monitoring. I'm not sure if it's because of the product or because they don't do it normally, but performance monitoring is an issue. I also believe integration with the cloud is not so clear. It's typically a heavy system that people install on-premise. You can install it in the cloud, but it's not so straightforward. You don't find a lot of information unless you go to the IBM cloud. I think IBM is behind in cloud strategy, we would like to put it in the cloud, but there isn't much information about that. 

There are three things that could improve - the cloud, monitoring and cloud integration. It's a solid product but not a modern one and of course it depends what you're looking for. 

For how long have I used the solution?

I've been using this product for 15 years. 

What do I think about the scalability of the solution?

In the company I'm working with now you can see it's scalable.

How are customer service and technical support?

We've been using the product a long time and hardly need technical support, which is great. We have very few problems.

How was the initial setup?

I've never done an installation but I believe it's relatively complex, IBM usually is. 

What other advice do I have?

I think DataStage is a product that one should look at as a good candidate in this segment.

I would rate this product a seven out of 10.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
CO
Managing Director at a tech services company with 11-50 employees
Real User
Top 20
An extract, transform and load solution that is difficult to set up but stable once you do

Pros and Cons

  • "Once you have Infosphere up and running properly, it is stable."
  • "The setup is extremely difficult."
  • "The pricing should be lower."

What is our primary use case?

We are using this solution with enterprise customers for ETL (Extract, Transform and Load). The clients are just using it for retail processes.  

What is most valuable?

The ETL features are the most valuable to our clients.  

What needs improvement?

The product is pretty complex to set up. I think it is quite expensive. So, the set up could be simplified and the price could be brought in line.  

For how long have I used the solution?

I have been using IBM Infosphere for 20 years.  

What do I think about the stability of the solution?

Once you have Infosphere up and running properly, it is stable.  

What do I think about the scalability of the solution?

The product is definitely scalable.  

Which solution did I use previously and why did I switch?

We previously worked with DataStage Informatica, we worked with Talend, we worked with SSIS, we looked at a lot of things in the course of providing solutions for customers.  

We probably work with SSIS (SQL Server Integration Services) more often because it is cheaper and easier to get hold of. More clients run SSIS than most of the others, but then most clients do not have massive workloads that require something more robust.  

How was the initial setup?

It is quite a complex tool to set up properly in any environment. It requires someone who has experience with the product.  

What other advice do I have?

My advice for anyone considering IBM Infosphere Datastage is to use a decent consulting house to help you once you get around to committing to the product. Do not assume that you will be able to go at this alone unless you have an extremely talented staff.  

On a scale from one to ten (where one is the worst and ten is the best), I would rate this product as a seven-out-of-ten.  

Disclosure: My company has a business relationship with this vendor other than being a customer: partner