We just raised a $30M Series A: Read our story

IBM InfoSphere DataStage OverviewUNIXBusinessApplication

IBM InfoSphere DataStage is #6 ranked solution in top Data Integration Tools. IT Central Station users give IBM InfoSphere DataStage an average rating of 8 out of 10. IBM InfoSphere DataStage is most commonly compared to SSIS: IBM InfoSphere DataStage vs SSIS. The top industry researching this solution is Computer Software Company, accounting for 28% of all views.
What is IBM InfoSphere DataStage?
IBM InfoSphere DataStage integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.
IBM InfoSphere DataStage Buyer's Guide

Download the IBM InfoSphere DataStage Buyer's Guide including reviews and more. Updated: October 2021

IBM InfoSphere DataStage Customers
Dubai Statistics Center, Etisalat Egypt
IBM InfoSphere DataStage Video

Archived IBM InfoSphere DataStage Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
KirillSlivchikov
Owner at 7Spring Consult
Real User
Top 20
A powerful tool with parallel data streams

Pros and Cons

  • "The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need."
  • "We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great."

What is our primary use case?

It is in the environment of our client, who is a large Russian bank. They are in the top 20, as of August, and have the re-maintenance project of their data warehouse solution based on IBM technologies. They use IBM BWD, a banking data model, on Netezza and DataStages in ETL tools. It is a native case.

We are using the on-premise deployment model.

How has it helped my organization?

Our main goal of this project is to increase the efficiency of the usage of this solution and help the bank to get money from the data.

What is most valuable?

The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need.

It is a powerful tool with parallel data streams.

What needs improvement?

The previous project was based on Microsoft SQL. It moved huge amounts of data from different data sources and DataStage to a middle stage, then moved it to Netezza. This created a bottleneck in the solution. We are trying to streamline it and create ETL processes. These will take data exactly from the data sources and move them to Netezza without using of a middle database. The volume of data is quite detailed. We are talking about records in the tens to hundreds of millions. 

We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great.

We would be happy if the IBM could give us more tolerance for bad networks or VPN channels, as this happens from time to time.

It would be great if we could use more than one SQL operator in the Source DB connector stage. Currently, in the target DB connection stage, we can use several SQL operators, but in the Source DB connector stage we can use only one. It would be better if we could use several.

Data Vault is become more popular. It would be great if it appeared in the newest versions.

I would like them to have more database procedures.

For how long have I used the solution?

We began using it in September last year.

What do I think about the stability of the solution?

It is quite stable. I haven't seen any pop up errors. It works properly.

They fixed some bugs in version 11.5.02. It works well now.

What do I think about the scalability of the solution?

It is quite scalable. 

DataStage is okay, but the problem of scalability is with another component of the solution (Netezza). The main problem is with the client version of Netezza. IBM stops to support it, then they tell us that we need move to the next version of Netezza. However, the price is too high for the client and we need to look for another platform. 

The client thinks that Datastage can stay in place with another platform.

There are not more than five data analysts and administrators using DataStage because it works at night with ETL processes. Therefore, end users are not using it. The several people who maintain and administer it are the users. 

We have two data specialist who work with it. From the bank, there are about five people who use it.

How are customer service and technical support?

I haven't used the technical support.

Which solution did I use previously and why did I switch?

Our client previously used SSIS from Microsoft. They also used Oracle. However, they did not have a special solution for ETL. Ten years ago, they used another data warehouse solution which used XML files as a transport layer.

DataStage is a directly specialized ETL tool which has instruments built for the ETL process as a stream. It can visualize and can track the ETL process, integrating it with the data governance catalog along with other IBM instruments. Previous solutions, except for SSIS, were just a number of scripts which created a process like peer-to-peer. It wasn't a centralized ETL tool with centralized ETL governance.

How was the initial setup?

It was straightforward technically.

What about the implementation team?

Three to four years ago, they decided to start a new data warehouse project. They were working with another Ukrainian company, which engineered this solution. However, the solution hadn't made it to production because of some problems between the understanding of IT and business. They tried to move it to production several times. After that, they decided to do some technical audits for this solution. They& asked us to come and see the solution, then write the audit report, which we did. Then, they asked us what to do with these problems, and this is when we began to help them.

All the components were already in place. We changed it a bit, tweaked the ETL processes, and changed some structures in the data warehouse. This solved the current needs of their business. 

The deployment is continuous. We are working on this project currently. It should take another year. At the moment, we have some Agile processes, in which we are finding new business needs. We try to understand them, then deploy the current user story.

What was our ROI?

The main problem of this project is they are trying to move the old solution to production in order to begin getting return on investment.

What's my experience with pricing, setup cost, and licensing?

There were no problems with the licensing model for the bank.

What other advice do I have?

It is the best solution in the IBM environment. It uses IBM data models, such as data quality tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user953511
Technical Lead at a tech services company with 5,001-10,000 employees
Real User
A powerful solution for complex transactions with an extremely straightforward setup

Pros and Cons

  • "DataStage works better with Linux operating systems when the application services are hosted on Linux system equipment, but it's powerful on Windows too."
  • "I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers."

What is most valuable?

DataStage itself is a very powerful tool and you have a lot of transformations that you can do. In comparison to Informatica, you can run very complex transactions on it. It's a precise and powerful environment. And when you have ETLs and you all your data documented on the data manager and you have the rest of management from IBM itself on InfoSphere, it's very powerful, especially when you use the whole suite. It helps with end-user technologies and gives you better imaging. 

It's powerful in administration mode. It works well when you're using solutions like DataConnect, IBM CDC, etc.

DataStage works better with Linux operating systems when the application services are hosted on a Linux system equipment, but it's powerful on Windows too.

Also, no matter what language you use, you can always transform the data.

I'm not sure about the latest versions, however, as I'm on 11.3, which is about five years old.

What needs improvement?

I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT, teams, to have separate administrative access from the developers. 

The platform also needs more stability. It caches a lot. It crashes on the application servers that the host allows on the platform.

The solution needs better online tools for data, or for sourcing data on the internet. They have InfoSphere exchange but it's not as useful for DataStage. 

For how long have I used the solution?

I've been using the solution for three years.

What do I think about the stability of the solution?

The stability is related to the operating system your company uses. It crashes occasionally when you're hosting the application servers, but only in the servers on the operating system. With Linux, because Linux itself is very robust, it crashes less. For example, we have a telecom with 40 million users and using Linux it crashed maybe two times a year. However, when the solution was hosted on the Windows platform, it crashed two to three times a month. 

The crashes are related to memory and if it's not automated, you have to deal with it manually. On Windows, you have to release the cache memories manually, but on Linux, you don't have to do that, which is why you get less crashing. 

What do I think about the scalability of the solution?

The solution is very scalable, but at a certain point, it consumes a lot of resources. In order to scale, you need a lot of memory and a lot of people.

How are customer service and technical support?

The quality of technical support you can expect usually depends on the region. In Egypt, it was not great, but in Jordan, Dubai, and Kuwait, it was good. That was a couple of years ago. I'm not using the solution right now, so I can't say for sure if this is still the case.

How was the initial setup?

The solution has one of the most straightforward setups. It's even easier than Office.

What other advice do I have?

The last version I interacted with was 11.3 because the later versions were cloud-based and usually our customers didn't want to use the solution on the cloud.

In terms of advice, I would give to anyone trying to implement the solution is this: you to have accurate sizing. Clients always do the sizing wrong and they need more experience to get the sizing right. Setting up the environments takes sizing into account but it usually makes a lot of problems if the sizing is poor when it starts to operate. Then you have re-implement and it will require an increase in resources that will change your budget. 

I would rate this solution nine out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
543,424 professionals have used our research since 2012.
AM
CEO at DELOMID IT
Real User
A solution that is easy to use for designing and transferring data

Pros and Cons

  • "The most valuable feature is the ability to transfer information via notes."
  • "The documentation and in-application help for this solution need to be improved, especially for new features."

What is our primary use case?

We implement this solution for our customers. The majority of them are Enterprise companies.

What is most valuable?

This solution is very easy to use because you can design to compile and to run.

The most valuable feature is the ability to transfer information via notes.

What needs improvement?

The documentation and in-application help for this solution need to be improved, especially for new features. By comparison, in Talend, there is help available for all of the features.

One of my clients has a problem using this solution with MongoDB.

In the next release of this solution, I would like to see the ability to copy and paste schemas. It would be very good because as it is now, you have to save the schema to a repository and then re-load it. It can be done in Talend, but in DataStage, it is not as good.

For how long have I used the solution?

Eight years.

What do I think about the stability of the solution?

This is a stable solution. You have to be careful when you install a service pack because sometimes it causes problems. There may be a second service pack to solve problems that were introduced by the first one.

What do I think about the scalability of the solution?

Scaling this solution is not difficult. When you first install you chose what components you need. My clients are enterprise companies, with at least five hundred or a thousand employees.

How are customer service and technical support?

The are several technical support teams, and the quality of support depends on where the customer is situated. Normally, technical support answers quickly, but it can be improved.

How was the initial setup?

The initial setup is not easy. You have to be an expert to install DataStage.

Sometimes I get calls from clients who ask me to install this solution because their Unix administrator is not able to do it. You have to configure your OS, Database, Web server, and more. There are a lot of things to install.  

If you are not experienced then it is not possible to install.

What's my experience with pricing, setup cost, and licensing?

Small and medium-sized companies cannot afford to pay for this solution.

Which other solutions did I evaluate?

This solution is for larger companies. Smaller businesses use Talend.

What other advice do I have?

This is a good product, but there is room for improvement.

I would rate this solution an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Ursula Fladeruf
Owner at mip GmbH
Real User
Powerful, reliable and the ability to run it in parallel mode makes it very fast

Pros and Cons

  • "The product is a stable and powerful data management solution that can run in parallel mode for enhanced speed."
  • "The interface needs work to be more user-friendly."

What is our primary use case?

We are a consultancy company, so we are not using DataStage for our own purposes. We deploy it for our customers. We use it to supply data integration and data warehousing solutions based on the specific needs of clients.

How has it helped my organization?

The product has improved our organization by allowing us to provide the product to clients as a reliable data management solution.

What is most valuable?

The product is a very powerful data management tool and the ability to run it in the parallel mode makes it very, very fast. I would say that the ability to use parallel mode would be one of the most valuable features.

What needs improvement?

The features that could be better starts with the user interface. It has been getting better in the last releases and in the past few years, and I guess that they will continue to make progress on this front. But even with the improvements that they have made, it could be even better now, and really should be. I think it's a little bit difficult to use because of the interface. Being user-friendly is important for any product and they need to make this adjustment.

In addition to improvements in the base user interface, I would say it would be good to incorporate more interface options for cloud-based systems.

For how long have I used the solution?

The organization has been using this solution for about 10 years.

What do I think about the stability of the solution?

The product is very stable. The stability enhances the fact that it is powerful and fast, so it is a reliable solution with good performance.

What do I think about the scalability of the solution?

I feel that the product has excellent scalability. We currently have more than 15,000 users as clients worldwide and scalability has never presented as an issue. It's used in the U.S., in Asia, and in Europe, so it seems to perform in various markets satisfactorily.

It's an enterprise-enabled product, so with that designation, it really needs to be scalable to satisfy the needs of clients — and those needs change all the time. It has the ability to connect reliably to a lot of sources and this is a very, very important thing for people who are using it.

How are customer service and technical support?

We do have some experience with technical support and customer services. We have access to the IBM software hotline, which is normal for IBM.  They work with the FTS (Follow the Sun) support model which means they are available 24/7. The availability is good and the response is as well. I would say it is good support.

How was the initial setup?

The initial setup of the product is straightforward. The first deployment may be more complex in one case than another. It depends on the complexity of the processes an organization is building and what they need to consider in future planning. Normally you can install it without much trouble and have your first processes live within a few days.

After that, the project is ongoing and it becomes more complex as you build it out. Normally everything does not have to be in place from the beginning as the solution may be deployed to solve new or future issues. In other words it is not replacing something that is already functioning, it is providing something new. As you build out, you get more processes going and the setup becomes more complex. But the initial setup is quite simple.

What about the implementation team?

We do the deployment with our own team for the client, but the implementation can change depending on the client needs. When the client has specific things that need to be resolved for their situation or they want to install and to implement additional products to integrate with the base solution, that affects the rollout. It's especially true during the implementation stage. The implementation team can begin with as little as one person and it can end up as a team of five, six, or seven members depending on complexity and needs for rapid deployment. It also depends on whether the product is going to be used throughout the company. There are a lot of customers who deploy globally or selectively because they may have a strategy already in place for certain solutions that they may not want to change. For example, they may already have ETL processes being used without DataStage and it may not make sense to convert these processes.

What's my experience with pricing, setup cost, and licensing?

It is very difficult to say how much the product costs because there are variables depending on the configuration. Normally it's priced according to use, so the price can vary quite a lot. The more you use, the more you pay.

In comparison to other products, I would say it's not so expensive as Informatica, but it is intended to be an enterprise solution so it's not very cheap to deploy as products that are not enterprise solutions.

The products we offer are really very different in pricing compared to open-source products. With open-source you have only the maintenance cost. For the software products we use, you have to invest in the software and then the maintenance costs are in addition to that.

There are no other costs in addition to the standard licensing fee and the maintenance. With IBM, you typically pay for the licenses and the first 12 months of maintenance is included in that cost. Afterward, you pay for the maintenance year-to-year.

Which other solutions did I evaluate?

We currently also use PowerCenter from Informatica as a solution for some clients. It isn't really a previous solution or a solution we evaluated and discarded but it is one that we sometimes use instead of DataStage. It depends on the needs of our customers.

The decision on which product to pick has partly to do with what the client wants to do and what we believe is the better solution for them and their needs. We have some projects where we use PowerCenter simply because our customer wants to use it; we have other projects where we use DataStage because of some of these customers are already using DataStage or they prefer it because it is from IBM.

In a similar way, we sometimes use Microsoft Integration Services, which is a very small part of our business. But again it isn't so much that we evaluated the solution and dismissed it or switched from using it. These products are opportunities for us to choose between in order to provide the best solution for clients. We evaluate the options and choose the best fit for the project.

What other advice do I have?

I would rate this particular product as a nine out of ten. It is very powerful and very fast, but the problems with the interface make it less than perfect.

As far as other advice that I would have for other people considering this as a solution, the first and most important is to examine your needs and decide on the processes you want to build. From that, you can immediately have a better idea of the type of solution that might be best for you. Then it is a good idea to get the advice of a consultant — like us.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner.
TD
IT Administrator at a aerospace/defense firm with 10,001+ employees
Real User
A good data solution for ETL jobs that improves task performance

Pros and Cons

  • "The solution has improved the time it takes to perform tasks related to batch applications."
  • "The solution should be more user-friendly."

What is our primary use case?

We primarily use the solution for ETL jobs and movement of data.

How has it helped my organization?

The solution has improved the time it takes to perform tasks related to batch applications.

What is most valuable?

The parallel job scan has been a very valuable feature.

What needs improvement?

The solution should be more user-friendly.

For how long have I used the solution?

I've been using the solution for 10 years.

What do I think about the stability of the solution?

The solution is quite stable.

What do I think about the scalability of the solution?

I've never investigated the scalability, but I can say that it only scales on the machine. I can have three machines running in parallel. We have about 20 people using the solution, and they are mostly developers and admins. We mainly use the solution for APL jobs.

How are customer service and technical support?

On a scale of one to ten, I would give technical support and eight.

Which solution did I use previously and why did I switch?

We were mostly using ETLs on mainframe jobs.

How was the initial setup?

It's a straightforward process. In terms of deployment from the data stage, it takes about a week.

What about the implementation team?

We use an integrator to assist with implementation.

What other advice do I have?

The advice I would give to others is to make sure they define a framework for development and for management. This could be very useful for the future of the product in the company.

I would rate the entire solution eight out of ten. I really like DataStage. The product fits our requirements perfectly.

We are changing the product now, however, to a cloud-based approach for DataStage.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
RL
Partner at Avydium
User
Its parallel processing capability allows you to go through extremely large data sets in no time at all

Pros and Cons

  • "Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job."
  • "Working with some of the big data components is good, but I can see improvements are needed."

What is our primary use case?

Complex data integration projects which require integration from multiple data sources.

How has it helped my organization?

I have worked during many implementations using DataStage. All of the projects that I worked on have been successful. This is due mainly to the strict discipline around best practices, and by following a set of standards and templates designed to reduce complexity and improve automation, including strong reference architecture.

What is most valuable?

  • Its parallel processing capability allows you to go through extremely large data sets in no time at all, if you do your job right. 
  • Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job. 
  • High scalability: Start small and go big with the same job. You just need to adjust the configuration file, no need to recompile.
  • Strong metadata management: Business, technical, and process metadata can all be managed from a single place.
  • Ease of integration with other tool sets: Easily supports APIs (or build your own) to support data streaming (or batched) from other systems.
  • Data Quality Management from within the tool: Supporting data sampling, including profiling of data, directly from the development canvas.

What needs improvement?

High-cost of ownership: They could take a page from open source software, such as Talend.

Working with some of the big data components is good, but I can see improvements are needed, such as native support for Spark and HBase.

For how long have I used the solution?

More than five years.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

No issues.

How are customer service and technical support?

Support is always good.

Which solution did I use previously and why did I switch?

Have used quite a few ETL tools in my job.

  • Ab Initio: Even pricier, but has a highly competent ETL tool. It is complete, but hard to use. 
  • Informatica: Not as flexible and does not support the same level of complexity in its maps.
  • Talend: It is a good tool suite, extensive, but can be cumbersome to cite all its pieces.
  • ODI: For the Oracle centric world.
  • SSIS: Week when compared to any of the above tool sets.

How was the initial setup?

Depends on type of environment that is being installed. I have seen fairly simple to overly complex initial setups due to the environment, not due to the tool.

What about the implementation team?

Both vendor and in-house team implementations:

IBM has top-notch support and tool services along with other partners as well. Depending on the partner, this can go from installation and configuration to solution development, etc.)

Most in-house teams that I have seen tend to have have good developers, but not always good architects. Like most every data integration project, if you do not have a strong architecture, your solution will eventually fail.

What was our ROI?

Depends on the project.

Which other solutions did I evaluate?

Have done many ETL tool evaluations based on client requirements. DataStage has always been in the top-three. It may not have been selected due to different weights being used for different sections of the evaluation for different clients, but it has always been in the top-three consistently.

What other advice do I have?

If you have the budget and your solution requires industrial/enterprise strength data integration, this product is always a good choice.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Architect at a tech services company with 51-200 employees
Consultant
Top 20
It has unlimited database connectors and free-of-charge application connectors.

Valuable Features

  • Unlimited database connectors and free-of-charge application connectors - Siebel in our case
  • Lot of transformation components
  • High scalability

Improvements to My Organization

It's standardized our batch integration.

Room for Improvement

It needs a better scheduling mechanism.

Use of Solution

I've been using it as a customer for six years.

Deployment Issues

We had no issues with the deployment.

Stability Issues

We had issues with Java on AIX platform (version 8.7) - currently migrated to Linux platform without issues.

Scalability Issues

It's highly scalable.

Customer Service and Technical Support

We have close contact with support in their Polish lab.

Initial Setup

The initial setup is easy as it's done through a web installation process with an HA setup option.

Pricing, Setup Cost and Licensing

Check the Information Server bundle offering, especially with InfoSphere Information Server for Data Integration.

Other Solutions Considered

Our customer checked AbInitio and Informatica PowerCenter – DataStage was best as it could be delivered quickest.

Other Advice

You should also look at Redbooks and DeveloperWorks articles for knowledge gathering.

Disclosure: My company has a business relationship with this vendor other than being a customer: We are a BP of IBM.
it_user273756
Solutions Specialist at a tech services company with 501-1,000 employees
Consultant
It has valuable administrative features, particularly since I don't program in this environment, but avoid the Netezza adapter as it's poor.

Valuable Features

Anything that is administrative related, as I don't program in this environment.

Room for Improvement

The recovery feature. We had DS repos in a bad condition, but IBM couldn't recover it.

Use of Solution

I've been using it for one year.

Deployment Issues

No issues encountered.

Stability Issues

No issues encountered.

Scalability Issues

No issues encountered.

Customer Service and Technical Support

Customer Service: Mostly, I would give 9/10. I did have one bad experience, so that leaves a bad impression. Technical Support: It's generally good, although sometimes I see a lot of confusion about how to resolve issues.

Initial Setup

It was complex.

Implementation Team

It was setup by an outside vendor.

Other Solutions Considered

No…

Valuable Features

Anything that is administrative related, as I don't program in this environment.

Room for Improvement

The recovery feature. We had DS repos in a bad condition, but IBM couldn't recover it.

Use of Solution

I've been using it for one year.

Deployment Issues

No issues encountered.

Stability Issues

No issues encountered.

Scalability Issues

No issues encountered.

Customer Service and Technical Support

Customer Service:

Mostly, I would give 9/10. I did have one bad experience, so that leaves a bad impression.

Technical Support:

It's generally good, although sometimes I see a lot of confusion about how to resolve issues.

Initial Setup

It was complex.

Implementation Team

It was setup by an outside vendor.

Other Solutions Considered

No other options were evaluated.

Other Advice

Don't use the Netezza adapter, as it's poor

Disclosure: I am a real user, and this review is based on my own experience and opinions.