IBM InfoSphere DataStage Review

Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data


What is our primary use case?

We use it for creating a pattern for data integration with our data vault. We have also used it for creating APIs.

What is most valuable?

As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. 

Its error logging mechanism is far simpler and easier to understand than other data integration tools.

The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables.

What needs improvement?

Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate.

In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.

For how long have I used the solution?

It was DataStage previously, and then it became InfoSphere. I have used DataStage for ten years and InfoSphere for one year.

What do I think about the stability of the solution?

It is quite stable. In the newer components of InfoSphere, you have a mapping tool called FastTrack and a metadata generator, which can have issues from time to time, but they get resolved.

What do I think about the scalability of the solution?

It is not that easy to scale on-premises. I have worked on the ones deployed on Windows or Unix, and scalability is often dependent on whether you can add more CPUs or boxes. On the cloud, it would have been easier to scale. However, the current version can only be deployed on Windows or Unix.

How are customer service and technical support?

I have not been in touch with them recently. Earlier, I was in touch with their technical support and had raised tickets because some weird errors, such as fantom error, were being logged in the error log, which made no sense. We used to get in touch with their support team to understand these.

Which solution did I use previously and why did I switch?

I have used Informatica and SAS CA. IBM InfoSphere has the highest cost of licensing as compared to others. It is not very widely used, and it is very difficult to find people who have this sort of knowledge. 

The newer version of Informatica is on the cloud and is much more user-friendly than InfoSphere because it provides profiling information in nice graphs and charts. It also provides a lot of templates. For example, if I want to build a whole dimensional kind of structure, Informatica has a template. I just need to use that template. So, the ease of use is far better in Informatica, and it has everything that InfoSphere has. The only thing is that Informatica comes in bundles. That's the reason sometimes organizations don't go for it. For example, the data integration is a separate section, and the data quality is a separate section. They have separate pricing.

How was the initial setup?

The initial setup is quite simple. It didn't take more than half an hour to set it up on my laptop.

What about the implementation team?

I implemented it myself. In terms of maintenance, a particular version might not require any maintenance. There could be bug fixes and minor versions going in for some versions.

What's my experience with pricing, setup cost, and licensing?

It is quite expensive.

What other advice do I have?

I would recommend this solution for large-scale implementation where you need a complex transformation and data integration to happen according to a structured format, either a data vault or a dimension model. It is suitable for big companies because of the cost. It is a very valuable platform for data in large volumes. For small volumes, you have other open-source tools that can do the same thing for you.

I am part of a consultancy, and I have deployed this product for companies. We have five to eight developers. Because InfoSphere is a licensed product, and its licenses cost a lot, there are not many InfoSphere developers.

I would rate IBM InfoSphere DataStage an eight out of ten.

Which deployment model are you using for this solution?

On-premises
**Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
More IBM InfoSphere DataStage reviews from users
...who work at a Computer Software Company
...who compared it with Informatica PowerCenter
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: June 2021.
511,521 professionals have used our research since 2012.
Add a Comment
ITCS user
Guest