IBM InfoSphere DataStage Review

A powerful tool with parallel data streams


What is our primary use case?

It is in the environment of our client, who is a large Russian bank. They are in the top 20, as of August, and have the re-maintenance project of their data warehouse solution based on IBM technologies. They use IBM BWD, a banking data model, on Netezza and DataStages in ETL tools. It is a native case.

We are using the on-premise deployment model.

How has it helped my organization?

Our main goal of this project is to increase the efficiency of the usage of this solution and help the bank to get money from the data.

What is most valuable?

The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need.

It is a powerful tool with parallel data streams.

What needs improvement?

The previous project was based on Microsoft SQL. It moved huge amounts of data from different data sources and DataStage to a middle stage, then moved it to Netezza. This created a bottleneck in the solution. We are trying to streamline it and create ETL processes. These will take data exactly from the data sources and move them to Netezza without using of a middle database. The volume of data is quite detailed. We are talking about records in the tens to hundreds of millions. 

We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great.

We would be happy if the IBM could give us more tolerance for bad networks or VPN channels, as this happens from time to time.

It would be great if we could use more than one SQL operator in the Source DB connector stage. Currently, in the target DB connection stage, we can use several SQL operators, but in the Source DB connector stage we can use only one. It would be better if we could use several.

Data Vault is become more popular. It would be great if it appeared in the newest versions.

I would like them to have more database procedures.

For how long have I used the solution?

We began using it in September last year.

What do I think about the stability of the solution?

It is quite stable. I haven't seen any pop up errors. It works properly.

They fixed some bugs in version 11.5.02. It works well now.

What do I think about the scalability of the solution?

It is quite scalable. 

DataStage is okay, but the problem of scalability is with another component of the solution (Netezza). The main problem is with the client version of Netezza. IBM stops to support it, then they tell us that we need move to the next version of Netezza. However, the price is too high for the client and we need to look for another platform. 

The client thinks that Datastage can stay in place with another platform.

There are not more than five data analysts and administrators using DataStage because it works at night with ETL processes. Therefore, end users are not using it. The several people who maintain and administer it are the users. 

We have two data specialist who work with it. From the bank, there are about five people who use it.

How are customer service and technical support?

I haven't used the technical support.

If you previously used a different solution, which one did you use and why did you switch?

Our client previously used SSIS from Microsoft. They also used Oracle. However, they did not have a special solution for ETL. Ten years ago, they used another data warehouse solution which used XML files as a transport layer.

DataStage is a directly specialized ETL tool which has instruments built for the ETL process as a stream. It can visualize and can track the ETL process, integrating it with the data governance catalog along with other IBM instruments. Previous solutions, except for SSIS, were just a number of scripts which created a process like peer-to-peer. It wasn't a centralized ETL tool with centralized ETL governance.

How was the initial setup?

It was straightforward technically.

What about the implementation team?

Three to four years ago, they decided to start a new data warehouse project. They were working with another Ukrainian company, which engineered this solution. However, the solution hadn't made it to production because of some problems between the understanding of IT and business. They tried to move it to production several times. After that, they decided to do some technical audits for this solution. They& asked us to come and see the solution, then write the audit report, which we did. Then, they asked us what to do with these problems, and this is when we began to help them.

All the components were already in place. We changed it a bit, tweaked the ETL processes, and changed some structures in the data warehouse. This solved the current needs of their business. 

The deployment is continuous. We are working on this project currently. It should take another year. At the moment, we have some Agile processes, in which we are finding new business needs. We try to understand them, then deploy the current user story.

What was our ROI?

The main problem of this project is they are trying to move the old solution to production in order to begin getting return on investment.

What's my experience with pricing, setup cost, and licensing?

There were no problems with the licensing model for the bank.

What other advice do I have?

It is the best solution in the IBM environment. It uses IBM data models, such as data quality tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email