Talend Data Quality Review

It reduces the QA effort immensely by handling most of the test scenarios in a reusable way

What is our primary use case?

Data Quality is used to automate the quality control check on the data loaded from batch jobs. This includes BCA for field level data quality and cross table checks for key column mismatches.

The data is in Redshift and the load volume is around 10 million records per batch load over more than 100 tables in a Data Vault model.

This is for a short three month project. I have used it from dev phase until QA. This reduces the QA effort immensely by handling most of the test scenarios in a reusable way.

How has it helped my organization?

This product speeds up the unit testing and QA for specific test scenarios. As a result, the development output quality can be evaluated and adjusted.

What is most valuable?

I like the components provided by Data Quality, such as:

  • Address standardization
  • Fuzzy match
  • Schema compliance check as they pack lot of code, which is required to perform these standard data operations. 
  • Doing the same by coding would be erroneous, take a lot of time, and provide output quality which is biased. 

Apart from specific components, I like idea of storing the results of Data Quality jobs in a DB and having the ability to run reports in the DB to show a dashboard of quality metrics.

What needs improvement?

  • The report generation and using the report in DI job steps could be improved. 
  • There are too many functions which could be streamlined. 
  • The report generated often has too many pages to go through, if not loaded into a DB.
  • There are more functions in a non-streamlined manner, which could be refined to arrive at a better off-the-shelf functions.

For how long have I used the solution?

Trial/evaluations only.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Sign Up with Email