Talend Data Quality Review

Visual jobs are easier to understand than a piece of Java code and improve collaboration between colleagues

What is our primary use case?

We’ve created an MDM-like system. The MDM hub is built on an Oracle Database. The system is retrieving data from different sources like files, a Microsoft SQL Server and Oracle DB. The data is being processed by our cleansing process. We’re using Talend DQ components, web services, and custom Java code to clean our data. Once the data is cleansed, we load it into the MDM hub where the records are matched and consolidated. The consolidated records are then written back to specific target sources.

How has it helped my organization?

It’s easy to monitor the processes. Every morning I’ll open the Talend Administration Center to check the status of the process. Within seconds I’m able to see which process ran successfully and which have failed and why they failed.

We’re also able to respond much more quickly to changes and demands from the business. We can create and change jobs quickly. When the business wants new data for a report, we can provide the data within hours.

The jobs are visual and this has improved collaboration between colleagues. It’s much easier to understand a visual job than a piece of Java code.

What is most valuable?

The numerous components provided by Talend. With these components you’re able to create jobs quickly and efficiently.

I also really like the fact that there are no out-of-the-box solutions regarding the development of jobs. Other vendors may have modules which cleanse your addresses. In Talend, you have the freedom to completely develop the process yourself. This can be tricky, but it also makes it fun.

What needs improvement?

When we upgraded to Version 6.4.1, we tried using a GIT repository instead of a SVN repository. After a few incidents where things disappeared and changes were not saved, we decided to go back to a SVN repository.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

Never had any real issues with the Talend server applications. The only issues we had were related to the limited resources of our development and test environments. Nothing a restart couldn’t fix.

If we encounter issues, it’s most likely when using the Talend Open Studio. The studio can be slow, get stuck, or crash. But again, it can be caused by the resources of your machine or your connection with the repository. If we encounter issues with the Studio we restart the Studio. In emergencies, we create and use a new workspace.

What do I think about the scalability of the solution?

All my projects have been relatively small. I have never needed to scale.

How are customer service and technical support?

My experiences with support have been quite good. I’ve never had issues which weren’t resolved, or where I had the feeling that they didn’t have the knowledge to help me. Depending on the location of the support engineer, it can be a bit difficult to communicate.

Which solution did I use previously and why did I switch?

I’ve never used another solution.

How was the initial setup?

Talend provides an installer which makes the installation straightforward.

If you want to tweak the installation you’ll need some knowledge of the different third-party applications like Tomcat, Elasticsearch, Kafka, etc. Some of the tweaks are documented in the Installation guide.

What's my experience with pricing, setup cost, and licensing?

I have never had to deal with pricing and licensing. But I would advise to first take a look and at the Open Studio edition. Figure out what you need and purchase the appropriate license.

Which other solutions did I evaluate?

My company had already partnered with Talend before I started. We’re also using Informatica and we’re looking into Human Inference.

What other advice do I have?

Keep your jobs small and simple, split large jobs into multiple smaller jobs. One of the major pitfalls is creating one huge job which does everything. This is detrimental for the performance of the job. It also makes it harder to read and understand the job, let alone debug the job.

Always use metadata and contexts groups. Deploying will be a lot easier.

Use the documentation possibilities in your jobs. Name the component, data flow, and sub jobs. This will increase the readability of the jobs.

I would give it an eight out of 10. In the last four years I have seen the product grow and improve but there is still more room for improvement.

**Disclosure: My company has a business relationship with this vendor other than being a customer: Talend Gold Partner.
More Talend Data Quality reviews from users
...who compared it with Oracle Data Quality
Add a Comment

author avatarGaryM
Top 5PopularReal User

Is this a plugin to Talend ETL or standalone? Why do you need to manually "pen the Talend Administration Center to check the status of the process" to check - does it not notify you (email, txt) if there's an error? What kind of errors do you have (we run similar processing using Melissadata DQ and its very rare there's an error so curious what would cause an error when doing matching/DQ cleansing)...

author avatarDries Nuyts

Hi GaryM

The Administration Center is a standalone solution which would ideally be installed on a server. It's some sort of web portal in which you can manage your Talend environment. More information on the Administration Center: https://www.youtube.com/watch?v=KljdggzHKQU

You can indeed configure the Administration center to send mails when processes fail. We don't use this option at this specific customer. We check the Administration Center and take appropriate actions when needed.

Most of our errors are caused by processes not related to the MDM. Our MDM is currently stable. When we do have MDM issues its most likely causes by FTP, network or database time outs.

Kind regards

Dries Nuyts