Talend Open Studio Review

Creates a job stream that connects to multiple data sources, but needs better installation configuration for other databases


How has it helped my organization?

By being able to cross-match records across multiple data sources and create a logical dataflow with options to place rejected records in a separate table, we are able to cleanse and create golden records in multiple categories. Rejected records, once identified, can be assessed for repair. This also means that we can identify how and where the rejected record occurred.

What is most valuable?

The Talend Studio connected to the Talend MDM (Master Data Management) is the most valuable feature. Talend Studio is used to create a job stream that connects to multiple data sources, matches, compares or creates a golden record for overall identification. It also has a good catalogue of objects that can be dragged and dropped for building models.

What needs improvement?

It needs better installation configuration for other databases. Although the installation allows you to select another database, this doesn't mean that all connection points in the application point to the database selected. You actually need to do a search through the entire install to locate the configuration settings and change them.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

In version 6.2 we did encounter issues with the job servers and specifically with ESB. Version 6.3 is better but large jobs can cause the MDM server to fall over, requiring a reboot.

We've built in some self-healing scripts to detect a loss of connectivity and force a restart of the services.

What do I think about the scalability of the solution?

Our Talend installation has been deployed onto Red Hat OpenStack, separating out MDM, TAC, DQ, and thee job servers. I made a point of determining data storage requirements for each server, and a memory ulimit setting to match the resource profile of the components. It was trial and error but it paid off by allowing the Talend system to process large jobs of 200-300 million records over a number of hours, rather than days.

How is customer service and technical support?

Support tends to be good for the usual types of issues, but once a problem gets more complex and deeply into the nuts and bolts of the product, support struggles.

Which solutions did we use previously?

Initially we used Pentaho, however, it was determined that this was not as feature rich as Talend.

How was the initial setup?

The initial setup out of the box is straightforward. However, it becomes more complex as you start to distribute the components and get forced down a path of connecting to one type of database for all the components. In my case, I had to deploy Talend using RedHat Ansible and use only a PostgreSQL database.

I needed to first install the software, search for all references to H2 or PostgreSQL, change the configuration files, and then do it all over again for the distributed installs; then translate this into Ansible scripts. So although it's not directly Talend that made this complex, the installation by Talend gives the option to install to PostgreSQL but doesn't use PostgreSQL for all database repositories.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing are fairly straightforward. It is reasonably priced and managed. It's a good solution overall.

Which other solutions did I evaluate?

Pentaho, and prior to that SAS MDM which was similar but it was harder to create models. We also ran a PoC for IBM Infosphere MDM, but the cost was considered unacceptable.

What other advice do I have?

Make sure you have someone with technical skills and patience to install in a distributed deployment. Learn the product well and build in your own log shipping with either Splunk or Elastic or Telegraf to ease your diagnostic pains.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email