What is our primary use case?
Talend has different modules. Talend has Talend Data integration (DI), Talend Data Quality (DQ), Talend MDM, and Talend Data Mapper (TDM). We have Talend DI, Talend DQ, and TDM. Our use cases span across these modules. We don't use Talend MDM because we have a different solution for MDM. Our EDF team is using an Informatica solution for that.
We have a platform that deals with MongoDB, Oracle, and SQL Server databases. We also have Teradata and Kafka. The first use case was to ensure that when the data traverses from one application to another, there is no data loss. This use case was more around data reconciliation, and it was also loosely tied to the data quality.
The second use case was related to data consistency. We wanted to make sure that the data is consistent across various applications. For example, we are a healthcare company. If I'm just validating the claim system, I need to see how do I inject the data into those systems without any issues.
The third use case was related to whether the data is matching the configurations. For example, in production, I want to see:
- If there is any data issue or duplicate data?
- Is the data coming from different states getting fed into the system and matching the configurations that have been set in our different engines, such as enrollment, billing, and all those things?
- Is it able to process this data with our configuration?
- Is it giving the right output?
The fourth use case was to see if I can virtually create data. For example, I want to test with some data that is not available in the current environment, or I'm trying to create some EDA files, which are 834 and 837 transaction files. These are the enrollment and claims processing files that come from different providers. If I want to test these files, do I have the right information within my systems, and who can give me that information.
The fifth use case was related to masking the information so that in your environment, people don't have access to certain data. For example, across the industry, people pull the data from production and then just push it into the lower environment and test, but because this is healthcare data, we have a lot of PHI and PII information. If you have your PHI and PII information in production and I am pulling that data, I have everything that is in production in the test environment. So, I know your address, and I know your residents. I can hack into your systems, and I can do anything. This is the main issue for us with HIPAA compliance. How do we mask that information so that in your environment, people don't have access to it?
These are different use cases on which we started our journey. Now, it is going more into the cloud, and we are using Talend to interact with various cloud environments in AWS. We are also interacting with Redshift and Snowflake by using Talend. So, it is expanding. We are using version 7.1, and we are migrating to version 7.3 very soon.
How has it helped my organization?
It is saving a lot of time. A person doesn't need to sit and create a file to test. Instead, there are automation processes that are like self-service, and with a few clicks, people are able to generate the data and process it to complete the testing. This gives more confidence in the quality of the deployment that happens in production. The outages have also reduced.
Overall, from 2017 to 2020, we have almost saved around 140,000 to 160,000 hours, which is only with respect to the data. I don't know how much we have saved because of masking. If masking is not there and compliance-related things come up, it could be $2 billion to $3 billion of expense that a company has to bear. Because masking is there, it gives more confidence. Not having the PHI and PII footprint in the lower environment has helped our organization.
What is most valuable?
It is saving a lot of time. Today, we can mask around a hundred million records in 10 minutes. Masking is one of the key pieces that is used heavily by the business and IT folks. Normally in the software development life cycle, before you project anything into the production environment, you have to test it in the test environment to make sure that when the data goes into production, it works, but these are all production files. For example, we acquired a new company or a new state for which we're going to do the entire back office, which is related to claims processing, payments, and member enrollment every year. If you get the production data and process it again, it becomes a compliance issue. Therefore, for any migrations that are happening, we have developed a new capability called pattern masking. This feature looks at those files, masks that information, and processes it through the system. With this, there is no PHI and PII element, and there is data integrity across different systems.
It has seamless integration with different databases. It has components using which you can easily integrate with different databases on the cloud or on-premise.
It is a drag and drop kind of tool. Instead of writing a lot of Java code or SQL queries, you can just drag and drop things. It is all very pictorial. It easily tells you where the job is failing. So, you can just go quickly and figure out why it is happening and then fix it.
What needs improvement?
They don't have any AI capabilities. Talend DQ is specifically for data quality, which only has data profiling. With Talend DQ, I cannot generate any reports today, so I need an ETL tool. It provides general Excel files, or I have to create some views. If instead of buying a new tool, Talend provides a reporting capability or solution, it would be great. It will reduce the development effort for creating these kinds of reports.
We also manage the infrastructure for Talend. From the licensing perspective, for cloud, they only have seat licenses where one person is tied to one license, but for on-premise, they have concurrent licenses. It would be really awesome if they can provide concurrent licenses for the cloud so that if one person is not there, somebody else can use that license. Currently, it is not possible unless a person deactivates his or her license and moves the same seat license to someone else. We are one of the biggest customers in the central zone of the US for Talend, and this is the feedback that we have provided them again and again, but they come back and say that they aren't able to provide concurrent licenses on the cloud.
In version 7.3, there is a feature for tokenization and de-tokenization of data. This is the feature that we are looking for. It is useful if somebody wants to see what we have masked and how do we demask it. This feature is not there in version 7.1. There are also a few other capabilities on the cloud, but we don't yet have a big footprint in the cloud.
For how long have I used the solution?
We have been using this solution since 2017. I was the person who brought this solution into this organization.
What do I think about the stability of the solution?
It is stable. I haven't seen any kind of outages for Talend DQ.
What do I think about the scalability of the solution?
Scalability depends on how many job servers you have. For example, if you have one job server and you are trying to process 2 million, 3 million, or 1 billion records, it might take more time. If you have more job servers so that you can run these jobs in parallel, your jobs will run faster. Networking also comes into play. For example, I am in California, and if I am trying to access something in North Carolina and process data, it could be slow. If my server is located in California, it would be pretty fast.
In terms of the number of users, DQ is specific to the data governance team that has five to seven people. For the Talend solution as a whole, we have around 150 people. It is a big solution, but its maintenance is not that big effort because you are not writing any code. If you know Talend and a little bit of Java, managing it should not be a that big effort.
How are customer service and technical support?
Sometimes, we have challenges because they don't understand the business, but we have to explain it to them. I can't expect them to understand everything about healthcare and then give me a solution. They provide services to a lot of different industries.
They have been pretty responsive. We are in the high-tier, and they have defined SLAs in terms of the turnaround time for any kind of issues. They have definitely been very helpful. In the past, when we were not in that particular tier, we had some challenges where it took a little bit of time in getting a response. Sometimes, they also sent some weird responses, and we had to go back and forth, but for a showstopper, their response has been pretty good.
What was our ROI?
We were able to save 140,000 to 160,000 hours based on the solutions and capabilities that we have built from 2017 to 2020. If I multiply it by $80, it would be somewhere around a billion dollars that we have already saved. If I take five licenses for three years, the savings would be $350,000. I don't know the ROI with respect to Informatica. Slowly, our EDF team is switching over from Informatica to Talend, and they say that it is pretty huge.
What's my experience with pricing, setup cost, and licensing?
It is cheaper than Informatica. Talend Data Quality costs somewhere between $10,000 to $12,000 per year for a seat license. It would cost around $20,000 per year for a concurrent license. It is the same for the whole big data solution, which comes with Talend DI, Talend DQ, and TDM.
What other advice do I have?
I would rate Talend Data Quality a nine out of ten.
Which version of this solution are you currently using?