Ease of use, level of integration with SSIS, performance, and attribute-level survivor-ship logic.
This vendor offers a large variety of components from on-prem to cloud SaaS as well as hybrid of cloud and on-prem. MatchUp for SSIS is the only component we currently utilize which is on-prem. The fact that we didn't have to become experts in a new tool and worry about how to integrate data quality processes into our ETL was huge for us and I think its level of integration with SSIS is likely unique in the industry.
We are using it for daily 1) direct matching and 2) column-level survivor-ship/golden record generation for millions of customer records and 3) mail house-holding. We started with B2C customers and later added B2B customers. The tool supports unique matching specific to organization names and individual names (as well as a variety of other specialized types of data values) and works well in both cases. For example it can pull out nicknames and match on those. One of the results for us is feeding the end result to Adobe Campaign for marketing automation in the cloud via SSIS extract to flat files. But the main output is an analytical golden record for our customer data for our EDW. This has provided a very effective, holistic, maintenance-free, and extremely cost effective solution.
Initial POC was up and running in just a few days with no training needed. The plug-in into our ETL tool was seamless and fully integrated into our existing processes. Most of our effort was on getting customer survivorship requirements and validation, not the technology itself and this took several months to refine. Any needed adjustment changes could be done very quickly allowing us to focus on business requirements instead of implementing technology.
Improvements to My Organization:
De-duplicates our customer data in an effective way so that we able to reduce marketing costs and increase the quality of communication with customers. Replaced weekly de-duplication with daily frequency. The tool can handle our B2C volume in a reasonable amount of time. Survivorship handles very complex column-level rules efficiently providing a first-time single version of truth for our customer data. It's inherent intelligence into name and address parsing provides a very accurate exact match with no false positives and no unexpected false negatives. We are continually impressed by its sophistication and ease of use. The tool does not requires a middle tier or specialized staff like every other tool on the market.
Room for Improvement:
- It needs to provide resizable forms/windows like all other SSIS windows.
- Licensing has been problematic where the vendor provided incorrect expiration in the licensing key causing production failure until they could send us a corrected file. Then it went down when the 1 year license had unexpectedly expired likely due to miscommunication on paying the yearly invoice. Bottom line is their license expiration process needs to be improved so it doesn't unexpectedly cause production matching going down.
- Provide for incremental matching using the MatchUp for SSIS tool (they provide this for other solutions such as standalone tool and MatchUp web service).
- Provide ability to sort mapped columns when using advanced survivorship (only allowed when not using column-level survivorship).
- Provide an option of a procedural language (such as C#) for survivorship expressions rather than relying on SSIS expression language.
- Provide more sophisticated ability to concatenate groups data fields into common blocks of data for advanced survivorship prioritization (we do most of this in SQL prior to feeding the data to the tool).
- Provide ability to only do survivorship with no matching (matching is currently required when running data through the tool).
- Tool is single-threaded - make it support multiple threads as it ties up a single CPU 100% rotating through different CPUs every few seconds.
- Documentation that is specific to MatchUp for SSIS (most of it was written for MatchUp Object which is web service API they provide that is similar but not exactly the same).
Use of Solution:
We started POC and kept refining the performance and survivorship requirements over a 6 month period before rolling it into production. It's worked flawlessly since going production 11 months ago.
I had the POC functioning in just a few days - the rest of the time was largely spent designing a method to detect deltas (based on shared last names to those customers who had changes) and providing the advanced calculations needed to support survivorship prioritization prior to the data flow using the tool (such as revenue history for customers, etc).
Only issues were some lack of clear documentation on 1) maintenance of matching rules across platforms, and 2) how to pass through columns as-is with no survivorship applied. Getting it working was a breeze once I figured it out but it took a few emails with the vendor who was very responsive.
No. Only seeming issue is that the tool spends time where all rows are loaded into the tool but none are coming out which appears as if it's hung when in fact it's working fine. You just have to be patient and wait but it would be nice to see progress.
First a caveat - we probably run higher volumes than most organizations. For B2B and daily matching you could probably process a delta in a matter of a few minutes with this tool. So below describes complexities for us that may not apply to your situation.
This version of the tool (SSIS plugin) does not directly support incremental loading. If you require processing just deltas, that requires some custom architecture to filter the input to just deltas based on your change detection. This only applies if you have large volumes of customers you want to process on frequent intervals. That said, it can process many millions of customers in a few hours or a delta of up to a million in about 20 minutes. So if you're a B2C organization then this is definitely scalable for that purpose. In fact this tool is magnitudes faster than the last matching tool I used and it wasn't a simple plug-in to an ETL tool. I recently heard of another matching tool that takes as long to match just a few thousand as this tool takes to run millions of customers.
Note: I suspect an essential ingredient when considering scalability is whether you're calling a web service for matching or just on-prem. This solution is only on-prem and so it is able to load all the records into memory in batch.
Single threading significantly limits additional scalability although it meets our daily processing needs for millions of customer records. It grabs one CPU and takes it to 100% and keeps it there. My hope is the vendor will improve this bit over time as they have so much of the tool over the past 5 years.
Attempts to combine completely differently matching methods causes a significant performance slow down on large volumes. This required us to run 2 different matches which is fine but just little inconvenient if you are both B2C and B2B like us. For example we match orgs and individuals separately and then re-combine them for survivorship.
Also, combining survivorship and matching in the same data flow did not scale well. I had to do 2 different runs - one for matching and then another for survivorship to make it perform to our requirements.
Good - they were responsive and provide emails when new versions are available but documentation on new versions and understanding version history is sparse. They even provided direct access to their developer but this turned out to not be needed.
Excellent, they were quick to respond (if within their office hours) and allowed me to contact one of their developers directly regarding more complex questions. They regularly release new versions and have greatly improved the product over the past few years.
I have used Datamentors Database as well as SAS Dataflux . We also tried Oracle's Fusion product until it failed miserably so we rejected it. Melissa data tool is light-years ahead of Datamentors, far easier to use than either of those tools and the price cannot be compared as the other tools are very expensive, esp Dataflux and Oracle. The Oracle tool (which is actually supposed to be full CDI product) actually failed just trying to do basic matching and didn't do attribute level survivorship and the cost was beyond prohibitive. My current shop previously custom coded simple matching and a match key to get them by with inexpensive solution. This replaces the custom solution, does it in a WAY more sophisticated way than you could do on your own, is faster than the custom solution and the custom solution was only matching, not survivorship so this is a big step forward for the organization. Consider also this runs on your ETL server whereas most DQ matching solutions require separate server.
Initial setup on the first install was VERY easy. Propagating the matching rules to the next server was easy IF you know which file to copy which isn't well documented. The tool is extremely easy to use when you know just a few little things which aren't documented.
This was in-house implementation. The vendor was very responsive in answering questions making in-hour implementation feasible.
ROI is TBD as this is replacing custom coded solution. ROI will be both in accuracy over current method (which is significant).
Cost and Licensing Advice:
This vendor has no equal in pricing for equivalent functionality. First no one else offers this level of integration with SSIS. Second other vendors with equal functionality all cost many times the cost of this tool. Third, this is one of the "go to" vendors for matching purposes as some master data and data quality tools are actually calling MelissaData Matchup object in the backend but charging you a lot for their pretty GUI to do this for you.
Other Solutions Considered:
Microsoft's DQS which could not scale over 100,000 customer records. DQS actually supports calling MelissaData Matchup to use it's more sophisticated matching but its a moot point if DQS can't handle the volume.
This tool is a dream compared to my previous experience with matching/de-duplication tools. And the pricing is incredible given its functionality. High value and lost cost. If you're an SSIS shop (they support other ETL tools also however) and you need to de-duplicate, household and/or do column-level survivorship then this tool can't be beat.
Disclosure: I am a real user, and this review is based on my own experience and opinions.