Melissa Data Quality Review
SSIS MatchUp Component is Amazing


Primary Use Case

We use this tool for generating a golden record of our customer data and generating our analytical golden id utilized for each customer.  Each column follows a set of survivorship rules across multiple source data records.

Improvements to My Organization

De-duplicates our customer data in a low-support and high performance process so that we able to reduce marketing costs and increase the quality of communication with customers.

It replaced a weekly primitive de-duplication (best record) home-grown matching process with daily a new daily sophisticated process that is supported by a vendor.

Its survivor-ship logic handles very complex column-level rules efficiently providing a first-time single version of truth for our customer data. It's inherent intelligence into name and address parsing provides a very accurate exact match with no false positives and no unexpected false negatives. We are continually impressed by its sophistication and ease of use. The tool does not requires a middle tier or specialized staff like every other tool on the market.

Valuable Features

The high value in this tool is its relatively low cost, ease of use, tight integration with SSIS, superior performance (compared to competitors), and attribute-level advanced survivor-ship logic.

This vendor offers a large variety of components from on-prem to cloud SaaS as well as hybrid of cloud and on-prem. This review is specific to the "MatchUp for SSIS" component.

For us, this tool had very high value due to the fact that we didn't have to become experts in some overly complicated DQ tool. And because it is fully integrated with our EDW ETL rather than having to originate and integrate an external application.

We are using it for daily 1) direct matching, 2) column-level survivor-ship and 3) mail house-holding. We started with B2C customers and later added B2B customers. The tool supports unique matching specific to organization names and individual names (as well as a variety of other specialized types of data values) and works well in both cases. For example it can pull out nicknames and match on those.

One of the business and operational benefits for us is feeding the end result to Adobe Campaign for marketing automation. But the primary output is simply creating and managing an analytical golden record for our customer data. This has provided a very effective, holistic, maintenance-free, and extremely cost effective solution for us.

The initial POC was up and running in just a few days with no training needed. The plug-in into our ETL tool was seamless and fully integrated into our existing processes. Most of our effort was due to the need to identify customer survivor-ship requirements and validation. Any needed adjustment changes could be done very quickly allowing us to focus on business requirements instead of implementing technology.

Room for Improvement

First a caution...the 2016 version is very buggy.  I think we were one of their first customers on 2016 so we ended up being the first ones to discover a few pretty significant bugs.  That said, after about 6 weeks working through the issues we found workarounds for all of them. And I believe the issues were made worse by running in a VM that was setup with Hot Swap CPU enabled which also caused issues in Sql Server. 

Work arounds included...

  • - Upgrading from SSIS 2012 to 2016 isn't fully supported. The only workarounds are either hacking the package XML or re-creating matching/survivorship logic.
  • - Running in 64 bit mode gave an "unhandled exception" window in the debugger and just hangs when invoked from a server. And it also randomly hangs in 32 bit mode. The root cause turned out to be alias names in the mappings that exceeded a certain size. Unfortunately the vendor didn't know about this issue instead of me having to find it.

More longer term annoyances...

- Licensing has been problematic. As soon as a license expires (which is specific to each server) the product stops functioning. And to update the license key you have to actually go into an SSIS dataflow and create an SSIS package on each server to update the license.   

- The tool needs to provide resizable forms/windows like all other SSIS windows. Vendor claims its an SSIS limitation however all SSIS components are resizable so that isn't true. This is just an annoyance but needless.

- The tool needs to provide for incremental matching using the MatchUp for SSIS tool (they provide this for other solutions such as standalone tool and MatchUp web service). We had to code our own incremental logic to work around this.

- Tool needs ability to sort mapped columns in the GUI when using advanced survivorship (only allowed when not using column-level survivorship).

- It should provide an option for a procedural language (such as C# or VB) for survivor-ship expressions rather than relying on SSIS expression language.

- It should provide a more sophisticated ability to concatenate groups of data fields into common blocks of data for advanced survivor-ship prioritization (we do most of this in SQL prior to feeding the data to the tool).

- It should provide the ability to only do survivor-ship with no matching (matching is currently required when running data through the tool).

- Tool should provide a component similar to BDD to enable the ability to split into multiple thread matches based on data partitions for matching and survivor-ship rather than requiring custom coding or doubling all the matching rules by using the BDD component (which is a standard SSIS DF component).

- Documentation needs to be provided that is specific to MatchUp for SSIS.  Most of their wiki pages were written for the web service API MatchUp Object rather than the SSIS component.

- They need to update their wiki site documentation as much of it is not kept current. Its also very very basic offering very little in terms of guidelines. For example, the tool is single-threaded so getting great performance requires running multiple parallel data flows or BDD in a data flow which you can figure out on your own but many SSIS practitioners aren't familiar with those techniques.

Use of Solution

Three to five years.

Stability Issues

No as long as you don't try to match on null last names or lots of duplicate (exact match) records or try to run it in the default 64 bit mode of SSIS (issue here is only with new versions).

Scalability Issues

We can run 9 million customer record exact matches in 10 minutes using 5 partitions/parallel dataflows. Survivorship takes another 50 minutes. This is on an 8 vproc VM. I'm sure you could run faster with dedicated hardware and running more parallel dataflows. The tool starts to exponentially slow down once you pass about 2 million customers in a single dataflow so its best to keep it at or under that number although mileage will vary depending on the complexity of your matching.

In fact this tool is magnitudes faster than the last matching tool I used and it wasn't a simple plug-in to an ETL tool. I recently heard of another matching tool that takes longer to match just a few thousand as this tool takes to run millions of customers.

Note:

We probably run higher volumes than most organizations. For B2B and daily matching you could probably process a delta in a matter of a few minutes with this tool. So below describes complexities for us that may not apply to your situation.

I suspect an essential ingredient when considering scalability is whether you're calling a web service for matching or just on-prem. Their SSIS component is only on-prem.

Combining survivor-ship and matching in the same data flow slows performance. We got much better performance by running them in two separate runs - the first for just matching and then another for just survivor-ship (re-using the previous grouping numbers in the first match) to make it perform to our requirements.

Customer Service and Technical Support

Customer Service:

Fairly typical vendor support. They are immediately attentive to problems and provide email notifications of software versions.

Technical Support:

They regularly release new versions and have greatly improved the product over the past few years.

However unless you can interact directly with the development team problems may not get resolved in a timely manner. I have usually been left coming up with my own solution in the time I was waiting for their support to provide answers from their support team.

Previous Solutions

I have used Datamentors and SAS Dataflux in the past with good success although I would easily take this product over those products for just matching/survivorship purposes for many reasons. We had tested Oracle's Fusion product until it failed miserably. The MelissaData tool is light-years ahead of Datamentors, far easier to use than either of those tools and the price can't be compared. The Oracle tool (which is actually supposed to be full CDI product) actually failed just trying to do basic matching and didn't do attribute level survivor-ship and the cost was clearly setup to help Larry purchase a new boat.

My current shop previously custom coded simple matching and a match key to get them by with an inexpensive solution. This replaced that custom solution and does it in a far more sophisticated way than you could do on your own.

Initial Setup

Initial setup on the first install was VERY easy. Propagating the matching rules to the next server was easy IF you know which file to copy which isn't well documented. The tool is extremely easy to use when you know just a few little things which aren't documented. Their development staff were very helpful in providing simple tips on how to set it up.

Implementation Team

This was in-house implementation. The vendor was very responsive in answering questions.

Pricing, License Cost and Setup

This vendor has no equal in pricing for equivalent functionality. First no one else offers this level of integration with SSIS. Second other vendors with equal functionality all cost many times the cost of this tool. Third it doesn't require a separate server or large learning curve of new software. Fourth, this is one of the "go to" vendors for matching purposes as some master data and data quality tools are actually calling MelissaData Matchup object in the backend then charging you a lot for their pretty GUI to do this for you.

Other Solutions Considered

I evaluated Microsoft's DQS which could not scale over 100,000 customer records. DQS actually supported calling MelissaData Matchup in the old Microsoft Marketplace (no longer available) to use it's more sophisticated matching but it was a moot point if DQS can't handle the volume.  

Other Advice

This tool is a dream compared to my previous experience with matching/de-duplication tools. And the pricing is incredible given its functionality. High value and very lost cost. If you're an SSIS shop (they support other ETL tools also however) and you need to de-duplicate, household and/or do column-level survivorship then this tool can't be beat.

I highly advise running parallel threads by splitting your dataflow into multiple paths.  This allow parallel matching and increaes throuput significantly.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
1 visitor found this review helpful

5 Comments

Oscar LiVendor

Hey Gary,

I'm the Product Channel Manager for SSIS components here at Melissa Data. Thank you for your insightful review! I'll be take a closer look into improving our product based on your recommendations.

Here are a few notes/comments:

• Forms are not re-sizable.

This might be a SSIS restriction but I will check with our development team to confirm this.

• Providing for incremental matching for MatchUp for SSIS (they provide this for other solutions such as standalone tool and MatchUp web service).

SSIS is meant to be a batch tool so incremental (on-demand) doesn’t fit this model very well.
However, we’ll explore this.

• Provide ability to sort mapped columns when using advanced survivorship (only allowed when not using column-level survivorship).

We’ll definitely look into this feature and add to our requested feature list.

• Provide an option of a procedural language (such as C#) for survivorship expressions rather than relying on SSIS expression language.

This may be difficult as we are limited by what is exposed in SSIS. The increased complexity with writing or imbedding C# compiler and compiling on the fly may affect stability of the component.

• Provide ability to only do survivorship with no matching (matching is currently required when running data through the tool).

We’ll definitely look into this feature and add to our requested feature list.

• Tool is single-threaded - make it support multiple threads.

This is a possibility in the future. Currently, one core is utilized to run the matchup component and one core is utilized to run the object.

• Documentation that is specific to MatchUp for SSIS (most of it was written for MatchUp Object which is web service API they provide).

This is one area we will work on first. Matchup is a complicated tool and our documentation could definitely be improved.

Thanks,
Oscar

Like (2)04 January 16
GaryMReal UserTOP 5POPULAR

Oscar - much appreciated. I love your product and just want to see it become even better. I'm very impressed with the improvements over the past few years. I did brief POC a couple years prior before it did survivorship and liked what I saw even then. I'm impressed that your firm has kept the significant coming on a regular basis. Keep it up!

Like (0)06 February 16
GaryMReal UserTOP 5POPULAR

Update - the tool is now SqlServer/SSIS 2016 compatible!

Like (0)15 December 16
Thomas StoughReal User

I am asking for pricing for your Personator Suite. This relates to E-commerce. Small scale on a Shopify platform. Thank you

Like (0)09 June 18
GaryMReal UserTOP 5POPULAR

Thomas, you will have to contact the vendor. I think their list prices are on their web page so not sure why you are asking a technical forum.

Like (0)11 June 18
Guest
Why do you like it?

Sign Up with Email