Sql Server Integration Services (SSIS) 2016 Review
SSIS 2016 - the Good the Bad, and the Ugly
SSIS is a unique ETL tool for a number of reasons. Most shops already own it but may treat it as a one-off free utility rather than as an enterprise ETL tool even though it can compete with the best of ETL tools. Some of the lack of respect is due to immature beginnings in its former "DTS" days. And some of it is because it comes bundled with Sql Server and does not normally run (but can) on a separate ETL server.
It can't be good if its free right? Especially when competitors easily charge 6 figures for their tools. Don't we get what we pay for? My answer is, generally yes but this may be an exception worth noting.
SSIS has become extremely flexible, extensible and integrated with many other Microsoft tools and a multitude of add-ons - both paid and free. It's come a very long ways since its DTS days. It's incredibly cost effective, easy to learn the basics quickly (although like all ETL tools requires the traditional learning curve to get good at) and has an immense user base. There are also endless bits of quality shareware available that seamlessly plug-in and a wide variety of low priced vendor supported add-on products to fill in any out-of-the-box gaps (see my other review of MellissaData Data Quality MatchUp for SSIS). And if you can't find something you need and you know how to code C#.net you can extend the tool yourself. So unlike any other tool on the market, there's always a way to make something work with SSIS.
This comes from the fact that SSIS is merely Visual Studio code and comes with templates to add your own custom components. A large variety of pre-built shareware is available at the codeplex.com website and vendors such as MellisaData and Pragmatic Works provide sophisticated add-on components from advanced realtime monitoring to state-of-the-art data quality plug-ins. BIML shareware allows for automated ETL code generation based on XML templates. Some of the shareware available on Codeplex are very high value such as the MultipleHash component providing very sophisticated hashing to support CDC and SCD operations. If you don't like out of the box functionality (such as the SCD wizard which is largely worthless) then there's likely a worthwhile supported replacement by some vendor for a reasonable price. There's built in support for CDC from many database sources such as Sql Server and Oracle for no added charge which is unheard of among their competitors.
Adding plug-ins such as BiXpress, Task Factory and DocXpress from Pragmatic Works are relatively inexpensive tools that do some really amazing things. BiXpress is a MUST for providing real-time and historical monitoring of ETL including tracking package parameter and local variable value changes both in real-time and historical. I highly recommend MelissaData Matchup for SSIS - you'll never see any other data quality tool as easy to use as that one which seamlessly integrates with SSIS.
Many improvements are here such as ability to deploy a single package within a project (a feature lost when going from 2008R2 to the new project deployment model in 2012). Also new adapters such as Odata.
Unlike any other ETL tool - if you learn Visual Studio, you gain familiarity with an entire toolset. Its navigation and project organization is common to all other .net development. Along with Visual Studio you get all the source control plug-ins inherent with the tool such as TFS (aka. TFVC in VSO which is cloud based and free!), and Git.
2012 Version introduced very flexible parameter capability superior to many other ETL tools. Project parameters integrate into Sql Agent to provide job-level ability to dynamically change runtime values such as source/target connections.
Either via shareware or purchased products. Github provides a huge amount of free shareware code - some of which is very high quality. Vendors such as KingswaySoft and PragmaticWorks and many others provide multitude of inexpensive add-ons from adapters to enhanced components to connect to just about everything. I use SSIS to connect to Microsoft Dynamics CRM, Oracle Netsuite, on-prem Oracle, hosted source using SOAP calls, and Azure SqlServer as well as flat file loads and on-prem SqlServers.
Built in reports show package execution and messages. Logging can also be customized so only what is needed is logged. There is also an excellent logging replacement called BiXpress that provides both historical and real-time monitoring which is more efficient and much more robust than the built-in logging capabilities. And none of this requires custom coding to make it useful unlike many other ETL tools.
Just google if you don't know the answer - from youtube to blogs there's an incredible amount of information out there about this tool. I suspect far more than any other ETL tool.
List of good features...
You have to write push down join & lookup SQL to the database yourself via joins in the data flow source to get very high performance. That said, this is a common complaint for nearly all ETL tools on the market and those that offer an alternative such as Informatica offer them at a very expensive add-on price and even then don't work for all join situations. (My best practice is complex joins go into views of the data lake/landing area tables anyways so the queries are easily audible but I know there are deferring opinions here.)
No surprise here as this is a common issue with ETL SCD wizards. Requires custom/shareware enhanced wizard or one from 3rd party vendor to get good performance. This begs the question why Microsoft can't come up with a better wizard since it's pretty straight forward to code a dynamic merge as a workaround and someone wrote a much better shareware version. Pragmatic Works also sells a much better and supported version in their toolkit along with many other tools.
Here's a list of constraints or potential gotchas...
Deployment instability with very large (in size of packages) projects
It is not unusual to get "out of memory" errors IF you use the default deployment wizard which is 32 bit. However there is a 64 bit version that eliminates this issue but you have to realize that using it isn't the default.
SSDT (the development tool) keeps grabbing more memory as you open new solutions so you have to exit at least once a day to free up memory. It is also 32 bit development environment which keeps the code small but also makes SSDT unstable if you open more than about 30 packages at a time (such as when you're applying framework code to a bunch of packages - you have to limit how many you do at a time).
Containers that help group tasks have several very annoying bugs. For example, sometimes if you attempt to resize the container it will make the diagram tool move about wildly and out of control. A task within a container sometimes becomes detached and you can't get it back into the container. Some containers just decide they aren't going to accept more tasks. The workarounds include: exit SSDT and come back in, and create a new container and move all the tasks (which may not be reasonable if you have lots of tasks with precedence sequence constraints). Another sometimes helpful workaround is using auto-layout which often resolves display problems.
Prior to 2012 not recommended!
Prior versions had many issues including debugging instability with large parent-child package call volumes and .com locking issues when running many parallel threads. It's largest drawback however was it was WAY overly complicated with its configuration XML file method of passing data between packages. That said, it was still superior to ETL tools that require passing parameters via just files (such as Informatica)! But these issues were resolved with 2012 when they introduced project and package parameters and they also improved memory management for parent/child package calls.