I know how it is. You may have started your ETL project using the SQL Server import / export wizard or you may have done a point integration of a couple of tables through data tools. You might even have built an entire solution from the ground up and been pretty sure that you thought of everything. You most likely have not. Data is a tricky thing. So tricky in fact that I over the years have built up an almost paranoid distrust against it. The only sure thing I can say is that it will change (both intentionally and unintentionally) over time and your meticulously crafted solution will fail. Best case scenario is that it simply will stop working. Worst case scenario is that this error / these errors have not caused a failure technically but have done faulty insert / update / delete operations against your data warehouse for months. This is not discovered until you have a very angry business manager on the line who has been doing erroneous reporting up the corporate chain for months. This is the most likely scenario. A good framework should have functionality for recording data lineage (what has changed) and the ability to gracefully handle technical errors. It won’t prevent these kinds of errors from happening but it will help you recover from them a lot faster. For inspiration read The Data Warehouse ETL Toolkit.
Integration Services is a powerful ETL tool that can handle almost any data integration challenge you throw at it. To achieve this it has to be very flexible. Like many of Microsoft’s products its very developer oriented. The issue with this is that there are as many ways of solving a problem as there are Business Intelligence consultants on a project. By implementing a SSIS framework (and sticking with it!) you ensure that the solution handles similar problems in similar ways. So when the lead developer gets hit by that bus you can put another consultant on the project who only needs to be trained on the framework to be productive. A framework will also boost productivity. The up-front effort of coding it, setting it up and forcing your team to use it is dwarfed by the benefits of templates, code reuse and shared functionality. Again, read The Data Warehouse ETL Toolkit for inspiration.
Come into the habit of testing your MDX and DAX calculations as soon as possible. Ideally this should happen as soon as you finish a calculation, scope statement, etc. Both MDX and DAX get complicated really fast and unless you are a Chris Webb you will loose track pretty quickly of dependencies and why numbers turn out as they do. Test your statements in isolation and the solution as a whole and verify that everything works correctly. Also these things can have a severe performance impact so remember to clear the analysis services cache and do before and after testing (even if you have cache warmer). Note that clearing the cache means different things to tabular and dimensional as outlined here.
Note that you need the enterprise version of SQL Server for most of this. If you have large data sets you should design your solution from the ground up to utilize partitioning. You will see dramatic performance benefits from aligning your partitions all the way from your SSIS process to your Analysis Services cubes / tabular models. Alignment means that if you partition your relational fact table by month and year, you should do the same for your analysis services measure group / tabular table. Your SSIS solution should also be partition-aware to maximize its throughput by exploiting your partitioning scheme.
I feel a bit sorry for the Excel provider. It knows that people seeing it will think “Obviously I can integrate Excel data with my SSIS solution, its a MS product and MS knows that much of our data is in Excel”. The problem is that Excel files are inherently unstructured. So for all but the simplest Excel workbooks the provider will struggle to figure out what data to read. Work around this by either exporting your Excel data to flat files or look at some third party providers.