This is the first in a series of posts introducing SSIS by showing how you can get it to do things that you probably already know how to do in Excel.
- Getting started
- File Save As (a kind of Hello, World! program for SSIS)
- Filtering rows
- Gouping and calculating
- Joining streams of data
I won’t be covering how to test SSIS, as there is already another post on testing ETL code.
What?
What is SSIS?
SSIS is Microsoft’s ETL tool, which you can get for free via the free version of their SQL Server database (see below). SSIS stands for SQL Server Integration Services, because ETL tools are often used to integrate data from different sources.
What is ETL?
ETL stands for Extract Transform Load, which is another way of saying Read Process Write. An ETL tool lets you design a program by dragging boxes onto a canvas and joining them up with lines. When you run the program, some of the boxes will tell the computer to read input data from files, database tables etc, some will tell it to write output to files, database tables etc, and the rest will tell it how to convert inputs into outputs. The lines are like pipes, that direct the data between the operations represented by the different boxes.
What is Excel?
If you don’t know that, then I don’t think these posts will be all that helpful for you. There are plenty of books and online courses that should be able to help you.
Why?
Why learn SSIS, especially when it looks like Excel can do everything that SSIS can?
SSIS can cope with much bigger inputs than Excel can, so when Excel starts to creak SSIS could be the next thing to use. Also SSIS can do many things that Excel can’t do, such as looping, fairly easy error handling and fiddly logic, running stored procedures (if it’s talking to a database), or automating repetitive things via BIML.
So for simple and small tasks, SSIS is likely to be overkill and Excel is the better tool for the job. However, as your task gets bigger and / or more complex, eventually SSIS is the better tool for the job. Knowing how to use both is the best option, so you can always use the most appropriate tool.
In the other posts in the series, the examples I will use will be tiny. If I weren’t using them to teach SSIS then Excel would be the obvious choice of tool. I hope that you will have this in mind when we go through the examples.
Why use Microsoft Technology A (Excel) to teach Microsoft Technology B (SSIS)?
Many people are already familiar with Excel, and so the basic operations in the other articles will be familiar if people can think of them in terms of the Excel operation they already know. If I just jumped in and described SSIS’ features it might be more confusing and harder to make the connection to Excel.
I will deliberately avoid Microsoft Technology C (SQL Server database). While SSIS and a database form a natural pair, a database will bring with it unavoidable complications such as transactions. Not everyone is familiar with databases, and ETL tools like SSIS can be useful even if they never go near a database. By excluding databases, I’m able to concentrate on the essentials of SSIS.
How?
How do I get started?
There is a free version and a paid-for version. I will assume you want the free version.
Even though I won’t go into database stuff at all in these posts, you have to download the SQL Server installation package to get SSIS. The good news is that you can choose which bits of the package you want to install, so I hope that you won’t end up installing more than you need to.
If you install SQL Server then you will get extra things that you might find useful for SSIS, but you can easily cope without them if you don’t want SQL Server. SQL Server comes with a scheduler called SQL Agent which can start your SSIS tasks, but you can start them manually or use Windows Scheduler. Also, SSIS tasks can physically live in the database (like a stored procedure) but they can also live in the file system as XML files.
Microsoft has instructions for installing SSIS, which should leave you with the SSIS run-time and a visual editor. The visual editor is a plug-in to Visual Studio, called SSDT (SQL Server Data Tools). There is a free version of Visual Studio (Developer or Community edition) which supports SSDT.
Conclusion
I hope that after following the instructions you will have SSIS installed, but you probably don’t know how to do anything with it. That is something I will tackle in the next post, where I show you how to do something like Hello, World! in SSIS. It will copy all of the contents of one file into another file, as if you had done File > Save As in Excel.