Automating the deployment of SSIS packages

Introduction

This post describes how I have automated the deployment of SSIS packages to different environments (test, staging, production etc.)  There are definitely other ways of doing it, and some of these are possibly better – I think that this is the oldest-school way possible.  I will describe our context which I hope will explain a little of the reasons why I chose the approach I did.

If you’re new to Octopus, you might be interested in a previous post that’s an introduction to Octopus.  Also, while you’re reading on, be aware that both SSIS and Octopus have projects and packages.  These are not the same thing!

For Octopus, a package is something like a zip file that is moved about.  It can contain whatever you like.  In SSIS, a package is a single file of code that has the suffix .dtsx.

An Octopus project says how a given package progresses through different environments (test -> staging -> production etc.), whereas an SSIS project is a Visual Studio thing that groups together related files e.g. several packages.

Requirements and constraints

This is a combination of fixed things that I want (requirements) and more negotiable things that would make some approaches easier than others (constraints).

The requirements are:

  1. I need to be able to deploy an SSIS project (of more than one SSIS package) to different machines;
  2. The different machines will mean the same code will connect to different database instances, and read / write different bits of file system – each deployment uses a single database and a single bit of the file system;
  3. I need the deployment to a given machine to be a single click and then forget (so that the chance of it going wrong is as small as possible, and so people don’t need to sit there baby-sitting the process);
  4. When deployed, the SSIS code will be run periodically via SQL Agent;
  5. The SSIS code needs to be edited in Visual Studio, with the minimum of messing about before or after an editing session (otherwise people will eventually forget to do all the messing about correctly and things will go wrong).

The constraints are:

  1. The SSIS project is run as files in a directory, rather than in the database (the bulk of the work is done via stored procedures that the SSIS code calls, before you stress too much over performance – SSIS is a thin orchestration layer over the top);
  2. The SSIS project is currently set to package deployment (rather than project deployment);
  3. We currently store all code in Git and deploy C# code using Jenkins and Octopus, so the team knows how to deploy with them and how to create and edit the relevant configuration.

I’ll now give some background, to help you understand the approach.

SSIS variables give configurability

SSIS connects to the outside world (e.g. to files or databases) via connection managers.  The simple approach with these is for the connection managers to contain a text string that is e.g. the path to the file or the database connection string.

You can add in a layer of indirection via SSIS variables.  The connection manager can be configured to evaluate an expression, whose value can be made of zero or more variables, string constants and operations that combine them (e.g. concatenate them).

SSIS variables can be given a default value, which can be changed if required by the code as it runs.  (We won’t need to change them for our configuration needs, but variables can also keep track of state as the code runs, e.g. the row id of the last row in the previous batch processed.)

Using variables in connection managers like this, the code can remain basically the same for all environments, but the variables can be changed to point to different databases or bits of file system.

SSIS package configurations

SSIS code that uses package deployment can be configured via external configuration files called package configurations.  You need to enable package configurations for a given SSIS package while you’re editing it.  You can then further edit the package by adding zero or more different configuration files.  For a given configuration file you say which variables’ values you want that file to contain.  You then edit the file to have a specific value for each variable.

The configuration file and the package are tied together.  A configuration file contains an id that says which package it came from.  Unfortunately, when you add a package configuration to a package, the package gets the full path to the file.  In our environment, the file system for the different machines isn’t constant.  I.e. they don’t all have a drive called D where the code is installed in \path\to\installation\directory.

When you kick off an SSIS package via SQL Server Agent, you can specify a configuration file to use – either in a specific tab in the configuration settings, or by adding to the command to execute on the command line.

Unfortunately, I haven’t found a way of telling Visual Studio to use a given configuration file that doesn’t involve editing the package.

Octopus is a shifter of packages and doesn’t care how they’re made or what they contain

Octopus is a nice example of separation of concerns.  Octopus accepts packages (e.g. .nupkg files), and then pushes them out to remote machines under your control.

It doesn’t care how the package came into existence, as long as it plays by the rules of the relevant file format (e.g. it contains a .nuspec file).  It also doesn’t care what’s inside the package – it just sees the (package, remote machine) pair as something on its To Do list.  The package is copied to the remote machine, with configuration changes on the way if required, then the contents of the package are unpacked.

Our C# code is packaged up with the help of a set of MSBuild rules that re-use the existing information in the .csproj files.  We don’t have that help for SSIS, but you can manually create a .nuspec file to list the files to include in the package and for us the list is very short and hardly ever changes.  nuget takes the .nuspec file created by hand and creates the package accordingly.

Approach

I will work through the approach starting at the end and working backwards.

Diagram showing development machine -> central Git repo -> build machine -> Octopus server -> target machine

On target machine X, SQL Agent is configured to run the packages using the config files specific to X.  (There is a config file per package.)  This means that the code on each machine is configured appropriately for that machine.

The .nupkg file containing of all code and config files is installed on a given machine using Octopus.  Most config files will go unused on a given machine, but they don’t contain anything sensitive so I don’t bother to delete them.  (I’ll spare you the details of the Octopus config as it’s fairly boring.)

The .nupkg file is created via a hand-written .nuspec file that lists all packages and all config files.  Jenkins runs nuget on this, and then uses Octopus CLI tools to push the resulting package to Octopus.

The configuration files are created as follows – there is slightly tedious fiddly about forced on you here by SSIS.  The SSIS package is edited to enable package configurations, a package configuration is added and then immediately deleted.  (Package configurations are left enabled.)  This doesn’t delete the file in the file system; it’s just that the package doesn’t know about it any more.  Because the file is generated via the normal tools (Visual Studio), it is given the id of the SSIS package.  The package configuration file is then copy/paste/edit’d until there is a clone per target environment, with the values set correctly.

The package is edited so that the variables have default values that work for the development environment.  The combination of this plus no package configuration files being configured in the package is legal, and means that Visual Studio uses the values that work for development.  When the package is run on a different machine, the values in the config file specified via SQL Agent will override the default values and so tune it to the new machine.

Summary

This is a fairly convoluted process, so well done for getting this far!  There are probably better ways of doing this, but this works for us and is a big improvement on the manual process it replaces.  Visual Studio, the SSIS run time engine, Jenkins, Octopus and SQL Agent have their foibles, which make this a bit of a faff, but I hope this post helps you understand it a bit.

UPDATE: Added the diagram.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s