Automating the deployment of SSIS packages

Introduction

This post describes how I have automated the deployment of SSIS packages to different environments (test, staging, production etc.)  There are definitely other ways of doing it, and some of these are possibly better – I think that this is the oldest-school way possible.  I will describe our context which I hope will explain a little of the reasons why I chose the approach I did.

If you’re new to Octopus, you might be interested in a previous post that’s an introduction to Octopus.  Also, while you’re reading on, be aware that both SSIS and Octopus have projects and packages.  These are not the same thing!

For Octopus, a package is something like a zip file that is moved about.  It can contain whatever you like.  In SSIS, a package is a single file of code that has the suffix .dtsx.

An Octopus project says how a given package progresses through different environments (test -> staging -> production etc.), whereas an SSIS project is a Visual Studio thing that groups together related files e.g. several packages.

UPDATE 1st July 2021: Octopus can do variable substitution in any file, including SSIS config files, which simplifies things.  I’ve changed things to reflect that.

Requirements and constraints

This is a combination of fixed things that I want (requirements) and more negotiable things that would make some approaches easier than others (constraints).

The requirements are:

  1. I need to be able to deploy an SSIS project (of more than one SSIS package) to different machines;
  2. The different machines will mean the same code will connect to different database instances, and read / write different bits of file system – each deployment uses a single database and a single bit of the file system;
  3. I need the deployment to a given machine to be a single click and then forget (so that the chance of it going wrong is as small as possible, and so people don’t need to sit there baby-sitting the process);
  4. When deployed, the SSIS code will be run periodically via SQL Agent;
  5. The SSIS code needs to be edited in Visual Studio, with the minimum of messing about before or after an editing session (otherwise people will eventually forget to do all the messing about correctly and things will go wrong).

The constraints are:

  1. The SSIS project is run as files in a directory, rather than in the database (the bulk of the work is done via stored procedures that the SSIS code calls, before you stress too much over performance – SSIS is a thin orchestration layer over the top);
  2. The SSIS project is currently set to package deployment (rather than project deployment);
  3. We currently store all code in Git and deploy C# code using Jenkins and Octopus, so the team knows how to deploy with them and how to create and edit the relevant configuration.

I’ll now give some background, to help you understand the approach.

SSIS variables give configurability

SSIS connects to the outside world (e.g. to files or databases) via connection managers.  The simple approach with these is for the connection managers to contain a text string that is e.g. the path to the file or the database connection string.

You can add in a layer of indirection via SSIS variables.  The connection manager can be configured to evaluate an expression, whose value can be made of zero or more variables, string constants and operations that combine them (e.g. concatenate them).

SSIS variables can be given a default value, which can be changed if required by the code as it runs.  (We won’t need to change them for our configuration needs, but variables can also keep track of state as the code runs, e.g. the row id of the last row in the previous batch processed.)

Using variables in connection managers like this, the code can remain basically the same for all environments, but the variables can be changed to point to different databases or bits of file system.

SSIS package configurations

SSIS code that uses package deployment can be configured via external configuration files called package configurations, that have the file suffix dtsConfig.  You need to enable package configurations for a given SSIS package while you’re editing it.  You can then further edit the package by adding zero or more different configuration files.  For a given configuration file you say which variables’ values you want that file to contain.  You then edit the file to have a specific value for each variable.

The configuration file and the package are tied together.  A configuration file contains an id that says which package it came from.  Unfortunately, when you add a package configuration to a package, the package gets the full path to the file.  In our environment, the file system for the different machines isn’t constant.  I.e. they don’t all have a drive called D where the code is installed in \path\to\installation\directory.

When you kick off an SSIS package via SQL Server Agent, you can specify a configuration file to use – either in a specific tab in the configuration settings, or by adding to the command to execute on the command line.

Unfortunately, I haven’t found a way of telling Visual Studio to use a given configuration file that doesn’t involve editing the package.

Octopus is a shifter of packages and doesn’t care how they’re made or what they contain

Octopus is a nice example of separation of concerns.  Octopus accepts packages (e.g. .nupkg files), and then pushes them out to remote machines under your control.

It doesn’t care how the package came into existence, as long as it plays by the rules of the relevant file format (e.g. it contains a .nuspec file).  It also doesn’t care what’s inside the package – it just sees the (package, remote machine) pair as something on its To Do list.  The package is copied to the remote machine, with configuration changes on the way if required, then the contents of the package are unpacked.

Our C# code is packaged up with the help of a set of MSBuild rules that re-use the existing information in the .csproj files.  We don’t have that help for SSIS, but you can manually create a .nuspec file to list the files to include in the package and for us the list is very short and hardly ever changes.  nuget takes the .nuspec file created by hand and creates the package accordingly.

Octopus can update configuration as it deploys

Octopus has two main benefits for deployment:

  1. It acts as the hub in a hub / spoke model, so that upstream parts of the CI pipeline can deal with only one thing (the Octopus server) and then delegate to it the details of deploying out to the targets (the spokes).
  2. It can tailor configuration of the code it’s deploying, so that it is different for each deployment target, e.g. so the deployed code has a different database connection string for each target.

So far we’ve used only the first benefit, but we can make use of the second benefit too. Because we already use Octopus to deploy our C# code to different targets, we already have rules in place that say which database connection string to use for which target. For the C# code, these apply to web.config and app.config files, but for SSIS packages they can apply to the .dtsConfig files.

Octopus understands web.config and app.config files very well, and so setting up rules to change them is straightforward. The important part of this is that you don’t need to change the web.config or app.config files to highlight the bits that Octopus should change or what rule it should use.

Octopus doesn’t have as good an understanding of .dtsConfig files, so you need to give it a bit of help. You need to edit the .dtsConfig file and replace the default value that’s currently there with #{SomeOctopusVariableName}, where SomeOctopusVariableName might be something like ReportingDBconnectionString. There’s more detail on general variable substitution on the Octopus site. As Octopus deploys the SSIS code, it will replace #{SomeOctopusVariableName} with the relevant value according to the rules you’ve set up.

Before we used Octopus variable substitution we had a different .dtsConfig file per deployment target, all of which were deployed to all targets, and SQL Agent on each target was configured to pick the relevant one for that target. This had at least two problems:

  1. There were loads of config files – more than necessary
  2. The database connection strings were hard-coded into the config files, so that when we changed database connection strings we needed to edit the config files. This was particularly annoying given that we had already configured Octopus to hold the database connection strings and their connection to specific deployment targets, because of using it to deploy C# code.

Approach

I will work through the approach starting at the end and working backwards.

On each target machine, SQL Agent is configured to run the packages using the config files.  (There is a config file per package.)

The .nupkg file containing of all code and config files is installed on a given machine using Octopus.  (I’ll spare you the details of the Octopus config as it’s fairly boring.) As it deploys it, the Octopus server updates the config files according to the variable substitution rule specific to the destination machine.

The .nupkg file is created via a hand-written .nuspec file that lists all packages and all config files.  Jenkins runs nuget on this, and then uses Octopus CLI tools to push the resulting package to Octopus.

The configuration files are created as follows – there is slightly tedious fiddly about forced on you here by SSIS.  The SSIS package is edited to enable package configurations, a package configuration is added and then immediately deleted.  (Package configurations are left enabled.)  This doesn’t delete the file in the file system; it’s just that the package doesn’t know about it any more.  Because the file is generated via the normal tools (Visual Studio), it is given the id of the SSIS package.  The package configuration file is then edited to replace the values with the magic text that Octopus expects of the form #{SomeOctopusVariableName} to trigger variable substitution, as described above.

The package is edited so that the variables have default values that work for the development environment.  The combination of this plus no package configuration files being configured in the package is legal, and means that Visual Studio uses the values that work for development.  When the package is run on a different machine, the values in the config file specified via SQL Agent will override the default values and so tune it to the new machine.

Summary

This is a fairly convoluted process, so well done for getting this far!  There are probably better ways of doing this, but this works for us and is a big improvement on the manual process it replaces.  Visual Studio, the SSIS run time engine, Jenkins, Octopus and SQL Agent have their foibles, which make this a bit of a faff, but I hope this post helps you understand it a bit.

UPDATE: Added the diagram.

UPDATE 1st July 2021: Changed to use Octopus variable substitution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s