Someone asked a question on the Ministry of Testing forum about CI/CD pipelines, and I realised my answer was a blog post (an extended waffle, that others might find useful). The question was from someone just starting with CI/CD, and they asked more experienced people: What do you wish you knew when you started, and what is the most important thing you learned so far?
The answers below are general, rather than specific to any one technology (either of the pipeline itself, or of the stuff flowing through the pipeline). I have already written more specific stuff on Jenkins and Octopus, and on deploying SSIS packages using Octopus.
Building software using software
My main point is to think of the build pipeline as a software artefact in its own right. (See also: your tests are code too.) So you should do as many of the normal software development things with it as are helpful for you. These turn into questions like:
- What are the requirements for the pipeline, who decides on what they are and their priority? Does this priority reflect the effort and risk involved?
- Are there dependencies between the different bits of work that will influence the order things get implemented?
- Is stuff under source control, is it reviewed, is it tested (see Testing below)?
- Do you need a separate development environment, or is it OK to tinker directly with production? If you have a separate development environment, how will you address risks of (not) making the same changes to production?
Product management for the pipeline
If the questions above make it sound like your needing product management for the pipeline, with some kind of backlog and owner, then yes, I think you do. If you don’t do this explicitly, then it will be implicit in people’s heads and probably inconsistent as a result.
So, for instance, you might have requirements that appear to pull in opposite directions, such as: must have fast feedback, and must run loads of tests. One way of resolving this is to throw lots of hardware at the problem. Another way is to have more than one pipeline – a fast one that’s run often e.g. as soon as a commit happens, and a slower one that runs if the first one succeeds or overnight etc. It might be that the people asking for fast feedback aren’t aware of the need for many tests to be run, and vice versa. Having everything out in the open and agreed on will help to expose this tension.
I’m not saying which solution (lots of hardware or lots of pipelines) is better for you, just that in general having the pipeline’s requirements explicit and agreed upon will help manage expectations and minimise grumpiness.
We had a bug recently in one of our pipelines such that errors were being swallowed rather than stopping the pipeline. So it might be worth having some way of testing your pipeline[s], to check that they really will stop if there’s a problem in step X (if that’s what you want).
As you’re running tests on the pipeline or iterating through changes to the pipeline, this could generate lots of notifications to Slack, Teams, email or wherever your pipeline’s connected to. This can get annoying for other people, as can lots of messages from you saying “Please ignore messages about the pipeline for the next afternoon while I tinker with it.” It could be that you just need people to live with this, or you could have a separate test environment for the pipeline that sends notifications only to you.
Multiple environments and source control
As mentioned above, testing links to other issues, such as safely porting changes between environments, which in turn links to things like having important things under source control. If you can easily pull from source control to each environment, then moving changes between environments is easier.
Over time, particularly if there’s a rush, it could be that workarounds accumulate that are outside of source control. For instance a batch file to do particular commands, that it is called from the pipeline. If possible, try to avoid this. Firstly, if the pipeline’s machine dies then you might lose your only copy of that important file. Also, it’s another thing that needs to be copied across machines. Finally, you have business logic in more than one place (the main pipeline and these external scripts), which can make it easy to overlook the logic outside the pipeline.
Who needs to get notified about what, when and with what info?
What level of detail is useful, to whom? Do you need e.g. a list of all the unit regression tests that have passed or failed, or just a summary of how many there are of each kind? If you summarise, how can you get the extra detail if you need it, e.g. to investigate build failures? For instance, you could summarise in the main output, and write the detail elsewhere, and put something in the summary about where to find the details.
It’s worth thinking about why and how a pipeline should start. There are three main categories to these:
- A human starts things
- Some code starts things
- A clock starts things
A human could decide to start a pipeline now, because they want to update a target environment – particularly if it takes a while for the pipeline to run (which is generally a bad idea).
Some code could start a pipeline, such as code that knows that new or changed code has been created, or a previous pipeline has run successfully. The second one lets you chain pipelines end-to-end – earlier pipelines passing show that it’s worth running a later one that takes more time or other resources. For instance, you don’t bother running performance tests if the functional tests are failing.
It could be that pipelines run automatically overnight or over the weekend, i.e. they’re triggered by a clock.
There’s no single right answer. You could mix and match – for instance, run pipeline A every time there are code changes, and then run pipeline B overnight and pipeline C afterwards if pipeline B finishes successfully
Don’t try to boil the ocean
Like with normal software development, getting feedback quickly and often is a good idea, to limit the size of your mistakes. The same is true of building a pipeline. Start small, with something that’s low risk, low effort and / or high value. Then add the next thing, then the next. Don’t try to go from nothing to a Rolls Royce pipeline in one big bang.
I’ve tried to keep this general, rather than specific to technologies. I think it is useful to approach pipeline development as you would approach developing production software – with the product management and software development process and tools you’re familiar with.
One thought on “CI/CD pipelines are software artefacts”