Building Distributed Architectures: Effective CI/CD Pipelines

“Continuous integration and continuous delivery allow us to build, test, and deliver software with greater speed, reliability, and automation than ever before.” ― Jez Humble, Co-author of “Continuous Delivery

_config.yml

Today’s article explores the second goal in our series, “Building Distributed Architectures”. In this post, we’ll delve into the importance of CI/CD pipelines and explore the best practices for creating robust and efficient pipelines that ensure smooth and reliable software delivery, while minimizing human error and manual effort.

Before going further, this article is part of a series covering how to build an effective distributed architecture. If you haven’t already, I recommend reading the previous articles to get up to speed:

Building Distributed Architectures: Creating a Blueprint for Success

Goals

Code-Driven

Building distributed systems requires a lot of moving parts that must work effectively with each other. Furthermore, there might be more than one distributed system that exists at any given moment in time, and based on the previous article in this series, our goal is to have at least three environments: development, staging and production. Each of these environments has its own purpose, and the development environment is the one that is most frequently updated. In order to ensure that the changes are propagated to the other environments, we need to have a reliable and efficient CI/CD pipeline. Also, we need to keep a good track of the versions of the components that are deployed in each environment, so that we can easily identify the changes that are deployed in each environment.

This article focuses on setting up an effective pipeline that meets these demands. Since my day-to-day work involves using Azure DevOps, I will be focusing on that platform, but the concepts are applicable to any CI/CD platform with minimal changes.

The CI/CD Pipeline

A CI/CD pipeline must be the single source of truth for any deployed system. This means that any change that must end up in a running environment must follow the same set of standard steps and manual changes must be avoided at any cost. On the short run this might seem like an overhead, but on the long run it pays off, as it ensures that the environments are always in sync and that the changes are propagated in a consistent manner. Also, it reduces the risk of human error, as the steps are always the same and the pipeline can be tested and validated. Using the CI/CD pipeline as the single source of truth also allows us to easily identify the changes that are deployed in each environment, as we can always check the pipeline history and see what was deployed and when.

I will outline a few key points that I consider important when designing a CI/CD pipeline:

Reusability and modularity - basically applying the DRY (Don’t Repeat Yourself) principle.
Versioning - identifying the version of each component that is deployed in each environment and the overall version of the product.
Handling multiple releases, bug fixes and features.
Reentrant pipelines.

Reusability and modularity

First of all, when designing a CI/CD pipeline, in order to make it efficient, we must address it as any normal piece of software. This means using templates, variables, and other features that allow us to reuse the same pipeline for multiple projects. This is especially important when we have multiple projects that are deployed in the same environment, as we can use the same pipeline for all of them, with minimal changes, or when having similar projects, the pipelines are also similar and can be reused.

In order to achieve this goal, a good practice is to use templates. Templates allow us to define a set of steps that can be reused in multiple pipelines. For example, we can have a template that installs the Azure CLI, or a template that installs the Terraform CLI. This allows us to reuse the same steps in multiple pipelines, without having to copy-paste the same steps in each pipeline. Also, if we need to update the steps, we can do it in one place and the changes will be propagated to all pipelines that use the template.

I would generally reccomend setting up a single project that contains all these templates. Looking at the structure of the Azure CI pipelines, the templates would be structured using the following folder structure:

/ci-templates
│
├── /deployments
│ ├── update-product-version.yaml
│ ├── tag-repositories.yaml
│ ├── publish-release-notes.yaml
│ └── ...
│
├── /jobs
│ ├── /node
│ │ ├── build.yaml
│ │ ├── publish-artifacts.yaml
│ │ └── ...
│ ├── /dotnet
│ │ ├── build.yaml
│ │ ├── publish-artifacts.yaml
│ │ └── ...
│ ├── /golang
│ │ ├── build.yaml
│ │ ├── publish-artifacts.yaml
│ │ └── ...
│ ├── ...
│ ├── build.yaml
│ ├── publish-artifacts.yaml
│ ├── semantic-release.yaml
│ └── ...
│
├── /steps
│ ├── /build
│ │ ├── /node
│ │ │ ├── build.yaml
│ │ │ ├── test.yaml
│ │ │ ├── doc.yaml
│ │ │ ├── prod.yaml
│ │ │ ├── publish-build-artifact.yaml
│ │ │ └── ...
│ │ ├── /dotnet
│ │ │ └── ...
│ │ ├── /golang
│ │ │ └── ..
│ │ ├── ...
│ │ └── create-fs-snapshot.yaml
│ ├── ...
│ └── ...

Deployments

The deployment templates address cross-cutting concerns such as:

Updating the product version
Tagging repositories with the actual versions for easy identification
Creating release notes or publishing internal documentation
Deploying brand-specific assets (logos, images, etc)

Jobs

The job templates address specific tasks that are common to multiple pipelines, such as:

Building a project (ex. basic NodeJS build job, .NET build job, etc)
Publishing artifacts (ex. publishing a Docker image, publishing a Helm chart, etc)
Running the semantic-release tool to automatically version the project build and to generate release notes

If the jobs require specialization (such as build jobs), then a general job can be defined with parameters, and in turn this job will reference the more specialized job based on the parameters. For example, a general build job can be defined with the following Azure DevOps YAML syntax:

parameters:
  build: true
  test: true
  doc: false
  platform: node

jobs:
  - template: ./$/build.yaml
    parameters:
      build: $
      test: $
      doc: $
      platform: $

This build job will load the appropriate specialized job based on the platform parameter. This allows us to reuse the same job template for multiple projects, without having to copy-paste the same steps in each pipeline.

A specialized build job for NodeJS projects can be defined as follows:

parameters:
  build: true
  test: true
  doc: false

jobs:
  # Perform the project's semantic release
  - job: BuildAndTest
    displayName: Build ($) and test ($) and document ($) node project
    steps:
      - checkout: self
        persistCredentials: true
      - $:
        - template: ../../steps/build/create-fs-snapshot.yaml
        - template: ../../steps/build/node/build.yaml
      - $:
        - template: ../../steps/build/node/test.yaml
      - $:
        - template: ../../steps/build/node/doc.yaml
      - $:
        - template: ../../steps/build/node/prod.yaml
      - $:
        - template: ../../steps/build/node/publish-build-artifact.yaml

Note: The specifics of each step will be detailed below in the Steps section

Steps

The step templates address specific tasks that are common to multiple jobs, such as:

Installing NPM packages
Running the build/test/doc command for a specific platform
Collecting the build artifacts
Publishing the build artifacts
etc.

In our example above - for a NodeJS project we have the following steps:

create-fs-snapshot.yaml - this creates an internal snapshot of the build folder before installing packages. This is useful in the later steps to identify exactly which files need to be published as artifacts.
build.yaml - this installs the NPM packages and runs the build command
test.yaml - this runs the test command
doc.yaml - this runs the doc command, generating the documentation files from the codebase
prod.yaml - This will install the production dependencies and remove the development dependencies from the build folder
publish-build-artifact.yaml - This will create an archive containing all necessary files from the initial snapshot and the production dependencies, and will publish it as a build artifact. This artifact can be used later in the pipeline to publish the build to a Docker registry, or to publish it as a Helm chart.

Of course, this is just an example, the build steps will be different for each project and technology, but the general idea is to have the granular details in the steps, while the jobs and deployments act as recipes that combine the steps in a specific order.

Using the templates in each pipeline

Once the templates are defined in a separate project, they can be used in each pipeline. For example, a pipeline for a NodeJS project can be defined as follows:

# Set continuous build trigger on the master branch
trigger:
  batch: true
  branches:
    include:
      - master
      - main      
      - release-*
      - feat-*
      - staging
      - prod
    exclude:
      - refs/tags/*

resources:
  repositories:
    # Reference the ci-pipelines repository for the CI templates
    - repository: ci-pipelines
      type: git
      name: my-devops-project/ci-pipelines

stages:
  - stage: Build
    jobs:
      - template: ./ci-templates/jobs/build.yaml@dev-tools
        parameters:
          build: true # change this to false if the project doesn't need building
          test: true # change this to false if the project doesn't need testing
          doc: true # change this to false if the project doesn't produce doc artifacts
          platform: node
      - template: ./ci-templates/jobs/semantic-release.yaml@dev-tools
        parameters:
          build: true # change this to false if the project doesn't build any output; this must match the build parameter from the previous step as it will affect the semantic-release step
          platform: node
  - stage: Publish
    dependsOn: Build
    jobs:
      - template: ./ci-templates/jobs/publish-container.yaml@dev-tools # this will run the steps to produce a docker container and publish it to a container registry
      - template: ./ci-templates/jobs/publish-helm-chart.yaml@dev-tools # this will run the steps to produce a helm chart and publish it to a chart registry

Some notes on the above pipeline

First fo all, the project pipeline uses one or more Stages to separate the jobs. This is not always necessary, but it will produce a cleaner view in the Azure Devops UI, as each stage will be displayed as a separate section. Also, the stages can be used to define dependencies between jobs, so that a job will only start after the previous job has finished, so I would reccommend it as a good practice. There is, though, a downside to going through this approach - there is a small overhead in the pipeline execution time, as each stage will be executed in a separate agent. This is not a big issue, as the stages are usually independent, but it is something to keep in mind.

The pipeline does not effectively publish the helm chart in the Kubernetes environment, it only publishes it to a chart registry. The actual deployment of the chart is done in another pipeline, which will be outlined below. This decouples the build of a project with the lifetime of that project inside the running environments.

Generally, each project’s pipeline should produce artifacts that are published to a location from which they can be used later on. The pipeline above describes the templates and steps for a typical NodeJS microservice that will end up creating a Docker image and a Helm chart. The Docker image will be published to a Docker registry, while the Helm chart will be published to a Helm chart registry. These artifacts can be used later on in the deployment pipeline to deploy the project in the Kubernetes environment.

For NodeJS projects that produce only NPM packages, the above pipeline will not contain the Publish Stage.

For other technologies such as dotnet, the pipeline will probably be nearly identical, with the only difference in the value of the platform parameter.

Approaching the pipelines in this way ensures that the details of the steps are hidden from the project, yet centralized. If a change needs to be done in the steps, it can be done in one place and the changes will be propagated to all pipelines that use the templates. This is particularly useful when the solution grows and the number of projects becomes larger, as it doesn’t require changing each individual pipeline.

Versioning

A typical requirement is to be able to identify the version of the software. This is important for various reasons, such as:

Better communication with the customers
Easier identification of the changes that are deployed in each environment
Marketing activities centered around new releases and updates

This may sound simple, but in reality when having a distributed environment, we must handle the individual versions of each component and the overall version of the product. Therefore, our CI/CD pipeline must be able to handle this requirement.

In order to manage the versioning aspect, as well as having a coherent and consistent behavior across the entire solution, there are a few decisions and steps that must be taken into account.

Versioning of each component

Having a clear and standardized versioning schema is essential for consistency. There are multiple approaches to this, but the most common one is the Semantic Versioning approach. This approach defines a version as a set of three numbers, separated by dots, as follows: MAJOR.MINOR.PATCH. The version is incremented based on the following rules:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

We employ semantic versioning in all of the projects that make up the solution. For this, we use the semantic-release tool, which is a tool that automatically determines the next version number based on the commit messages. This tool is integrated in the CI/CD pipeline and it is run after the build step. The tool will analyze the commit messages and will determine the next version number based on the changes that were made. For example, if the commit messages contain the BREAKING keyword, then the MAJOR version will be incremented. If the commit messages contain the feat keyword, then the MINOR version will be incremented. If the commit messages contain the fix keyword, then the PATCH version will be incremented. If the commit messages contain the chore keyword, then the version will not be incremented. This allows us to have a clear and consistent versioning schema across all projects.

The semantic-release tool will also generate release notes based on the commit messages, which can be used to communicate the changes to the customers. Also, the tool will tag the repository with the version number, which allows us to easily identify the version of each component.

To set up the semantic release, a configuration must be defined in the root of the project, in a file named .releaserc. This file contains the configuration for the tool, such as the plugins that are used, the commit message format, the release rules, etc. An example configuration file can be defined as follows:

success: []
fail: []
branches:
  - 'master'
  - 'main'
  - 'staging'
  - 'prod'
  - 'release-+([0-9])?(.{+([0-9]),x}).x'
plugins:
  - '@semantic-release/commit-analyzer'
  - '@semantic-release/release-notes-generator'
  - '@semantic-release/changelog'
  - '@semantic-release/npm'

This configuration file triggers the semantic-release algorithm on the branches named main, master, staging, prod or on any branch that has a name matching the format release-[MAJOR].x or release-[MAJOR].[MINOR].x

The steps taken during the semantic release are:

Analyze the commit messages and determine the next version number (this also takes into account the repo tags and branch name)
Generate the release notes
Generate the changelog
Publish the NPM package (if the project is an NPM package).

Notes

We use the semantic-release project even in solutions which are not based on NodeJS. This means that a package.json file must exist even in these projects.
For projects that either are not based on NodeJS or that do not actually publish an NPM package, it is enough to place the private: true flag in the package.json file, so that the NPM publish step is skipped.
For monorepos that contain sub-projects that are NPM packages, the above configuration is slightly different:

success: []
fail: []
branches:
    - 'master'
    - 'main'
    - 'staging'
    - 'prod'
    - 'release-+([0-9])?(.{+([0-9]),x}).x'
plugins:
    - '@semantic-release/commit-analyzer'
    - '@semantic-release/release-notes-generator'
    - '@semantic-release/changelog'
    - - '@semantic-release/npm'
    - pkgRoot: ./dist/packages/my-lib-1
    - - '@semantic-release/npm'
    - pkgRoot: ./dist/packages/my-lib-2    

The configuration above is for an NX project that has the subprojects in the packages folder and that publishes two NPM packages: my-lib-1 and my-lib-2. The pkgRoot parameter instructs the semantic-release tool to publish the NPM package from the specified folder.

Branching and merging strategies

Although this is a whole topic in itself, I will briefly outline the branching and merging strategy that we use in our projects, as this affects also the CI/CD pipelines and the versioning.

Looking back at our standard environments - dev, staging and prod, our versioning strategy is linked to the environments. In a nutshell, we use the following branching strategy:

The main or master branch contains the code that will be deployed in the dev environment. We use both names for historical reasons, but to be consistent it is better to settle on a fixed name e.g. main and use it across all projects.
The staging and prod branches contain stable releases that are deployed in the respective environments. Whenever a project is deemed stable enough, it will be merged onto the staging branch. After the QA process gives the green light, it will be merged into the prod branch. The prod branch is the one that is deployed in the production environment.
Bug fixes are done by branching the corresponding prod or staging branch into a release-[MAJOR].[MINOR] branch. The MINOR version used as a basis for the branch should be the highest one for that specific [MAJOR]. The semantic release configuration ensures that any new versions created on this branch can only affect the [PATCH] version and that they will be higher than the initial [MAJOR].[MINOR].[PATCH] version. This also ensures that changes on this branch cannot produce versions that already exist (semantic-release will report an error). Once the bug is fixed, the branch will be merged into the main so that newer versions will contain the fix.
New features are developed on feature branches called feature-[name]. Changes pushed here will not create new versions.

The branching strategy can be extended with other rules, such as - commits to main branch can be done directly, while merges into the staging and prod branches must be done through pull requests. This ensures that the code is reviewed before being merged into the stable branches.

From a CI/CD perspective:

the project pipeline will cause an automatic CI build for the commit on any of these standard branches (main, master, staging, prod, release-..., feature-...)
the master CI pipeline, about which I will talk below, is triggered continuously on the main branch, while the staging and prod pipelines are triggered manually, when a new version is ready to be deployed. The release-... and feature-... branches are not deployed in any environment, they are only used for development purposes, therefore the master CI pipeline is not triggered on these branches. Optionally, depending on the bug fix strategy, this rules can be extended to handle the release-... branches as well (treating them identical to the prod branch).

There are multiple branching strategies that can be employed and the semantic-release tool can be configured to work with any of them. The above strategy is just an example, but it is important to have a clear strategy that is followed across all projects.

Versioning of the product

As mentioned before, even though each project in the whole solution has its own version, we also need to have a version for the entire solution. Also, we need control over when the version is incremented, as we don’t want to have a new version for each commit that is pushed to the main branch. This is where the master CI pipeline comes into play.

A master CI pipeline describes the steps that are taken for the entire project. The master CI pipeline uses deployments, jobs and steps from the same common template project that is used by the project pipelines. The master CI pipeline is triggered continuously on the main branch and manually on the staging and prod pipelines, and it is responsible for the following:

Set product version stage. Generating a new version (this is not used until the end of the pipeline, but we need it from the beginning for the next steps). For control of the whole project version we don’t use semantic versioning here, instead the versions are controlled in a semi-automatic way. The product MAJOR and MINOR version are set out in build variables, which must be manually adjusted, while the PATCH version is a timestamp (or optionally, incremental) value. Typically releasing a new official version (whether minor or major) is an activity that must be coordinated with the other business departments (such as marketing, sales, etc.), therefore the decision of switching to a new version involves a manual step.
Deployment stage. This stage has individual jobs for each microservice or component from the solution that must be deployed to an environment. The environment, as well as it’s settings are parametrized so that by changing the environment name, the pipeline will target dev, staging or prod, respectively.
Update product version stage. The final step in the master CI pipeline is to update the product version. This is reflected in either setting up some global value or updating a ConfigMap or Secret in the Kubernetes environment. This is done at the end of the pipeline, so that if any of the previous steps fail, the version will not be updated.

The master CI pipeline is set out with continuous triggering for the main/master branches of each project and manual triggering for the staging and prod branches. I consider this to be a good practice because even if the staging or prod merges are guarded by a PR, in the end, the whole deployment is managed as a whole - generating a single new version. On the opposite side, any push to the main branch of a project will trigger the deployment to the dev environment, which ensures that the dev is always up-to-date with the latest changes. As an optional alternative, the staging can be set up as a continuous deployment, but only in the case when the staging branches are guarded by a PR.

Reentrant pipelines

An important aspect of a healthy CI/CD process is that the pipelines must be reentrant. This means that running the same pipeline multiple times should not cause issues and in the best case, if there are no changes between runs, there should be no end effect. Having the pipelines designed in such a way makes it easy to re-run them in case of failures, or to re-run them in case of changes in the environment. There are many situations in which a pipeline job can fail, such as:

The pipeline agent is not available
Timeout during the pipeline execution
Temporary outage of the external services (such as Docker registry, Helm registry, etc)
etc. In such cases, the pipeline should support the option of retrying the failed jobs or the whole pipeline altogether.

Other improvements

There are many improvements that can be done to the outlined pipelines, but they are out of the scope of this article. Just to name a few:

Setting up pipelines that create a complete environment from scratch, run e2e tests and then drop the environment
Integrating static and dynamic vulnerability scanning tools into the build/deployment process
Integrating code quality tools into the build/deployment process
Integrating performance testing tools into the build/deployment process
Integrating observability tools into the build/deployment process
etc.

Conclusions

This article delves into the crucial considerations when designing a robust CI/CD (Continuous Integration and Continuous Delivery) pipeline for your software projects. While it’s an extensive read, it’s essential to address these key aspects:

Single Source of Truth: The CI/CD pipeline serves as the authoritative source for all deployments. It ensures consistency and reliability across environments.
Software Development Principles: Treat the CI/CD pipeline like any other software project. Emphasize reusability and modularity, following the “Don’t Repeat Yourself” (DRY) principle.
Handling Diverse Changes: Design the pipeline to accommodate multiple types of changes, including releases, bug fixes, and new features.
Reentrancy: Create a pipeline that can be rerun without adverse effects, crucial for addressing failures or adapting to evolving requirements.
Versioning Control: Manage versioning at both the component and product levels. Consider adopting semantic versioning for components and ensure clear control over the product’s version.
Branching and Merging Strategies: Implement a well-defined branching and merging strategy across the project, aligning it with CI/CD practices.
Support for Multiple Environments: Craft the pipeline to seamlessly adapt to various deployment environments, such as development, staging, and production.
Scalability for Multiple Projects: Ensure that the pipeline can accommodate multiple projects within your ecosystem with minimal adjustments.
Technology Agnosticism: Design a pipeline flexible enough to cater to various technologies used in your organization’s projects.
Deployment Flexibility: Enable the pipeline to target different deployment destinations, supporting diverse deployment targets.

By incorporating these considerations into your CI/CD pipeline design, you can establish a robust, adaptable, and efficient pipeline that aligns with best practices in software delivery.

Having a well-designed CI/CD pipeline is essential for any distributed system. It ensures that the environments are always in sync and that the changes are propagated in a consistent manner. Also, it reduces the risk of human error, as the steps are always the same and the pipeline can be tested and validated. Using the CI/CD pipeline as the single source of truth also allows us to easily identify the changes that are deployed in each environment, as we can always check the pipeline history and see what was deployed and when. It is one of the most crucial aspects of a distributed system and it must be designed with care early in the stages of the project.

As with any other part of the project, the CI/CD pipeline must be designed with the future in mind, but it must also be flexible enough to allow changes and improvements as the project evolves. It is important to have a clear strategy and to follow it across all projects, but it is also important to be able to adapt to the changing requirements and to be able to improve the pipeline as the project evolves. Using templates and modularizing the pipeline is a good way to achieve this without having to rewrite the pipeline from scratch for each project, especially when the number of such projects grows over time.

Written on September 17, 2023