I’ll describe our Continuous Integration pipeline which is used by several teams to develop software which is later deployed as Docker Containers. Our programmers use git to develop new features into feature branches which are then usually merged into master branch. The master branch represents the current development of the software. Most projects also have a production branch which always contains code which is ready to be deployed into production at any given moment. Some software packages use version release model so each major version has its own branch.
As developers develop their code they always run at least unit tests locally in their development machine. New code is committed into feature branch, which is merged by another developer into master and pushed to the git repository. This triggers a build in a Jenkins server. We have several Jenkins environments, the most important are testing and staging. Testing provides a CI environment for the developers to verify that their code is production compatible and the staging is for the testing team so that they can have their time to test a release candidate before its actually deployed into production.
High level anatomy of a CI server
The CI service runs a Linux with Docker support. A Jenkins instance is currently installed directly into the host system instead of a container (we had some issues with it as it needs to launch containers). Two sets of backend services are also started into the server: A minimal set required for a development environment. These containers run with –net=host mode and they bind to their default ports. These are used for the unit tests.
Then there’s a separated set of services inside containers which form a complete set that looks just like production environment. The services also obtain fresh backups from production databases so that the developers can test the new code against a copy of live data. These services run with the traditional docker mode, so they each have their own IP from the 178.18.x.x address space. More on this later.
Services for development environment and when running unit tests
Developers can run a subset of required services in their development laptops. This means that for each database type (redis, mongodb etc) only a minimal amount of processes are started (say one mongodb, one redis and no more). This is so that the environment doesn’t consume too much resources. The tests can also be programmed to assume that there’s actual databases behind. When the tests are executed in a CI machine a similar set of services is found on the ports at localhost of the machine. So the CI machine has both a set of services bound to localhost default ports and then a separated set of services which represent the production environment (see next paragraph)
We also use a single Vagrant image which contains this minimal set of backend services inside containers so that the developers can easily spawn the development environment into their laptops.
When a build is triggered the following sequence is executed. A failure in any step break the execution chain and the build is marked as a failure:
- Code is pulled from the git repository.
- The code includes a Dockerfile which is then used to build a container which is tagged with the git revision id. This results that each commit id has one and exactly one container.
- The container is started so that it executes the unit test suite inside the container. This accesses a set of empty databases which are reserved for unit and integration testing.
- Integration test suite is executed: This means that first a special network container is started with “–name=”networkholder” which acts as a base for the container network and it runs redir which is used to redirect certain ports from inside the container to the host system so that some depended services (like redis, mongodb etc) can be accessed as they would be in the container “localhost”. Then the application container is started with the –net=”container:networkholder” so that it reuses the network container network stack and it starts the application so that it listens for incoming requests. Then a third container is started into the same network space (–net=”container:networkholder) and this executes the integration test suite.
- A new application container (which usually replaces an existing container from the previous build) so that the developers can access the running service across the network. This application container has access to production like set of backend services (like databases) which contains a fresh copy of the production data.
- A set of live tests are executed against the application container launched in previous step. These tests are programmed to assume that they can be executed continuously in the production. This step verifies that the build can work with a similar deployment what the production has.
- The build is now considered to be successful. If this was a staging build then the container is uploaded to a private Docker registry so that it could be deployed into production. Some services run an additional container build so that all build tools and other unnecessary binaries are stripped from the container for security reasons.
Service naming in testing, staging and production
Each of our service has an unique service name, for example a set of mongodb services would have names “mongodb-cluster-a-node-1” to “mongodb-cluster-a-node-3”. This name is used to create a dns record: “mongodb-cluster-a-node-1.us-east-1.domain.com” so that each production region has its own domain (here “us-east-1.domain.com”). All our services use /etc/resolv.conf to add the region domain into its search path. This results that the applications can use the plain service name without the domain to find the services. This has the additional benefit that we can run the same backend services in our CI servers so that the server has a local dns resolver which resolves the host names to docker container.
Consider this setup:
- Application config has setting like mongodb.host = “mongodb-cluster-a-node-1:27017”
- Production environment the service mongodb-cluster-a-node-1 is deployed into some arbitrary machine and a DNS record is created: mongodb-cluster-a-node-1.us-east-1.domain.com A 10.2.2.1
- Testing and Staging environments both run mongodb-cluster-a-node-1 service locally inside one container. This container has its own IP address, for example 172.18.2.1.
When the application is run in testing or staging: Application resolves mongodb-cluster-a-node-1. The request goes to a local dnsmasq in the CI machine, which resolves the name “mongodb-cluster-a-node-1” into a container at ip 172.18.2.1. Application connects to this ip which is locally in the same machine.
When the application is run in production: Application resolves mongodb-cluster-a-node-1. The request goes into the libc dns lookup code which uses the search property from /etc/resolv.conf. This results that a DNS query is eventually done for mongodb-cluster-a-node-1.us-east-1.domain.com, which returns an IP in an arbitrary machine.
This setup allows us to use the same configurations in both testing, staging and production environments, so that we can verify that all high availability client libraries can connect to all required backends and that the software will work in the production environment.
This setup suits our needs quite well. It leverages the sandboxing which Dockers gives us and enables us to do new deployments with great speed: The CI server needs around three minutes to finish a single build plus two minutes for deployment with our Orbitctl tool, which deserves its own blog post. The developers can use the same containers in a compact Vagrant environment which we use to run our actual production instances, reducing the overhead for maintaining separated environments.