Continuous Integration with Docker-REDHAWK

Introduction

Over the last year I’ve published a number of blog posts from how to use Docker-REDHAWK up through some challenges getting a functional GitLab instance setup with a secure Docker container registery (via TLS). I also posted about using Docker-Compose to test REST-Python. For this post, we’ll focus on transitioning that last part over into a GitLab CI pipeline in such a way that we:

  1. Permit concurrent test environments on the same host
  2. Clean up after ourselves (both at the build runner and container registry)

Before we get too far into this, the configuration I describe below is using a GitLab CI Runner configured with /var/run/docker.sock mapped to the runner. This allows appropriately configured containers to control the host’s Docker daemon (e.g., the docker image can do this).

Test Environment

REST-Python is our fork of the REDHAWK SDR program’s old REST backend to a REDHAWK system. It supports a number of great features like event channels and server-side decimation (to support lighter-weight edge devices). And naturally since it’s exposing endpoints to REDHAWK’s infrastructure and assets, it actually needs a running system for testing. Each element of the infrastructure is already wrapped in a Docker-REDHAWK image (a configurable Domain is in geontech/redhawk-domain, for example) so this is an effort to setup these services relative to the REST-Python environment and run its tests against those known endpoints.

Attempt 1: GitLab CI YAML services

GitLab’s built-in CI YAML format provides the user with specifying an image for a job as well as services. Each of these ultimately points to a Docker image name in a repository somewhere (DockerHub, for example). On face value then, I expected to setup a job like this:

test:
  stage: test
    variables:
      OMNISERVICEIP: omniserver
    image:
        name:       INTERNAL_TEST_IMAGE
        entrypoint: ["bash", "-l", "-c"]
    services:
      - name:  geontech/redhawk-omniserver
        alias: omniserver
    - geontech/redhawk-domain
    - geontech/redhawk-gpp
  script:
      - nameclt list
      - ./test.sh

The test.sh script would then use the SCA File Manager on the domain to copy in any test apparatus assets (waveforms, etc.) into the SDRROOT so that we can launch a signal generator waveform and test BULKIO streaming, for example. Piece of cake, right?

Unfortunately, the three services representing the OmniORB service, a Domain, and a GPP have no communication with one another even though nameclt list would prove connectivity from the job’s container to the omniserver container, for example. The reason for this is that in the backend, GitLab is linking the service containers to the job but not one another. That’s not terribly helpful, but it’s a start.

Attempt 2: Docker-Compose services

An alternative route to the setting up services is to define your test system as a Docker Compose stack. Each container defined in the compose file can then share a named virtual network with one another which ensures that each service can communicate. Going this route also ensures that there can be concurrent instances of testing occurring on the same host since the network can be uniquely named. For example:

networks:
  backend
    frontend

services:
  omniorb:
      networks:
          - backend
    rest:
      networks:
          - backend
          - frontend

Not shown in the example above are the rest of the service definitions; we’ll get to that in a moment.

To keep stack’s names unique, we’ll use docker-compose option -p (project) when launching the environment. There are a variety of CI_... built-in variables that can be used here to ensure the name is unique. If we use CI_COMMIT_REF_SLUG for example, it resolves to the branch name, slugified so it will fit Docker-Compose’s required format. See here for the other predefined environment variables for more creative options.

One area where environment variables will get resolved is in the image definition of a service. That’s handy for us because we can define it as a variable name with a default, and then export our build’s unique image name prior to running docker-compose. That makes the compose file reusable both in the build environment and at a developer’s desk. For example, the compose file’s REST service would be defined:

services:
  rest:
      image: ${CONTAINER_TEST_IMAGE:-geontech/redhawk-webserver}

Note: We could also have defined two compose files with the resting one as a docker-compose.override.yml.

Important: The CONTAINER_TEST_IMAGE variable is defined globally in the .gitlab-ci.yml as $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG.

In this way if a user wants to use this compose file for local testing, they will end up with the publicly named geontech/redhawk-webserver or easily replace that image with their own locally cached name. Let’s look at the complete compose file:

version: "3.4"
services:
  # OmniORB services
  omniorb:
    image: geontech/redhawk-omniserver:${REDHAWK_VERSION:-2.0.8}
    hostname: omniorb
    networks:
      - redhawk

  # Standard REDHAWK SDR Domain
  domain:
    image: geontech/redhawk-domain:${REDHAWK_VERSION:-2.0.8}
    depends_on:
      - omniorb
    environment:
      - DOMAINNAME=REDHAWK_DEV
      - OMNISERVICEIP=omniorb
      - OMNISERVICEPORTS=19000:19050
    networks:
      - redhawk

  # Standard GPP 
  gpp:
    image: geontech/redhawk-gpp:${REDHAWK_VERSION:-2.0.8}
    depends_on:
      - domain
    environment:
      - DOMAINNAME=REDHAWK_DEV
      - NODENAME=GPP_Node
      - GPPNAME=GPP
      - OMNISERVICEIP=omniorb
      - OMNISERVICEPORTS=19100:19150
    networks:
      - redhawk

  # FEI FileReader Device runner mounted to the snapshot.
  fei_file_reader:
    image: geontech/redhawk-fei-filereader:${REDHAWK_VERSION:-2.0.8}
    build: https://github.com/GeonTech/FEI_FileReader
    depends_on:
      - domain
    environment:
      - DOMAINNAME=REDHAWK_DEV
      - NODENAME=FEI_FR_Node
      - OMNISERVICEIP=omniorb
      - OMNISERVICEPORTS=19200:19250
    networks:
      - redhawk

  # REST-Python Unit Under Test (UUT)
  rest:
    image: ${CONTAINER_TEST_IMAGE:-geontech/redhawk-webserver}
    depends_on:
      - domain
      - gpp
      - fei_file_reader
    environment:
      - OMNISERVICEIP=omniorb
    command: [sleep, '3600']
    networks:
      - redhawk
      - web

networks:
  redhawk:
  web:

As one can see, there are two network names that can be easily patched in a CI environment so that concurrent test environments will not collide. Also you’ll notice the command for the REST service is overwritten here to simply sleep; this prevents its supervisord instance from running on startup, resulting in a second REST service inside the container (after all, we’ll be running it manually in testing so we don’t need the other one). Other items of interest are the use of the OMNISERVICEIP environment variable set to the alias name of the omniserver container, and OMNISERVICEPORTS variable being used to constrain OmniORB’s port selection strategy.

Compose-based Testing in GitLab

As mentioned in the introduction, the GitLab CI Runner configured in this system has access to the /var/run/docker.sock. This allows us to define jobs using the docker image, for example, to control the host’s Docker daemon. That is our strategy here, to use docker-compose from within a Docker container, and then call docker-compose exec ... to manipulate the test environment. Let’s take a look at that idea.

Build It

First, the build stage where we take our local Dockerfile and build a version of the REST service that we’ll test.

.dind: &dind
  image: docker:latest

.container_registry: &container_registry
  <<: *dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  after_script:
    - docker logout $CI_REGISTRY

build:
  stage: build
    <<: *container_registry
  script:
      # Patch and build the app
    - sed -i -E "s/(geontech.+\:)(.+)/\1"$REDHAWK_VERSION"/g" Dockerfile
    - docker build --pull --rm -t $CONTAINER_TEST_IMAGE .
    # Login, push, clean-up
      - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker push $CONTAINER_TEST_IMAGE
    - docker logout $CI_REGISTRY
    - docker rmi $CONTAINER_TEST_IMAGE

If you’re familiar with Docker-REDHAWK (CentOS), the Dockerfiles are all templates that get patched at build time with the target version of REDHAWK SDR. So in the above job’s script, you see I’m doing effectively the same thing, an in-place patch of the Dockerfile. The script then logs into the local container registry (provided by our onsite GitLab instance) and pushes the image. The last step deletes the test image from the local build system to avoid cluttering it with build artifacts.

Test It

test:
  stage: test
  <<: *container_registry
  script:
    # Install docker-compose
    - apk add --no-cache py2-pip && pip install docker-compose

    # Change to 'tests', pull the test image, and bring up the stack
        # without rebuilding it.  Then install other assets and run the test.
        # Once finished, bring down the stack and clean up before returning
        #  the test result code.
    - cd tests
        - docker pull ${CONTAINER_TEST_IMAGE}
    - docker-compose -p ${CI_COMMIT_REF_SLUG} up -d --no-build
    - docker-compose -p ${CI_COMMIT_REF_SLUG} exec -T rest
      bash -l -c 'yum install -y rh.SigGen rh.FileWriter'
    - docker-compose -p ${CI_COMMIT_REF_SLUG} exec -T rest
      bash -l -c './test.sh' || RESULT=$?
    - docker-compose -p ${CI_COMMIT_REF_SLUG} down
    - docker rmi ${CONTAINER_TEST_IMAGE}
    - exit ${RESULT}

There are several comments in that code block which describe the main concepts well. You can see where we install docker-compose and then patch the docker-compose.yml file to make it concurrent-test-environment friendly. Finally, we run docker-compose ... up to bring up all the containers, which are now namespaced uniquely thanks to the patching and the -p (project) option.

We then use docker-compose exec -T ... to run a few commands in the rest service’s container. Because we’re using bash -l -c ... to execute specific commands, we need the -T option since we do not need a TTY interface for the exec. Now, one thing you probably noticed is that the actual exec to run the test.sh script is LOGICAL OR with storing the return code ($?) into a variable. This ensures that even if tests fail, the job will run docker-compose down, which will destroy all containers and the network(s) related to the compose file (because again, we want to be good citizens and clean up after ourselves). Then we exit using that return code so that any failure(s) can halt the pipeline as appropriate.

More Background: the test.sh runner script in REST-Python uses the REDHAWK SDR python packages (from ossie.utils import sb, utils) to make test assets and then deploy them into the SDRROOT which is why we first yum install ... a few REDHAWK assets local to the REST server, make the test assets, and then copy them to the Domain’s SDRROOT via its File Manager interface:

def install_app():
    FILE_NAME = '{0}.sad.xml'.format(Default.WAVEFORM)
    INSTALL_DIR = os.path.join('/', 'waveforms', Default.WAVEFORM)
    INSTALL_FILE = os.path.join(INSTALL_DIR, FILE_NAME)

    # Make a simple waveform and write it to file
    sb.launch(Default.COMPONENT)
    waveform_xml = sb.generateSADXML(Default.WAVEFORM)

    # Install the waveform in the SDRROOT of the running domain
    domain_name = os.getenv('DOMAINNAME', 'REDHAWK_DEV')
    omniorb_ip = os.getenv('OMNISERVICEIP', 'localhost')
    try:
        print 'Attaching to domain "{0}" on naming service "{1}"'.format(
            domain_name, omniorb_ip)
        dom = redhawk.attach(domain_name, omniorb_ip)

        if not dom.fileMgr.exists(INSTALL_DIR):
            print 'Creating ' + INSTALL_DIR
            dom.fileMgr.mkdir(INSTALL_DIR)
        print 'Writing ' + INSTALL_FILE
        waveform_sca = dom.fileMgr.create(INSTALL_FILE)
        waveform_sca.write(waveform_xml)
        print 'Finished.  Closing file.'
        waveform_sca.close()
    except:
        pass

Note: Since we’re using Compose, we could also just as simply define a volume shared between the domain and rest containers rather than using the file manager. Your docker-compose down would need the -v (--volumes) option as well.

Clean It

When you’re finished testing and have already merged the code into a protected branch, perhap’s its time to delete that branch and its associated test image (again, good citizen, clean up after yourself). One of the downsides to the GitLab container registry is that deleting images that have been pushed to it is only discussed in terms of using the UI, which makes cleaning up after automated processes a very time consuming process if you have many projects. In fact as of today, there are at least a dozen open issues on GitLab CE and EE where both free and paid users have been requesting this feature for over a year! So what do we do?

Well, I found this post linked by one of those users who loosely spelled out his approach to probing the various REST APIs (GitLab and the container registry) to ultimately send a DELETE for the image. I’m going to take a moment to fill in the gaps since their server environment is slightly different. For us, the registry and SCM servers are one in the same like the default described at GitLab. The SCM UI is on port 443, and the container registry is on port 4567. Therefore we have two different URLs we have to touch in order to resolve exactly what to delete from the registry.

The script below also requires you to define Secret Variables for GITLAB_USER and GITLAB_PAT. The former is a user with Master* access to the repository, and the latter is a Personal Access Token for that user. Rather than a Personal Access Token, and if it’s your preference, you could use an Impersonation Token in its place. The token must be API-level however since the user will be sending a DELETE.

*Note: The GitLab permissions model states that both Developer and Master can delete images from the registry. However in practice, I’ve found this to be bugged. The Developer user can delete an image when logged into the UI but not through the REST API.

Here’s the resulting script which takes the full image name as the first argument:

#!/usr/bin/env bash
echo ${1:?Full image name (registry, tag, etc.) are required}
[ ! -z "${GITLAB_USER}" ] || echo "You must set this variable to a GitLab user with adequate permissions"
[ ! -z "${GITLAB_PAT}" ] || echo "You must set this to the personal access token for GITLAB_USER"
[ ! -z "${GITLAB_USER}${GITLAB_PAT}" ] || exit 1

# ###########################################################################
# First, replace the existing image with a relatively empty one to save space.
# ###########################################################################
echo "Replacing ${1} with an empty image at the registry to save space"
DIR='/tmp/docker-dummy'
mkdir -p $DIR

# generate a file containing a random string
head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13 ; echo '' > ${DIR}/dummyfile 
# generate the dummy image with only that one file 
echo "FROM scratch" > ${DIR}/Dockerfile
echo "ADD dummyfile ." >> ${DIR}/Dockerfile
# build and push it
docker build -t $1 ${DIR}/
docker push $1

# ###########################################################################
# Second, split the first argument into registry, image, and tag
# ###########################################################################
pattern='^([^/^:]+)(:?[0-9]*)/(.+):(.+$)'
[[ $1 =~ $pattern ]]
GITLAB=${BASH_REMATCH[1]}
REGISTRY=${GITLAB}${BASH_REMATCH[2]}
IMAGE=${BASH_REMATCH[3]}
TAG=${BASH_REMATCH[4]}

# ###########################################################################
# Third, get the user's registry token from GitLab using the gitlab user and 
# personal access token.
# ###########################################################################
echo "Retrieving a token for modifying the repository"
TOKEN=$(curl https://${GITLAB}/jwt/auth \
    --get \
    --silent --show-error \
    -d client_id=docker \
    -d offline_token=true \
    -d service=container_registry \
    -d "scope=repository:${IMAGE}:pull,*" \
    --fail \
    --user ${GITLAB_USER}:${GITLAB_PAT} \
    | sed -r "s/(\{\"token\":\"|\"\})//g")

# ###########################################################################
# Fourth, get the manifest that will be deleted
# ###########################################################################
echo "Retrieving the manifest related to that token"
MANIFEST=$(curl https://${REGISTRY}/v2/${IMAGE}/manifests/${TAG} \
    --head \
    --fail \
    --silent --show-error \
    -H "accept: application/vnd.docker.distribution.manifest.v2+json" \
    -H "authorization: Bearer ${TOKEN}" \
    | grep -i "Docker-Content-Digest" \
    | grep -oi "sha256:\w\+")

# ###########################################################################
# Fifth, delete the image via the manifest.
# ###########################################################################
echo "Deleting image..."

curl "https://${REGISTRY}/v2/${IMAGE}/manifests/${MANIFEST}" \
    -X DELETE \
    --fail \
    --silent --show-error \
    -H "accept: application/vnd.docker.distribution.manifest.v2+json" \
    -H "authorization: Bearer ${TOKEN}"

The script has several comments to peruse. The gist is the GITLAB_USER credentials are used to fetch a token that can be used to access the container registry. It then uses that token to identify the tag and its manifest before sending the DELETE against the manifest. So what does a clean up CI task look like?

cleanup-test:
  stage: cleanup
  when: always
  image: alpine:latest
  script:
    - apk add --no-cache git bash curl
    - git clone ${DOCKER_UTILS} docker-util
    - docker-util/delete-image.sh $CONTAINER_TEST_IMAGE

Naturally, you’ll want this to run after you go through a release/tagging process so that you can push the tagged image into longer-term storage, staging, etc. The way this is written now is to always run (whether the previous stage passes or fails), and the only thing it’s doing is cloning a copy of the above delete-image.sh script and running it.

Warning: One of the drawbacks of this DELETE API that is different than say, the CLI or DockerHub, is that you cannot delete a redundant tag. Tags are only links to a given SHA manifest (digest), and the DELETE hook requires the digest as the argument. Therefore this script ends up deleting all tags and the image. Compare this to the CLI: you can delete specific tags using docker rmi ..., which is the command for removing images if you are removing the last tag for that manifest.

Conclusion

That’s it for this post about using Docker-REDHAWK within a GitLab CI environment via Compose. We’ve shown how to run through Build, Test, and Clean Up jobs in such a way that we can stand up and test the REST-Python server in a completely functional REDHAWK SDR environment without leaving artifacts behind unnecessarily.

Sometime soon, the above environment for REST-Python will be pushed publicly for those interested parties to test it. Hopefully in the mean time the above was enough information for piecing together your own CI pipeline(s). This model likely works similarly for Jenkins, Travis, CircleCI, etc.; I would love to hear any success stories there are on other platforms using these images.

Recent Posts

Ready for an exciting change?

Work with US!