In this article, we explore how Docker tags work, the risks and benefits of using them, and a mechanism for pinning to specific digests to bring us closer to reproducible builds.

How Docker Image Tags work

Docker tags are mutable named references to Docker images, much like branch refs in Git. They make it easy to pull and run images, and for image authors to roll out updates automatically.

For example, to pull the latest Debian GNU/Linux image for the buster release:

$ docker pull debian:buster

or similarly in the FROM line of in Dockerfiles:

$ cat Dockerfile-hello-world
FROM debian:buster
CMD [ "echo", "hello world" ]
$ docker build -f Dockerfile-hello-world . -t hello-world
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM debian:buster
 ---> dc2eddc15825
Step 2/2 : CMD [ "echo", "hello world" ]
 ---> Using cache
 ---> 252a30f53d74
Successfully built 252a30f53d74
Successfully tagged hello-world:latest

Whenever a pull or build command is issued, the Docker client checks which image the buster tag currently points to and downloads it (if it isn't already cached locally).

Multiple tags can point to the same image. For example, at the time of writing, latest, buster, 10, and 10.8 all point to the same image. latest is just like any other tag, except that it is the default tag when pushing or pulling an image if none is specified:

$ docker pull debian

The author can publish (push) a new version of the buster tag at any time for any reason, most likely with security and/or bug fixes. That way, whenever a tag gets used to pull an image or on the FROM line in a Dockerfile, the latest version will automatically be downloaded and used.

The Problem with Docker Tags

Docker tags make it easy to benefit from the hard work of others by running their images directly or by creating new derived ones. Simply head over to Docker Hub, choose a tag, and each time you pull the image, you'll get the latest version pushed by the authors.

The downside of this is that each time a Docker tag is pulled, the latest version is used, and this is often not what you want if you value build reproducibility, and you really should!

Without reproducibility, it can be difficult to isolate issues introduced by dependencies from those introduced in your own application code.

A new version of a tagged image may fix a critical bug or vulnerability that does not affect an application, and at the same time, introduce other major bugs or vulnerabilities that do affect the application. Automatically including this new version in the application could be a costly mistake, not least if the new version resulted from a supply chain attack.

There are other reasons a Docker build won't be reproducible, such as stale Docker client caches, non-reproducible instructions in the Dockerfile, and we shouldn't ignore them either!

Most dependency management systems (eventually) include some mechanism to tie dependencies to fixed versions. Each time an application gets built, the dependencies used are exactly the same (e.g., Maven, NPM package-lock, Go modules, etc.).

Pinning Docker Images

Docker is no different. With the Docker v2 API release, it became possible to use digests in place of tags when pulling images or to use them in FROM lines in Dockerfiles.

For example:

docker pull debian@sha256:839535f161ac382d771e74b8bb8157be00d3d813345a58fd28aa52e1bf242c91

or

FROM debian@sha256:839535f161ac382d771e74b8bb8157be00d3d813345a58fd28aa52e1bf242c91

These digests are calculated from descriptors generated by a Docker client during the build and stored in a Docker registry and address one of a:

  • manifest-list: a list of manifests for different platforms
  • manifest: a list of layers and their digests for a specific docker image

On a linux/amd64 machine:

$ docker pull debian:buster
buster: Pulling from library/debian
Digest: sha256:9d4ab94af82b2567c272c7f47fa1204cd9b40914704213f1c257c44042f82aac
Status: Image is up to date for debian:buster
docker.io/library/debian:buster

Note the digest:

sha256:9d4ab94af82b2567c272c7f47fa1204cd9b40914704213f1c257c44042f82aac

At the time of writing, the digest for debian:buster on linux/amd64 found on Dockerhub was:

sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b

The Docker CLI is actually showing the digest for a manifest-list, which is being resolved automatically to digest the manifest for the current platform (as required for running or building).

We can verify this by inspecting the manifest:

$ docker manifest inspect debian@sha256:9d4ab94af82b2567c272c7f47fa1204cd9b40914704213f1c257c44042f82aac

which clearly shows that the digest for the linux/amd64 platform matches that shown on Dockerhub:

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:cc53ac4b589c1c12fa7700a8266a58ae5b13a2c7730949e7bff1a74afc3dedb7",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v5"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:9d2f2c6b0fc442536e9b24b247cb0af2cb671196fc0df00cc0ee18a43a8243e7",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:917ab0af993d3997abb8404f8b309c832d73bc9996cb04cc89c656258c6b3999",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:043b4267e27221c075efb9b567e1264dcb75a46f779b4041bd764d8d438493b2",
         "platform": {
            "architecture": "386",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:1f55a6a90eceb1b622021f7c484248a95e8faded60660ecbf06152f3dfcefc5e",
         "platform": {
            "architecture": "mips64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:10184f1b8dc52c6bb3145868585deeb36d02a141a0bfc3c6bd7cd42adc1b5101",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 529,
         "digest": "sha256:a4661b41a4f92b1bf91aafa05cfa9e039686560713b2785c00e2c73c299ae0a4",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

So it's possible to use a tag, a platform-specific image digest, or a manifest-list digest when pulling an image or configuring a FROM line in a Dockerfile. The latter both give us improved build reproducibility.

The Problem with Docker Image Pinning

Now that our images get pinned to specific digests, keeping them up to date is the problem.

  • How do we know if the good folks at Debian have released a new buster image?
  • How would we uptake it in a controllable way?
  • What was fixed and/or broken?

One solution would be to:

  • Visit Dockerhub each day and find the digests of all the tags being used
  • Try to figure out the upstream source code repository
  • Try to find the specific commits of the source Dockerfile (and its FROM lines if any) and all the packages included
  • Maybe try to figure out which vulnerabilities have been fixed and/or introduced by scanning the image using a third-party tool
  • Decide if up-taking this new image is on balance going to make things better or worse
  • Update all Dockerfiles to the latest digests
  • Rebuild all images, test and release them

For most, I expect this is more or less intractable. I'd be surprised if there are many Docker users for which having a reproducible build is a realistic prospect — especially given there is currently no notification mechanism on Dockerhub to inform users of new public images for a tag.

How we are solving this at Atomist

At Atomist, we're solving these problems for our own internal processes and for our users.

To that end, we've developed a new tool that will automatically pin the FROM lines Dockerfiles to the latest digests for those tags, monitor Dockerhub for updates, and raise pull requests if the tags are updated to point to new images when published by their authors.

There is also support for private registries on Amazon's ECR, Google's GCR, and Dockerhub if the base images aren't publicly available on Dockerhub.

Go ahead and try our Docker Base Image Policy on your own Dockerfiles for free.

If you have questions or suggestions, let me know in our Atomist community Slack or @kipzter.