Welcome to Knowledge Base!

KB at your finger tips

Book a Meeting to Avail the Services of Docker overtime

This is one stop global knowledge base where you can learn about all the products, solutions and support features.

Categories
All

Docker

Optimizing builds with cache management

Optimizing builds with cache management

You will likely find yourself rebuilding the same Docker image over and over again. Whether it’s for the next release of your software, or locally during development. Because building images is a common task, Docker provides several tools that speed up builds.

The most important feature for improving build speeds is Docker’s build cache.

How does the build cache work?

Understanding Docker’s build cache helps you write better Dockerfiles that result in faster builds.

Have a look at the following example, which shows a simple Dockerfile for a program written in C.

# syntax=docker/dockerfile:1
FROM ubuntu:latest

RUN apt-get update && apt-get install -y build-essentials
COPY main.c Makefile /src/
WORKDIR /src/
RUN make build

Each instruction in this Dockerfile translates (roughly) to a layer in your final image. You can think of image layers as a stack, with each layer adding more content on top of the layers that came before it:

Image layer diagram showing the above commands chained together one after the other

Whenever a layer changes, that layer will need to be re-built. For example, suppose you make a change to your program in the main.c file. After this change, the COPY command will have to run again in order for those changes to appear in the image. In other words, Docker will invalidate the cache for this layer.

Image layer diagram, but now with the link between COPY and WORKDIR marked as invalid

If a layer changes, all other layers that come after it are also affected. When the layer with the COPY command gets invalidated, all layers that follow will need to run again, too:

Image layer diagram, but now with all links after COPY marked as invalid

And that’s the Docker build cache in a nutshell. Once a layer changes, then all downstream layers need to be rebuilt as well. Even if they wouldn’t build anything differently, they still need to re-run.

Note

Suppose you have a RUN apt-get update && apt-get upgrade -y step in your Dockerfile to upgrade all the software packages in your Debian-based image to the latest version.

This doesn’t mean that the images you build are always up to date. Rebuilding the image on the same host one week later will still get you the same packages as before. The only way to force a rebuild is by making sure that a layer before it has changed, or by clearing the build cache using docker builder prune .

How can I use the cache efficiently?

Now that you understand how the cache works, you can begin to use the cache to your advantage. While the cache will automatically work on any docker build that you run, you can often refactor your Dockerfile to get even better performance. These optimizations can save precious seconds (or even minutes) off of your builds.

Order your layers

Putting the commands in your Dockerfile into a logical order is a great place to start. Because a change causes a rebuild for steps that follow, try to make expensive steps appear near the beginning of the Dockerfile. Steps that change often should appear near the end of the Dockerfile, to avoid triggering rebuilds of layers that haven’t changed.

Consider the following example. A Dockerfile snippet that runs a JavaScript build from the source files in the current directory:

# syntax=docker/dockerfile:1
FROM node
WORKDIR /app
COPY . .          # Copy over all files in the current directory
RUN npm install   # Install dependencies
RUN npm build     # Run build

This Dockerfile is rather inefficient. Updating any file causes a reinstall of all dependencies every time you build the Docker image &emdash; even if the dependencies didn’t change since last time!

Instead, the COPY command can be split in two. First, copy over the package management files (in this case, package.json and yarn.lock ). Then, install the dependencies. Finally, copy over the project source code, which is subject to frequent change.

# syntax=docker/dockerfile:1
FROM node
WORKDIR /app
COPY package.json yarn.lock .    # Copy package management files
RUN npm install                  # Install dependencies
COPY . .                         # Copy over project files
RUN npm build                    # Run build

By installing dependencies in earlier layers of the Dockerfile, there is no need to rebuild those layers when a project file has changed.

Keep layers small

One of the best things you can do to speed up image building is to just put less stuff into your build. Fewer parts means the cache stay smaller, but also that there should be fewer things that could be out-of-date and need rebuilding.

To get started, here are a few tips and tricks:

Don’t include unnecessary files

Be considerate of what files you add to the image.

Running a command like COPY . /src will COPY your entire build context into the image. If you’ve got logs, package manager artifacts, or even previous build results in your current directory, those will also be copied over. This could make your image larger than it needs to be, especially as those files are usually not useful.

Avoid adding unnecessary files to your builds by explicitly stating the files or directories you intend to copy over. For example, you might only want to add a Makefile and your src directory to the image filesystem. In that case, consider adding this to your Dockerfile:

COPY ./src ./Makefile /src

As opposed to this:

COPY . /src

You can also create a .dockerignore file, and use that to specify which files and directories to exclude from the build context.

Use your package manager wisely

Most Docker image builds involve using a package manager to help install software into the image. Debian has apt , Alpine has apk , Python has pip , NodeJS has npm , and so on.

When installing packages, be considerate. Make sure to only install the packages that you need. If you’re not going to use them, don’t install them. Remember that this might be a different list for your local development environment and your production environment. You can use multi-stage builds to split these up efficiently.

Use the dedicated RUN cache

The RUN command supports a specialized cache, which you can use when you need a more fine-grained cache between runs. For example, when installing packages, you don’t always need to fetch all of your packages from the internet each time. You only need the ones that have changed.

To solve this problem, you can use RUN --mount type=cache . For example, for your Debian-based image you might use the following:

RUN \
    --mount=type=cache,target=/var/cache/apt \
    apt-get update && apt-get install -y git

Using the explicit cache with the --mount flag keeps the contents of the target directory preserved between builds. When this layer needs to be rebuilt, then it’ll use the apt cache in /var/cache/apt .

Minimize the number of layers

Keeping your layers small is a good first step, and the logical next step is to reduce the number of layers that you have. Fewer layers mean that you have less to rebuild, when something in your Dockerfile changes, so your build will complete faster.

The following sections outline some tips you can use to keep the number of layers to a minimum.

Use an appropriate base image

Docker provides over 170 pre-built official images for almost every common development scenario. For example, if you’re building a Java web server, use a dedicated image such as openjdk . Even when there’s not an official image for what you might want, Docker provides images from verified publishers and open source partners that can help you on your way. The Docker community often produces third-party images to use as well.

Using official images saves you time and ensures you stay up to date and secure by default.

Use multi-stage builds

Multi-stage builds let you split up your Dockerfile into multiple distinct stages. Each stage completes a step in the build process, and you can bridge the different stages to create your final image at the end. The Docker builder will work out dependencies between the stages and run them using the most efficient strategy. This even allows you to run multiple builds concurrently.

Multi-stage builds use two or more FROM commands. The following example illustrates building a simple web server that serves HTML from your docs directory in Git:

# syntax=docker/dockerfile:1

# stage 1
FROM alpine as git
RUN apk add git

# stage 2
FROM git as fetch
WORKDIR /repo
RUN git clone https://github.com/your/repository.git .

# stage 3
FROM nginx as site
COPY --from=fetch /repo/docs/ /usr/share/nginx/html

This build has 3 stages: git , fetch and site . In this example, git is the base for the fetch stage. It uses the COPY --from flag to copy the data from the docs/ directory into the Nginx server directory.

Each stage has only a few instructions, and when possible, Docker will run these stages in parallel. Only the instructions in the site stage will end up as layers in the final image. The entire git history doesn’t get embedded into the final result, which helps keep the image small and secure.

Combine commands together wherever possible.

Most Dockerfile commands, and RUN commands in particular, can often be joined together. For example, instead of using RUN like this:

RUN echo "the first command"
RUN echo "the second command"

It’s possible to run both of these commands inside a single RUN , which means that they will share the same cache! This can is achievable using the && shell operator to run one command after another:

RUN echo "the first command" && echo "the second command"
# or to split to multiple lines
RUN echo "the first command" && \
    echo "the second command"

Another shell feature that allows you to simplify and concatenate commands in a neat way are heredocs . It enables you to create multi-line scripts with good readability:

RUN <<EOF
set -e
echo "the first command"
echo "the second command"
EOF

(Note the set -e command to exit immediately after any command fails, instead of continuing.)

Other resources

For more information on using cache to do efficient builds, see:

  • Garbage collection
  • Cache storage backends

Stay Ahead in Today’s Competitive Market!
Unlock your company’s full potential with a Virtual Delivery Center (VDC). Gain specialized expertise, drive seamless operations, and scale effortlessly for long-term success.

Book a Meeting to Avail the Services of Dockerovertime

Docker driver

Docker driver

The Buildx Docker driver is the default driver. It uses the BuildKit server components built directly into the Docker engine. The Docker driver requires no configuration.

Unlike the other drivers, builders using the Docker driver can’t be manually created. They’re only created automatically from the Docker context.

Images built with the Docker driver are automatically loaded to the local image store.

Synopsis

# The Docker driver is used by buildx by default
docker buildx build .

It’s not possible to configure which BuildKit version to use, or to pass any additional BuildKit parameters to a builder using the Docker driver. The BuildKit version and parameters are preset by the Docker engine internally.

If you need additional configuration and flexibility, consider using the Docker container driver.

Further reading

For more information on the Docker driver, see the buildx reference.

Read article

Local and tar exporters

Local and tar exporters

The local and tar exporters output the root filesystem of the build result into a local directory. They’re useful for producing artifacts that aren’t container images.

  • local exports files and directories.
  • tar exports the same, but bundles the export into a tarball.

Synopsis

Build a container image using the local exporter:

$ docker buildx build --output type=local[,parameters] .
$ docker buildx build --output type=tar[,parameters] .

The following table describes the available parameters:

Parameter Type Default Description
dest String  Path to copy files to

Further reading

For more information on the local or tar exporters, see the BuildKit README.

Read article

Continuous integration with Docker

Continuous integration with Docker

Continuous Integration (CI) is the part of the development process where you’re looking to get your code changes merged with the main branch of the project. At this point, development teams run tests and builds to vet that the code changes don’t cause any unwanted or unexpected behaviors.

Git branches about to get merged

There are several uses for Docker at this stage of development, even if you don’t end up packaging your application as a container image.

Docker as a build environment

Containers are reproducible, isolated environments that yield predictable results. Building and testing your application in a Docker container makes it easier to prevent unexpected behaviors from occurring. Using a Dockerfile, you define the exact requirements for the build environment, including programming runtimes, operating system, binaries, and more.

Using Docker to manage your build environment also eases maintenance. For example, updating to a new version of a programming runtime can be as simple as changing a tag or digest in a Dockerfile. No need to SSH into a pet VM to manually reinstall a newer version and update the related configuration files.

Additionally, just as you expect third-party open source packages to be secure, the same should go for your build environment. You can scan and index a builder image, just like you would for any other containerized application.

The following links provide instructions for how you can get started using Docker for building your applications in CI:

  • GitHub Actions
  • GitLab
  • Circle CI
  • Render

Docker in Docker

You can also use a Dockerized build environment to build container images using Docker. That is, your build environment runs inside a container which itself is equipped to run Docker builds. This method is referred to as “Docker in Docker”.

Docker provides an official Docker image that you can use for this purpose.

What’s next

Docker maintains a set of official GitHub Actions that you can use to build, annotate, and push container images on the GitHub Actions platform. See Introduction to GitHub Actions to learn more and get started.

Read article

Docker container driver

Docker container driver

The buildx Docker container driver allows creation of a managed and customizable BuildKit environment in a dedicated Docker container.

Using the Docker container driver has a couple of advantages over the default Docker driver. For example:

  • Specify custom BuildKit versions to use.
  • Build multi-arch images, see QEMU
  • Advanced options for cache import and export

Synopsis

Run the following command to create a new builder, named container , that uses the Docker container driver:

$ docker buildx create \
  --name container \
  --driver=docker-container \
  --driver-opt=[key=value,...]
container

The following table describes the available driver-specific options that you can pass to --driver-opt :

Parameter Type Default Description
image String  Sets the image to use for running BuildKit.
network String  Sets the network mode for running the BuildKit container.
cgroup-parent String /docker/buildx Sets the cgroup parent of the BuildKit container if Docker is using the cgroupfs driver.
env.<key> String  Sets the environment variable key to the specified value in the BuildKit container.

Usage

When you run a build, Buildx pulls the specified image (by default, moby/buildkit ){:target=”blank” rel=”noopener” class=””}. When the container has started, Buildx submits the build submitted to the containerized build server.

$ docker buildx build -t <image> --builder=container .
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
#1 [internal] booting buildkit
#1 pulling image moby/buildkit:buildx-stable-1
#1 pulling image moby/buildkit:buildx-stable-1 1.9s done
#1 creating container buildx_buildkit_container0
#1 creating container buildx_buildkit_container0 0.5s done
#1 DONE 2.4s
...

Loading to local image store

Unlike when using the default docker driver, images built with the docker-container driver must be explicitly loaded into the local image store. Use the --load flag:

$ docker buildx build --load -t <image> --builder=container .
...
 => exporting to oci image format                                                                                                      7.7s
 => => exporting layers                                                                                                                4.9s
 => => exporting manifest sha256:4e4ca161fa338be2c303445411900ebbc5fc086153a0b846ac12996960b479d3                                      0.0s
 => => exporting config sha256:adf3eec768a14b6e183a1010cb96d91155a82fd722a1091440c88f3747f1f53f                                        0.0s
 => => sending tarball                                                                                                                 2.8s
 => importing to docker

The image becomes available in the image store when the build finishes:

$ docker image ls
REPOSITORY                       TAG               IMAGE ID       CREATED             SIZE
<image>                          latest            adf3eec768a1   2 minutes ago       197MB

Cache persistence

The docker-container driver supports cache persistence, as it stores all the BuildKit state and related cache into a dedicated Docker volume.

To persist the docker-container driver’s cache, even after recreating the driver using docker buildx rm and docker buildx create , you can destroy the builder using the --keep-state flag:

For example, to create a builder named container and then remove it while persisting state:

# setup a builder
$ docker buildx create --name=container --driver=docker-container --use --bootstrap
container
$ docker buildx ls
NAME/NODE       DRIVER/ENDPOINT              STATUS   BUILDKIT PLATFORMS
container *     docker-container
  container0    desktop-linux                running  v0.10.5  linux/amd64
$ docker volume ls
DRIVER    VOLUME NAME
local     buildx_buildkit_container0_state

# remove the builder while persisting state
$ docker buildx rm --keep-state container
$ docker volume ls
DRIVER    VOLUME NAME
local     buildx_buildkit_container0_state

# the newly created driver with the same name will have all the state of the previous one!
$ docker buildx create --name=container --driver=docker-container --use --bootstrap
container

QEMU

The docker-container driver supports using QEMU (user mode) to build non-native platforms. Use the --platform flag to specify which architectures that you want to build for.

For example, to build a Linux image for amd64 and arm64 :

$ docker buildx build \
  --builder=container \
  --platform=linux/amd64,linux/arm64 \
  -t <registry>/<image> \
  --push .

Warning

QEMU performs full-system emulation of non-native platforms, which is much slower than native builds. Compute-heavy tasks like compilation and compression/decompression will likely take a large performance hit.

Custom network

You can customize the network that the builder container uses. This is useful if you need to use a specific network for your builds.

For example, let’s create a network named foonet :

$ docker network create foonet

Now create a docker-container builder that will use this network:

$ docker buildx create --use \
  --name mybuilder \
  --driver docker-container \
  --driver-opt "network=foonet"

Boot and inspect mybuilder :

$ docker buildx inspect --bootstrap

Inspect the builder container and see what network is being used:

$ docker inspect buildx_buildkit_mybuilder0 --format={{.NetworkSettings.Networks}}
map[foonet:0xc00018c0c0]

Further reading

For more information on the Docker container driver, see the buildx reference.

Read article

Configuring your builder

Configuring your builder

This page contains instructions on configuring your BuildKit instances when using our Setup Buildx Action.

BuildKit container logs

To display BuildKit container logs when using the docker-container driver, you must either enable step debug logging, or set the --debug buildkitd flag in the Docker Setup Buildx action:

name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          buildkitd-flags: --debug
      -
        name: Build
        uses: docker/build-push-action@v3
        with:
          context: .

Logs will be available at the end of a job:

BuildKit container logs

Daemon configuration

You can provide a BuildKit configuration to your builder if you’re using the docker-container driver (default) with the config or config-inline inputs:

Registry mirror

You can configure a registry mirror using an inline block directly in your workflow with the config-inline input:

name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          config-inline: |
            [registry."docker.io"]
              mirrors = ["mirror.gcr.io"]

For more information about using a registry mirror, see Registry mirror.

Max parallelism

You can limit the parallelism of the BuildKit solver which is particularly useful for low-powered machines.

You can use the config-inline input like the previous example, or you can use a dedicated BuildKit config file from your repository if you want with the config input:

# .github/buildkitd.toml
[worker.oci]
  max-parallelism = 4
name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          config: .github/buildkitd.toml

Append additional nodes to the builder

Buildx supports running builds on multiple machines. This is useful for building multi-platform images on native nodes for more complicated cases that aren’t handled by QEMU. Building on native nodes generally has better performance, and allows you to distribute the build across multiple machines.

You can append nodes to the builder you’re creating using the append option. It takes input in the form of a YAML string document to remove limitations intrinsically linked to GitHub Actions: you can only use strings in the input fields:

Name Type Description
name String Name of the node. If empty, it’s the name of the builder it belongs to, with an index number suffix. This is useful to set it if you want to modify/remove a node in an underlying step of you workflow.
endpoint String Docker context or endpoint of the node to add to the builder
driver-opts List List of additional driver-specific options
buildkitd-flags String Flags for buildkitd daemon
platforms String Fixed platforms for the node. If not empty, values take priority over the detected ones.

Here is an example using remote nodes with the remote driver and TLS authentication:

name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver: remote
          endpoint: tcp://oneprovider:1234
          append: |
            - endpoint: tcp://graviton2:1234
              platforms: linux/arm64
            - endpoint: tcp://linuxone:1234
              platforms: linux/s390x
        env:
          BUILDER_NODE_0_AUTH_TLS_CACERT: ${{ secrets.ONEPROVIDER_CA }}
          BUILDER_NODE_0_AUTH_TLS_CERT: ${{ secrets.ONEPROVIDER_CERT }}
          BUILDER_NODE_0_AUTH_TLS_KEY: ${{ secrets.ONEPROVIDER_KEY }}
          BUILDER_NODE_1_AUTH_TLS_CACERT: ${{ secrets.GRAVITON2_CA }}
          BUILDER_NODE_1_AUTH_TLS_CERT: ${{ secrets.GRAVITON2_CERT }}
          BUILDER_NODE_1_AUTH_TLS_KEY: ${{ secrets.GRAVITON2_KEY }}
          BUILDER_NODE_2_AUTH_TLS_CACERT: ${{ secrets.LINUXONE_CA }}
          BUILDER_NODE_2_AUTH_TLS_CERT: ${{ secrets.LINUXONE_CERT }}
          BUILDER_NODE_2_AUTH_TLS_KEY: ${{ secrets.LINUXONE_KEY }}

Authentication for remote builders

The following examples show how to handle authentication for remote builders, using SSH or TLS.

SSH authentication

To be able to connect to an SSH endpoint using the docker-container driver, you have to set up the SSH private key and configuration on the GitHub Runner:

name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Set up SSH
        uses: MrSquaare/ssh-setup-action@523473d91581ccbf89565e12b40faba93f2708bd # v1.1.0
        with:
          host: graviton2
          private-key: ${{ secrets.SSH_PRIVATE_KEY }}
          private-key-name: aws_graviton2
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          endpoint: ssh://me@graviton2

TLS authentication

You can also set up a remote BuildKit instance using the remote driver. To ease the integration in your workflow, you can use an environment variables that sets up authentication using the BuildKit client certificates for the tcp:// :

  • BUILDER_NODE_<idx>_AUTH_TLS_CACERT
  • BUILDER_NODE_<idx>_AUTH_TLS_CERT
  • BUILDER_NODE_<idx>_AUTH_TLS_KEY

The <idx> placeholder is the position of the node in the list of nodes.

name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver: remote
          endpoint: tcp://graviton2:1234
        env:
          BUILDER_NODE_0_AUTH_TLS_CACERT: ${{ secrets.GRAVITON2_CA }}
          BUILDER_NODE_0_AUTH_TLS_CERT: ${{ secrets.GRAVITON2_CERT }}
          BUILDER_NODE_0_AUTH_TLS_KEY: ${{ secrets.GRAVITON2_KEY }}

Standalone mode

If you don’t have the Docker CLI installed on the GitHub Runner, the Buildx binary gets invoked directly, instead of calling it as a Docker CLI plugin. This can be useful if you want to use the kubernetes driver in your self-hosted runner:

name: ci

on:
  push:

jobs:
  buildx:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
        with:
          driver: kubernetes
      -
        name: Build
        run: |
          buildx build .

Isolated builders

The following example shows how you can select different builders for different jobs.

An example scenario where this might be useful is when you are using a monorepo, and you want to pinpoint different packages to specific builders. For example, some packages may be particularly resource-intensive to build and require more compute. Or they require a builder equipped with a particular capability or hardware.

For more information about remote builder, see remote driver and the append builder nodes example.

name: ci

on:
  push:
    branches:
      - "main"

jobs:
  docker:
    runs-on: ubuntu-latest
    steps:
      -
        name: Checkout
        uses: actions/checkout@v3
      -
        uses: docker/setup-buildx-action@v2
        id: builder1
      -
        uses: docker/setup-buildx-action@v2
        id: builder2
      -
        name: Builder 1 name
        run: echo ${{ steps.builder1.outputs.name }}
      -
        name: Builder 2 name
        run: echo ${{ steps.builder2.outputs.name }}
      -
        name: Build against builder1
        uses: docker/build-push-action@v3
        with:
          builder: ${{ steps.builder1.outputs.name }}
          context: .
          target: mytarget1
      -
        name: Build against builder2
        uses: docker/build-push-action@v3
        with:
          builder: ${{ steps.builder2.outputs.name }}
          context: .
          target: mytarget2
Read article