How to reduce image size using Docker multi-stage build

This article describes how to use Docker's multi-stage build function to significantly reduce the image size. It is suitable for images that need to build programs (such as javac) in Dockerfile and require the installation of a compilation tool chain. (such as Java)

Let’s learn the words first (this article uses Chinese vocabulary. If you need to query foreign documents, you can refer to this vocabulary list. In theory, I personally do not agree with translating terms):

multi-stage
build
image
stage

Let’s take a look at the effect: originally 110M+, now 92M.

Compare Dockerfile

Dockerfile before optimization:

FROM openjdk:8u171-jdk-alpine3.8

ADD ./app
WORKDIR /app

RUN apk add maven \
  && mvn clean package \
  && apk del maven \
  && mv target/final.jar / \
  && cd / \
  && rm -rf /app \
  && rm -rf /root/.m2

ENTRYPOINT java -jar /final.jar

Optimized Dockerfile:

FROM openjdk:8u171-jdk-alpine3.8 as builder

ADD ./app
WORKDIR /app

RUN apk add maven \
  && mvn clean package \
  && apk del maven \
  && mv target/final.jar /

FROM openjdk:8u181-jre-alpine3.8 as environment
WORKDIR /
COPY --from=builder /final.jar .
ENTRYPOINT java -jar /final.jar

Obviously, the optimized Dockerfile adds the FROM AS command and two FROMs appear. This is a multi-stage build.

Learn about multi-stage builds

Multi-stage builds are a new feature of Docker 17.05, which can use multiple FROM statements in a Dockerfile to create multiple Stages. Each stage is independent (source request), and files from other stages can be obtained through COPY --from. Let’s make an analogy and compare the final image to a dish (fried green peppers). After frying the raw green peppers, serve.

# Compare list mirror -> a dish first stage -> stir-fry second stage -> serve

The goal of the two stages is to make (generate) the final dish (image). What we need to do is to serve the food that is “fried” in the first stage. Our goal is to make dishes with the lightest plates (serving and intermediate products) possible.

The visualization process is as follows:

# Cooking process... Omit the ingredients-> [First stage - stir-fry] # At this time, there are stir-frying tools, stir-frying results and intermediate products on the plate # At this time, start the second stage, only keep the stir-frying results, and no longer need others.
-> The result of stir-frying-> [Start serving, only keep the result] # Take the stir-fried green pepper (COPY --from), and don't take the others-> The final dish is a dish.

Now you should have a general understanding of the multi-stage build process. Let's give the microphone to Java and see how to use the compilation tool in Dockerfile to build a JAR, and only keep the built JAR and runtime and give it to Image, and throw away the rest:

# Phase 1 - Compile (fry)
FROM openjdk:8u171-jdk-alpine3.8 as builder # Built-in compilation tool ADD ./app
WORKDIR /app

RUN ... Skip compilation and cleanup...

# Now, the JAR is out. JDK is no longer needed, so it cannot be left in the image.
# So we start the second stage - running (on the desktop), and throw away all the files in the first stage (including the compilation tools)
FROM openjdk:8u181-jre-alpine3.8 as environment # Runtime only# At present, the compilation tools and other things from the previous stage have been abandoned by us. In the current image, only when running, we need to take the result of the previous stage (frying), and don’t need the others.
COPY --from=0 /final.jar .

# Ok, now the image only has the necessary runtime and JARs.
ENTRYPOINT java -jar /final.jar

The above is an introduction to multi-stage construction.

Using multi-stage builds

The core command of multi-stage build is FROM. FORM doesn't need much explanation for you who have been through many battles. In a multi-stage build, each FROM will start a new Stage, which can be seen as a new Image (not accurate enough, source request), isolated from other stages (even including environment variables). Only the final FROM will be included in the Image.

Let's make a simple multi-stage build example:

# Stage 1
FROM alpine:3.8
WORKDIR /demo
RUN echo "Hello, stage 1" > /demo/hi-1.txt

# Stage 2
FROM alpine:3.8
WORKDIR /demo
RUN echo "Hello, stage 2" > /demo/hi-2.txt

You can build this Dockerfile yourself, and then docker save <tag> > docker.tar to see the contents. If nothing goes wrong, there should only be /demo/hi-2.txt and Alpine.

In this Dockerfile, we created two stages. The first stage creates hi-1.txt, the second stage creates hi-2.txt, and the second stage will be added to the final image, while the others will not.

Copying files - a bridge between stages

If the stages are completely isolated from each other, then there is no point in having multiple stages - the results of the previous stage will be completely discarded and enter the next completely new stage.

We can use the COPY command to obtain files from other stages. Using COPY in multiple stages is exactly the same as normal application, just add --form `. Then, we modify the previous example so that the final image contains the products of two stages:

# Stage 1
FROM alpine:3.8
WORKDIR /demo
RUN echo "Hello, stage 1" > /demo/hi-1.txt

# Stage 2
FROM alpine:3.8
WORKDIR /demo
COPY --from=0 /demo/hi-1.txt /demo
RUN echo "Hello, stage 2" > /demo/hi-2.txt

Rebuild and save (Save), you will find an extra layer containing hi-1.txt.

Stage naming - quick identification

For those of us with only seven seconds of memory, using the stage index every time is not a very good thing. At this time, you can give them names by naming the stages for easy identification.

Adding a name to a stage is simple, just add as <name> after FROM.

Now, we update the Dockerfile to give the stage a name and use the name to COPY.

# Stage 1, it's name is "build1"
FROM alpine:3.8 as build1
WORKDIR /demo
RUN echo "Hello, stage 1" > /demo/hi-1.txt

# Stage 2, it's name is "build2"
FROM alpine:3.8 as build2
WORKDIR /demo
# No longer use indexes
COPY --from=build1 /demo/hi-1.txt /demo
RUN echo "Hello, stage 2" > /demo/hi-2.txt

Rebuild and save, the result should be the same as last time.

Build only some stages - easy debugging

Docker also provides us with a very convenient way to debug - only build part of the stage. It can stop the build at a certain stage and not build the subsequent stages. This makes debugging easier; distinguishing between production, development, and testing.

Still use the last Dockerfile, but use the --target <stage> parameter to build:

$ docker build --target build1 .

Save again, and you will find only the content of build1.

Summarize

That's all there is to multi-stage builds. Let’s go back to the two Dockerfiles at the beginning and compare them. Can you find where the image before optimization is fat?

Obviously, it includes the useless JDK, which only works at compile time and is useless after compilation. Only JRE is needed. Therefore, using multi-stage builds can isolate the compilation stage and the running stage to achieve image optimization.

References

https://docs.docker.com/develop/develop-images/multistage-build/#name-your-build-stages

https://yeasy.gitbooks.io/docker_practice/image/multistage-builds.html

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in: