Six ways to reduce the size of Docker images

Six ways to reduce the size of Docker images

Since I started working on Vulhub in 2017, I have been struggling with a troublesome problem: when writing a Dockerfile, how can I reduce the size of the image generated by docker build ? This article will summarize six methods I have used to reduce the size of the image.

1. Using Alpine Linux

Alpine Linux is a Linux distribution based on BusyBox and Musl Libc. Its biggest advantage is its small size. A pure base Alpine Docker image is only 2.67MB compressed.

Many official Docker images have Alpine versions, such as PHP:

By comparison, it can be found that the size of the alpine version image is about 1/5 of the ordinary version.

However, in Docker Hub, most images do not have Alpine versions, such as Mysql and PHP-Apache. If we need to develop based on these environments, we have to write Alpine versions ourselves or find some third-party images.

In addition, another disadvantage of Alpine is that it uses Musl Libc as a replacement for the traditional glibc. When compiling the software, you may encounter some unpredictable problems, which will cause us to spend a lot of unnecessary time.

2. Install only minimal dependencies

Package managers such as apt-get, yum, and apk are tools that we must use when compiling images. Pure Docker base images usually lack tools such as wget, curl, git, and gcc, which require us to install manually.

Let’s take apt as an example. When installing software, apt-get can specify an option: --no-install-recommends . After specifying this parameter, some non-essential dependencies will not be installed together. For example, when we install wget, if we add this option, the number of packages to be installed will be reduced from 6 to 3:

This reduces the size of the image to a certain extent, but the side effect of doing so is that it may cause the target software to lack some functions.

For example, at this point wget will not be able to verify the authenticity of the server certificate, resulting in a command error:

Therefore, our general practice is to try to add --no-install-recommends when using apt, and then correct some errors in time if they occur later. Known problems like wget can be predicted and handled in advance:

apt-get install --no-install-recommends wget ca-certificates

3. Clean up the mess for apt

Some tools are only used in the compilation phase. I don’t want them to take up my precious image capacity. I can delete these intermediate dependencies after the image compilation is completed.

Let's take apt as an example. After using it, we need to do the following:

  • Remove unnecessary dependencies: apt-get pruge --autoremove ...
  • Delete the local package list: rm -rf /var/lib/apt/lists/*

In this process, we will encounter a very difficult problem: which dependencies are "unnecessary"?

For example, when compiling PHP, we may use three tools: wget, libxml, and gcc. These three tools need to be installed before compiling PHP. But after the compilation is finished, we can uninstall wget and gcc, but we cannot uninstall libxml.

The reason is that libxml is a dynamic link library that PHP depends on. If we uninstall it, an error will occur that the shared link library cannot be found:

root@8eab53da8d5b:/# php -v
php: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

So, is there a more convenient way for me to automatically find only those dependencies that are not "shared link libraries" and delete them?

Of course there is. A simpler way is to traverse the newly compiled executable file, use the ldd command to list the shared link library file names it depends on, and search for the package name corresponding to this file name in the source:

These packages are all the dynamic link libraries that PHP depends on. Then we use apt-mark to declare these packages as "manually installed packages" to prevent apt purge from automatically uninstalling them.

Then, we can automatically uninstall the remaining unused packages. The complete shell script is as follows:

find /usr/local -type f -executable -exec ldd '{}' ';' \
 | awk '/=>/ { print $(NF-1) }' \
 | sort -u \
 | xargs -r dpkg-query --search \
 | cut -d: -f1 \
 | sort -u \
 | xargs -r apt-mark manual \
; \
apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false;

4. Try to install and uninstall intermediate dependencies in one step

A docker image is a layered cake made up of layers. We can use docker history <image name> command to see which layers any image is made up of and the size of each layer:

For Dockerfile, the data of these layers will be saved in the image, even if the latter layer deletes the files saved in the previous layer.

For example, we have the following Dockerfile:

FROM alpine:3.12
RUN truncate -s 50M /sample.dat
RUN rm -rf /sample.dat

We can try to see how big this compiled image is, 58MB:

In comparison, the normal alpine:3.12 is only 5.57MB, which means that even if we have deleted the /sample.dat file, there is no such content in the final image, but it will always remain in the image history.

Therefore, when deleting the "intermediate dependencies" mentioned above, we need to write the three parts of installation, use, and uninstallation in one step to ensure that the space is released. for example:

FROM debian:buster

RUN apt-get update \
 && apt-get install gcc \
 && gcc ... \
 && apt-get purge --autoremove gcc \
 && rm -rf /var/lib/apt/lists/*

5. Multi-stage compilation

After Docker version 17.05, the concept of multi-stage builds was introduced, which will greatly simplify all of our above operations.

In simple terms, multi-stage builds allow us to divide the compilation of Docker images into multiple "stages". For example, in the case of common software compilation, we can separate the compilation stage and directly copy the binary file to a new base image after the software compilation is completed. The biggest advantage of this is that the second image no longer contains any intermediate dependencies used in the compilation stage, which is clean and clear.

Taking the most common Java project as an example, when compiling the Jar package, we need to use tools such as JDK and Maven, but in the actual operation stage, we only need the JRE environment. Let's compare the sizes of the two images: maven:3-openjdk-8 and openjdk:8-jre :

The difference is more than double.

Taking the Shiro 1.2.4 environment in Vulhub as an example, two FROM commands can be seen in its Dockerfile:

FROM maven:3-jdk-8 AS builder

LABEL MAINTAINER="phithon <[email protected]>"

COPY ./code/ /usr/src/

WORKDIR /usr/src

RUN cd /usr/src; \
 mvn -U clean package -Dmaven.test.skip=true

FROM openjdk:8u102-jre

LABEL MAINTAINER="phithon <[email protected]>"

COPY --from=builder /usr/src/target/shirodemo-1.0-SNAPSHOT.jar /shirodemo-1.0-SNAPSHOT.jar

EXPOSE 8080

CMD ["java", "-jar", "/shirodemo-1.0-SNAPSHOT.jar"]

The first FROM is used to enter maven:3-jdk-8 environment and use Maven to compile the source code; the second FROM enters the smaller openjdk:8u102-jre environment and uses COPY --from= syntax to copy the jar file from the compilation result of the previous stage to the jre environment.

Finally, two images will be left on the machine, one is the builder, and the other is the shiro 1.2.4 environment we need in the end. The latter can be used independently by any other user, while the former can be deleted directly.

For users, we no longer need to worry about how to delete the intermediate dependencies when compiling the software to make the image smaller. Anyway, any dependencies used in the first stage will not be left in the formal production environment.

However, multi-stage compilation still has the above-mentioned problem of relying on dynamic link libraries. If we only copy the executable file when copying the compilation results, the error of not finding the shared link library will still occur when running in the new environment. Therefore, I personally feel that multi-stage compilation is only suitable for languages ​​that can be cross-platform or statically compiled, such as Java and golang, and is still not friendly to projects with more dependencies such as C and Python.

6. Use the slim version of the image

Careful students may have noticed that the official Docker Debian image has a slim version, which is more than twice the size of the default version:

The Chinese meaning of slim is "slim". As the name suggests, debian:stretch-slim is indeed much slimmer because it deletes many files such as man documents that will not be used in the container.

Some upper-level images are written based on the slim version of Debian, such as Python. If we develop a Python project, we can use python:slim base image.

To sum up, the six methods will not affect each other and we can use them at the same time. But the fifth one, multi-stage compilation will be the mainstream method in the future.

This concludes this article on six ways to reduce the size of Docker images. For more information on reducing the size of Docker images, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Implementation of Docker configuration modification of Alibaba Cloud image repository
  • Detailed explanation of 3 ways to build Docker images with SpringBoot
  • Detailed steps for springboot docker jenkins to automatically deploy and upload images
  • How to view files in Docker image
  • How to operate Docker and images

<<:  Mysql NULL caused the pit

>>:  JavaScript to achieve stair rolling special effects (jQuery implementation)

Recommend

Summary of the use of Datetime and Timestamp in MySQL

Table of contents 1. How to represent the current...

React new version life cycle hook function and usage detailed explanation

Compared with the old life cycle Three hooks are ...

Detailed explanation of CSS BEM writing standards

BEM is a component-based approach to web developm...

VMware configuration hadoop to achieve pseudo-distributed graphic tutorial

1. Experimental Environment serial number project...

How to configure the Runner container in Docker

1. Create a runner container mk@mk-pc:~/Desktop$ ...

Top 10 useful and important open source tools in 2019

In Black Duck's 2017 open source survey, 77% ...

Vue+Echart bar chart realizes epidemic data statistics

Table of contents 1. First install echarts in the...

Three ways to forward linux ssh port

ssh is one of the two command line tools I use mo...

mysql code to implement sequence function

MySQL implements sequence function 1. Create a se...

How to install ionCube extension using pagoda

1. First install the pagoda Installation requirem...

How to delete folders, files, and decompress commands on Linux servers

1. Delete folders Example: rm -rf /usr/java The /...

MySQL starts slow SQL and analyzes the causes

Step 1. Enable MySQL slow query Method 1: Modify ...

How to quickly build a LAMP environment on CentOS platform

This article uses an example to describe how to q...