Detailed explanation of Cgroup, the core principle of Docker

Detailed explanation of Cgroup, the core principle of Docker

The powerful tool cgroup in the kernel can not only limit the resources isolated by NameSpace, but also set weights for resources, calculate usage, etc.

What is cgroup

cgroup full name is control groups

control groups: Control groups are integrated into the Linux kernel. They put processes (tasks) into groups, set permissions for groups, and control processes. It can be understood as the concept of users and groups. A user will inherit the permissions of the group it belongs to.

cgroups is a mechanism in the Linux kernel. This mechanism can integrate or separate a series of tasks and subtasks according to specific behaviors, and implement a framework for unified resource control according to the different levels of resource division. cgroup can control, limit, and isolate the physical resources required by the process, including CPU, memory, and IO, providing the most basic guarantee for container virtualization. It is a management tool for building a series of docker virtualization

Characteristics of developing cgroup

API: Implementation Management

Cgroup management can manage threads

All thread functions are managed in a unified way by subsystem

The child process and the parent process are in the same cgroup, so you only need to control the parent process.

The role of cgroup

The cgroup kernel manages process resources through hooks, providing a unified interface for the transition from resource control of a single process to virtual cards at the operating system level.

Cgroup provides four functions:

  1. Resource control: cgroup limits the total amount of resources through process groups. For example, when a program uses memory, you need to set a limit on how much memory the program can use on the host.
  2. Priority Assignment: Use the hardware weight value. When two programs need the process to read the CPU, which one comes first and which one comes later, the priority is used to control
  3. Resource statistics: You can count the usage of hardware resources, such as CPU, memory, etc. and how long they have been used.
  4. Process control: You can suspend/resume the process group.

Glossary

  • Task: represents a process in the system—PID
  • cgroup: Resource control is implemented in units of control groups (cgroups). Each cgroup contains tasks. There can be multiple cgroups, which can restrict different contents. The group names cannot be the same.
  • subsystem: subsystem. Resource scheduling controller. Specific control content. For example, the CPU subsystem controls the CPU time allocation, the memory subsystem can control the memory usage within a cgroup, the hard disk subsystem can control the reading and writing of the hard disk, and so on.
  • Hierarchy: A hierarchical tree is composed of a bunch of cgroups. A hierarchy tree contains multiple cgroups. Each hierarchy schedules resources through bound subsystems. It can contain 0 or more child nodes. Child nodes inherit the properties of their parent nodes. The entire system can have multiple hierarchies. It is a logical concept.

Relationship: A cgroup can have multiple tasks, subsystem is equivalent to the type of controlling cgroup restrictions, there can be multiple cgroups in a hierarchy, and a system can have multiple hierarchies.

Four rules of hierarchy tree

The traditional process startup uses init as the root node, also called the parent process, which creates child processes as child nodes, and each child node can also create new child nodes, thus forming a tree structure. The structure of cgroup is similar to it. Child nodes inherit the properties of their parent nodes. The biggest difference between them is that the hierarchical tree composed of the system's cgroup allows multiple existences. If the process model is a tree formed with init as the root node, then the cgroup model is composed of multiple hierarchical trees.

If there is only one hierarchical tree, all tasks will be subject to the same restrictions of a subsystem, which will cause trouble for tasks that do not need such restrictions.

1. One or more subsystems can be attached to the same hierarchy

You can see that in a hierarchical tree, there is a cgroup group called cpu_mem_cg and two child nodes cg1 and cg2. As shown in the figure, this means that in the cpu_mem_cg group, two subsystems, cpu and mem memory, are attached to control the hardware resource usage of cpu and memory of cg1 and cg2 at the same time.

2. A subsystem can be attached to multiple hierarchies, but only to hierarchies that do not have any subsystems.

As shown in the figure, the cpu subsystem is first attached to the hierarchy tree A, and cannot be attached to the hierarchy tree B at the same time, because B already has a mem subsystem. If both B and A do not have any subsystems, then the cpu subsystem can be attached to both A and B hierarchical trees at the same time.

What this means is that if there are no subsystems in multiple hierarchical trees, a cpu subsystem can be attached to these hierarchical trees in turn.

3. A process (task) cannot belong to different cgroups in the same hierarchy

Each time the system creates a new hierarchy, the default cgroup that constitutes the initialization of the new hierarchy is called the root cgroup. For your own successful hierarchy, a task can only exist in one cgroup of this hierarchy, which means that there cannot be two identical tasks in a hierarchy, but it can exist in other cgroups in different hierarchies.

If you want to add a task in a hierarchical tree cgroup to another cgroup in the hierarchical tree, it will be removed from the cgroup where the previous task is located.

As shown in the example above:

httpd has been added to cg1 in hierarchy A, and its pid is 58950. At this time, this httpd process cannot be put into cg2, otherwise the httpd process in cg1 will be deleted. However, it can be put into cg3 control group in hierarchy B.

In fact, it is to prevent process conflicts. For example, there is an httpd process in cg1 in the hierarchical tree A. At this time, the CPU usage limit for cg1 is 30%, and the CPU usage limit for cg2 is 50%. If the httpd process is added to cg2, there will be a conflict in the CPU usage limit of httpd.

4. The newly forked child process is in the same cgroup as the parent process in the initial state

A new child process (child_task) opened by the process task is in the same cgroup as the original task by default, but the child_task can be removed to other different cgroups in the hierarchy tree.

When fork is just completed, the parent process and the child process are completely independent

As shown in the figure, when someone visits the httpd58950 process, it will fork out another child process httpd58951. At this time, by default, httpd58951 and httpd58950 are both in cg1, and their relationship is also parent-child process. httpd58951 can be moved to cg2, and their relationship is changed at this time, and they both become independent processes.

Subsystem

What exactly can a subsystem control?

Verify by doing the following

[root@localhost ~]# yum -y install libcgroup-tools
After installing this tool, you can view it by using the cgroup command

List all cgroup control groups in the system

[root@localhost ~]# lscgroup
net_cls,net_prio:/
freezer:/
hugetlb:/
cpu,cpuacct:/
cpu,cpuacct:/machine.slice
cpu,cpuacct:/user.slice
cpu,cpuacct:/system.slice
cpu,cpuacct:/system.slice/network.service
cpu,cpuacct:/system.slice/docker.service
...

View the hardware that the subsystem can control

[root@localhost ~]# lssubsys -a
cpuset
cpu,cpuacct
memory
devices
freezer
net_cls,net_prio
blkio
perf_event
hugetlb
pids

As can be seen above, there is a corresponding directory, /sys/fs/cgroup

[root@localhost ~]# ll /sys/fs/cgroup/
total 0
drwxr-xr-x. 5 root root 0 Mar 25 04:50 blkio
lrwxrwxrwx. 1 root root 11 Mar 25 04:50 cpu -> cpu,cpuacct
lrwxrwxrwx. 1 root root 11 Mar 25 04:50 cpuacct -> cpu,cpuacct
drwxr-xr-x. 5 root root 0 Mar 25 04:50 cpu,cpuacct
drwxr-xr-x. 2 root root 0 Mar 25 04:50 cpuset
drwxr-xr-x. 5 root root 0 Mar 25 04:50 devices
drwxr-xr-x. 2 root root 0 Mar 25 04:50 freezer
drwxr-xr-x. 2 root root 0 Mar 25 04:50 hugetlb
drwxr-xr-x. 5 root root 0 Mar 25 04:50 memory
lrwxrwxrwx. 1 root root 16 Mar 25 04:50 net_cls -> net_cls,net_prio
drwxr-xr-x. 2 root root 0 Mar 25 04:50 net_cls,net_prio
lrwxrwxrwx. 1 root root 16 Mar 25 04:50 net_prio -> net_cls,net_prio
drwxr-xr-x. 2 root root 0 Mar 25 04:50 perf_event
drwxr-xr-x. 5 root root 0 Mar 25 04:50 pids
drwxr-xr-x. 5 root root 0 Mar 25 04:50 systemd

You can see that the contents of the directory are more than what the command shows, because there are several soft link files

# The following three all belong to cpu,cpuacct
cpu -> cpu,cpuacct
cpuacct -> cpu,cpuacct
cpu,cpuacct
# The following three all belong to net_cls, net_prio
net_cls -> net_cls,net_prio
net_prio -> net_cls,net_prio
net_cls,net_prio

What do the contents that Subsystem can control represent?

serial number Restricted Content Meaning
1 blkio (restrictions on input and output of block devices) Optical disk, solid state disk, USB….
2 cpu You can regulate the task's use of the CPU.
3 cpuacct Automatically generate a report on the CPU resource usage of each task.
4 cpuset (for multi-processor physical machines) Assign a separate CPU to the task.
5 device (device refers to keyboard, mouse...) Disable and enable the task's access to the device.
6 freezer Control the suspension and resumption of tasks. For example, not allowing a task to use the CPU is called suspension.
7 memory Control the memory usage limit of tasks and automatically generate reports on memory resource usage
8 perf_event A unified performance test can be performed on the task, such as detecting the CPU performance of Linux and the read and write efficiency of the hard disk.
9 net_cls Not used directly in Docker, it allows the Linux traffic controller to identify packets originating from a specific cgroup by marking network packets with a class identifier (classid).

Note: So far, there is no tool that can limit the size of the container hard disk. You can only limit the read and write frequency of the hard disk.

How cgroups work

Check the tasks file in the CPU control of the cgroup, which stores the control of the CPU of the processes in the file. If you want to add a process to control the CPU, just add the pid of the process to the tasks file, including other hardware resource controls.

[root@localhost ~]# cat /sys/fs/cgroup/cpu/tasks 
1
2
4
5
6
7
8
9
...
68469
68508
68526
68567

In a production environment, it is automatically incremented because it is in the kernel.

The real working principle of cgroup is hook. The implementation of cgroup is essentially to hook the system process. When the task process is running, when a certain resource is designed, the subsystem attached to the hook will be triggered to perform resource detection. Finally, the corresponding technology will be used to limit resources and allocate priorities according to different resource categories.

How is the hook implemented?

Simply put, the data structure for managing task processes in Linux sets a keyword for each task in the cgroup and points the keyword to the hook, which is called a pointer.

When a task corresponds to only one pointer structure, a pointer structure can be used by multiple tasks.

Once a pointer reads the content of the unique pointer data, the task will be triggered and resource control can be performed.

In actual use, users need to use mount to mount the cgroup control group

In the directory, you can see that, for example, the httpd program has a pid number of 69060

[root@localhost ~]# yum -y install httpd^C
[root@localhost ~]# systemctl start httpd^C
[root@localhost ~]# netstat -anput | grep 80
tcp6 0 0 :::80 :::* LISTEN 69060/httpd

Check the mounts file in its pid directory, which contains a large number of cgroup mounts.

You can see the directory behind each cgoup, such as /sys/fs/cgroup/cpu,cpuacct , which means that the httpd process is restricted by the CPU usage. There are many similar mount items in the file, including hardware resource control such as blkio/perf_event/memory .

[root@localhost ~]# cat /proc/69060/mounts
rootfs / rootfs rw 0 0
/dev/mapper/centos-root / xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
devtmpfs /dev devtmpfs rw,seclabel,nosuid,size=914476k,nr_inodes=228619,mode=755 0 0
tmpfs /dev/shm tmpfs rw,seclabel,nosuid,nodev 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
mqueue /dev/mqueue mqueue rw,seclabel,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,seclabel,relatime 0 0
...
cgroup /sys/fs/cgroup/systemd cgroup rw,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
...

This is how cgroup is controlled by mount. All programs are like this. After all systems on the subsystem mount the files, they can operate cgroup and hierarchy tree for management like the operating system, including permission management and sub-file systems. In addition to the cgroup file system, the kernel does not provide any other operations for cgroup access. If you want to operate cgroup, you must use mount to hang on to a cgroup control group.

Resource Control Operations

We need to know how to control each hardware resource.

like:

The specific meaning of each item in the cpu directory of the cgroup group is the details of the specific control of the cpu

[root@localhost ~]# cd /sys/fs/cgroup/cpu
[root@localhost cpu]# ls
cgroup.clone_children cpuacct.stat cpu.cfs_quota_us cpu.stat system.slice
cgroup.event_control cpuacct.usage cpu.rt_period_us machine.slice tasks
cgroup.procs cpuacct.usage_percpu cpu.rt_runtime_us notify_on_release user.slice
cgroup.sane_behavior cpu.cfs_period_us cpu.shares release_agent

These specific usage methods will be explained one by one in the next article

Docker command line restrictions

-c/--cpu-shares: Limit CPU priority -m/--memory: Limit memory usage --memory-swap: Limit the size of memory + swap --blkil-weight
bps/iops
--device-read-bps
--device-write-bps
--device-read-iops
--device-write-iops

Specific usage: cpu, memory, blkio

The cgroup directory structure is as follows

/sys/fs/cgroup stores the hardware resource control of all processes

The default non-docker process controls for specific hardware resources are stored in the /sys/fs/cgroup/{cpu,memory,blkio...}/ directory. The docker process ID will not be in these directories

/sys/fs/cgroup/cpu/docker/ directory stores the docker process control in the host.

The control of the container generated by docker is stored in /sys/fs/cgroup/cpu/docker/容器id/ directory

This is the end of this article about the detailed explanation of Cgroup, the core principle of Docker. For more information about the core principle of Docker, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed explanation of the application of Docker underlying technology Namespace Cgroup
  • Detailed explanation of docker cgroup resource monitoring
  • Detailed explanation of how to use cgroups to limit resource usage in Docker containers
  • This article will help you thoroughly understand the specific use of cgroup in Docker

<<:  Example of using UserMap in IMG

>>:  How to use CSS to display multiple images horizontally in the center

Recommend

Drawing fireworks effect of 2021 based on JS with source code download

This work uses the knowledge of front-end develop...

RHCE installs Apache and accesses IP with a browser

1. at is configured to write "This is a at t...

Use of Linux sed command

1. Function Introduction sed (Stream EDitor) is a...

MySQL string splitting example (string extraction without separator)

String extraction without delimiters Question Req...

RGB color table collection

RGB color table color English name RGB 16 colors ...

Usage of if judgment in HTML

In the process of Django web development, when wr...

Build a WebRTC video chat in 5 minutes

In the previous article, I introduced the detaile...

Summary of the application of decorative elements in web design

<br />Preface: Before reading this tutorial,...

Eight implementation solutions for cross-domain js front-end

Table of contents 1. jsonp cross-domain 2. docume...

Conflict resolution when marquee and flash coexist in a page

The main symptom of the conflict is that the FLASH...

Mini Program implements list countdown function

This article example shares the specific code for...

Detailed explanation of how to pass password to ssh/scp command in bash script

Install SSHPASS For most recent operating systems...

How to output Chinese characters in Linux kernel

You can easily input Chinese and get Chinese outp...

Vue + element dynamic multiple headers and dynamic slots

Table of contents 1. Demand 2. Effect 3. All code...