Detailed explanation of Linux CPU load and CPU utilization

Detailed explanation of Linux CPU load and CPU utilization

CPU Load and CPU Utilization

Both of these can reflect the busyness of a machine to a certain extent.

The CPU usage rate reflects the current CPU busyness. The reason for the fluctuation is that the process that occupies the CPU processing time may be in the IO wait state but has not yet been released to wait.

The load average refers to the number of processes that occupy CPU time and wait for CPU time within a certain period of time. Here, the process waiting for CPU time refers to the process waiting to be awakened, excluding the process in the wait state.

From the above analysis, we can see that a machine is very likely to be in a situation of low CPU usage and high load. Therefore, the busyness of the machine should be viewed in combination with both. From the actual usage observation, my own dual-core Xeon 2.8GHZ, 2G memory machine has an average load of about 50, and the CPU usage is close to 100% (the application has a lot of IO operations). In this case, the application is still smooth and the actual access delay is not very high. Therefore, when the CPU is still idle, how to improve IO response is the key to reducing the load. Many people think that the machine is very busy when the load reaches dozens. I think that if the CPU utilization rate is relatively low at this time, the high load may not explain the problem well. Once the process processed by the CPU is completed, those waiting processes can also get a response immediately. In this case, the IO read and write speed should be optimized. If the CPU usage rate is always above 90%, even if the average load is only in the single digit (for example, a certain process is always running), the machine is actually busy.

In fact, in the previous article, it was also written that the CPU usage is low and the load is high. The reason for the low CPU usage, but the load is very high, the high load may be IO

An Analogy for CPU Load

To determine whether the system is overloaded, you must understand the true meaning of load average. Below, I will try to explain this problem in the most popular language based on the article "Understanding Linux CPU Load".
First, let’s assume the simplest case where your computer has only one CPU and all calculations must be done by this CPU.
Well, we might as well imagine this CPU as a bridge with only one lane, and all vehicles must pass through this lane. (Obviously, the bridge is only open to traffic in one direction.)
The system load is 0, which means there is no car on the bridge.

The system load is 0.5, which means that half of the bridge has cars on it.

A system load of 1.0 means that there are cars on all sections of the bridge, which means that the bridge is "full". But it must be noted that the bridge was still accessible until this time.

The system load is 1.7, which means that there are too many vehicles and the bridge is already fully occupied (100%), and the vehicles waiting to board the bridge account for 70% of the vehicles on the bridge. By analogy, a system load of 2.0 means that the number of vehicles waiting to board the bridge is the same as the number of vehicles on the bridge deck; a system load of 3.0 means that the number of vehicles waiting to board the bridge is twice the number of vehicles on the bridge deck. In short, when the system load is greater than 1, the vehicles behind must wait; the greater the system load, the longer they must wait to cross the bridge.

The system load of the CPU is basically equivalent to the analogy above. The traffic capacity of the bridge is the maximum workload of the CPU; the vehicles on the bridge are processes waiting to be processed by the CPU.
If the CPU processes a maximum of 100 processes per minute, then a system load of 0.2 means that the CPU only processes 20 processes in this minute; a system load of 1.0 means that the CPU processes exactly 100 processes in this minute; a system load of 1.7 means that in addition to the 100 processes being processed by the CPU, there are 70 processes waiting in line for the CPU to process.
In order to ensure the smooth operation of the computer, the system load should not exceed 1.0, so that no process needs to wait and all processes can be processed as soon as possible. Obviously, 1.0 is a critical value. If it exceeds this value, the system is no longer in the optimal state and you need to intervene.

CPU Load - Multiprocessor

Above, we assumed that your computer has only 1 CPU. What would happen if your computer had 2 CPUs installed?
2 CPUs means that the computer's processing power has doubled, and the number of processes that can be processed simultaneously has also doubled.
Let’s use the bridge as an analogy again. Two CPUs mean that the bridge has two lanes, doubling its traffic capacity.

Therefore, 2 CPUs indicate that the system load can reach 2.0, at which point each CPU reaches 100% of its workload. Generally speaking, for a computer with n CPUs, the maximum acceptable system load is n.0.

CPU Load - Multi-core Processors

Chip manufacturers often include multiple CPU cores within a single CPU, which is called a multi-core CPU.
In terms of system load, the effect of multi-core CPU is similar to that of multiple CPUs, so when considering system load, you must consider how many CPUs the computer has and how many cores each CPU has. Then, divide the system load by the total number of cores. As long as the load per core does not exceed 1.0, the computer is running normally.
How do you know how many CPU cores your computer has?
The "cat /proc/cpuinfo" command can be used to view CPU information. The "grep -c 'model name' /proc/cpuinfo" command directly returns the total number of CPU cores.

Rules of thumb for system loading

Is 1.0 the ideal value for system load?

Not necessarily. System administrators often leave some wiggle room. When this value reaches 0.7, you should pay attention. The rule of thumb is this:

  • When the system load is continuously greater than 0.7, you must start investigating where the problem lies to prevent the situation from getting worse.
  • When the system load is continuously greater than 1.0, you must find a solution to reduce this value.
  • When the system load reaches 5.0, it means that your system has serious problems, has been unresponsive for a long time, or is close to crashing. You should not allow your system to reach this value.

My machine has 24 cores, so what is the appropriate load?

[[email protected] /home/ahao.mah/ALIOS_QA]#grep 'model name' /proc/cpuinfo | wc -l24

The answer is:

[[email protected] /home/ahao.mah/ALIOS_QA]#echo "0.7*24" |bc16.8

Optimal observation time

The last question is, "load average" returns three average values ​​---- 1 minute system load, 5 minutes system load, 15 minutes system load, ---- which value should I refer to?

If the system load is greater than 1.0 for only 1 minute and less than 1.0 for the other two time periods, this indicates that it is only a temporary phenomenon and is not a serious problem.

If the average system load is greater than 1.0 (after adjusting the number of CPU cores) within 15 minutes, it indicates that the problem persists and is not a temporary phenomenon. Therefore, you should mainly observe the "15-minute system load" and use it as an indicator of the normal operation of the computer.

You may also be interested in:
  • Troubleshooting ideas and solutions for high CPU usage in Linux systems
  • Detailed explanation of the process of troubleshooting the cause of high CPU usage under Linux

<<:  How to Develop a Progressive Web App (PWA)

>>:  Detailed explanation of the execution plan explain command example in MySQL

Recommend

Detailed explanation of Vue data proxy

Table of contents 1. What I am going to talk abou...

Solution to mysql error when modifying sql_mode

Table of contents A murder caused by ERR 1067 The...

Detailed steps for installing MySQL using cluster rpm

Install MySQL database a) Download the MySQL sour...

Summarize the User-Agent of popular browsers

1. Basic knowledge: Http Header User-Agent User A...

The meaning of the 5 types of spaces in HTML

HTML provides five space entities with different ...

How to upload and download files between Linux server and Windows system

Background: Linux server file upload and download...

Two methods to implement Mysql remote connection configuration

Two methods to implement Mysql remote connection ...

Use of Linux file command

1. Command Introduction The file command is used ...

How to install openjdk in docker and run the jar package

Download image docker pull openjdk Creating a Dat...

Create a virtual environment using venv in python3 in Ubuntu

1. Virtual environment follows the project, creat...

Implementation of Docker deployment of MySQL cluster

Disadvantages of single-node database Large-scale...