Detailed explanation of top command output in Linux

Detailed explanation of top command output in Linux

Preface

I believe everyone has used the top command under Linux. Since I came into contact with Linux, I have been using top to view the CPU and MEM rankings of processes. But I don’t understand the other output results of the top command. What do these indicators represent and under what circumstances should I pay attention to them? What is the source data of the top command output results, and what is the calculation principle?

Demo Environment

# uname -a
Linux VM_1_11_centos 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

top Command

The top command is a commonly used performance analysis tool under Linux. It can display the system resource usage and resource usage of various processes in real time (refresh every 3 seconds by default), similar to the Windows Task Manager.

top - 11:00:54 up 54 days, 23:35, 6 users, load average: 16.32, 18.75, 21.04
Tasks: 209 total, 3 running, 205 sleeping, 0 stopped, 1 zombie
%Cpu(s): 29.7 us, 18.9 sy, 0.0 ni, 49.3 id, 1.7 wa, 0.0 hi, 0.4 si, 0.0 st
KiB Mem : 32781216 total, 1506220 free, 6525496 used, 24749500 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 25607592 avail Mem 

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND                                                                                 
root 20 0 15.6g 461676 4704 R 198.0 1.4 11:15.26 python                                                                                  
root 20 0 9725596 240028 4672 R 113.0 0.7 7:48.49 python                                                                                  
root 20 0 6878028 143196 4720 S 82.4 0.4 1:35.03 python

The first line of data is equivalent to the uptime command output. 11:00:54 is the current time, up 54 days, 23:55 is the time the system has been running, 6 users means there are currently 6 users logged in, load average: 16.32, 18.75, 21.04 respectively represent the system's one-minute average load, five-minute average load, and 15-minute average load.

Load Average

The load average indicates the average number of active processes, including the number of running processes, the number of processes ready to run (ready state), and the number of processes in uninterruptible sleep state. If the average load number is exactly equal to the number of CPU cores, it proves that each core can be well utilized. If the average load number is greater than the number of cores, it proves that the system is in an overloaded state. It is usually considered that more than 70% of the number of cores is considered to be severely overloaded and requires attention. It is also necessary to combine the 1-minute average load, the 5-minute average load, and the 15-minute average load to see the load trend. If the 1-minute load is relatively high, and the 5-minute and 15-minute average loads are both relatively low, it indicates an instantaneous increase and requires observation. If all three values ​​are very high, you need to pay attention to whether a process is consuming CPU crazily or has frequent IO operations. It may also be caused by too many processes running in the system and frequent process switching. For example, the demonstration environment above is an 8-core CentOS machine, which proves that the system is running in an overloaded state for a long time.

Tasks: 214 total, 4 running, 209 sleeping, 0 stopped, 1 zombie

The Tasks information in the second line shows the overall number and status of processes running in the system. 214 total means that there are 214 user processes in the system now, 4 running means that 4 processes are in the running state, 209 sleeping means that 209 processes are in the sleeping state, 0 stopped means that 0 processes are in the stopped state, and 1 zombie means that there is 1 zombie process.

Zombie Processes

When the child process ends, if the parent process does not call wait()/waitpid() to wait for the child process to end, a zombie process will be generated. The reason is that the child process does not actually exit when it ends, but leaves a zombie process data structure in the system process table, waiting for the parent process to clean up. If the parent process has exited, the init process will take over the parent process to handle it (collect the corpse). It can be seen from this that if the parent process does nothing and does not exit, there will be a large number of zombie processes. Each zombie process will occupy a slot in the process table. If there are too many zombie processes, the system will be unable to create new processes because the capacity of the process table is limited. Therefore, when the zombie indicator is too large, we need to pay attention to it. The S column in the process detailed information below represents the running status of the process, and Z means that the process is a zombie process.

How to eliminate zombie processes:

1. Find the parent process pid of the zombie process (pstress can display the parent-child relationship of the process), kill -9 pid, and init will automatically clean up the zombie process after the parent process exits. (Note that kill -9 does not kill zombie processes)

2. Restart the system.

%Cpu(s): 31.9 us, 30.3 sy, 0.0 ni, 37.0 id, 0.0 wa, 0.0 hi, 0.8 si, 0.0 st

The %Cpu(s) in the third line indicates the overall CPU usage.

  • us user indicates the CPU time ratio in user mode
  • sy system indicates the proportion of CPU time in kernel mode
  • ni nice indicates the proportion of CPU time running low-priority processes
  • id idle indicates the idle CPU time ratio
  • wa iowait indicates the proportion of CPU time in IO waiting
  • hi hard interrupt indicates the proportion of CPU time used to process hard interrupts
  • si soft interrupt indicates the proportion of CPU time spent on processing soft interrupts
  • st steal indicates the proportion of CPU time occupied by other virtual machines when the current system is running in the virtual machine.

So the overall CPU usage = 1-id. When us is very high, it proves that the CPU time is mainly consumed in user code and the user code needs to be optimized. When sy is very high, it means that the CPU time is consumed in the kernel, either by frequent system calls or frequent CPU switching (process switching/thread switching). When wa is very high, it means that a process is performing frequent IO operations, which may be disk IO or network IO. When si is very high, it means that the CPU time is consumed in processing soft interrupts. Network packet reception and transmission will trigger system soft interrupts, so a large number of small network packets will cause frequent triggering of soft interrupts. A typical SYN Floor will cause si to be very high.

KiB Mem : 32781216 total, 663440 free, 7354900 used, 24762876 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 24771700 avail Mem

The 4th and 5th lines show the system memory usage. The unit is KiB. totol represents the total memory, free represents the unused content, and used represents the used memory. buff indicates the memory used to read and write disk cache, and cache indicates the memory used to read and write file cache. avail indicates the available application memory.

The swap principle is to use a piece of disk space or a local file as memory. Swap total indicates the total amount of swap available, swap free indicates the remaining amount, and used indicates the amount already used. If all three values ​​are 0, it means that the swap function of the system is turned off. Since the demonstration environment is a virtual machine, the swap function of the virtual machine is generally turned off.

The sixth line onwards shows the specific status of each process:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  • PID Process ID
  • USER Username of the process owner, such as root
  • PR process scheduling priority
  • NI process nice value (priority), the smaller the value, the higher the priority
  • VIRT Virtual memory used by the process
  • RES Physical memory used by the process (excluding shared memory)
  • Shared memory used by the SHR process
  • CPU The percentage of CPU used by the process
  • MEM The percentage of memory used by the process
  • TIME The total CPU time used by the process since it was started
  • COMMAND The startup command of the process (only binary is displayed by default, top -c can display the command line and startup parameters)

Calculation principle

Before introducing the calculation principles of various indicators of the top command, it is necessary to first introduce the proc file system under Linux, because the data of the top command comes from the proc file system. The proc file system is a virtual file system and a way of communication between the Linux kernel and the user. The Linux kernel will tell the user the current status of the kernel through the proc file system, and the user can also set some behaviors of the kernel by writing proc. Unlike ordinary files, these proc files are created and modified dynamically because the state of the kernel is changing all the time.

The CPU indicators displayed by top are all derived from the /proc/stat file information:

# cat /proc/stat 
cpu 1151829380 20277 540128095 1909004524 21051740 0 10957596 0 0 0
cpu0 143829475 3918 67658924 235696976 5168514 0 1475030 0 0 0
cpu1 144407338 1966 67616825 236756510 3969110 0 1392212 0 0 0
cpu2 144531920 2287 67567520 238021699 2713175 0 1363460 0 0 0
cpu3 143288938 2366 67474485 239715220 2223739 0 1356698 0 0 0
cpu4 143975390 3159 67394206 239494900 1948424 0 1343261 0 0 0
cpu5 144130685 2212 67538520 239431294 1780756 0 1349882 0 0 0
cpu6 144009592 2175 67536945 239683876 1668203 0 1340087 0 0 0
cpu7 143656038 2193 67340668 240204045 1579816 0 1336963 0 0 0

The first line represents the total CPU information, followed by detailed information for each CPU.

But what is the information in these specific columns? We can find the answer through man proc:

user (1) Time spent in user mode.

nice (2) Time spent in user mode with low priority (nice).

system (3) Time spent in system mode.

idle (4) Time spent in the idle task. This value should be USER_HZ times the second entry in the
      /proc/uptime pseudo-file.
iowait (since Linux 2.5.41)

     (5) Time waiting for I/O to complete.

irq (since Linux 2.6.0-test4)
     (6) Time servicing interrupts.

softirq (since Linux 2.6.0-test4)
     (7) Time servicing softirqs.

steal (since Linux 2.6.11)
     (8) Stolen time, which is the time spent in other operating systems when running in a virtual
         ized environment

guest (since Linux 2.6.24)
     (9) Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.

guest_nice (since Linux 2.6.33)
      (10) Time spent running a niced guest (virtual CPU for guest operating systems under the control
         trol of the Linux kernel).

That is to say, starting from the second column, they are the CPU time of user, nice, system, idle, iowait, irq (hard interrupt), softirq (soft interrupt), steal, guest, and guest_nice, and the unit is usually 10ms. So how is the proportion in top calculated?

Because the CPU time is a cumulative value, we require a time period difference to reflect the current CPU situation. The default time period for top is 3s. For example, now take a user value user1 and the current total CPU time total1

Among them, total is equal to the sum of the above items, that is, total=user+nice+system+idle+iowait+irq+softirq+steal+guest+guest_nice. After 3 seconds, another user value user2 and a total amount total2 are obtained.

Then the average CPU usage of the user in these 3 seconds is equal to ((user2-user1)/ (total2-total1))/3 * 100%. In addition, the calculation method for each specific CPU is similar.

Top memory-related indicators directly read the corresponding fields of the /proc/meminfo file:

# cat /proc/meminfo 
MemTotal: 32781216 kB
MemFree: 1043556 kB
MemAvailable: 25108920 kB
Buffers: 427516 kB
Cached: 22084612 kB
SwapCached: 0 kB
Active: 18640888 kB
Inactive: 10534920 kB
Active(anon): 6664480 kB
Inactive(anon): 412 kB
Active(file): 11976408 kB
Inactive(file): 10534508 kB
Unevictable: 4 kB
Mlocked: 4 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 1092 kB
Writeback: 0 kB
AnonPages: 6663764 kB
Mapped: 347808 kB
Shmem: 1212 kB
Slab: 2201292 kB
SReclaimable: 1957344 kB
SUnreclaim: 243948 kB
KernelStack: 73392 kB
PageTables: 57300 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 16390608 kB
Committed_AS: 42170784 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 61924 kB
VmallocChunk: 34359625048 kB
HardwareCorrupted: 0 kB
AnonHugePages: 364544 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 376680 kB
DirectMap2M: 26886144 kB
DirectMap1G: 8388608 kB

Among them, total corresponds to MemTotal, free corresponds to MemFree, and avail corresponds to MemAailable.

Summarize

The article starts with the output of the top command, explaining which indicators' abnormal values ​​need our attention, and finally introduces the CPU calculation principle of the top command and the data source of mem.

Well, that’s all for this article. I hope the content of this article will be of certain reference value to your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Detailed explanation of the usage of Linux top command
  • Detailed explanation of the Linux top command and its output results
  • Detailed explanation of Linux top command
  • Detailed explanation of top command in Linux
  • Tips for using top command in Linux

<<:  Detailed explanation of the use of default in MySQL

>>:  React Native reports "Cannot initialize a parameter of type'NSArray<id<RCTBridgeModule>>" error (solution)

Recommend

Detailed explanation of the use of MySQL DML statements

Preface: In the previous article, we mainly intro...

Get the IP and host name of all hosts on Zabbix

zabbix Zabbix ([`zæbiks]) is an enterprise-level ...

Install MySQL 5.7 on Ubuntu 18.04

This article is compiled with reference to the My...

Detailed example of SpringBoot+nginx to achieve resource upload function

Recently, I have been learning to use nginx to pl...

Understanding flex-grow, flex-shrink, flex-basis and nine-grid layout

1. flex-grow, flex-shrink, flex-basis properties ...

Docker implements cross-host container communication based on macvlan

Find two test machines: [root@docker1 centos_zabb...

Example of how to create a local user in mysql and grant database permissions

Preface When you install MySQL, you usually creat...

Some problems that may be caused by inconsistent MySQL encoding

Stored procedures and coding In MySQL stored proc...

Detailed graphic explanation of mysql query control statements

mysql query control statements Field deduplicatio...

Summary of commonly used operators and functions in MySQL

Let’s build the data table first. use test; creat...

Form submission page refresh does not jump

1. Design source code Copy code The code is as fol...

Solution to the gap between divs

When you use HTML div blocks and the middle of th...

Markup language - simplified tags

Click here to return to the 123WORDPRESS.COM HTML ...