Troubleshooting ideas and solutions for high CPU usage in Linux systems

Troubleshooting ideas and solutions for high CPU usage in Linux systems

Preface

As Linux operation and maintenance engineers, in our daily work we may encounter situations where the CPU load on Linux servers reaches 100% and remains high. If the CPU continues to run high, it will affect the normal operation of the business system and cause losses to the company.


Many operation and maintenance personnel are often at a loss when encountering this situation. For CPU overload problems, the following two methods can usually be used to quickly locate them:

Method 1

Step 1: Use

top command, then press shift+p to sort by CPU

Find the pid of the process that is using too much CPU

Step 2: Use

top -H -p [process id]

Find the id of the thread that consumes the most resources in the process

Step 3: Use

echo 'obase=16;[thread id]' | bc or printf "%x\n" [thread id]

Convert the thread id to hexadecimal (letters should be lowercase)

bc is the calculator command in Linux

Step 4: Execution

jstack [process id] |grep -A 10 [thread id in hexadecimal]"

View thread status information

Method 2

Step 1: Use

top command, then press shift+p to sort by CPU

Find the process that is using too much CPU

Step 2: Use

ps -mp pid -o THREAD,tid,time | sort -rn

Get thread information and find threads that use up a lot of CPU

Step 3: Use

echo 'obase=16;[thread id]' | bc or printf "%x\n" [thread id]

Convert the required thread ID to hexadecimal format

Step 4: Use

jstack pid |grep tid -A 30 [hexadecimal of thread id]

Print thread stack information

Case Study

Scenario Description

Troubleshooting high CPU usage of JAVA processes in production environments

Solution process

1. According to the top command, it is found that the Java process with PID 2633 occupies up to 300% of the CPU and a fault occurs.

2. After finding the process, how to locate the specific thread or code? First, display the thread list and sort it by the threads with high CPU usage:

[root@localhost ~]# ps -mp 2633 -o THREAD,tid,time | sort -rn

The results are as follows:


The thread (TID) 3626 with the highest CPU consumption was found, which has occupied the CPU time for 12 minutes!

3. Convert the required thread TID to hexadecimal format

[root@localhost ~]# printf "%x\n" 3626
e18

4. Finally, use the jstack command to print out the stack information of this thread under the process:

[root@localhost ~]# jstack 2633 |grep "e18" -A 30

Compared with troubleshooting, discovering the fault is equally important! Most monitoring software on the market can achieve real-time observation of server load, such as Zabbix, Nagios, Alibaba Cloud Monitoring (for cloud servers), etc. However, most of the software requires operation and maintenance personnel to actively set rules or conduct tests to discover problems. How can we receive alerts passively?

I would like to recommend a practical operation and maintenance software to you - Professor Wang. For users whose businesses are deployed on Alibaba Cloud, they only need to bind the read-only AcessKey that needs to be monitored to promptly notify the corresponding team members of the alarm information of the cloud resources.

The change from active to passive approach reduces the workload of operation and maintenance engineers on the one hand, and reduces the chances of O&M engineers missing or ignoring alarms on the other.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Detailed explanation of Linux CPU load and CPU utilization
  • Detailed explanation of the process of troubleshooting the cause of high CPU usage under Linux

<<:  Detailed explanation of Vue custom instructions and their use

>>:  Summary of the installation process of MySql 8.0.11 and the problems encountered when linking with Navicat

Recommend

JS realizes picture digital clock

This article example shares the specific code of ...

Use of VNode in Vue.js

What is VNode There is a VNode class in vue.js, w...

Basic learning tutorial of table tag in HTML

Table label composition The table in HTML is comp...

Docker Data Storage Volumes Detailed Explanation

By default, the reading and writing of container ...

Graphic tutorial on installing Ubuntu 18.04 on VMware 15 virtual machine

In the past few years, I have been moving back an...

MySQL column to row conversion and year-month grouping example

As shown below: SELECT count(DISTINCT(a.rect_id))...

Detailed process of configuring NIS in Centos7

Table of contents principle Network environment p...

MySQL 5.6.27 Installation Tutorial under Linux

This article shares the installation tutorial of ...

WeChat Mini Programs are shared globally via uni-app

In actual use, it is often necessary to share the...

Native JS to implement paging click control

This is an interview question, which requires the...

WeChat applet to save albums and pictures to albums

I am currently developing a video and tool app, s...

Detailed explanation of HTML programming tags and document structure

The purpose of using HTML to mark up content is t...

Example code for css flex layout with automatic line wrapping

To create a flex container, simply add a display:...

Summary of flex layout compatibility issues

1. W3C versions of flex 2009 version Flag: displa...