Record a troubleshooting record of high CPU usage of Tomcat process

Record a troubleshooting record of high CPU usage of Tomcat process

This article mainly records a tomcat process, and the troubleshooting record of excessive CPU usage due to too many TCP connections.

Problem Description

Under Linux, the CPU usage of a Tomcat web service is very high, and top shows over 200%. The request could not be responded to. Repeated restart still the same phenomenon.

Troubleshooting

1. Get process information

The jvm process can be quickly checked through the jps command provided by jdk.

jps-pid

2. View jstack information

jstack pid

It is found that there are a large number of log4j thread blocks in the waiting lock state

org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=201 (Compiled frame)

After searching for relevant information, I found that the log4j 1.x version has a deadlock problem.

I found the problem, so I adjusted the log4j configuration, turned on only the error level log, and restarted Tomcat. At this time, the block thread in the stack disappears, but the process CPU usage is still high.

3. Further investigation

To analyze the CPU usage of each thread, we need to introduce a script contributed by a great god to calculate the CPU usage of each thread in the Java process.

#!/bin/bash

typeset top=${1:-10}
typeset pid=${2:-$(pgrep -u $USER java)}
typeset tmp_file=/tmp/java_${pid}_$$.trace

$JAVA_HOME/bin/jstack $pid > $tmp_file
ps H -eo user,pid,ppid,tid,time,%cpu --sort=%cpu --no-headers\
    | tail -$top\
    | awk -v "pid=$pid" '$2==pid{print $4"\t"$6}'\
    | while read line;
do
    typeset nid=$(echo "$line"|awk '{printf("0x%x",$1)}')
    typeset cpu=$(echo "$line"|awk '{print $2}')
    awk -v "cpu=$cpu" '/nid='"$nid"'/,/^$/{print $0"\t"(isF++?"":"cpu="cpu"%");}' $tmp_file
done

rm -f $tmp_file

Script application scope

Because the %CPU statistics in ps come from /proc/stat, this data is not real-time, but depends on the frequency of OS updates, which is generally 1S. This is why the statistics you see are inconsistent with the information from jstack. However, this information is very helpful for troubleshooting problems caused by continuous load from a few threads, because these fixed few threads will continue to consume CPU resources. Even if there is a time difference, it is caused by these threads anyway.

In addition to this script, a simpler method is to find out the process ID and use the following command to view the resource usage of each thread in the process

top -H -p pid

Get the pid (thread id) from here, convert it to hexadecimal, and then find the thread information of the object in the stack information.

Through the above method, it is found that the cumulative CPU usage of the threads corresponding to the tomcat process is about 80%, which is much smaller than the 200%+ given by top.

This means that there are no threads that occupy the CPU for a long time, and there should be many short-term CPU-intensive calculations. I then suspected that it was caused by insufficient JVM memory and frequent GC.

jstat -gc pid

It was found that the jvm memory usage was not abnormal, but the number of gc times increased significantly.

After checking the memory, since it is a network program, further check the network connection.

4. Problem location

Querying the TCP connection of the corresponding port of tomcat, it is found that there are a large number of EASTABLISH connections and some connections in other states, totaling more than 400.

netstat -anp | grep port

Further checking the source of these connections revealed that there were a large number of background threads on the application side of the tomcat service, which frequently polled the service, causing the number of tomcat connections of the service to be full and unable to continue receiving requests.

Netstat status description:

  • LISTEN: Listen for connection requests from remote TCP ports
  • SYN-SENT: Send a connection request and wait for a matching connection request (if there are a large number of such status packets, check whether it has been infected)
  • SYN-RECEIVED: After receiving and sending a connection request, wait for the other party to confirm the connection request (if there are a lot of this state, it is estimated that it has been flooded)
  • ESTABLISHED: represents an open connection
  • FIN-WAIT-1: Waiting for a remote TCP connection interruption request, or confirmation of a previous connection interruption request
  • FIN-WAIT-2: Waiting for a connection interruption request from the remote TCP
  • CLOSE-WAIT: Waiting for a connection interruption request from a local user
  • CLOSING: Waiting for remote TCP to confirm the connection is broken
  • LAST-ACK: Waiting for the confirmation of the original connection interruption request sent to the remote TCP (not a good thing, if this item appears, check whether it has been attacked)
  • TIME-WAIT: Wait enough time to ensure that the remote TCP receives confirmation of the connection termination request
  • CLOSED: No connection status

5. Root Cause Analysis

The direct triggering cause is client polling, request exception, and continued polling; new background threads on the client continue to join the polling team, which eventually leads to the server's Tomcat connection being full.

This is the end of this article about recording a problem of high CPU usage of the tomcat process. For more related content about high CPU usage of the tomcat process, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A practical record of troubleshooting Spring project packaging issues
  • j2Cache online exception troubleshooting problem solving record analysis

<<:  Detailed discussion of memory and variable storage in JS

>>:  Two ways to manually implement MySQL dual-machine hot standby on Alibaba Cloud Server

Recommend

Using vue3 to implement counting function component encapsulation example

Table of contents Preface 1. The significance of ...

MySQL time types and modes details

Table of contents 1. MySQL time type 2. Check the...

WeChat applet implements search box function

This article example shares the specific code for...

Using JS to implement a simple calculator

Use JS to complete a simple calculator for your r...

The difference between distinct and group by in MySQL

Simply put, distinct is used to remove duplicates...

Deploy grafana+prometheus configuration using docker

docker-compose-monitor.yml version: '2' n...

Web design tips on form input boxes

1. Dashed box when cancel button is pressed <br...

Detailed application of Vue dynamic form

Overview There are many form requirements in the ...

React's transition from Class to Hooks

Table of contents ReactHooks Preface WhyHooks? Fo...

Implement a simple data response system

Table of contents 1. Dep 2. Understand obverser 3...

Docker online and offline installation and common command operations

1. Test environment name Version centos 7.6 docke...

About the problem of no virtual network card after VMware installation

1 Problem description: 1.1 When VMware is install...

MySql import CSV file or tab-delimited file

Sometimes we need to import some data from anothe...

JavaScript to make the picture move with the mouse

This article shares the specific code of JavaScri...