Solve the problem of MySQL Threads_running surge and slow query

Solve the problem of MySQL Threads_running surge and slow query

background

The year before the New Year should be a time to review the work of the year and wrap up, but all kinds of promotions and activities are waiting for the Spring Festival, so many problems have been encountered. Looking back on the problems encountered recently, I found that several of them are similar. It’s a good time to organize them as a case for wrapping up before the New Year. The manifestation is that the database is suspended and unresponsive. This happens when there is high business pressure or when the business is running normally, but problems suddenly occur.

Problem Description

Since Tencent Cloud Database MySQL itself has fault detection and high availability mechanisms, when these problems occurred, several minutes had passed from the time the user reported the problem to the time the actual intervention for troubleshooting was made, but the high availability switch was not triggered, indicating that the problem may not be a failure of the database itself, nor is it some external reason that caused the database to be unavailable.

Checking the status of the database at that time, I found a very abnormal indicator:

Around the time of the problem, the total number of connections and the number of threads_running began to soar in a short period of time, and for nearly half a minute, even the monitoring plug-in could not collect any data. During the same time period, CPU usage (reached 100%) and the number of slow queries also soared. Basically, it can be confirmed that the CPU usage, slow query, and number of connections are related. We can analyze the cause of this problem based on these three indicators.

Cause Analysis

99% of the time, as long as the number of slow queries is soaring, the problem is related to slow queries, but case analysis cannot be so hasty to draw conclusions. Now that we have gotten back to the point, since the goal has been narrowed down to three indicators, let's consider the significance of these three indicators separately and see what problems the abnormalities of these indicators will cause.

CPU

A high CPU usage indicates that MySQL's computing power is fully utilized. Only user threads and MySQL's own system threads can occupy MySQL's computing resources. This problem obviously has nothing to do with MySQL system threads, indicating that user threads are occupying a large amount of CPU computing resources, and the utilization rate reaches 100%, indicating that the degree of competition for this resource is very serious. It may cause the originally highly efficient query to become very slow due to the lack of CPU resources, and the query will be transformed from a high-efficiency query to an inefficient and slow query, resulting in the phenomenon of database pseudo-death or hang.

Slow query

Slow query is a common problem. Due to low query efficiency, it will occupy excessive CPU, IO, memory and other resources, thus affecting other normal queries. From the monitoring indicators, CPU usage, IO usage, and memory usage may increase to varying degrees. In severe cases, these indicators will also soar, resulting in slow response of the entire database.

Number of connections

The number of connections is usually an indicator of "actual failure". For example, when the number of connections reaches the upper limit of max_connections, the entire database cannot create new connections. The program side directly reports an error instead of being unresponsive. For the thread_running indicator, refer to the description in the official document:

The number of threads that are not sleeping.

To put it simply, a spike in this metric means that there are a large number of active users connected to the MySQL instance at that time. And judging from the monitoring chart of this case, there is a soaring trend, indicating that a large number of active connections appeared in a short period of time.

analyze

After a simple analysis of these three indicators, we can find that they influence each other:

  1. Slow query accumulation can cause high CPU usage;
  2. Too high a CPU will result in lower overall query efficiency, which in turn causes some efficient queries to become slow queries;
  3. The execution efficiency of slow queries is too low and they will remain active for a long time, so the Threads_running indicator will definitely increase.
  4. When excessive concurrency suddenly arrives, a large number of active queries will cause the Threads_running indicator to soar. At the same time, this spike-like peak can easily fill up the CPU.

It seems that the reasons for the surge in the three indicators are self-consistent, and relying solely on these three indicators cannot truly determine the cause of the problem. So think carefully about why the reasons for the surge in these indicators are self-consistent? You will find that there is a core phenomenon, or a commonality: queries need to be able to be accumulated. if:

  1. The accumulated queries are not efficient to begin with, so the cause of this problem is basically slow queries.
  2. The accumulated queries are very efficient, so the cause of this problem may be excessive instantaneous concurrency, or other reasons that cause the CPU usage to surge, which in turn affects these highly efficient queries.

Therefore, by checking the accumulated queries, you can identify the problem more directly. In the case shown in the figure above, the accumulated queries use a large number of group by and order by, and the query efficiency is relatively low, so the root cause is still slow query.

Expand

As mentioned at the beginning, there have been several problems recently with similar causes. In addition to this surge case, there are also phenomena as shown below.

threads_running remains at a relatively stable value. Referring to the analysis in the previous article, it can be found that this phenomenon means that in normal times, there are about 10 queries that are active for a long time. A failure scenario can be predicted: the business volume continues to increase, and the number of active queries increases. When efficient queries are affected and the efficiency is reduced to a certain extent, the front-end program/user will initiate a retry due to timeout or slow response. Then, due to the reduction in query efficiency, the retry is repeatedly triggered, which triggers an avalanche effect and slowly drags down the database.

Fortunately, among multiple instances of similar phenomena, only one had a problem, which was the predicted scenario, and the others were optimized in time.

To sum up

Although it is still a slow query problem, this case shows the usefulness of another MySQL indicator, threads_running: monitoring active connections, discovering some high-concurrency and abnormal queries in advance, and preventing the database from accumulating queries and causing pseudo-death.

The above is the detailed content of how to solve the problem of MySQL Threads_running surge and slow query. For more information about MySQL Threads_running surge and slow query, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL slow query pitfalls
  • MYSQL slow query and log example explanation
  • The role and opening of MySQL slow query log
  • MYSQL slow query and log settings and testing
  • Enabling and configuring MySQL slow query log
  • Example of MySQL slow query
  • Mysql sql slow query monitoring script code example
  • How to locate MySQL slow queries
  • MySQL slow query method and example
  • Detailed explanation of the reason why the MySQL 5.7 slow query log time is 8 hours behind the system time
  • Mysql slow query optimization method and optimization principle
  • How to optimize MySQL performance through MySQL slow query

<<:  Solution to Django's inability to access static resources with uwsgi+nginx proxy

>>:  Web Design Tutorial (5): Web Visual Design

Recommend

Summary of methods to include file contents in HTML files

In the forum, netizens often ask, can I read the ...

Common styles of CSS animation effects animation

animation Define an animation: /*Set a keyframe t...

Who is a User Experience Designer?

Scary, isn't it! Translation in the picture: ...

How to set static IP in centOS7 NET mode

Preface NAT forwarding: Simply put, NAT is the us...

Talk about the 8 user instincts behind user experience in design

Editor's note: This article is contributed by...

Vue3.0+vite2 implements dynamic asynchronous component lazy loading

Table of contents Create a Vite project Creating ...

Javascript destructuring assignment details

Table of contents 1. Array deconstruction 2. Obje...

JavaScript canvas to achieve scratch lottery example

This article shares the specific code of JavaScri...

JQuery implements hiding and displaying animation effects

This article shares the specific code of JQuery t...

What does href=# mean in a link?

Links to the current page. ------------------- Com...

Docker solution for logging in without root privileges

When you use the docker command for the first tim...

How to implement scheduled backup of MySQL database

1. Create a shell script vim backupdb.sh Create t...

MySQL uses find_in_set() function to implement where in() order sorting

This article introduces a tutorial about how to u...

Detailed process record of nginx installation and configuration

Table of contents 1 Introduction to nginx 1 What ...