Detailed analysis of MySQL instance crash cases

[Problem description]

Our production environment has a cluster of multiple MySQL servers (MySQL 5.6.21), which crash from time to time. However, the error log only records the restart information, but not the crash stack:

mysqld_safe Number of processes running now: 0
mysqld_safe mysqld restarted

Next, check the system log /var/log/message file. There is no other abnormal information when crashing, and it is not caused by OOM.

【Troubleshooting ideas】

Because no valuable information is recorded in the log. To locate the cause of the crash, first enable the MySQL core dump function.

Here are the steps to enable core dump:

1. Add two configuration items in the my.cnf file

[mysqld]

core_file

[mysqld_safe]

core-file-size=unlimited

2. Modify system parameters and configure suid_dumpable

echo 1 >/proc/sys/fs/suid_dumpable

3. Restart the MySQL service and the configuration will take effect

【Problem Analysis】

After core dump is enabled, a core file is generated when the server crashes again.

Using gdb to analyze the generated core file, you can see the stack information at the time of crash as follows:

From the function table_esms_by_digest::delete_all_rows , we can see that the crash was triggered by the truncate table events_statements_summary_by_digest operation.

We have an internal DML analysis tool that is used to count the number of database accesses for additions, deletions, modifications, and queries per minute. The data source of this tool is the events_statements_summary_by_digest table. The collection program collects data from this table once a minute and performs a truncate operation after the collection is completed.

After pausing the DML collection program on this group of clusters, MySQL no longer crashed.

Further analysis of multiple core files revealed that the final function calls all occurred on the _lf_pinbox_real_free function.

Combined with the on-site environment, there are two places worth analyzing:

1. Abnormal memory value. When printing the variable, the address of the variable here is low, which is not normal:

(gdb) p pins->pinbox

$2 = (LF_PINBOX *) 0x1367208

2. The red part is the operation of pfs releasing digest records one by one. An error occurred when releasing a row of data:

void reset_esms_by_digest()

{

uint index;

if (statements_digest_stat_array == NULL)

return;

PFS_thread *thread = PFS_thread::get_current_thread();

if (unlikely(thread == NULL))

return;

for (index = 0; index < digest_max; index++)

{

statements_digest_stat_array[index].reset_index(thread);

statements_digest_stat_array[index].reset_data();

}

digest_index = 1;

}

There are two possible causes of error:

1. Under high concurrency, conflicts occur in memory access;

2. A special SQL causes an error when processing hash.

Searching for similar problems online has made further progress and basically confirmed that this problem is caused by a bug

The following Mysql bug report describes a similar problem

https://bugs.mysql.com/bug.php?id=73979

A more detailed description of the environment is in the following link

https://bugs.launchpad.net/percona-server/+bug/1351148

The bug fix on 5.6.35 was found to be very similar to the situation we encountered.

Compared with the modification of _lf_pinbox_real_free, this part has indeed undergone major adjustments.

Here is a code snippet from the MySQL 5.6.35 function _lf_pinbox_real_free:

static void _lf_pinbox_real_free(LF_PINS pins)

{

LF_PINBOX pinbox= pins->pinbox;

struct st_match_and_save_arg arg = {pins, pinbox, pins->purgatory};

pins->purgatory = NULL;

pins->purgatory_count = 0;

lf_dynarray_iterate(&pinbox->pinarray,

(lf_dynarray_func)match_and_save, &arg);

if (arg.old_purgatory)

{

void *last = arg.old_purgatory;

while (pnext_node(pinbox, last))

last = pnext_node(pinbox, last);

pinbox->free_func(arg.old_purgatory, last, pinbox->free_func_arg);

}

}

Below is a code snippet of the MySQL 5.6.21 function _lf_pinbox_real_free

static void _lf_pinbox_real_free(LF_PINS pins)

{

int npins;

void list;

void **addr = NULL;

void first= NULL, last= NULL;

LF_PINBOX pinbox= pins->pinbox;

npins= pinbox->pins_in_array+1;

if (pins->stack_ends_here != NULL)

{

int alloca_size = sizeof(void )LF_PINBOX_PINSnpins;

if (available_stack_size(&pinbox, *pins->stack_ends_here) > alloca_size)

{

struct st_harvester hv;

addr = (void **) alloca(alloca_size);

hv.granary = addr;

hv.npins= npins;

_lf_dynarray_iterate(&pinbox->pinarray,

(lf_dynarray_func)harvest_pins, &hv);

npins = hv.granary-addr;

if (npins)

qsort(addr, npins, sizeof(void *), (qsort_cmp)ptr_cmp);

}

}

At the same time, it was observed that the problematic cluster had abnormal indicators, with QPS less than 6000 and Threads_connected nearly 8000. (Compared with other high-concurrency clusters, QPS is over 20,000 and Threads_connected is only around 300).

After checking the connection method on the application side, we found that one of the applications had nearly a hundred application servers, which might initiate requests at the same time without reasonable connection reuse. Maintaining a large number of connection threads increased the probability of bug triggering.

The description of Bugs Fixed is as follows:

Miscalculation of memory requirements for qsort operations could result in stack overflow errors in situations with a large number of concurrent server connections. (Bug #73979, Bug #19678930, Bug #23224078)

【Solution】

We analyzed the core file at the time of the crash, found the triggering conditions of the crash, paused the DML collection program (truncate table events_statements_summary_by_digest operation), and then resumed it.

Later I learned that this was a bug in MySQL, which was fixed in MySQL version 5.6.35. This bug is more likely to be triggered when the application establishes a large number of connections to the database.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support for 123WORDPRESS.COM.

<<: A troubleshooting experience of centos Docker bridge mode unable to access the host Redis service

>>: Lombok implementation JSR-269

Solution - BASH: /HOME/JAVA/JDK1.8.0_221/BIN/JAVA: Insufficient permissions

Based on vue-simple-uploader, encapsulate the global upload plug-in function of file segment upload, instant upload and breakpoint resume

Table of contents 1. Introduction 2. About vue-si...

Detailed analysis of MySQL instance crash cases

Solution - BASH: /HOME/JAVA/JDK1.8.0_221/BIN/JAVA: Insufficient permissions

One line of code solves various IE compatibility issues (IE6-IE10)

How to modify the root user password in mysql 8.0.16 winx64 and Linux

Detailed explanation of Nginx reverse generation Mogilefs distributed storage example

A practical record of restoring a MySQL Slave library

How to upgrade all Python libraries in Ubuntu 18.04 at once

select the best presets to create full compatibility with all browsersselect

Introduction to the use of this in HTML tags

Tutorial on installing and configuring MySql5.7 in Alibaba Cloud ECS centos6.8

Getting Started Tutorial on Animating SVG Path Strokes Using CSS3

Recommend

Detailed explanation of the murder caused by a / slash in Nginx proxy_pass

Pure CSS3 to achieve pet chicken example code

JS 9 Promise Interview Questions

MySQL reports an error: Can't find file: './mysql/plugin.frm' solution

Whitespace processing in HTML/CSS and how to preserve whitespace in the page

Basic steps to use Mysql SSH tunnel connection

Solution to the problem that directly setting the width and height of a hyperlink does not work

Let’s talk about the symbol data type in ES6 in detail

Solution to the cross-domain problem of SpringBoot and Vue interaction

Docker uses Supervisor to manage process operations

Web Design Principles of Hyperlinks

Vue realizes price calendar effect

MySQL chooses the right storage engine

10 ways to view compressed file contents in Linux (summary)

Based on vue-simple-uploader, encapsulate the global upload plug-in function of file segment upload, instant upload and breakpoint resume