A performance bug about MySQL partition tables

A performance bug about MySQL partition tables

1. Problem Description

Recently, I encountered a problem. When using partitioned tables for data query/load, the performance dropped by about 50% compared to ordinary tables. The main bottleneck appeared in the CPU. Since it was the CPU bottleneck, we can collect perf top -a -g and pstack to find the performance bottleneck. At the same time, compared with ordinary tables, we found that the CPU was mainly consumed in the function build_template_field as shown in the following figure:

2. Stack analysis using pt-pmap

In order to verify with perf top -g -a , we also obtained pstack at that time. Due to the large number of threads, we formatted it through pt-pmap to facilitate the acquisition of useful information as follows:

After formatting, we removed the idle wait stack and found a large number of them as shown above, which was also confirmed by the performance in perf top -a -g.

3. Analysis of bottleneck points in this column

We see that a lot of cpu is consumed here

ha_innobase::build_template
 ->build_template_field
   ->dict_col_get_clust_pos

template is almost always bound to a specific query, which means that a normal statement requires at least one template . Its structure is row_prebuilt_t , which contains the query tuple, the query table, the index used for the query, transaction-related information, persistent cursor, the length of the MySQL layer query row, auto-increment information, ICP-related information, mysql_row_templ_t structure and other information. The mysql_row_templ_t information is one for each field. Its main function is to record the related attributes of the feild information at the MySQL layer and the columns information at the Innodb layer. It is used to quickly convert a row of records between the MySQL layer and the Innodb layer. In order to initialize mysql_row_templ_t , the above logic appears.

The logic is roughly as follows:

Loop through each field in the table (one level of loop) ha_innobase::build_template
Is it a field that needs to be accessed? build_template_needs_field
This includes all fields that are queried and written. The more fields you need to access, the slower it will be.
If not, do not continue the loop
If you need access
build_template_field (filled with mysql_row_templ_t structure)
Loop through each field of the primary key (second-level loop)
Including pseudo columns, the primary key is all the fields in the table. The more fields in the table, the slower it is) dict_col_get_clust_pos
Confirm the position of this field in the primary key
pos0 primary key pos1 DB_TRX_ID pos2 DB_ROLL_PTR pos3 other fields for users
Loop through each field of the index (two-level loop, but there are usually not too many index fields, so it won't be slow here) dict_index_t::get_col_pos
Confirm the position of this field in the index, if not, return NULL
Return pos For example, the primary key is id1 and the secondary index is id2 id3. The secondary index is pos0 id2 pos1 id3 pos2 id1
Continue to complete other properties such as mysql null bitmap, mysql display length, mysql character set, etc.

Here we can see that there are actually two layers of loops, that is, loops within loops (time complexity O(M×N)), and the loops have the greatest impact in two places:

  • First level, the number of fields in the table
  • In the second layer, the fields that need to be accessed (both read and write) are looped through the primary key (that is, all fields)

This is why it's slow here. However, templates are usually not created multiple times for a query. For example, a large query of a common table is created only before the statement locates data for the first time. This is a special point in the comparison between partitioned tables and common tables. Let me describe it below.

4. Multiple creation of templates in the partition table

Assume we have the following partition table:

create table t(
    id1 int,
    id2 int,
    primary key(id1),
    key(id2)
)engine=innodb
partition by range(id1)(
    partition p0 values ​​less than(100),
    partition p1 values ​​less than(200),
    partition p2 values ​​less than(300));    

insert into t values(1,1);
insert into t values(101,1);
insert into t values(201,1);
insert into t values(2,2);
insert into t values(3,2);
insert into t values(4,2);
insert into t values(7,2);
insert into t values(8,2);
insert into t values(9,2);
insert into t values(10,2);

We use the statement " select * from t where id2=1 ". Obviously, id2 is a secondary index. Since all MySQL secondary indexes are local partitions, the values ​​here are distributed in three partitions. For such a statement, when the normal table should continue to access the position after the last positioning ( next_same ), it is changed index read to locate again by encapsulating the partition table. We can clearly see that this is scan next partition, and its part=1 is the second partition, which is our p1 (the first is 0).

In this way, template needs to be rebuilt for each scan next partition partition), which leads to the above problem. This is understandable. The new partition is a new InnoDB file, so the last located persistent cursor is actually useless, which is equivalent to a new table access. Here is another judgment on whether to create template :

  if (m_prebuilt->sql_stat_start) {
    build_template(false);
  }

In addition to being set to true at the beginning of the statement, m_prebuilt->sql_stat_start is also set to true each time the partition is changed as follows:

ha_innopart::set_partition:
m_prebuilt->sql_stat_start = m_sql_stat_start_parts.test(part_id);

5. About a special process

There is also a stack in our fault pstack as follows:

This stack is not actually complete, but Partition_helper::handle_ordered_index_scan appears in it. This function is actually related to the sorting of the partition table. If we consider such a situation, for the secondary index select max(id2) from t, we need to first access each partition to obtain the maximum value and then compare the maximum value of each partition to get the final result. MySQL uses priority queue for processing, which should be part of the function completed by this function (I didn't look at it carefully). Secondly, we first have QUICK_RANGE_SELECT , which is used for range queries, so we construct it as follows:

select * from t where id2<2 order by id2;


Stack:

This is because the id2 field only ensures that the data is sorted by size within the partition, but it is out of order in the entire table and requires additional processing.

6. Problem Simulation

With these preparations, we can construct a partition table with 300 fields and 25 partitions. The latest test version is 8.0.26

create table tpar300col(
    id1 int,
    id2 int,
    id3 int,
    id4 int,
...
    id299 varchar(20),
    id300 varchar(20),
    primary key(id1),
    key(id2)
)engine=innodb
partition by range(id1)(
    partition p0 values ​​less than(100),
    partition p1 values ​​less than(200),
    partition p3 values ​​less than(300),
 ...
    partition p25 values ​​less than(2500));  

insert into tpar300col values(1 ,1,1,
.... insert one piece of data per partition into tpar300col values(2401,1,1

Then construct some other data id2 not to be 1, and create a stored procedure:

delimiter //

CREATE PROCEDURE test300col()
begin 
  declare num int;
  set num = 1; 
while num <= 1000000 do
  select * from tpar300col where id2=1;
  set num = num+1;
end while;
end //
implement:
 /opt/mysql/mysql3340/install/mysql8/bin/mysql -S--socket=/opt/mysql/mgr3315/data/mgr3315.sock -e'use test;call test300col();' > log.log

Then perf top observes the following:

This confirmed the problem.

VII. Conclusion

This problem is actually related to the data dispersion of the secondary index relative to the partition key, but we cannot control the data of the secondary index, and the index must be used. We can only try to avoid it in some ways. Of course, I also submitted a BUG, ​​as follows:

https://bugs.mysql.com/bug.php?id=104576

I don't know if there is a way to fix this problem. For example, for a partitioned table, the fields of each partition are actually the same. Do I need to rebuild mysql_row_templ_t.clust_rec_field_no every time? If it is not necessary, then the problem will be solved by itself. The official has verified that this problem does exist. Here are some ways to avoid it:

  • The partition table should not have too many fields
  • The fields to be accessed should not always use select *
  • Avoid using hash partitioning, which can increase this problem.

This is the end of this article about a performance bug in MySQL partition tables. For more information about a performance bug in MySQL partition tables, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Why must the partition key in a MySQL partition table be part of the primary key?
  • MySQL partition table is classified by month
  • Detailed explanation of MySQL partition table
  • Management and maintenance of Mysql partition tables
  • MySQL optimization partition table
  • Summary of MySQL partition table management commands

<<:  Solution to inserting a form with a blank line above and below

>>:  My CSS architecture concept - it varies from person to person, there is no best, only suitable

Recommend

CSS3 realizes particle animation effect when matching kings

When coding, you will find that many things have ...

How to automatically number the results of MYSQL query data

Preface In fact, I have never encountered this ki...

Vue SPA first screen optimization solution

Table of contents Preface optimization SSR Import...

VM VirtualBox virtual machine mount shared folder

One environment Install VMware Tools on CentOS 7 ...

Summary of discussion on nginx cookie validity period

Every visit will generate Cookie in the browser, ...

Detailed explanation of several ways to install CMake on Ubuntu

apt install CMake sudo apt install cmake This met...

JS ES new features: Introduction to extension operators

1. Spread Operator The spread operator is three d...

mysql-8.0.16 winx64 latest installation tutorial with pictures and text

I just started learning about databases recently....

Vue implements the drag and drop sorting function of the page div box

vue implements the drag and drop sorting function...

border-radius method to add rounded borders to elements

border-radius:10px; /* All corners are rounded wi...

Avoiding Problems Caused by Closures in JavaScript

About let to avoid problems caused by closure Use...

The whole process of developing a Google plug-in with vue+element

Simple function: Click the plug-in icon in the up...

N ways to cleverly implement adaptive dividers with CSS

Dividing lines are a common type of design on web...

How to configure pseudo-static and client-adaptive Nginx

The backend uses the thinkphp3.2.3 framework. If ...