Detailed explanation of MySQL Group by optimization

Detailed explanation of MySQL Group by optimization

A standard Group by statement contains sorting, grouping, and aggregation functions, such as select a,count(*) from t group by a; This statement uses a for sorting by default. If column a has no index, a temporary table will be created to count a and count(*), and then sorted by a through sort_buffer.

Standard execution process

structure:

create table t1(id int primary key, a int, b int, index(a));
delimiter;;
create procedure idata()
begin
 declare i int;

 set i=1;
 while(i<=1000)do
 insert into t1 values(i, i, i);
 set i=i+1;
 end while;
end;;
delimiter ;
call idata();

The function is to insert 1000 statements into t1, from (1,1,1) to (1000,1000,1000).

Execute select id%10 as m, count(*) as c from t1 group by m;

Analysis:

Using index means that this statement uses a covering index and selects index a, so there is no need to return to the table.
Using temporary means using a temporary table;
Using filesort indicates that sorting is required.

process:

1. Create a temporary memory table with two fields m and c. The primary key is m.
2. Scan the index a of table t1, take out the id values ​​on the leaf nodes one by one, and calculate the result of id%10, which is recorded as x;
1) If there is no row with primary key x in the temporary table, insert a record (x,1);
2) If there is a row with primary key x in the table, add 1 to the c value of the row x;

Step 2 If it is found that the total field length stored in the temporary table in memory reaches the size set by the parameter tmp_table_size, the temporary table in memory will be upgraded to a temporary table on disk, and then the traversal calculation will be restarted.
3. After the traversal is completed, sort according to field m and return the result set to the client.

The final sorting is the operation in the dashed box in the figure below. If the size of sort_buffer is not large enough, a temporary table will be used to assist in sorting.

optimization

The overall process of the unoptimized (that is, the grouping columns have no index) group by can be summarized as follows: because the data is unordered, a temporary table needs to be created, and then each data is judged one by one to which group it belongs, and finally sorted according to the grouping columns. Therefore, there are two approaches to optimization:

Remove sorting

If the returned data does not need to be sorted, you can prohibit sorting, that is, change the above statement to select a,count(*) from t group by a order by null.

Sort by

If the records are sorted by the sort field, the data becomes the following structure:

In this way, when actually obtaining the fields to be returned or calculating the aggregate function, you only need to access them in sequence. When the column value becomes the next one, you know that the current group access is over and the previously counted data is returned directly. This avoids the creation of temporary tables, and sorting does not require additional sorting using sort_buffer. This greatly improves the efficiency of execution.

accomplish

1. If the grouping field is suitable for creating an index, create an index for the grouping field directly.

MySQL version 5.7 supports the generated column mechanism, which is used to implement the associated update of column data. You can create a column z as follows, and then create an index on the z column (if it is MySQL 5.6 and earlier, you can also create a normal column and index to solve this problem)

alter table t1 add column z int generated always as(id % 100), add index(z);

Then parse:

No temporary table or additional sorting is used, so performance is improved.

2. If the grouping field is not suitable (the usage rate is very low), you can use SQL_BIG_RESULT to try to optimize.

By adding the SQL_BIG_RESULT hint to the group by statement, you can tell the optimizer that this statement involves a large amount of data and that you should use a temporary disk table directly. The MySQL optimizer sees that the disk temporary table is stored in a B+ tree, and its storage efficiency is not as high as that of an array. Therefore, since SQL_BIG_RESULT is used to indicate a large amount of data, considering the disk space, it is better to store it directly in an array. So after using SQL_BIG_RESULT the optimizer will use a disk temporary table of array structure.

However, if the conditions for using a disk temporary table are not met, the disk temporary table will not be used. That is, when the sort_buffer space can store the total length of the fields to be returned and sorted, the array structure sort_buffer will be used. If the total field exceeds the sort_buffer size, then an array structure disk temporary table will be added to assist in sorting.

If there is enough space in sort_buffer, the data will be sorted inside sort_buffer, which will serve as an index.

Still using the above example, using SQL_BIG_RESULT

alter table t1 add column z int generated always as(id % 100), add index(z);

The specific process is as follows:

1. Initialize sort_buffer and put an integer field into it, denoted as m;
2. Scan index a of table t1, take out the id values ​​in it one by one, and store the value of id%10 in sort_buffer;
3. After the scan is completed, sort the field m of sort_buffer (if the memory of sort_buffer is insufficient, temporary files on disk will be used to assist in sorting);
4. After the sorting is completed, an ordered array is obtained.

Analysis:

It can be seen that the temporary table is not used at this time, but sort_buffer is used directly for sorting, which saves the performance consumption caused by using the temporary table.

Summarize

1. If there is no requirement to sort the results of the group by statement, add order by null after the statement; then generally there is no need to use a temporary table (the above two optimizations are both proposed under the premise of requiring sorting)
2. Try to let the group by process use the index of the above table. The confirmation method is to check if there is no Using temporary and Using filesort in the explain result.
3. If the amount of data that needs to be counted in group by is not large, try to use only temporary tables in memory; you can also avoid using temporary tables on disk by appropriately increasing the tmp_table_size parameter;
4. If the amount of data is too large, use the SQL_BIG_RESULT hint to tell the optimizer to directly use the sorting algorithm to get the group by result.

The above is the detailed explanation of MySQL Group by optimization. For more information about MySQL Group by optimization, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of the group by statement in MySQL database group query
  • Detailed explanation of group by and having in MySQL
  • MySQL efficient query left join and group by (plus index)
  • How to optimize MySQL group by statement
  • How to use MySQL group by and order by together
  • mysql group by grouping multiple fields
  • Implement group by based on MySQL to get the latest data of each group
  • A brief discussion on group by in MySQL

<<:  Detailed explanation of 5 solutions for CSS intermediate adaptive layout

>>:  Solution to the failure of docker windows10 shared directory mounting

Recommend

Encapsulate a simplest ErrorBoundary component to handle react exceptions

Preface Starting from React 16, the concept of Er...

Linux common text processing commands and vim text editor

Today, let's introduce several common text pr...

Tutorial on installing AutoFs mount service under Linux

Whether it is Samba service or NFS service, the m...

Detailed analysis of Vue child components and parent components

Table of contents 1. Parent components and child ...

Vue implements star rating with decimal points

This article shares the specific code of Vue to i...

VMware Workstation 14 Pro installs CentOS 7.0

The specific method of installing CentOS 7.0 on V...

More than 100 lines of code to implement react drag hooks

Preface The source code is only more than 100 lin...

How to use not in to optimize MySql

Recently, when using select query in a project, I...

MySQL tutorial thoroughly understands stored procedures

Table of contents 1. Concepts related to stored p...

Detailed tutorial on installing Docker and docker-compose suite on Windows

Table of contents Introduction Download and insta...

Detailed explanation of Nginx status monitoring and log analysis

1. Nginx status monitoring Nginx provides a built...

HTML exceeds the text line interception implementation principle and code

The HTML code for intercepting text beyond multipl...