How to optimize MySQL group by statement

How to optimize MySQL group by statement

In MySQL, create a new table with three fields, id, a, and b. Insert 1,000 records with the same fields, as follows:

mysql> show create table t1\G
*************************** 1. row ***************************
    Table: t1
Create Table: CREATE TABLE `t1` (
 `id` int(11) NOT NULL,
 `a` int(11) DEFAULT NULL,
 `b` int(11) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> select * from t1 limit 10;
+----+------+------+
| id | a | b |
+----+------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
| 7 | 7 | 7 |
| 8 | 8 | 8 |
| 9 | 9 | 9 |
| 10 | 10 | 10 |
+----+------+------+
10 rows in set (0.00 sec)

When we execute the following SQL containing group by, we can see the execution plan:

mysql> explain select id%10 as m, count(*) as c from t1 group by m limit 10;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY,a | a | 5 | NULL | 1000 | 100.00 | Using index; Using temporary; Using filesort |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+
1 row in set, 1 warning (0.00 sec)

At the end there are:

  • Using index: Covering index
  • using temporary: using a temporary memory table
  • using filesort: sorting operation is used

In order to better understand the execution process of this group by statement, I draw a picture to represent it:

Comparing with the above table, we can easily find that the execution process of this group by statement is as follows:

a. First, create a temporary memory table with two fields m and c. The primary key is m; m is id%10, and c is the number of counts (*)

b. Scan index a of table t1, take out the id values ​​on the leaf nodes one by one, calculate the result of id%10, and record it as x; if there is no row with primary key x in the temporary table, insert a record (x,1); if there is a row with primary key x in the table, add 1 to the c value of row x;

c. After the traversal is completed, sorting is performed according to field m, and the result set is returned to the client. (Note that this sorting action is automatically added by group by.)

If we don't want the group by statement to automatically sort for us, we can add order by null at the end of the statement, so that we can remove the sorting process after order by. as follows:

mysql> explain select id%10 as m, count(*) as c from t1 group by m order by null;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY,a | a | 5 | NULL | 1000 | 100.00 | Using index; Using temporary |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

As you can see, the words "using filesort" at the end of explain have disappeared. Let’s look at the results again:

mysql> select id%10 as m, count(*) as c from t1 group by m;
+------+-----+
| m | c |
+------+-----+
| 0 | 100 |
| 1 | 100 |
| 2 | 100 |
| 3 | 100 |
| 4 | 100 |
| 5 | 100 |
| 6 | 100 |
| 7 | 100 |
| 8 | 100 |
| 9 | 100 |
+------+-----+
10 rows in set (0.00 sec)
mysql> select id%10 as m, count(*) as c from t1 group by m order by null;
+------+-----+
| m | c |
+------+-----+
| 1 | 100 |
| 2 | 100 |
| 3 | 100 |
| 4 | 100 |
| 5 | 100 |
| 6 | 100 |
| 7 | 100 |
| 8 | 100 |
| 9 | 100 |
| 0 | 100 |
+------+-----+
10 rows in set (0.00 sec)

When we do not add order by null, group by will automatically sort for us, so the record with m=0 will be at the first position. If we add order by null, then group by will not automatically sort, and the record with m=0 will be at the end.

In our current statement, there are a total of 1000 records in table t1. Modulo 10, there are only 10 results, which can be placed in the temporary memory table. The temporary memory table is controlled by tmp_table_size in MySQL.

mysql> show variables like "%tmp_table%";
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| max_tmp_tables | 32 |
| tmp_table_size | 39845888 |
+----------------+----------+
2 rows in set, 1 warning (0.00 sec)

When our results are large enough and the temporary table in memory is not enough to save them, MySQL will use the temporary table on disk, and the entire access speed will become very slow. So how do we optimize the group by operation?

01

Group by optimized index

It is not difficult to see from the above description that when group by is used for grouping, the temporary tables created all have a unique index. If the amount of data is large, the execution speed of group by will be very slow. To optimize this situation, we have to analyze why group by requires a temporary table?

This problem is actually because the logic of group by is to count the number of times different values ​​appear. Since the results of each row of records after group by are unordered, a temporary table is needed to store these intermediate result sets. What if all our values ​​were arranged and in order?

For example, we have a table whose record id column is:

0,0,0,1,1,2,2,2,2,3,4,4,

When we use group by, we just accumulate the same values ​​from left to right. This way there is no need for a temporary table.

We are all familiar with the above structure. When we create an index on a data column, the column itself is sorted. When group by is based on this column, there is no need to sort in this process because the index is naturally sorted. To achieve this optimization, we add a new column z to table t1 as follows:

mysql> alter table t1 add column z int generated always as(id % 10), add index(z);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> select z as m, count(*) as c from t1 group by z;
+------+-----+
| m | c |
+------+-----+
| 0 | 100 |
| 1 | 100 |
| 2 | 100 |
| 3 | 100 |
| 4 | 100 |
| 5 | 100 |
| 6 | 100 |
| 7 | 100 |
| 8 | 100 |
| 9 | 100 |
+------+-----+
10 rows in set (0.00 sec)

mysql> explain select z as m, count(*) as c from t1 group by z;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | z | z | 5 | NULL | 1000 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

We add a new field z, whose value is the value after id%10, and create an index. Then we group the z column through group by. We can see that there is no temporary table in the result.

Therefore, using indexes can help us remove the temporary tables that group by depends on.

02

Group by optimization---direct sorting

If we already know that the amount of data in the table is very large and the temporary memory table is definitely not enough to accommodate the sorting, we can actually skip the sorting process of the temporary memory table by telling group by to perform disk sorting.

In fact, there is such a method in MySQL: adding the SQL_BIG_RESULT hint in the group by statement can tell the optimizer: this statement involves a large amount of data, please use the disk temporary table directly. When we use this statement, MySQL will automatically use the array method to organize the fields in the disk temporary table instead of the B+ tree we all know. Regarding this knowledge point, here is an introduction to the official document:

SQL_BIG_RESULT or SQL_SMALL_RESULT can be used with GROUP BY or DISTINCT to tell the optimizer that the result set has many rows or is small, respectively. For SQL_BIG_RESULT, MySQL directly uses disk-based temporary tables if they are created, and prefers sorting to using a temporary table with a key on the GROUP BY elements. For SQL_SMALL_RESULT, MySQL uses in-memory temporary tables to store the resulting table instead of using sorting. This should not normally be needed.

The entire group by processing process will become:

a. Initialize sort_buffer and put an integer field into it, denoted as m;

b. Scan index a of table t1, take out the id values ​​in it one by one, and store the value of id%100 into sort_buffer;

c. After the scan is completed, sort the field m of sort_buffer (if the memory of sort_buffer is insufficient, temporary files on disk will be used to assist in sorting);

d. After sorting is completed, an ordered array is obtained. Like 0,0,0,1,1,2,2,3,3,3,4,4,4,4

e. Based on the ordered array, get the different values ​​in the array and the number of occurrences of each value.

In yesterday’s article, we analyzed the use of temporary tables in union statements. Today, we analyzed the use of temporary tables in group by statements. So when does MySQL use temporary tables?

When does MySQL use internal temporary tables?

1. If the statement execution process can read data and get results directly, no additional memory is required. Otherwise, additional memory is required to save the intermediate results.

2. If the execution logic requires the use of two-dimensional table features, temporary tables will be given priority. For example, UNION requires a unique index constraint, and GROUP BY requires another field to store the cumulative count.

The above is the details of how to optimize MySQL group by statements. For more information about MySQL group by optimization, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of the group by statement in MySQL database group query
  • Detailed explanation of group by and having in MySQL
  • MySQL efficient query left join and group by (plus index)
  • Detailed explanation of MySQL Group by optimization
  • How to use MySQL group by and order by together
  • mysql group by grouping multiple fields
  • Implement group by based on MySQL to get the latest data of each group
  • A brief discussion on group by in MySQL

<<:  How to use vuex in Vue project

>>:  Summary of common docker commands (recommended)

Recommend

What are the core modules of node.js

Table of contents Global Object Global objects an...

How to set an alias for a custom path in Vue

How to configure custom path aliases in Vue In ou...

Vue element implements table adding, deleting and modifying data

This article shares the specific code of vue elem...

How to write beautiful HTML code

What Beautiful HTML Code Looks Like How to write ...

How to modify the group to which a user belongs in Linux

Modify the group to which a user belongs in Linux...

Two methods to stretch the background image of a web page

There are two solutions: One is CSS, using backgro...

Detailed installation process of mysql5.7.21 under win10

This article shares the installation of MySQL 5.7...

How to Rename a Group of Files at Once on Linux

In Linux, we usually use the mv command to rename...

Detailed explanation of the difference between tinyint and int in MySQL

Question: What is the difference between int(1) a...

mysql5.5 installation graphic tutorial under win7

MySQL installation is relatively simple, usually ...

A simple method to implement scheduled backup of MySQL database in Linux

Here are the detailed steps: 1. Check the disk sp...

HTML table tag tutorial (17): table title vertical alignment attribute VALIGN

The table caption can be placed above or below th...