In MySQL, create a new table with three fields, id, a, and b. Insert 1,000 records with the same fields, as follows: mysql> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) NOT NULL, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> select * from t1 limit 10; +----+------+------+ | id | a | b | +----+------+------+ | 1 | 1 | 1 | | 2 | 2 | 2 | | 3 | 3 | 3 | | 4 | 4 | 4 | | 5 | 5 | 5 | | 6 | 6 | 6 | | 7 | 7 | 7 | | 8 | 8 | 8 | | 9 | 9 | 9 | | 10 | 10 | 10 | +----+------+------+ 10 rows in set (0.00 sec) When we execute the following SQL containing group by, we can see the execution plan: mysql> explain select id%10 as m, count(*) as c from t1 group by m limit 10; +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+ | 1 | SIMPLE | t1 | NULL | index | PRIMARY,a | a | 5 | NULL | 1000 | 100.00 | Using index; Using temporary; Using filesort | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+ 1 row in set, 1 warning (0.00 sec) At the end there are:
In order to better understand the execution process of this group by statement, I draw a picture to represent it: Comparing with the above table, we can easily find that the execution process of this group by statement is as follows: a. First, create a temporary memory table with two fields m and c. The primary key is m; m is id%10, and c is the number of counts (*) b. Scan index a of table t1, take out the id values on the leaf nodes one by one, calculate the result of id%10, and record it as x; if there is no row with primary key x in the temporary table, insert a record (x,1); if there is a row with primary key x in the table, add 1 to the c value of row x; c. After the traversal is completed, sorting is performed according to field m, and the result set is returned to the client. (Note that this sorting action is automatically added by group by.) If we don't want the group by statement to automatically sort for us, we can add order by null at the end of the statement, so that we can remove the sorting process after order by. as follows: mysql> explain select id%10 as m, count(*) as c from t1 group by m order by null; +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | t1 | NULL | index | PRIMARY,a | a | 5 | NULL | 1000 | 100.00 | Using index; Using temporary | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec) As you can see, the words "using filesort" at the end of explain have disappeared. Let’s look at the results again: mysql> select id%10 as m, count(*) as c from t1 group by m; +------+-----+ | m | c | +------+-----+ | 0 | 100 | | 1 | 100 | | 2 | 100 | | 3 | 100 | | 4 | 100 | | 5 | 100 | | 6 | 100 | | 7 | 100 | | 8 | 100 | | 9 | 100 | +------+-----+ 10 rows in set (0.00 sec) mysql> select id%10 as m, count(*) as c from t1 group by m order by null; +------+-----+ | m | c | +------+-----+ | 1 | 100 | | 2 | 100 | | 3 | 100 | | 4 | 100 | | 5 | 100 | | 6 | 100 | | 7 | 100 | | 8 | 100 | | 9 | 100 | | 0 | 100 | +------+-----+ 10 rows in set (0.00 sec) When we do not add order by null, group by will automatically sort for us, so the record with m=0 will be at the first position. If we add order by null, then group by will not automatically sort, and the record with m=0 will be at the end. In our current statement, there are a total of 1000 records in table t1. Modulo 10, there are only 10 results, which can be placed in the temporary memory table. The temporary memory table is controlled by tmp_table_size in MySQL. mysql> show variables like "%tmp_table%"; +----------------+----------+ | Variable_name | Value | +----------------+----------+ | max_tmp_tables | 32 | | tmp_table_size | 39845888 | +----------------+----------+ 2 rows in set, 1 warning (0.00 sec) When our results are large enough and the temporary table in memory is not enough to save them, MySQL will use the temporary table on disk, and the entire access speed will become very slow. So how do we optimize the group by operation? 01 Group by optimized index It is not difficult to see from the above description that when group by is used for grouping, the temporary tables created all have a unique index. If the amount of data is large, the execution speed of group by will be very slow. To optimize this situation, we have to analyze why group by requires a temporary table? This problem is actually because the logic of group by is to count the number of times different values appear. Since the results of each row of records after group by are unordered, a temporary table is needed to store these intermediate result sets. What if all our values were arranged and in order? For example, we have a table whose record id column is: 0,0,0,1,1,2,2,2,2,3,4,4, When we use group by, we just accumulate the same values from left to right. This way there is no need for a temporary table. We are all familiar with the above structure. When we create an index on a data column, the column itself is sorted. When group by is based on this column, there is no need to sort in this process because the index is naturally sorted. To achieve this optimization, we add a new column z to table t1 as follows: mysql> alter table t1 add column z int generated always as(id % 10), add index(z); Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select z as m, count(*) as c from t1 group by z; +------+-----+ | m | c | +------+-----+ | 0 | 100 | | 1 | 100 | | 2 | 100 | | 3 | 100 | | 4 | 100 | | 5 | 100 | | 6 | 100 | | 7 | 100 | | 8 | 100 | | 9 | 100 | +------+-----+ 10 rows in set (0.00 sec) mysql> explain select z as m, count(*) as c from t1 group by z; +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t1 | NULL | index | z | z | 5 | NULL | 1000 | 100.00 | Using index | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) We add a new field z, whose value is the value after id%10, and create an index. Then we group the z column through group by. We can see that there is no temporary table in the result. Therefore, using indexes can help us remove the temporary tables that group by depends on. 02 Group by optimization---direct sorting If we already know that the amount of data in the table is very large and the temporary memory table is definitely not enough to accommodate the sorting, we can actually skip the sorting process of the temporary memory table by telling group by to perform disk sorting. In fact, there is such a method in MySQL: adding the SQL_BIG_RESULT hint in the group by statement can tell the optimizer: this statement involves a large amount of data, please use the disk temporary table directly. When we use this statement, MySQL will automatically use the array method to organize the fields in the disk temporary table instead of the B+ tree we all know. Regarding this knowledge point, here is an introduction to the official document:
The entire group by processing process will become: a. Initialize sort_buffer and put an integer field into it, denoted as m; b. Scan index a of table t1, take out the id values in it one by one, and store the value of id%100 into sort_buffer; c. After the scan is completed, sort the field m of sort_buffer (if the memory of sort_buffer is insufficient, temporary files on disk will be used to assist in sorting); d. After sorting is completed, an ordered array is obtained. Like 0,0,0,1,1,2,2,3,3,3,4,4,4,4 e. Based on the ordered array, get the different values in the array and the number of occurrences of each value. In yesterday’s article, we analyzed the use of temporary tables in union statements. Today, we analyzed the use of temporary tables in group by statements. So when does MySQL use temporary tables? When does MySQL use internal temporary tables? 1. If the statement execution process can read data and get results directly, no additional memory is required. Otherwise, additional memory is required to save the intermediate results. 2. If the execution logic requires the use of two-dimensional table features, temporary tables will be given priority. For example, UNION requires a unique index constraint, and GROUP BY requires another field to store the cumulative count. The above is the details of how to optimize MySQL group by statements. For more information about MySQL group by optimization, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: How to use vuex in Vue project
>>: Summary of common docker commands (recommended)
Table of contents Global Object Global objects an...
How to configure custom path aliases in Vue In ou...
This article shares the specific code of vue elem...
Table of contents vite Build Configuration vite.c...
What Beautiful HTML Code Looks Like How to write ...
Modify the group to which a user belongs in Linux...
There are two solutions: One is CSS, using backgro...
This article shares the installation of MySQL 5.7...
In Linux, we usually use the mv command to rename...
When using MySQL to run certain statements, a dea...
Question: What is the difference between int(1) a...
MySQL installation is relatively simple, usually ...
First, let me explain that what we want to do is ...
Here are the detailed steps: 1. Check the disk sp...
The table caption can be placed above or below th...