External temporary tables A temporary table created by CREATE TEMPORARY TABLE is called an external temporary table. This temporary table is only visible to the current user and will be automatically closed when the current session ends. This temporary table can have the same name as a non-temporary table (the non-temporary table will not be visible to the current session until the temporary table is deleted). Internal temporary table An internal temporary table is a special lightweight temporary table used for performance optimization. This temporary table is automatically created by MySQL and is used to store intermediate results of certain operations. These operations may be included in the optimization phase or in the execution phase. This internal table is not visible to the user, but you can use EXPLAIN or SHOW STATUS to check whether MYSQL uses an internal temporary table to help complete an operation. Internal temporary tables play a very important role in the optimization process of SQL statements. Many operations in MySQL rely on internal temporary tables for optimization. However, using internal temporary tables requires creating tables and storing and replacing intermediate data, so users should try to avoid using temporary tables when writing SQL statements. There are two types of internal temporary tables: one is the HEAP temporary table, in which all data is stored in memory and no IO operation is required for operations on this table. The other is the OnDisk temporary table. As the name implies, this temporary table stores data on disk. OnDisk temporary tables are used to process operations with large intermediate results. If the data stored in the HEAP temporary table is larger than MAX_HEAP_TABLE_SIZE (for details, please refer to the system variables section in the MySQL manual), the HEAP temporary table will be automatically converted to an OnDisk temporary table. In 5.7, OnDisk temporary tables can use the MyISAM engine or the InnoDB engine through the INTERNAL_TMP_DISK_STORAGE_ENGINE system variable. This article mainly introduces which operations may use internal temporary tables. If users can use internal temporary tables as little as possible for query optimization when writing SQL statements, the efficiency of query execution will be effectively improved. First we define a table t1. CREATE TABLE t1( a int, b int); INSERT INTO t1 VALUES(1,2),(3,4); All the operations below are based on table t1. Using the SQL_BUFFER_RESULT hint in SQL statements SQL_BUFFER_RESULT is mainly used to allow MySQL to release the lock on the table as early as possible. Because if the amount of data is large, it takes a long time to send the data to the client. By buffering the data in a temporary table, the time that the read lock occupies the table can be effectively reduced. For example: mysql> explain format=json select SQL_BUFFER_RESULT * from t1; EXPLAIN { "query_block": { "select_id": 1, "cost_info": { "query_cost": "2.00" }, "buffer_result": { "using_temporary_table": true, "table": { "table_name": "t1", "access_type": "ALL", ... If the SQL statement contains DERIVED_TABLE. In 5.7, due to the adoption of a new optimization method, we need to use set optimizer_switch='derived_merge=off' to prohibit the derived table from being merged into the outer query. For example: mysql> explain format=json select * from (select * from t1) as tt; EXPLAIN { "query_block": { "select_id": 1, "cost_info": { "query_cost": "2.40" }, "table": { "table_name": "tt", "access_type": "ALL", ... "materialized_from_subquery": { "using_temporary_table": true, ... If we query the system table, the data in the system table will be stored in an internal temporary table. We currently cannot use EXPLAIN to check whether internal temporary tables are required to read system table data, but we can use SHOW STATUS to check whether internal temporary tables are used. For example: mysql> select * from information_schema.character_sets; mysql> show status like 'CREATE%'; If the DISTINCT statement is not optimized away, that is, the DISTINCT statement is optimized and converted to a GROUP BY operation or the DISTINCT is eliminated using a UNIQUE INDEX, an internal temporary table will be used. mysql> explain format=json select distinct a from t1; EXPLAIN { { "query_block": { "select_id": 1, "cost_info": { "query_cost": "1.60" }, "duplicates_removal": { "using_temporary_table": true, ... If the query has an ORDER BY clause and cannot be optimized away. In the following cases, internal temporary tables are used to cache intermediate data and then sort the intermediate data. 1) If the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access) For example: 1))BNL is turned on by default mysql> explain format=json select * from t1, t1 as t2 order by t1.a; EXPLAIN { "query_block": { "select_id": 1, "cost_info": { "query_cost": "22.00" }, "ordering_operation": { "using_temporary_table": true, ... 2)) After turning off BNL, ORDER BY will use filesort directly. mysql> set optimizer_switch='block_nested_loop=off'; Query OK, 0 rows affected (0.00 sec) mysql> explain format=json select * from t1, t1 as t2 order by t1.a; EXPLAIN { "query_block": { "select_id": 1, "cost_info": { "query_cost": "25.00" }, "ordering_operation": { "using_filesort": true, ... 2) The ORDER BY columns do not belong to the columns of the first joined table in the execution plan. For example: mysql> explain format=json select * from t as t1, t as t2 order by t2.a; EXPLAIN { "query_block": { "select_id": 1, "cost_info": { "query_cost": "25.00" }, "ordering_operation": { "using_temporary_table": true, ... 3) If the ORDER BY expression is a complex expression. So what kind of ORDER BY expression does MySQL consider to be a complex expression? 1)) If the sort expression is a SP or UDF. For example: drop function if exists func1; delimiter | create function func1(x int) returns int deterministic begin declare z1, z2 int; set z1 = x; set z2 = z1+2; return z2; end| delimiter ; explain format=json select * from t1 order by func1(a); { "query_block": { "select_id": 1, "cost_info": { "query_cost": "2.20" }, "ordering_operation": { "using_temporary_table": true, ... 2)) The ORDER BY column contains an aggregate function To simplify the execution plan, we use INDEX to optimize the GROUP BY statement. For example: create index idx1 on t1(a); explain format=json SELECt a FROM t1 group by a order by sum(a); | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "1.20" }, "ordering_operation": { "using_temporary_table": true, "using_filesort": true, "grouping_operation": { "using_filesort": false, ... drop index idx1 on t1; 3)) The ORDER BY column contains a SCALAR SUBQUERY, but the SCALAR SUBQUERY is not optimized. For example: explain format=json select (select rand() from t1 limit 1) as a from t1 order by a; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "1.20" }, "ordering_operation": { "using_temporary_table": true, "using_filesort": true, ... 4) If the query has both ORDER BY and GROUP BY statements, but the columns used in the two statements are different. Note: If it is 5.7, we need to set sql_mode to non-only_full_group_by mode, otherwise an error will be reported. Also to simplify the execution plan, we use INDEX to optimize the GROUP BY statement. For example: set sql_mode=''; create index idx1 on t1(b); explain format=json select t1.a from t1 group by t1.b order by 1; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "1.40" }, "ordering_operation": { "using_temporary_table": true, "using_filesort": true, "grouping_operation": { "using_filesort": false, ... drop index idx1 on t1; If the query has a GROUP BY clause and cannot be optimized away. In the following cases, internal temporary tables are used to cache intermediate data, and then GROUP BY is performed on the intermediate data. 1) If the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access). For example: explain format=json select t2.a from t1, t1 as t2 group by t1.a; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "8.20" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "4.00" ... 2) If the GROUP BY column does not belong to the first joined table in the execution plan. For example: explain format=json select t2.a from t1, t1 as t2 group by t2.a; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "8.20" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "nested_loop": [ ... 3) If the columns used in the GROUP BY statement are different from the columns used in the ORDER BY statement. For example: set sql_mode=''; explain format=json select t1.a from t1 group by t1.b order by t1.a; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "1.40" }, "ordering_operation": { "using_filesort": true, "grouping_operation": { "using_temporary_table": true, "using_filesort": false, ... 4) If GROUP BY contains ROLLUP and is based on multi-table outer joins. For example: explain format=json select sum(t1.a) from t1 left join t1 as t2 on true group by t1.a with rollup; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "7.20" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "4.00" }, ... 5) If the columns used in the GROUP BY statement are from SCALAR SUBQUERY and are not optimized away. For example: explain format=json select (select avg(a) from t1) as a from t1 group by a; | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "3.40" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "2.00" }, ... Convert IN expressions to semi-join for optimization 1) If the semi-join execution mode is Materialization For example: set optimizer_switch='firstmatch=off,duplicateweedout=off'; explain format=json select * from t1 where a in (select b from t1); | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "5.60" }, "nested_loop": [ { "rows_examined_per_scan": 1, "materialized_from_subquery": { "using_temporary_table": true, "query_block": { "table": { "table_name": "t1", "access_type": "ALL", ... 2) If the semi-join execution mode is Duplicate Weedout For example: set optimizer_switch='firstmatch=off'; explain format=json select * from t1 where a in (select b from t1); | { "query_block": { "select_id": 1, "cost_info": { "query_cost": "4.80" }, "duplicates_removal": { "using_temporary_table": true, "nested_loop": [ { ... If the query statement contains UNION, MySQL will use an internal temporary table to help the UNION operation eliminate duplicates. For example: explain format=json select * from t1 union select * from t1; | { "query_block": { "union_result": { "using_temporary_table": true, "table_name": "", ... If the query statement uses multiple table updates. Here, Explain cannot show that the internal temporary table is used, so you need to check the status. For example: update t1, t1 as t2 set t1.a=3; show status like 'CREATE%'; If the aggregate function contains the following functions, the internal temporary table will also be used. 1) count(distinct *) For example: explain format=json select count(distinct a) from t1; 2) group_concat For example: explain format=json select group_concat(b) from t1; In summary, the 10 situations listed above, MySQL will use internal temporary tables to cache intermediate results. If the amount of data is large, the internal temporary table will store the data on disk, which will obviously affect performance. In order to minimize performance loss, we need to avoid the above situation as much as possible. Summarize The above is all the content of this article on the detailed usage of two MySQL temporary tables. I hope it will be helpful to everyone. Interested friends can refer to: Several important MySQL variables, Detailed explanation of MySQL prepare principles, How to delete MySQL table data, etc. If you have any questions, please leave a message. Everyone is welcome to communicate and discuss. You may also be interested in:
|
<<: How to use cc.follow for camera tracking in CocosCreator
>>: Centos 7.4 server time synchronization configuration method [based on NTP service]
environment Hostname ip address Serve Jenkins 192...
After installing Navicat The following error may ...
Elastic stack, commonly known as ELK stack, is a ...
This article example shares the specific code of ...
Preface Share two methods to monitor whether an e...
When we type a letter on the keyboard, how is it ...
MySQL installation instructions MySQL is a relati...
Similar structures: Copy code The code is as foll...
This article shares the specific code for JavaScr...
1. Key points for early planning of VMware vSpher...
Introduction Describes the use cases and solution...
Sometimes in our actual work, we need to import d...
Note: nginx installed via brew Website root direc...
Note: It is recommended that the virtual machine ...
For Windows User Using openGauss in Docker Pull t...