Detailed explanation of the usage of two types of temporary tables in MySQL

Detailed explanation of the usage of two types of temporary tables in MySQL

External temporary tables

A temporary table created by CREATE TEMPORARY TABLE is called an external temporary table. This temporary table is only visible to the current user and will be automatically closed when the current session ends. This temporary table can have the same name as a non-temporary table (the non-temporary table will not be visible to the current session until the temporary table is deleted).

Internal temporary table

An internal temporary table is a special lightweight temporary table used for performance optimization. This temporary table is automatically created by MySQL and is used to store intermediate results of certain operations. These operations may be included in the optimization phase or in the execution phase. This internal table is not visible to the user, but you can use EXPLAIN or SHOW STATUS to check whether MYSQL uses an internal temporary table to help complete an operation. Internal temporary tables play a very important role in the optimization process of SQL statements. Many operations in MySQL rely on internal temporary tables for optimization. However, using internal temporary tables requires creating tables and storing and replacing intermediate data, so users should try to avoid using temporary tables when writing SQL statements.

There are two types of internal temporary tables: one is the HEAP temporary table, in which all data is stored in memory and no IO operation is required for operations on this table. The other is the OnDisk temporary table. As the name implies, this temporary table stores data on disk. OnDisk temporary tables are used to process operations with large intermediate results. If the data stored in the HEAP temporary table is larger than MAX_HEAP_TABLE_SIZE (for details, please refer to the system variables section in the MySQL manual), the HEAP temporary table will be automatically converted to an OnDisk temporary table. In 5.7, OnDisk temporary tables can use the MyISAM engine or the InnoDB engine through the INTERNAL_TMP_DISK_STORAGE_ENGINE system variable.

This article mainly introduces which operations may use internal temporary tables. If users can use internal temporary tables as little as possible for query optimization when writing SQL statements, the efficiency of query execution will be effectively improved.

First we define a table t1.

CREATE TABLE t1( a int, b int); INSERT INTO t1 VALUES(1,2),(3,4);

All the operations below are based on table t1.

Using the SQL_BUFFER_RESULT hint in SQL statements

SQL_BUFFER_RESULT is mainly used to allow MySQL to release the lock on the table as early as possible. Because if the amount of data is large, it takes a long time to send the data to the client. By buffering the data in a temporary table, the time that the read lock occupies the table can be effectively reduced.

For example:

mysql> explain format=json select SQL_BUFFER_RESULT * from t1; 
EXPLAIN 
{ 
 "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "2.00" 
}, 
"buffer_result": { 
 "using_temporary_table": true, 
 "table": { 
"table_name": "t1", 
"access_type": "ALL", 
...

If the SQL statement contains DERIVED_TABLE.

In 5.7, due to the adoption of a new optimization method, we need to use set optimizer_switch='derived_merge=off' to prohibit the derived table from being merged into the outer query.

For example:

mysql> explain format=json select * from (select * from t1) as tt; 
EXPLAIN 
{ 
 "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "2.40" 
}, 
"table": { 
 "table_name": "tt", 
 "access_type": "ALL", 
 ... 
 "materialized_from_subquery": { 
"using_temporary_table": true, 
...

If we query the system table, the data in the system table will be stored in an internal temporary table.

We currently cannot use EXPLAIN to check whether internal temporary tables are required to read system table data, but we can use SHOW STATUS to check whether internal temporary tables are used.

For example:

mysql> select * from information_schema.character_sets; 
mysql> show status like 'CREATE%';

If the DISTINCT statement is not optimized away, that is, the DISTINCT statement is optimized and converted to a GROUP BY operation or the DISTINCT is eliminated using a UNIQUE INDEX, an internal temporary table will be used.

mysql> explain format=json select distinct a from t1; 
EXPLAIN 
{ 
{ 
 "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "1.60" 
}, 
"duplicates_removal": { 
 "using_temporary_table": true, 
...

If the query has an ORDER BY clause and cannot be optimized away. In the following cases, internal temporary tables are used to cache intermediate data and then sort the intermediate data.

1) If the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access)

For example:

1))BNL is turned on by default

mysql> explain format=json select * from t1, t1 as t2 order by t1.a; 
EXPLAIN 
{ 
 "query_block": { 
 "select_id": 1, 
 "cost_info": { 
"query_cost": "22.00" 
 }, 
 "ordering_operation": { 
"using_temporary_table": true, 
 ...

2)) After turning off BNL, ORDER BY will use filesort directly.

mysql> set optimizer_switch='block_nested_loop=off'; 
Query OK, 0 rows affected (0.00 sec) 
mysql> explain format=json select * from t1, t1 as t2 order by t1.a; 
EXPLAIN 
{ 
  "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "25.00" 
}, 
"ordering_operation": { 
 "using_filesort": true, 
...

2) The ORDER BY columns do not belong to the columns of the first joined table in the execution plan.

For example:

mysql> explain format=json select * from t as t1, t as t2 order by t2.a; 
EXPLAIN 
{ 
  "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "25.00" 
}, 
"ordering_operation": { 
 "using_temporary_table": true, 
...

3) If the ORDER BY expression is a complex expression.

So what kind of ORDER BY expression does MySQL consider to be a complex expression?

1)) If the sort expression is a SP or UDF.

For example:

drop function if exists func1; 
delimiter | 
create function func1(x int) 
returns int deterministic 
begin 
declare z1, z2 int; 
set z1 = x; 
set z2 = z1+2; 
return z2; 
end| 
delimiter ; 
explain format=json select * from t1 order by func1(a); 
{ 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "2.20" 
}, 
"ordering_operation": { 
 "using_temporary_table": true, 
...

2)) The ORDER BY column contains an aggregate function

To simplify the execution plan, we use INDEX to optimize the GROUP BY statement.

For example:

 create index idx1 on t1(a); 
 explain format=json SELECt a FROM t1 group by a order by sum(a); 
 | { 
  "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "1.20" 
}, 
"ordering_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
 "grouping_operation": { 
"using_filesort": false, 
... 
 drop index idx1 on t1;

3)) The ORDER BY column contains a SCALAR SUBQUERY, but the SCALAR SUBQUERY is not optimized.

For example:

explain format=json select (select rand() from t1 limit 1) as a from t1 order by a; 
| { 
 "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "1.20" 
}, 
"ordering_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
...

4) If the query has both ORDER BY and GROUP BY statements, but the columns used in the two statements are different.

Note: If it is 5.7, we need to set sql_mode to non-only_full_group_by mode, otherwise an error will be reported.

Also to simplify the execution plan, we use INDEX to optimize the GROUP BY statement.

For example:

set sql_mode=''; 
create index idx1 on t1(b); 
explain format=json select t1.a from t1 group by t1.b order by 1; 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "1.40" 
}, 
"ordering_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
 "grouping_operation": { 
"using_filesort": false, 
... 
drop index idx1 on t1;

If the query has a GROUP BY clause and cannot be optimized away. In the following cases, internal temporary tables are used to cache intermediate data, and then GROUP BY is performed on the intermediate data.

1) If the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access).

For example:

explain format=json select t2.a from t1, t1 as t2 group by t1.a; 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "8.20" 
}, 
"grouping_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
 "cost_info": { 
"sort_cost": "4.00" 
...

2) If the GROUP BY column does not belong to the first joined table in the execution plan.

For example:

explain format=json select t2.a from t1, t1 as t2 group by t2.a; 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "8.20" 
}, 
"grouping_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
 "nested_loop": [ 
...

3) If the columns used in the GROUP BY statement are different from the columns used in the ORDER BY statement.

For example:

set sql_mode=''; 
explain format=json select t1.a from t1 group by t1.b order by t1.a; 
| { 
  "query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "1.40" 
}, 
"ordering_operation": { 
 "using_filesort": true, 
 "grouping_operation": { 
"using_temporary_table": true, 
"using_filesort": false, 
...

4) If GROUP BY contains ROLLUP and is based on multi-table outer joins.

For example:

explain format=json select sum(t1.a) from t1 left join t1 as t2 on true group by t1.a with rollup; 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "7.20" 
}, 
"grouping_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
 "cost_info": { 
"sort_cost": "4.00" 
 }, 
...

5) If the columns used in the GROUP BY statement are from SCALAR SUBQUERY and are not optimized away.

For example:

explain format=json select (select avg(a) from t1) as a from t1 group by a; 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "3.40" 
}, 
"grouping_operation": { 
 "using_temporary_table": true, 
 "using_filesort": true, 
 "cost_info": { 
"sort_cost": "2.00" 
 }, 
... 

Convert IN expressions to semi-join for optimization

1) If the semi-join execution mode is Materialization

For example:

set optimizer_switch='firstmatch=off,duplicateweedout=off'; 
explain format=json select * from t1 where a in (select b from t1); 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "5.60" 
}, 
"nested_loop": [ 
 { 
"rows_examined_per_scan": 1, 
 "materialized_from_subquery": { 
"using_temporary_table": true, 
"query_block": { 
 "table": { 
"table_name": "t1", 
"access_type": "ALL", 
... 

2) If the semi-join execution mode is Duplicate Weedout

For example:

set optimizer_switch='firstmatch=off'; 
explain format=json select * from t1 where a in (select b from t1); 
| { 
"query_block": { 
"select_id": 1, 
"cost_info": { 
 "query_cost": "4.80" 
}, 
"duplicates_removal": { 
 "using_temporary_table": true, 
 "nested_loop": [ 
{ 
...

If the query statement contains UNION, MySQL will use an internal temporary table to help the UNION operation eliminate duplicates.

For example:

explain format=json select * from t1 union select * from t1; 
| { 
"query_block": { 
"union_result": { 
 "using_temporary_table": true, 
 "table_name": "", 
... 

If the query statement uses multiple table updates.

Here, Explain cannot show that the internal temporary table is used, so you need to check the status.

For example:

update t1, t1 as t2 set t1.a=3; 
show status like 'CREATE%'; 

If the aggregate function contains the following functions, the internal temporary table will also be used.

1) count(distinct *) 
For example: 
explain format=json select count(distinct a) from t1; 
2) group_concat 

For example:

explain format=json select group_concat(b) from t1; 

In summary, the 10 situations listed above, MySQL will use internal temporary tables to cache intermediate results. If the amount of data is large, the internal temporary table will store the data on disk, which will obviously affect performance. In order to minimize performance loss, we need to avoid the above situation as much as possible.

Summarize

The above is all the content of this article on the detailed usage of two MySQL temporary tables. I hope it will be helpful to everyone. Interested friends can refer to: Several important MySQL variables, Detailed explanation of MySQL prepare principles, How to delete MySQL table data, etc. If you have any questions, please leave a message. Everyone is welcome to communicate and discuss.

You may also be interested in:
  • Detailed explanation of the difference between Mysql temporary table and partition table
  • Analysis of the principle and creation method of Mysql temporary table
  • Analysis of mysql temporary table usage [query results can be stored in temporary tables]
  • How to use MySQL 5.7 temporary tablespace to avoid pitfalls
  • MySQL FAQ series: When to use temporary tables
  • In-depth analysis of JDBC and MySQL temporary tablespace
  • Simple usage of MySQL temporary tables
  • The difference between Update and select in MySQL for single and multiple tables, and views and temporary tables
  • A brief discussion on MySQL temporary tables and derived tables
  • Basic creation and use tutorial of temporary tables in MySQL
  • Some basic usage methods of temporary tables in MySQL
  • How to use temporary tables to speed up MySQL queries
  • Examples of using temporary tables in MySQL

<<:  How to use cc.follow for camera tracking in CocosCreator

>>:  Centos 7.4 server time synchronization configuration method [based on NTP service]

Recommend

The process of deploying a project to another host using Jenkins

environment Hostname ip address Serve Jenkins 192...

How to connect to MySQL visualization tool Navicat

After installing Navicat The following error may ...

How to build a multi-node Elastic stack cluster on RHEL8 /CentOS8

Elastic stack, commonly known as ELK stack, is a ...

Native js to implement a simple calculator

This article example shares the specific code of ...

How to use JS to check if an element is within the viewport

Preface Share two methods to monitor whether an e...

Overview of the Differences between Linux TTY/PTS

When we type a letter on the keyboard, how is it ...

MySQL installation tutorial under Windows with pictures and text

MySQL installation instructions MySQL is a relati...

Several situations where div is covered by iframe and their solutions

Similar structures: Copy code The code is as foll...

JavaScript to achieve simple provincial and municipal linkage

This article shares the specific code for JavaScr...

MySQL compression usage scenarios and solutions

Introduction Describes the use cases and solution...

Java imports data from excel into mysql

Sometimes in our actual work, we need to import d...

Implementation of multi-site configuration of Nginx on Mac M1

Note: nginx installed via brew Website root direc...

Methods and steps for deploying GitLab environment based on Docker

Note: It is recommended that the virtual machine ...

Detailed explanation of how to configure openGauss database in docker

For Windows User Using openGauss in Docker Pull t...