A brief discussion on the execution details of Mysql multi-table join query

A brief discussion on the execution details of Mysql multi-table join query

First, build the case demonstration table for this blog:

create table a(a1 int primary key, a2 int ,index(a2)); --Both fields have indexescreate table c(c1 int primary key, c2 int ,index(c2), c3 int); --Both fields have indexescreate table b(b1 int primary key, b2 int); --Primary key indexcreate table d(d1 int, d2 int); --No indexinsert into a values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
insert into b values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
insert into c values(1,1,1),(2,4,4),(3,6,6),(4,5,5),(5,3,3),(6,3,3),(7,2,2),(8,8,8),(9,5,5),(10,3,3);  
insert into d values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);

How to choose the driver table?

The concept of a driving table refers to the first table that is processed when multiple tables are queried. The records of this table are used to associate other tables. The determination of the driving table is critical, as it will directly affect the association order of multiple table connections and also determine the query performance during subsequent associations.

The selection of the driving table follows a principle:在對最終結果集沒影響的前提下,優先選擇結果集最小的那張表作為驅動表. Changing the driver table means changing the connection order. The driver table can be optimized only if it will not change the final output result. In the case of outer joins, changing the driving table will often affect the output results. For example, the left table of a left join and the right table of a right join. If the driving table chooses the left or right side of the join, the final output results are likely to be different.

Use the result set to select the driving table, what is the result set? How to calculate the result set? Before selecting, MySQL will make a result record estimate for each table that can be used as a driving table according to the screening conditions of each table in where, and estimate the number of rows returned by each table, and then multiply it by the total byte size of the fields queried in select:

每行查詢字節數* 預估的行數= 預估結果集

Use where to estimate the number of result rows, following the following rules:

  • If there is no filter condition for the corresponding table in where, the default is the entire table, regardless of whether there is a relevant condition in on.
  • If there is a filter condition in where, but the index cannot be used for filtering, the default is the entire table.
  • If there is a filter condition in where and an index can be used, the number of rows returned will be estimated based on the index.

We use the table created above as a basis and use the following SQL as a case to demonstrate:

select a.*,c.c2 from a join c on a.a2=c.c2 where a.a1>5 and c.c1>5;

View its execution plan through explain:

insert image description here

The first row in the explain display result is the driving table. In this case, table c is the driving table.

If you modify the SQL, change the condition c.c2 in the select to c.* :

select a.*,c.* from a join c on a.a2=c.c2 where a.a1>5 and c.c1>5;

View its execution plan through explain:

insert image description here

At this time, the driving table is still c. Logically, the amount of data in c.* must be larger than that in a.*. It seems that the rule of result set size does not work here.

In this case, if a is used as the driving table and is associated with table c through index c2, then you need to go back to the table for query again, because the data of c.* cannot be obtained through c2 alone, and you need to query again through the primary key c1 on c2. The previous SQL query is for c2, so no additional query is required. At the same time, because table a has only two fields, a.* can be directly obtained through the a2 index without the need for additional queries.

To sum up, although the result set is larger when using the C table as the driver, it can reduce an extra table return query, so MySQL believes that using the C table as the driver is more efficient.

The result set is a major factor in choosing a driving table, but it is not the only factor.

What is the internal logic of the two-table association query?

The association query between MySQL tables uses the Nested-Loop join algorithm, which is a nested loop join as the name suggests. However, there may be different variants depending on the scenario: such as Index Nested-Loop join, Simple Nested-Loop join, Block Nested-Loop join, Betched Key Access join, etc.

  • When使用索引joins, there are two algorithms: Index Nested-Loop join and Batched Key Access join ;
  • When未使用索引, there are two algorithms: Simple Nested-Loop join and Block Nested-Loop join ;

Let's first look at the case with an index, using the table created at the beginning of the blog, the sql is as follows:

select a.*,c.* from a join c on a.a2=c.c2 where a.a1>4;

View its execution plan through explain:

insert image description here

First, determine the driving table a according to the logic of the first step, and then query a record a1=5 through a.a1>4, a., associate c2 of this record with the c table, obtain the primary key c1 on the c2 index, and then use the value of c1 to query c.* on the clustered index to form a complete result, put it into the net buffer, and then according to the condition a.a1>4, a. take the next record and repeat this process. The process diagram is as follows:

insert image description here

The driven table is associated through the index, using the Index Nested-Loop join algorithm, and the MSYQL join buffer is not used. According to the filter conditions of the driver table, the indexes of the driven table are associated one by one. Each time a matching record is associated, it is put into the net-buffer and then the association continues. This buffer area is controlled by the net_buffer_length parameter, with a minimum of 4k, a maximum of 16M, and a default of 1M. If the net-buffer is full, send it to the client, clear the net-buffer, and continue the previous process.

From the above process, we know that when each record of the driving table is associated with the driven table, if data not included in the index is needed, it is necessary to return to the table once to query the record on the clustered index. This is a random query process. Each record is a random query, and the performance is not very high. MySQL selectively optimizes this situation and converts this random query into a sequential query. The execution process is as follows:

insert image description here

At this time, the Batched Key Access join algorithm will be used. As the name suggests, it is a batch key access connection.

Query the driving table one by one according to the where condition, put the data rows that match the record into the join buffer, and then obtain the index record of the driven table according to the associated index and store it in the read_rnd_buffer. Both join buffer and read_rnd_buffer have size limits. When either reaches the upper limit, data processing for this batch will be stopped. After processing, the data will be cleared and the next batch will be executed. That is, the data that meets the conditions in the driving table may not be able to be processed all at once, but must be processed in batches.

When the batch limit is reached, the index of the driven table in read_rnd_buffer is sorted in ascending order by primary key, so that approximate sequential query can be performed when returning to the table for query:

insert image description here

insert image description here

As shown in the figure above, the left side is a schematic diagram of random query before sorting, and the right side is a schematic diagram of sequential query using MRR ( Multi-Range Read ) after sorting.

Because the data of MySQL's InnoDB engine is arranged according to the clustered index, when the non-clustered index is sorted according to the primary key, the primary key is used to query, which turns the random query into a sequential query. The computer's sequential query has a pre-reading mechanism, which will read up to 1M of data when reading a page of data. This is where sequential reading comes in handy.

The BKA algorithm can optimize the execution logic when it is necessary to return the driven table to the table. If it is not necessary to return the table, then the BKA algorithm is naturally not needed.

If you want to use the BKA optimization algorithm, you need to set it before executing the SQL statement:

set optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';

The first two parameters are used to enable MRR ( Multi-Range Read ). The reason for doing this is that the optimization of the BKA algorithm needs to rely on MRR. According to the official documentation, the current optimizer strategy is more inclined not to use MRR when judging consumption. Setting mrr_cost_based to off means that MRR is always used. )

Finally, use explain to view the execution plan after turning on the parameter:

insert image description here

The above are all cases where there are indexes associated with the driven table. Next, let's look at the case where there are no indexes associated with the driven table.

If index join is not used, the simplest Simple Nested-Loop join is to fetch a piece of data from the driving table according to the where condition, then scan the driven table in its entirety and put the records that meet the condition into the final result set. In this way, each record of the driving table is accompanied by a full table scan of the driven table. This is a Simple Nested-Loop join.

Of course, MySQL does not directly use Simple Nested-Loop join, but optimizes it. Instead of obtaining the data of the driving table one by one, it obtains multiple records, that is, obtains them piece by piece, which is named Block Nested-Loop join. Each time a batch of data is taken, the upper limit is the size of the join buffer, and then the driven table is fully scanned, each data is matched with all the rows in the join buffer, and the matched data are put into the final result set. This greatly reduces the number of times the driven table is scanned.

The processes of BNL ( Block Nested-Loop join ) and BKA ( Batched Key Access join ) are somewhat similar, but there is no read_rnd_buffer step.

The sample sql is as follows:

select a.*, d.* from a join d on a.a2=d.d2 where a.a1>7;

Use explain to view its execution plan:

insert image description here

How to perform multi-table join? Is it the result set of joining two tables first and then associating the third table, or is there one record that runs through the whole thing?

In fact, looking at the name of the join algorithm: Nested-Loop join, you can tell that it is a nested loop join of multiple tables, rather than first joining two tables to get the result and then joining them in sequence. Its form is similar to the following:

for row1 in table1 filtered by where{
	for row2 in table2 associated by table1.index1 filtered by where{
		for row3 in table3 associated by table2.index2 filtered by where{
			put into net-buffer then send to client;
		}
	}	
}

For different join methods, there are the following situations:

Index Nested-Loop join :

The sql is as follows:

select a.*,b.*,c.* from a join c on a.a2=c.c2 join b on c.c2=b.b2 where b.b1>4;

View its execution plan through explain:

insert image description here

Its internal execution process is as follows:

insert image description here

Before execution, the MySQL executor will determine the association order of each table. First, the first record b5 of the driving table b is filtered through the where condition, and then the associated field b2 of this record is associated with the index a2 of the second table a. The index position is located through Btree. There may be more than one matching index. When the previous item is matched, check whether there is a2's filtering condition in where and whether the condition requires data outside the index. If so, return to the table, use the primary key on the a2 index to query the data, and then make a judgment. Then use the joined information to associate Table C in Chapter 3 in the same way.

Block Nested-Loop join and Batched Key Access join : These two join algorithms are similar to Index Nested-Loop join algorithm, but because they can use the join buffer, they can filter a batch of data from the driving table each time instead of one row. At the same time, each join keyword corresponds to a join buffer, that is, the driving table and the second table use one join buffer, and the obtained block result set and the third chapter table use one join buffer.

This blog mainly talks about the above three problems, how to determine the driving table, the execution details of the association between two tables, and the execution process of the association between multiple tables.

This is the end of this article about the execution details of MySQL multi-table join query. For more relevant MySQL multi-table join query content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Comparison of efficiency between single-table query and multi-table join query in MySql database
  • Detailed explanation of MySQL multi-table join query
  • MySQL multi-table join introductory tutorial
  • MySQL multi-table join query example explanation
  • mysql three tables connected to create a view
  • A simple tutorial on optimizing table join queries in MySQL
  • Basic multi-table join query tutorial in MySQL
  • MySQL and PHP basics and application topics: table connection

<<:  Solution to the problem that Centos8 cannot install docker

>>:  Website performance: Image and Cookie optimization and mobile application optimization

Recommend

Some experience sharing on enabling HTTPS

As the domestic network environment continues to ...

How to use an image button as a reset form button

When we make a form, we often set a submit button ...

Detailed tutorial on how to quickly install Zookeeper in Docker

Docker Quickly Install Zookeeper I haven't us...

How to use vue3+TypeScript+vue-router

Table of contents Easy to use Create a project vu...

CSS3 to achieve menu hover effect

Result: html <nav id="nav-1"> <...

Detailed explanation of the EXPLAIN command and its usage in MySQL

1. Scenario description: My colleague taught me h...

Detailed explanation of MySQL combined index method

For any DBMS, indexes are the most important fact...

Linux Autofs automatic mount service installation and deployment tutorial

Table of contents 1. Introduction to autofs servi...

How to use squid to build a proxy server for http and https

When we introduced nginx, we also used nginx to s...

Avoiding Problems Caused by Closures in JavaScript

About let to avoid problems caused by closure Use...

Summary of basic operations for MySQL beginners

Library Operations Query 1.SHOW DATABASE; ----Que...

Detailed tutorial for installing MySQL 8.0.11 compressed version under win10

After reinstalling my computer recently, I downlo...

Correct modification steps for Docker's default network segment

background A colleague is working on his security...