First, build the case demonstration table for this blog: create table a(a1 int primary key, a2 int ,index(a2)); --Both fields have indexescreate table c(c1 int primary key, c2 int ,index(c2), c3 int); --Both fields have indexescreate table b(b1 int primary key, b2 int); --Primary key indexcreate table d(d1 int, d2 int); --No indexinsert into a values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10); insert into b values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10); insert into c values(1,1,1),(2,4,4),(3,6,6),(4,5,5),(5,3,3),(6,3,3),(7,2,2),(8,8,8),(9,5,5),(10,3,3); insert into d values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10); How to choose the driver table?The concept of a driving table refers to the first table that is processed when multiple tables are queried. The records of this table are used to associate other tables. The determination of the driving table is critical, as it will directly affect the association order of multiple table connections and also determine the query performance during subsequent associations. The selection of the driving table follows a principle: Use the result set to select the driving table, what is the result set? How to calculate the result set? Before selecting, MySQL will make a result record estimate for each table that can be used as a driving table according to the screening conditions of each table in where, and estimate the number of rows returned by each table, and then multiply it by the total byte size of the fields queried in select:
Use where to estimate the number of result rows, following the following rules:
We use the table created above as a basis and use the following SQL as a case to demonstrate: select a.*,c.c2 from a join c on a.a2=c.c2 where a.a1>5 and c.c1>5; View its execution plan through explain: The first row in the explain display result is the driving table. In this case, table c is the driving table. If you modify the SQL, change the condition select a.*,c.* from a join c on a.a2=c.c2 where a.a1>5 and c.c1>5; View its execution plan through explain: At this time, the driving table is still c. Logically, the amount of data in c.* must be larger than that in a.*. It seems that the rule of result set size does not work here. In this case, if a is used as the driving table and is associated with table c through index c2, then you need to go back to the table for query again, because the data of c.* cannot be obtained through c2 alone, and you need to query again through the primary key c1 on c2. The previous SQL query is for c2, so no additional query is required. At the same time, because table a has only two fields, a.* can be directly obtained through the a2 index without the need for additional queries. To sum up, although the result set is larger when using the C table as the driver, it can reduce an extra table return query, so MySQL believes that using the C table as the driver is more efficient. The result set is a major factor in choosing a driving table, but it is not the only factor. What is the internal logic of the two-table association query?The association query between MySQL tables uses the Nested-Loop join algorithm, which is a nested loop join as the name suggests. However, there may be different variants depending on the scenario: such as Index Nested-Loop join, Simple Nested-Loop join, Block Nested-Loop join, Betched Key Access join, etc.
Let's first look at the case with an index, using the table created at the beginning of the blog, the sql is as follows: select a.*,c.* from a join c on a.a2=c.c2 where a.a1>4; View its execution plan through explain: First, determine the driving table a according to the logic of the first step, and then query a record a1=5 through a.a1>4, a., associate c2 of this record with the c table, obtain the primary key c1 on the c2 index, and then use the value of c1 to query c.* on the clustered index to form a complete result, put it into the net buffer, and then according to the condition a.a1>4, a. take the next record and repeat this process. The process diagram is as follows: The driven table is associated through the index, using the Index Nested-Loop join algorithm, and the MSYQL join buffer is not used. According to the filter conditions of the driver table, the indexes of the driven table are associated one by one. Each time a matching record is associated, it is put into the net-buffer and then the association continues. This buffer area is controlled by the net_buffer_length parameter, with a minimum of 4k, a maximum of 16M, and a default of 1M. If the net-buffer is full, send it to the client, clear the net-buffer, and continue the previous process. From the above process, we know that when each record of the driving table is associated with the driven table, if data not included in the index is needed, it is necessary to return to the table once to query the record on the clustered index. This is a random query process. Each record is a random query, and the performance is not very high. MySQL selectively optimizes this situation and converts this random query into a sequential query. The execution process is as follows: At this time, the Batched Key Access join algorithm will be used. As the name suggests, it is a batch key access connection. Query the driving table one by one according to the where condition, put the data rows that match the record into the join buffer, and then obtain the index record of the driven table according to the associated index and store it in the read_rnd_buffer. Both join buffer and read_rnd_buffer have size limits. When either reaches the upper limit, data processing for this batch will be stopped. After processing, the data will be cleared and the next batch will be executed. That is, the data that meets the conditions in the driving table may not be able to be processed all at once, but must be processed in batches. When the batch limit is reached, the index of the driven table in read_rnd_buffer is sorted in ascending order by primary key, so that approximate sequential query can be performed when returning to the table for query: As shown in the figure above, the left side is a schematic diagram of random query before sorting, and the right side is a schematic diagram of sequential query using MRR ( Because the data of MySQL's InnoDB engine is arranged according to the clustered index, when the non-clustered index is sorted according to the primary key, the primary key is used to query, which turns the random query into a sequential query. The computer's sequential query has a pre-reading mechanism, which will read up to 1M of data when reading a page of data. This is where sequential reading comes in handy. The BKA algorithm can optimize the execution logic when it is necessary to return the driven table to the table. If it is not necessary to return the table, then the BKA algorithm is naturally not needed. If you want to use the BKA optimization algorithm, you need to set it before executing the SQL statement: set optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on'; The first two parameters are used to enable MRR ( Finally, use explain to view the execution plan after turning on the parameter: The above are all cases where there are indexes associated with the driven table. Next, let's look at the case where there are no indexes associated with the driven table. If index join is not used, the simplest Of course, MySQL does not directly use Simple Nested-Loop join, but optimizes it. Instead of obtaining the data of the driving table one by one, it obtains multiple records, that is, obtains them piece by piece, which is named Block Nested-Loop join. Each time a batch of data is taken, the upper limit is the size of the join buffer, and then the driven table is fully scanned, each data is matched with all the rows in the join buffer, and the matched data are put into the final result set. This greatly reduces the number of times the driven table is scanned. The processes of The sample sql is as follows: select a.*, d.* from a join d on a.a2=d.d2 where a.a1>7; Use explain to view its execution plan: How to perform multi-table join? Is it the result set of joining two tables first and then associating the third table, or is there one record that runs through the whole thing?In fact, looking at the name of the join algorithm: Nested-Loop join, you can tell that it is a nested loop join of multiple tables, rather than first joining two tables to get the result and then joining them in sequence. Its form is similar to the following: for row1 in table1 filtered by where{ for row2 in table2 associated by table1.index1 filtered by where{ for row3 in table3 associated by table2.index2 filtered by where{ put into net-buffer then send to client; } } } For different join methods, there are the following situations: The sql is as follows: select a.*,b.*,c.* from a join c on a.a2=c.c2 join b on c.c2=b.b2 where b.b1>4; View its execution plan through explain: Its internal execution process is as follows: Before execution, the MySQL executor will determine the association order of each table. First, the first record b5 of the driving table b is filtered through the where condition, and then the associated field b2 of this record is associated with the index a2 of the second table a. The index position is located through Btree. There may be more than one matching index. When the previous item is matched, check whether there is a2's filtering condition in where and whether the condition requires data outside the index. If so, return to the table, use the primary key on the a2 index to query the data, and then make a judgment. Then use the joined information to associate Table C in Chapter 3 in the same way. This blog mainly talks about the above three problems, how to determine the driving table, the execution details of the association between two tables, and the execution process of the association between multiple tables. This is the end of this article about the execution details of MySQL multi-table join query. For more relevant MySQL multi-table join query content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Solution to the problem that Centos8 cannot install docker
>>: Website performance: Image and Cookie optimization and mobile application optimization
As the domestic network environment continues to ...
When we make a form, we often set a submit button ...
Docker Quickly Install Zookeeper I haven't us...
Table of contents Easy to use Create a project vu...
HTML tag: superscript In HTML, the <sup> tag...
Result: html <nav id="nav-1"> <...
1. Scenario description: My colleague taught me h...
For any DBMS, indexes are the most important fact...
Table of contents 1. Introduction to autofs servi...
When we introduced nginx, we also used nginx to s...
Update: Recently, it was discovered that the serv...
About let to avoid problems caused by closure Use...
Library Operations Query 1.SHOW DATABASE; ----Que...
After reinstalling my computer recently, I downlo...
background A colleague is working on his security...