1. MySQL join bufferIn the process of MySQL processing join operations, join buffer is an important concept and an important optimization method for MySQL table join. Although this concept is not complicated to implement, it is an important method to optimize MySQL join connections. It can greatly improve the efficiency of join queries when "brute force" connections are performed. The authoritative description of this concept comes from the description of this concept in the MySQL documentation. The description is short, but concise and explains the main implementation idea of this optimization: Table name Type t1 range t2 ref t3 ALL The join is then done as follows: - While rows in t1 matching range - Read through all rows in t2 according to reference key - Store used fields from t1, t2 in cache - If cache is full - Read through all rows in t3 - Compare t3 row against all t1, t2 combinations in cache - If row satisfies join condition, send it to client - Empty cache - Read through all rows in t3 - Compare t3 row against all stored t1, t2 combinations in cache - If row satisfies join condition, send it to client 2. Join buffer cache storage space allocationIn the following function, table_count indicates the number of non-const tables before this table in all join tables, because this table needs to cache the "need to read" records in all previous tables (tables[i].table->read_set is set). Each execution of the double loop copies the description structure of the field to be cached (and its corresponding data source). In other words, the double loop is only for assignment and metadata storage, and the final cache->buff=(uchar*) my_malloc(size,MYF(0)) is the actual allocation of the record content that meets the conditions. static int join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count) { … for (i=0 ; i < table_count ; i++) { bool have_bit_fields = FALSE; uint null_fields=0,used_fields; Field **f_ptr,*field; MY_BITMAP *read_set = tables[i].table->read_set; for (f_ptr=tables[i].table->field,used_fields=tables[i].used_fields; used_fields ; f_ptr++) { field= *f_ptr; if (bitmap_is_set(read_set, field->field_index)) { used_fields--; length+=field->fill_cache_field(copy); … } } cache->length=length+blobs*sizeof(char*); cache->blobs=blobs; *blob_ptr=0; /* End sequence */ size=max(thd->variables.join_buff_size, cache->length); if (!(cache->buff=(uchar*) my_malloc(size,MYF(0)))) DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */ cache->end=cache->buff+size; reset_cache_write(cache); DBUG_RETURN(0); } 3. Ordinary multi-table query implementationOf course, this "ordinary" can also be understood as "simple" and "intuitive", which is also the execution process in most cases. Normal query is actually a recursive call to each table, which is exactly the same as matrix multiplication. This correspondence is very intuitive and very common. This regular query action is implemented through the sub_select function, which essentially executes tsecer_select() { for (r = first ; r != end ; r = next) { if(sofartest()) { nexttable.tsecer_select() } } } The sofartest() here means "use all currently read tables to make judgments", which is the expression pushed down in where. For example, in the query select * from a, b where aa > 10 and bb + aa = 10, after table a is read, the check whether aa > 10 is already possible. Of course, this is a description method that is not even pseudocode, and the actual code corresponds to: enum_nested_loop_state sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records) { … error= (*join_tab->read_first_record)(join_tab); rc = evaluate_join_record(join, join_tab, error); … while (rc == NESTED_LOOP_OK) { error = info->read_record(info); rc = evaluate_join_record(join, join_tab, error); } … return rc; } static enum_nested_loop_state evaluate_join_record(JOIN *join, JOIN_TAB *join_tab, int error) { … if (select_cond) { select_cond_result = test(select_cond->val_int()); /* check for errors evaluating the condition */ if (join->thd->is_error()) return NESTED_LOOP_ERROR; } … if (found) { enum enum_nested_loop_state rc; /* A match from join_tab is found for the current partial join. */ rc = (*join_tab->next_select)(join, join_tab+1, 0); if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS) return rc; if (join->return_tab < join_tab) return NESTED_LOOP_OK; /* Test if this was a SELECT DISTINCT query on a table that was not in the field list; In this case we can abort if we found a row, as no new rows can be added to the result. */ if (not_used_in_distinct && found_records != join->found_records) return NESTED_LOOP_NO_MORE_ROWS; } … } As you can see here, this is a recursion, used to generate a Cartesian cross product set, which is very concise and cute in terms of both program implementation and mathematical expression. 4. Select implementation of join bufferWhen using join buffer cache, next_select function points to sub_select_cache enum_nested_loop_state sub_select_cache(JOIN *join,JOIN_TAB *join_tab,bool end_of_records) { enum_nested_loop_state rc; if (end_of_records) { rc = flush_cached_records(join,join_tab,FALSE); if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS) rc = sub_select(join,join_tab,end_of_records); return rc; } if (join->thd->killed) // If aborted by user { join->thd->send_kill_message(); return NESTED_LOOP_KILLED; /* purecov: inspected */ } if (join_tab->use_quick != 2 || test_if_quick_select(join_tab) <= 0) { if (!store_record_in_cache(&join_tab->cache)) return NESTED_LOOP_OK; // There is more room in cache return flush_cached_records(join,join_tab,FALSE); } rc = flush_cached_records(join, join_tab, TRUE); if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS) rc = sub_select(join, join_tab, end_of_records); return rc; } Combined with the instructions in the MySQL documentation, the meaning of the code here is quite obvious. The judgment of end_of_records at the beginning corresponds to if (!store_record_in_cache(&join_tab->cache)) return NESTED_LOOP_OK; // There is more room in cache return flush_cached_records(join,join_tab,FALSE); correspond - Store used fields from t1, t2 in cache - If cache is full The store_record_in_cache function determines whether the cache is full. If the cache can hold more cache, it stores the previous table's combined records in the cache and returns NESTED_LOOP_OK. Note: This place can be said to be the key to the entire cache optimization, because the table scan is not started here. On the other hand, if the cache data is full, the flush_cached_records function is called to perform the following process: - Read through all rows in t3 - Compare t3 row against all t1, t2 combinations in cache - If row satisfies join condition, send it to client - Empty cache The special thing about this process is that the traversal is driven by comparing each record of the table with all the combinations of t1 and t2 in the cache to determine whether the push-down where condition is satisfied (If row satisfies join condition), then the join_tab->next_select function is executed (send it to client). static enum_nested_loop_state flush_cached_records(JOIN *join,JOIN_TAB *join_tab,bool skip_last) { … info= &join_tab->read_record; do {//Traverse all records in table t3... for (i=(join_tab->cache.records- (skip_last ? 1 : 0)) ; i-- > 0 ;) {//Traverse all t1 and t2 record combinations in cache read_cached_record(join_tab); skip_record = FALSE; if (select && select->skip_record(join->thd, &skip_record)) {// reset_cache_write(&join_tab->cache); return NESTED_LOOP_ERROR; } if (!skip_record) {//Satisfy the push-down where condition//Execute the traversal of the next table rc= (join_tab->next_select)(join,join_tab+1,0); if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS) { reset_cache_write(&join_tab->cache); return rc; } } … } while (!(error=info->read_record(info))); 5. Let’s take an example to illustrate this process The core idea of this implementation is not complicated, and it is even simpler and more intuitive when combined with specific examples. x + y y == 5 * 5, which is the most common classic Pythagorean number value of "the hypotenuse is three, the leg is four, the side is five". mysql> create table harry (x int); Query OK, 0 rows affected (0.03 sec) mysql> insert harry values (1),(2),(3),(4),(5); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> create table tsecer (y int); Query OK, 0 rows affected (0.01 sec) mysql> insert tsecer values (1),(2),(3),(4),(5); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> explain select * from harry, tsecer where x * x + y * y = 5 * 5; +----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+ | 1 | SIMPLE | harry | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | tsecer | ALL | NULL | NULL | NULL | NULL | 5 | Using where; Using join buffer | +----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+ 2 rows in set (0.00 sec) mysql> 1. Not using joinbufferWithout the join buffer, for each x value in the harry table, the corresponding tsecer table must be fully scanned, and then the combination of x and y is used to determine whether x is satisfied. x + y The condition y == 5 * 5. Since x has a total of 5 values, tsecer needs to scan the entire table 5 times. 2. Use joinbuffer For each value of x, the tsecer table first caches this value in the joinbuffer when executing. If the buffer content is not empty, the value of x at this time is stored in the buffer and then returned directly. When the join buffer is full or it is the last record, the scan of the tsecer table is started. For each record read from the tsecer table, it is combined with each record previously cached to see whether it meets its own judgment conditions. 3. Reasons why cache can be optimizedEssentially, the reason for this efficiency improvement is that the "utilization rate" of each record obtained from the table is improved. When using the intuitive scanning method, the full table scan only matches one combination, while after using the buffer, it matches all combinations in the cache. The above is the detailed content of the MySQL join buffer principle. For more information about MySQL join buffer, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Sample code for implementing follow ads with JavaScript
>>: Seven Principles of a Skilled Designer (2): Color Usage
Table of contents 1. What is Javascript? 2. What ...
Table of contents 01 JavaScript (abbreviated as: ...
Preface In fact, I have never encountered this ki...
Recently, the company is preparing to develop an ...
1. Add the viewport tag to the HTML header. At th...
In applications with paging queries, queries that...
In the nginx process model, tasks such as traffic...
Table of contents 1. Understanding the stack stru...
Today I will use the server nginx, and I also nee...
Table of contents We have written about drag and ...
Table of contents background How to determine whe...
Table of contents MAH 1. Introduction to MAH Arch...
Table of contents Preface Laying the foundation p...
Why use prettier? In large companies, front-end d...
Overview binlog2sql is an open source MySQL Binlo...