MySQL join buffer principle

1. MySQL join buffer
2. JoinBufferCache storage space allocation
3. Ordinary multi-table query implementation
4. Select implementation of joinbuffer
5. Let’s take an example to illustrate this process

1. Not using joinbuffer
2. Use joinbuffer
3. Reasons why cache can be optimized

1. MySQL join buffer

In the process of MySQL processing join operations, join buffer is an important concept and an important optimization method for MySQL table join. Although this concept is not complicated to implement, it is an important method to optimize MySQL join connections. It can greatly improve the efficiency of join queries when "brute force" connections are performed.

The authoritative description of this concept comes from the description of this concept in the MySQL documentation. The description is short, but concise and explains the main implementation idea of this optimization:
Assume you have the following join:

Table name Type
t1 range
t2 ref
t3 ALL
The join is then done as follows:
 
- While rows in t1 matching range
 - Read through all rows in t2 according to reference key
  - Store used fields from t1, t2 in cache
  - If cache is full
    - Read through all rows in t3
      - Compare t3 row against all t1, t2 combinations in cache
        - If row satisfies join condition, send it to client
    - Empty cache
 
- Read through all rows in t3
 - Compare t3 row against all stored t1, t2 combinations in cache
   - If row satisfies join condition, send it to client

2. Join buffer cache storage space allocation

In the following function, table_count indicates the number of non-const tables before this table in all join tables, because this table needs to cache the "need to read" records in all previous tables (tables[i].table->read_set is set).

Each execution of the double loop copies the description structure of the field to be cached (and its corresponding data source). In other words, the double loop is only for assignment and metadata storage, and the final cache->buff=(uchar*) my_malloc(size,MYF(0)) is the actual allocation of the record content that meets the conditions.

static int
join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)
{
…
  for (i=0 ; i < table_count ; i++)
  {
    bool have_bit_fields = FALSE;
    uint null_fields=0,used_fields;
    Field **f_ptr,*field;
    MY_BITMAP *read_set = tables[i].table->read_set;
    for (f_ptr=tables[i].table->field,used_fields=tables[i].used_fields;
 used_fields ;
 f_ptr++)
    {
      field= *f_ptr;
      if (bitmap_is_set(read_set, field->field_index))
      {
used_fields--;
length+=field->fill_cache_field(copy);
…
      }
  }
 
  cache->length=length+blobs*sizeof(char*);
  cache->blobs=blobs;
  *blob_ptr=0; /* End sequence */
  size=max(thd->variables.join_buff_size, cache->length);
  if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))
    DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */
  cache->end=cache->buff+size;
  reset_cache_write(cache);
  DBUG_RETURN(0);
}

3. Ordinary multi-table query implementation

Of course, this "ordinary" can also be understood as "simple" and "intuitive", which is also the execution process in most cases. Normal query is actually a recursive call to each table, which is exactly the same as matrix multiplication. This correspondence is very intuitive and very common.

This regular query action is implemented through the sub_select function, which essentially executes

tsecer_select()
{
for (r = first ; r != end ; r = next)
{
if(sofartest())
{
nexttable.tsecer_select()
}
}
}

The sofartest() here means "use all currently read tables to make judgments", which is the expression pushed down in where. For example, in the query select * from a, b where aa > 10 and bb + aa = 10, after table a is read, the check whether aa > 10 is already possible. Of course, this is a description method that is not even pseudocode, and the actual code corresponds to:

enum_nested_loop_state
sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
{
…
    error= (*join_tab->read_first_record)(join_tab);
    rc = evaluate_join_record(join, join_tab, error);
…
  while (rc == NESTED_LOOP_OK)
  {
    error = info->read_record(info);
    rc = evaluate_join_record(join, join_tab, error);
  }
…
  return rc;
}
static enum_nested_loop_state
evaluate_join_record(JOIN *join, JOIN_TAB *join_tab,
                     int error)
{
…
  if (select_cond)
  {
    select_cond_result = test(select_cond->val_int());
 
    /* check for errors evaluating the condition */
    if (join->thd->is_error())
      return NESTED_LOOP_ERROR;
  }
…
    if (found)
    {
      enum enum_nested_loop_state rc;
      /* A match from join_tab is found for the current partial join. */
      rc = (*join_tab->next_select)(join, join_tab+1, 0);
      if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS)
        return rc;
      if (join->return_tab < join_tab)
        return NESTED_LOOP_OK;
      /*
        Test if this was a SELECT DISTINCT query on a table that
        was not in the field list; In this case we can abort if
        we found a row, as no new rows can be added to the result.
      */
      if (not_used_in_distinct && found_records != join->found_records)
        return NESTED_LOOP_NO_MORE_ROWS;
    }
…
}

As you can see here, this is a recursion, used to generate a Cartesian cross product set, which is very concise and cute in terms of both program implementation and mathematical expression.
In the MySQL implementation, the for loop in the tsecer_select function is roughly equivalent to the while loop in sub_select, and the contents of the loop body in the tsecer_select function are placed in the evaluate_join_record function, where sofartest corresponds to evaluate_join_record::test(select_cond->val_int()); the nexttable.tsecer_select() statement in tsecer_select corresponds to evaluate_join_record::(*join_tab->next_select)(join, join_tab+1, 0).

4. Select implementation of join buffer

When using join buffer cache, next_select function points to sub_select_cache

enum_nested_loop_state
sub_select_cache(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
{
  enum_nested_loop_state rc;
 
  if (end_of_records)
  {
    rc = flush_cached_records(join,join_tab,FALSE);
    if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS)
      rc = sub_select(join,join_tab,end_of_records);
    return rc;
  }
  if (join->thd->killed) // If aborted by user
  {
    join->thd->send_kill_message();
    return NESTED_LOOP_KILLED; /* purecov: inspected */
  }
  if (join_tab->use_quick != 2 || test_if_quick_select(join_tab) <= 0)
  {
    if (!store_record_in_cache(&join_tab->cache))
      return NESTED_LOOP_OK; // There is more room in cache
    return flush_cached_records(join,join_tab,FALSE);
  }
  rc = flush_cached_records(join, join_tab, TRUE);
  if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS)
    rc = sub_select(join, join_tab, end_of_records);
  return rc;
}

Combined with the instructions in the MySQL documentation, the meaning of the code here is quite obvious. The judgment of end_of_records at the beginning corresponds to

    if (!store_record_in_cache(&join_tab->cache))
      return NESTED_LOOP_OK; // There is more room in cache
    return flush_cached_records(join,join_tab,FALSE);

correspond

  - Store used fields from t1, t2 in cache
  - If cache is full

The store_record_in_cache function determines whether the cache is full. If the cache can hold more cache, it stores the previous table's combined records in the cache and returns NESTED_LOOP_OK. Note: This place can be said to be the key to the entire cache optimization, because the table scan is not started here. On the other hand, if the cache data is full, the flush_cached_records function is called to perform the following process:

    - Read through all rows in t3
      - Compare t3 row against all t1, t2 combinations in cache
        - If row satisfies join condition, send it to client
    - Empty cache

The special thing about this process is that the traversal is driven by comparing each record of the table with all the combinations of t1 and t2 in the cache to determine whether the push-down where condition is satisfied (If row satisfies join condition), then the join_tab->next_select function is executed (send it to client).

static enum_nested_loop_state
flush_cached_records(JOIN *join,JOIN_TAB *join_tab,bool skip_last)
{
…
  info= &join_tab->read_record;
  do
  {//Traverse all records in table t3...
        for (i=(join_tab->cache.records- (skip_last ? 1 : 0)) ; i-- > 0 ;)
        {//Traverse all t1 and t2 record combinations in cache read_cached_record(join_tab);
          skip_record = FALSE;
          if (select && select->skip_record(join->thd, &skip_record))
          {//
            reset_cache_write(&join_tab->cache);
            return NESTED_LOOP_ERROR;
          }
          if (!skip_record)
          {//Satisfy the push-down where condition//Execute the traversal of the next table rc= (join_tab->next_select)(join,join_tab+1,0);
            if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS)
            {
              reset_cache_write(&join_tab->cache);
              return rc;
            }
          }
…
  } while (!(error=info->read_record(info)));

5. Let’s take an example to illustrate this process

The core idea of this implementation is not complicated, and it is even simpler and more intuitive when combined with specific examples.
For example, we use two simple tables, one storing a value of x and the other storing a value of y. We want to calculate all the values in these two tables that satisfy x through a join operation.

x + y

y == 5 * 5, which is the most common classic Pythagorean number value of "the hypotenuse is three, the leg is four, the side is five".

mysql> create table harry (x int);
Query OK, 0 rows affected (0.03 sec)
 
mysql> insert harry values (1),(2),(3),(4),(5);
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
 
mysql> create table tsecer (y int);                   
Query OK, 0 rows affected (0.01 sec)
 
mysql> insert tsecer values (1),(2),(3),(4),(5);     
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
 
mysql> explain select * from harry, tsecer where x * x + y * y = 5 * 5;
+----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+
| 1 | SIMPLE | harry | ALL | NULL | NULL | NULL | NULL | 5 | |
| 1 | SIMPLE | tsecer | ALL | NULL | NULL | NULL | NULL | 5 | Using where; Using join buffer |
+----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+
2 rows in set (0.00 sec)
 
mysql>

1. Not using joinbuffer

Without the join buffer, for each x value in the harry table, the corresponding tsecer table must be fully scanned, and then the combination of x and y is used to determine whether x is satisfied.

x + y

The condition y == 5 * 5. Since x has a total of 5 values, tsecer needs to scan the entire table 5 times.

2. Use joinbuffer

For each value of x, the tsecer table first caches this value in the joinbuffer when executing. If the buffer content is not empty, the value of x at this time is stored in the buffer and then returned directly. When the join buffer is full or it is the last record, the scan of the tsecer table is started. For each record read from the tsecer table, it is combined with each record previously cached to see whether it meets its own judgment conditions.
For the example we have seen, the five values of the harry table are all in the cache. During the scan of the tsecer table, for each record read from tsecer, combined with the "each" cache in the cache, it is determined whether the combined result meets the conditions. If any group meets the conditions, then continue with next_select.
In this example using buffer, you can see that only one scan is performed on the tsecer table. Generally speaking, the database scan code is the highest (because it involves disk reads). Using buffer reduces the scan of the tsecer table to one, so the efficiency is greatly improved, especially when multiple tables are involved and/or the number of records in each table is large.

3. Reasons why cache can be optimized

Essentially, the reason for this efficiency improvement is that the "utilization rate" of each record obtained from the table is improved. When using the intuitive scanning method, the full table scan only matches one combination, while after using the buffer, it matches all combinations in the cache.

The above is the detailed content of the MySQL join buffer principle. For more information about MySQL join buffer, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in: