Detailed explanation of the basic implementation principle of MySQL DISTINCT

Detailed explanation of the basic implementation principle of MySQL DISTINCT

Preface

DISTINCT is actually very similar to the implementation of the GROUP BY operation, except that only one record is taken from each group after GROUP BY. Therefore, the implementation of DISTINCT is basically the same as that of GROUP BY, with not much difference. The same can be achieved through loose index scan or compact index scan. Of course, when DISTINCT cannot be completed using only indexes, MySQL can only complete it through temporary tables.

However, one difference from GROUP BY is that DISTINCT does not require sorting. That is to say, if the query that only performs DISTINCT operation cannot complete the operation using only the index, MySQL will use a temporary table to "cache" the data, but will not perform a filesort operation on the data in the temporary table.

Of course, if we use GROUP BY and perform grouping when performing DISTINCT, and use aggregate functions such as MAX, filesort cannot be avoided.

Below we will use several simple Query examples to demonstrate the implementation of DISTINCT.

1. First, let's look at the DISTINCT operation performed through a loose index scan:

sky@localhost : example 11:03:41> EXPLAIN SELECT DISTINCT group_id 
  -> FROM group_messageG
*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: range
possible_keys: NULL
     key: idx_gid_uid_gc
   key_len: 4
     ref: NULL
     rows: 10
    Extra: Using index for group-by
1 row in set (0.00 sec)

We can clearly see that the Extra information in the execution plan is "Using index for group-by". What does this mean? Why does the execution plan tell me that GROUP BY is performed through the index when I have not performed a GROUP BY operation?

In fact, this is related to the implementation principle of DISTINCT. In the process of implementing DISTINCT, grouping is also required, and then one piece of data is taken from each group and returned to the client. The Extra information here tells us that MySQL completes the entire operation using loose index scan.

Of course, if MySQL Query Optimizer could be more user-friendly and change the information here to "Using index for distinct", it would be better and easier to understand, haha.

2. Let's look at an example of scanning through a compact index:

sky@localhost : example 11:03:53> EXPLAIN SELECT DISTINCT user_id 
  -> FROM group_message
  -> WHERE group_id = 2G
*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: ref
possible_keys: idx_gid_uid_gc
     key: idx_gid_uid_gc
   key_len: 4
     ref: const
     rows: 4
    Extra: Using WHERE; Using index
1 row in set (0.00 sec)

The display here is exactly the same as implementing GROUP BY through compact index scan. In fact, during the implementation of this Query, MySQL will let the storage engine scan all index keys of group_id = 2 to obtain all user_ids, and then use the sorted characteristics of the index to retain one piece of information each time the index key value of user_id is changed. The entire DISTINCT operation can be completed when all index keys of group_id = 2 are scanned.

3. Now let's see what happens when DISTINCT cannot be accomplished using an index alone:

sky@localhost : example 11:04:40> EXPLAIN SELECT DISTINCT user_id 
  -> FROM group_message
  -> WHERE group_id > 1 AND group_id < 10G
*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: range
possible_keys: idx_gid_uid_gc
     key: idx_gid_uid_gc
   key_len: 4
     ref: NULL
     rows: 32
    Extra: Using WHERE; Using index; Using temporary
1 row in set (0.00 sec)

When MySQL cannot complete the DISTINCT operation relying solely on the index, it has to use a temporary table to perform the corresponding operation. But we can see that when MySQL uses temporary tables to complete DISTINCT, it is slightly different from processing GROUP BY, that is, there is no filesort.

In fact, in MySQL's grouping algorithm, sorting is not necessarily required to complete the grouping operation. I have already mentioned this in the above GROUP BY optimization tips. In fact, MySQL implements the grouping and completes the DISTINCT operation without sorting, so the filesort sorting operation is missing.

4. Finally, try combining it with GROUP BY:

sky@localhost : example 11:05:06> EXPLAIN SELECT DISTINCT max(user_id) 
  -> FROM group_message
  -> WHERE group_id > 1 AND group_id < 10
  -> GROUP BY group_idG
*************************** 1. row ***************************
      id: 1
 SELECT_type: SIMPLE
    table: group_message
     type: range
possible_keys: idx_gid_uid_gc
     key: idx_gid_uid_gc
   key_len: 4
     ref: NULL
     rows: 32
    Extra: Using WHERE; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

Finally, let's take a look at this example of using an aggregate function together with GROUP BY. Compared with the third example above, we can see that there is an additional filesort sorting operation because we used the MAX function. To obtain the MAX value after grouping, the operation cannot be completed using the index, so it can only be done through sorting.

Since the implementation of DISTINCT is basically the same as that of GROUP BY, this article will not draw pictures to show the implementation process.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Comparison and usage of distinct and group by statements in MySQL
  • The basic principle of distinct statement in MySQL and its comparison with group by
  • Index optimization in MySQL DISTINCT statement and DISTINCT multi-field operation
  • Discussion on the distinct statement in MySQL to query duplicate records and related performance
  • Comparison of usage of distinct and count(*) in MySQL
  • Analyze the techniques for optimizing distinct in MySQL
  • Analysis of MySQL: single table distinct, multi-table group by query to remove duplicate records

<<:  Mini Program to Implement Calculator Function

>>:  Detailed explanation of the solution to the nginx panic problem

Recommend

JavaScript to show and hide the drop-down menu

This article shares the specific code for JavaScr...

Introduction to the use of base link tag base

<br />When you click the link, the web page ...

Tomcat uses thread pool to handle remote concurrent requests

By understanding how tomcat handles concurrent re...

A brief discussion on several situations where MySQL returns Boolean types

mysql returns Boolean type In the first case, ret...

Do designers need to learn to code?

Often, after a web design is completed, the desig...

TypeScript enumeration basics and examples

Table of contents Preface What are enums in TypeS...

15 Vim quick reference tables to help you increase your efficiency by N times

I started using Linux for development and enterta...

Docker-compose image release process analysis of springboot project

Introduction The Docker-Compose project is an off...

Solution to the problem that docker nginx cannot be accessed after running

## 1 I'm learning docker deployment recently,...

Solve the problem of garbled Chinese characters in Mysql5.7

When using MySQL 5.7, you will find that garbled ...

In-depth analysis of MySQL deadlock issues

Preface If our business is at a very early stage ...

Four modes of Oracle opening and closing

>1 Start the database In the cmd command windo...

How to ensure that every page of WeChat Mini Program is logged in

Table of contents status quo Solution Further sol...