In-depth explanation of MySQL common index and unique index

Scenario

1. Maintain a citizen system with a field for ID number

2. The business code can ensure that no two duplicate ID numbers are written (if the business cannot guarantee this, you can rely on the unique index of the database to constrain it)

3. Common SQL query statements: SELECT name FROM CUser WHERE id_card = 'XXX'

4. Create an index

The ID number is relatively large, so it is not recommended to set it as the primary key
From a performance perspective, should you choose a normal index or a unique index?

Assume that the values in field k are not repeated

Query process

1. Query statement: SELECT id FROM T WHERE k=5

2. Query process

Starting from the root of the B+ tree, search by layer to the leaf node, which is the data page in the lower right corner of the above figure.
Use binary search to locate specific records within the data page

3. For common indexes

Find the first record that meets the condition (5,500), then find the next record until the first record that does not meet k=5 is found

4. For unique index

Since the index defines uniqueness, the search stops after the first record that meets the conditions is found.

Performance Differences

1. Performance difference: very small

2. InnoDB data is read and written in units of data pages, the default is 16KB

3. When a record needs to be read, the record itself is not read from the disk, but read in units of data pages.

4. When the record with k=5 is found, the data page where it is located is already in memory

5. For ordinary indexes, only one more pointer search and one more calculation are required – CPU consumption is very low

If the record k=5 happens to be the last record of the data page, then if you want to get the next record, you need to read the next data page.
Very low probability: For an integer field index, a data page (16KB, compact format) can store approximately 745 values

change buffer

1. When a data page needs to be updated, if the data page is in memory, it is updated directly

2. If the data page is not in memory, without affecting data consistency

InnoDB caches these update operations in the change buffer
There is no need to read this data page from disk (random read)
The next time a query needs to access this data page, the data page is read into memory and the operations related to this data page in the change buffer are performed (merge)

3. Change buffer is persistent data, which has a copy in memory and is also written to disk

4. Record the update operation in the channge buffer first to reduce random disk reads and improve the execution speed of the statement

5. In addition, reading data pages into memory requires occupying the buffer pool. Using channge buffer can avoid occupying memory and improve memory utilization

6. The change buffer uses the memory in the buffer pool and cannot be increased indefinitely. The control parameter innodb_change_buffer_max_size

# Default is 25, maximum is 50
mysql> SHOW VARIABLES LIKE '%innodb_change_buffer_max_size%';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| innodb_change_buffer_max_size | 25 |
+-------------------------------+-------+

merge

1. merge: apply the operations in the change buffer to the original data page

2. The execution process of merge

Read data pages from disk into memory (old version data pages)
Find the change buffer record for this data page from the change buffer (possibly multiple)
Then execute them one by one to get the new version of the data page
Write redolog, including: table update of data page + change buffer change

3. After the merge is executed, the data page in the memory and the disk page corresponding to the change buffer have not been modified and are dirty pages.

Through other mechanisms, dirty pages are flushed to the corresponding physical disk pages.

4. Trigger timing

Visit this data page
The system background thread merges periodically
The database is shut down normally

Conditions of Use

1. For a unique index, all update operations must first determine whether the operation violates the uniqueness constraint.

2. The update of unique index cannot use change buffer, only common index can use change buffer

The primary key cannot use the change buffer either.
For example, to insert (4,400), you must first determine whether there is a record with k=4 in the table. The premise of this determination is to read the data page into memory.
Since the data page has been read into the memory, it is enough to directly update the data page in the memory without writing the change buffer.

Usage scenarios

1. The more changes a data page has recorded in the change buffer before it is merged, the greater the benefit.

2. For businesses with more writes than reads, the probability of a page being accessed immediately after it is written is extremely low, so the use of change buffer is most effective at this time.

For example, billing and logging systems

3. If the update mode of a business is: query will be performed immediately after writing

Although the update operation is recorded in the change buffer, the data page will be read from the disk immediately after the query, triggering the merge process.
Instead of reducing random reads, it increases the cost of maintaining the change buffer.

Update Process

Insert(4,400)

The target page is in memory

For the unique index, find the position between 3 and 5, determine if there is no conflict, and insert this value
For a normal index, find the position between 3 and 5 and insert this value
Performance Difference: Minimal

The target page is not in memory

1. For a unique index, you need to read the data page into memory, determine if there is any conflict, and insert the value

Random disk reads are very expensive

For common indexes, just record the update operation in the change buffer.

Reduced random disk reads, significantly improved performance

Index selection

1. There is not much difference in query performance between common index and unique index. The main consideration is update performance. It is recommended to choose common index.

2. Scenarios where it is recommended to close the change buffer

If all updates are followed by queries for this record
Control parameter innodb_change_buffering

mysql> SHOW VARIABLES LIKE '%innodb_change_buffering%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| innodb_change_buffering | all |
+-------------------------+-------+

# Valid Values (>= 5.5.4)
none / inserts / deletes / changes / purges / all

# Valid Values (<= 5.5.3)
none / inserts

# The predecessor of the change buffer is the insert buffer, which can only optimize insert operations

change buffer + redolog

Update Process

Current state of the k-tree: After finding the corresponding location, the data page Page 1 where k1 is located is in memory, and the data page Page 2 where k2 is located is not in memory

INSERT INTO t(id,k) VALUES (id1,k1),(id2,k2);

# Memory: buffer pool
# redolog:ib_logfileX
# Data table space: t.ibd
# System tablespace: ibdata1

1. Page 1 is in memory, update the memory directly

2. Page 2 is not in memory, record in the changer buffer: add (id2,k2) to Page 2

3. The above two actions are included in the redolog (disk sequential write)

4. The transaction is now complete and the cost of executing the update statement is very low

Write twice to memory + once to disk

5. When a transaction is committed, the change buffer operation records are also recorded in the redolog

Therefore, the change buffer can be restored during crash recovery.

The dotted line is the background operation and does not affect the response time of the update operation

Reading process

Assumption: The read statement occurs shortly after the update statement, the data in memory is still there, and it has nothing to do with the system tablespace (ibdata1) and redolog (ib_logfileX).

SELECT * FROM t WHERE k IN (k1,k2);

1. Read Page 1 and return it directly from memory (at this time, Page 1 may still be a dirty page and has not actually been written to disk)

2. Read Page 2, read the data page into memory through random disk read, and then apply the operation log in the change buffer (merge)

Generate a correct version and return

Improve update performance

1. redolog: saves random disk write IO consumption (sequential write)

2. Change buffer: saves IO consumption of random disk reads

References

"MySQL Practice 45 Lectures"

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support for 123WORDPRESS.COM.

You may also be interested in:

Detailed Analysis of the Selection of MySQL Common Index and Unique Index
Which one should I choose between MySQL unique index and normal index?
Efficiency comparison between common index and unique index in MySQL
Detailed explanation of the difference between MySQL normal index and unique index

<<: Comparison of two implementation methods of Vue drop-down list

>>: Detailed tutorial on installing Docker and nvidia-docker on Ubuntu 16.04