Which one should I choose between MySQL unique index and normal index?

Which one should I choose between MySQL unique index and normal index?

Imagine a scenario where, when designing a user table, each person's ID number is unique and needs to be searched. However, since the ID number field is large, it is not suitable to be used as the primary key. When the business code has guaranteed that the inserted ID card is unique, you can choose to create a unique index and a normal index. How should you choose? Next, we will analyze the query and update execution process.

Query process

Assume that k is the index on table t. When searching for select id from t where k=5 , it will start from the root of the B+ tree k, search the leaf nodes layer by layer, find the data page with k=5, and then perform binary positioning on the data page content.

For a normal index, after finding the record with k=5, it will continue to search one more time until it encounters the first record that is not 5.

For a unique index, since the value is unique, stop searching after finding it.

Because InnoDB reads and writes in units of data pages (data pages are 16 KB by default), when a piece of data is read, the entire data page is read into the memory as a whole . In the data page read into the memory, if it contains records with k=5, in the case of query, the unique index has one more search and judgment process than the ordinary index, which can be ignored.

If k=5 is the last entry in the current data page, the next data page needs to be read. But the probability of this happening is low and can be ignored.

So in general, there is not much difference between ordinary indexes and unique indexes during the query process.

change buffer

Before analyzing the impact of unique indexes and common indexes, let's first understand the change buffer structure.

What is a change buffer?

When performing an update operation, if the data page to be updated is in memory, it will be updated directly. Otherwise, without affecting data consistency, InnoDB will cache the update operation in the change buffer, thereby eliminating the process of reading the data page from disk. When the next query operation reads the data page that needs to be updated, the update statement in the change buffer will be executed and written to the data page. The process of applying operations to the hard disk is called merge. The background thread will merge regularly or when the database is closed normally, a merge operation will also be performed.

The execution process of merge is as follows:

  1. Read the old version of the data page from disk.
  2. Find the records associated with the data page from the change buffer and apply them one by one to get the new version of the data page.
  3. Write redo log to record data changes and change buffer changes.

The change buffer is actually data that can be persisted to the hard disk, which means that the change buffer exists both in memory and on the hard disk. The change buffer was previously called the insert buffer. Initially, only the insert buffer was optimized, but later support for delete and update was added, and the name was changed to change buffer.

It can be seen that recording the update operation in the change buffer first reduces the process of reading disk data pages into memory, and the execution speed of the statement will be significantly improved. At the same time, reading data into memory will occupy the buffer pool memory, so reducing read operations also improves memory utilization.

The Buffer Pool is an area in memory where InnoDB caches table and index data as it accesses it. Allows frequently used data to be updated directly in memory, speeding up processing. On some dedicated servers, 80% of the physical memory is divided into buffer pool.

You can use innodb_change_buffer_max_size to set the size of the buffer pool occupied by the change buffer.

Change buffer application scenarios?

As mentioned above, the change buffer saves update records in advance, reducing the process of reading data pages and thus improving performance. In other words, if the change buffer contains more update records for different data pages, the benefit will be greater.

Therefore, for businesses with more writes and less reads (immediate query after update), the change buffer plays a greater role . Such as common billing and logging systems.

If the business is to query immediately after the update, although the update record can be placed in the change buffer, the merge process will be triggered immediately because the data page needs to be queried immediately afterwards. This will not reduce the number of random access IOs, but will increase the maintenance cost of the change buffer, which will have the opposite effect.

Update Process

For unique indexes, all update operations need to determine whether they violate the uniqueness constraint. Therefore, the required data pages must be read into memory and then updated directly without using the change buffer. Therefore, the change buffer is only useful for ordinary indexes.

For a specific analysis, insert a new record into a table:

If the data page to be updated by the new record is in memory:

For a unique index, find the appropriate position, determine if there is any conflict, insert the value, and the statement ends.

For a normal index: find the position, insert the value, and the statement ends.

Therefore, when the data page is in memory, the only difference between a unique index and a normal index is a judgment process. Can be ignored.

If the data page to be updated by the new record is not in memory:

For a unique index, the data page is read into memory, conflicts are determined, the data is inserted, and the statement ends.

For common indexes, the statement is recorded in the change buffer and the statement ends.

Since the random IO access from disk to memory is involved, it is one of the most expensive operations in the database. Ordinary indexes reduce read operations compared to unique indexes, which can significantly improve performance.

Choice of unique or normal index

By comparing the two in terms of query and update. We know that during the query process, except for extremely special circumstances, the difference between the two is actually not that big.

The main difference is the case when during an update, the data page to be updated is not in the content. At this time, the unique index cannot use the change buffer because it needs a uniqueness check. There is an additional process of reading data from the disk to the content, which involves random IO access and is relatively inefficient.

Therefore, if the business needs to update good performance , you can choose a normal index. Of course, everything is based on the premise of ensuring data accuracy.

If an update is followed by a query, you can consider turning off the change buffer. In other cases, the change buffer can provide a significant improvement.

Especially for mechanical hard drives, the change buffer effect is very significant.

Comparison of redo log and change buffer

The emergence of redo log in InnoDB makes it crash-safe and improves efficiency by writing logs first and then writing to disk through WAL.

The change buffer saves the random IO process of reading data pages from disk to memory.

Let's analyze the relationship between the two through an insert statement:

mysql> insert into t(id,k) values(id1,k1),(id2,k2);

Assume k is a normal index, the data page inserted by k1 is in memory, but k2 is not.

When performing an insert operation, the following four parts are mainly involved:

InnoDB buffer pool: memory area

redo log: log

system table space (ibdata1): system table space

data(t.idb): data table space

When innodb_file_per_table is turned on, the table is created in a separate tablespace, otherwise it is created in the system tablespace.

The execution process is as follows:

  1. Page 1 where k1 is located is in memory, update the memory directly
  2. Page2 where k2 is located is not in memory, but is recorded in the change buffer.
  3. Record the operations of k1 and k2 in the redo log.
  4. Commit the transaction.

It can be seen that the execution cost of this update statement (including insert, delete, and update operations) is very low, with two writes to memory and one sequential write to disk . The operations marked with dotted lines are background operations and do not affect the response time.

Let’s look at another query statement:

select * from t where k in (k1, k2)

Assume that the read statement occurs shortly after the update statement and the data in memory is still there, then the read operation has nothing to do with the system tablespace and redo log.

Execution process:

  1. Read page1 where k1 is located in memory and return directly. Note that the data on the disk is not read, and the data on the disk may still be a previous version.
  2. Read page2 where k2 is located. At this time, page2 needs to be loaded from disk to memory, and the contents of the change buffer need to be applied, and then the correct result is returned. It can also be seen here that the change buffer is not suitable for reading immediately after the update.

To summarize the relationship between redo log and change buffer:

Storage location: The change buffer is also persisted on the hard disk, but is saved in the system tablespace ibdata1. The redo log is a separate file.

Record content: The change buffer records the content of the update operation, while the redo log records the modification of ordinary data pages and the changes in the change buffer.

Disk synchronization process: Synchronizing the changes to the data pages in memory is done through a merge operation, not based on the redo log.

From the update process point of view: redo log converts random disk write IO into sequential write, while change buffer saves random disk read IO consumption.

If the server loses power unexpectedly, will the change buffer be lost?

No, because the data in the change buffer has been recorded in the redo log, so it will not be lost.

Because part of the change buffer data is on disk and part is in memory. The data on disk has been merged so it will not be lost.
For data in memory:

  1. If the change buffer is written but the redo log and binlog are not committed, the transaction will be rolled back and this part of the data will not exist.
  2. If the change buffer, redo log, and binlog are written and committed, they will not be lost. Restore directly from the redo log.
  3. If the change buffer is written, the redo log is written but not committed, and the binlog is written, restore the redo log from the binlog and then restore the change buffer.

References

Buffer Pool

The above is which one to choose between MySQL unique index and normal index? For more details about MySQL unique index and common index, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed Analysis of the Selection of MySQL Common Index and Unique Index
  • In-depth explanation of MySQL common index and unique index
  • Efficiency comparison between common index and unique index in MySQL
  • Detailed explanation of the difference between MySQL normal index and unique index

<<:  Apply provide and inject to refresh Vue page method

>>:  Detailed explanation of Vue login and logout

Blog    

Recommend

Beginners learn some HTML tags (1)

Beginners can learn HTML by understanding some HT...

In-depth understanding of Vue-cli4 routing configuration

Table of contents Preface - Vue Routing 1. The mo...

Nginx sample code for implementing dynamic and static separation

In combination with the scenario in this article,...

Detailed explanation of MySql data type tutorial examples

Table of contents 1. Brief Overview 2. Detailed e...

JavaScript to achieve custom scroll bar effect

In actual projects, the up and down scroll bars a...

How to reference external CSS files and iconfont in WeChat applet wxss

cause The way to import external files into a min...

MySQL multi-master and one-slave data backup method tutorial

Overview Operations on any one database are autom...

Implementation code for using mongodb database in Docker

Get the mongo image sudo docker pull mongo Run th...

Automated front-end deployment based on Docker, Nginx and Jenkins

Table of contents Preliminary preparation Deploym...

Docker compose custom network to achieve fixed container IP address

Due to the default bridge network, the IP address...

How to realize vertical arrangement of text using CSS3

In a recent project, I wanted to align text verti...

A tutorial on how to install, use, and automatically compile TypeScript

1. Introduction to TypeScript The previous articl...