Detailed Analysis of the Selection of MySQL Common Index and Unique Index

Detailed Analysis of the Selection of MySQL Common Index and Unique Index

Suppose a user management system where each person registers with a unique mobile phone number, and the business code has ensured that two duplicate mobile phone numbers will not be written. If the user management system needs to look up a name by mobile phone number, it will execute a SQL statement like this:

select name from users where mobile = '15202124529';

Usually you would consider creating an index on the mobile field. Since the mobile phone number field is relatively large, it is usually not used as the primary key. So now there are two options:

1. Create a unique index for the id_card field
2. Create a normal index

If the business code has ensured that duplicate ID numbers will not be written, then both options are logically correct.

From a performance perspective, should we choose a unique index or a normal index?

As shown in the figure: Assume that the values ​​on field k are not repeated

Next, we will analyze the performance impact of these two (ID, k) indexes on query statements and update statements.

Query process

Assume that the query statement is select id from T where k=5. The query statement searches the index tree by starting from the root of the B+ tree and searching layer by layer to the leaf node, which is the data page in the lower right corner of the figure. Then we can assume that the data page uses binary search to locate the record (the data page uses an ordered array to store the nodes. The data pages are connected in series through a bidirectional linked list).

  • For a normal index, after finding the first record that meets the condition (5,500), you need to find the next record until you encounter the first record that does not meet the k=5 condition.
  • For a unique index, since the index defines uniqueness, the search will stop after the first record that meets the conditions is found.

So, how big will the performance gap be due to this difference? The answer is, very little.

Reason: Unless the Key column is very large and multiple consecutive Keys occupy a full page, it will cause a page IO, which will produce a more obvious performance difference. From the perspective of average amortization, the difference is almost negligible.

InnoDB data is read and written in units of data pages. That is to say, when a record needs to be read, the record itself is not read from the disk, but read into the memory as a whole in pages. In InnoDB, the default size of each data page is 16KB.

Update Process

In order to explain the impact of common indexes and unique indexes on the performance of update statements, we need to first introduce the change buffer.

  • When a data page needs to be updated, if the data page is in memory, it is updated directly.
  • If the data page is not in memory yet, without affecting data consistency:
  1. InnoDB caches these update operations in the change buffer, so there is no need to read the data page from disk.
  2. The next time a query needs to access this data page, the data page is read into memory.
  3. Then execute the operations related to this page in the change buffer.

In this way, the correctness of the data logic can be guaranteed

It should be noted that although the name is called change buffer, it is actually persistent data. That is to say, the change buffer has a copy in memory and is also written to disk.

The process of applying the operations in the change buffer to the old data page to obtain the new data page should be called merge.

P.S. In addition to accessing this data page triggering a merge, the system has a background thread that merges periodically. The merge operation is also performed during the normal shutdown of the database.

(The merge operation of the change buffer first updates the change buffer operation to the data page in the memory. This operation is written to the redo log. If MySQL is not down and the redo log is full and the check point needs to be moved, the data in the memory is refreshed to the disk by judging whether the data on the disk is consistent with the data on the disk, that is, whether it is a dirty page. When MySQL is down and there is no memory, that is, no dirty page, it is recovered through the redo log.)

Obviously, if the update operation can be recorded in the change buffer first to reduce disk reading, the execution speed of the statement will be significantly improved.

Moreover, reading data into memory requires occupying the buffer pool, so this method can also avoid occupying memory and improve memory utilization.

Under what conditions can the change buffer be used?

For a unique index, all update operations must first determine whether the operation violates the uniqueness constraint.

For example, to insert the record (4,400), we must first determine whether a record with k=4 already exists in the table, and this requires reading the data page into memory to determine this.

If everything has already been read into memory, it would be faster to update the memory directly, and there would be no need to use a change buffer.

Therefore, the change buffer cannot be used to update the unique index, and in fact only ordinary indexes can be used.

The change buffer uses the memory in the buffer pool, so it cannot be increased indefinitely. The size of the change buffer can be set dynamically via the parameter innodb_change_buffer_max_size. When this parameter is set to 50, it means that the size of the change buffer can only occupy a maximum of 50% of the buffer pool.

Ps. Database buffer pool (buffer pool) https://www.jianshu.com/p/f9ab1cb24230

Analysis: Inserting a new record InnoDB processing flow

Now that we understand the change buffer mechanism, what is the InnoDB process for inserting a new record (4,400) into this table?

1. The first case is: the target page to be updated by this record is in memory.

  • At this time, InnoDB's processing flow is as follows: For the unique index, find the position between 3 and 5, determine that there is no conflict, insert this value, and the statement execution ends;
  • For a normal index, find the position between 3 and 5, insert the value, and the statement execution ends.

In this way, the difference between the impact of ordinary indexes and unique indexes on the performance of update statements is just a judgment call and will only consume a tiny amount of CPU time. But this is not the focus

2. The second situation is that the target page to be updated by this record is not in the memory. At this time, InnoDB's processing flow is as follows:

  • For a unique index, the data page needs to be read into memory, and if there is no conflict, the value is inserted and the statement execution ends.
  • For common indexes, the update is recorded in the change buffer and the statement execution is completed.

Reading data from disk into memory involves random IO access and is one of the most expensive operations in a database. Since the change buffer reduces random disk access, the improvement in update performance will be obvious.

The change buffer is mainly used to cache update operations and process them asynchronously. In this way, each update can be directly recorded in the change buffer, which is very fast and can turn multiple writes to disk into one write to disk.

Change buffer usage scenarios

Through the above analysis, it is clear that using change buffer accelerates the update process. It is also clear that change buffer is limited to the use of common index scenarios and is not suitable for unique indexes.

Can the use of change buffer accelerate all scenarios of common indexes?

Because the merge is the time when the data is actually updated, and the main purpose of the change buffer is to cache the recorded changes, so before a data page is merged, the more changes the change buffer records (that is, the more times this page needs to be updated), the greater the benefit.

Therefore, for businesses that write more and read less, the probability of a page being accessed immediately after it is written is relatively small, and the change buffer is most effective at this time. This type of business model is common in billing and logging systems. (Suitable for scenarios with more writes and less reads. More reads and less writes will increase the maintenance cost of the change buffer)

Conversely, if the update mode of a business is to query immediately after writing, then even if the conditions are met and the update is recorded in the change buffer first, the merge process will be triggered immediately because the data page will be accessed soon. This will not reduce the number of random access IOs, but will increase the maintenance cost of the change buffer. Therefore, for this business model, the change buffer actually has a side effect. (If you immediately query the update operation results of the normal index, a merge operation will be triggered, and the data on the disk will be merged with the operation records in the change buffer, generating a large amount of IO)

Index selection and practice

Based on the above analysis, how should we choose between common indexes and unique indexes?

In fact, there is no difference between these two types of indexes in terms of query capabilities. The main consideration is the impact on update performance. Therefore, it is recommended to choose common indexes as much as possible.

If all updates are followed by queries for the same record, the change buffer should be closed.

In all other cases, the change buffer can improve update performance. In actual use, the combination of ordinary indexes and change buffers is very effective in optimizing updates of tables with large amounts of data.

P.S. Especially when using mechanical hard disks, the effect of the change buffer mechanism is very significant. Therefore, when there is a library like "historical data", you should pay special attention to the indexes in these tables, try to use common indexes, and then increase the change buffer as much as possible to ensure the data writing speed of this "historical data" table.

Change buffer and redo log

Understanding the principle of change buffer may remind you of redo log and WAL (Write-Ahead Logging, the key point of which is to write logs first and then write to disk).

The core mechanism of WAL to improve performance is indeed to minimize random reads and writes

Execute this insert statement on the table:

mysql> insert into t(id,k) values(id1,k1),(id2,k2);

Assuming the current state of the k index tree, after finding the location, the data page where k1 is located is in the memory (InnoDB buffer pool), and the data page where k2 is located is not in the memory. The following figure shows the update status diagram with change buffer.

Figure 3: Update process with change buffer

Analyzing this update statement, you will find that it involves four parts:

Memory, redo log (ib_log_fileX), data tablespace (t.ibd), system tablespace (ibdata1).

Data tablespace: It is a table data file. The corresponding disk file is "table name.ibd"; System tablespace: It is used to store system information, such as data dictionary, etc. The corresponding disk file is "ibdata1"

The data table space and system table space seem to represent the complex structure corresponding to the B+ tree.

This update statement performs the following operations (in numerical order in the figure):

  1. Page 1 In memory, directly update the memory;
  2. Page 2 is not in memory, so the change buffer area in memory records "I want to insert a row into Page 2"
  3. This information records the above two actions in the redo log (3 and 4 in the figure).

After doing the above, the transaction can be completed. So, you can see that the cost of executing this update statement is very low, which is to write two memories and then write one disk (the two operations together write one disk), and it is written sequentially.

The granularity of the change buffer and redo log is different because the change buffer is only temporarily stored in the change buffer if the page where the changed data is located is not in the memory. The redo log records all operations that modify data within a transaction, even if the modified data is already in memory.

At the same time, the two dotted arrows in the figure are background operations and do not affect the update response time.

So how to handle the read request after this?

For example, we now want to execute select * from t where k in (k1, k2) .

If the read statement occurs shortly after the update statement and the data in the memory is still there, then the two read operations at this time have nothing to do with the system tablespace (ibdata1) and the redo log (ib_log_fileX).

Figure 4 Reading process with change buffer

As can be seen from the figure: when reading Page 1, it is returned directly from the memory.

If we read data after WAL, do we have to read the disk? Do we have to update the data in the redo log before we can return?

Actually, it is not necessary. Although the disk still has the previous data, the result is returned directly from the memory and the result is correct. To read Page 2, you need to read Page 2 from disk into memory, then apply the operation log in the change buffer to generate a correct version and return the result. As you can see, this data page will not be read into memory until Page 2 needs to be read.

If we simply compare the benefits of these two mechanisms in improving update performance, the redo log mainly saves the IO consumption of random disk writes (converting them to sequential writes), while the change buffer mainly saves the IO consumption of random disk reads.

Questions for consideration:

1. As can be seen from Figure 3, the change buffer is initially written to the memory. If the machine loses power and restarts at this time, will the change buffer be lost? Losing the change buffer is no small matter. When reading data from disk again, there will be no merge process, which means data is lost. Will this situation happen?

answer:

1. The change buffer is partially in memory and partially in ibdata.

The purge operation should persist the corresponding data in the change buffer to ibdata

2. The redo log records the changes to the data page and the new information written to the change buffer

If power is lost, the persistent change buffer data has been purged and does not need to be restored. Mainly analyze data without persistence

The situations are divided into the following categories:

(1) When the change buffer is written, the redo log is fsynced but not committed, and the binlog is not fsynced to the disk, so this part of the data is lost
(2) The change buffer is written, the redo log is written but not committed, binlog, or fsynced to disk. First restore the redo log from the binlog, then restore the change buffer from the redo log

(3) After the change buffer is written, both the redo log and the binlog have been fsynced. So the data can be directly restored from the redo log.

Summarize

This is the end of this article about MySQL common index and unique index selection. For more relevant MySQL common index and unique index selection content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Which one should I choose between MySQL unique index and normal index?
  • In-depth explanation of MySQL common index and unique index
  • Efficiency comparison between common index and unique index in MySQL
  • Detailed explanation of the difference between MySQL normal index and unique index

<<:  JavaScript to achieve full screen page scrolling effect

>>:  Docker uses a single image to map to multiple ports

Recommend

How to reset the root password of Mysql in Windows if you forget it

My machine environment: Windows 2008 R2 MySQL 5.6...

Tutorial on Installing Nginx-RTMP Streaming Server on Ubuntu 14

1. RTMP RTMP streaming protocol is a real-time au...

Methods and problems encountered in installing mariadb in centos under mysql

Delete the previously installed mariadb 1. Use rp...

Do not start CSS pseudo-class names with numbers

When newbies develop div+css, they need to name t...

Install zip and unzip command functions under Linux and CentOS (server)

Install zip decompression function under Linux Th...

How to let DOSBox automatically execute commands after startup

Using DOSBox, you can simulate DOS under Windows ...

Solution for forgetting the root password of MySQL5.7 under Windows 8.1

【background】 I encountered a very embarrassing th...

How to use Vuex's auxiliary functions

Table of contents mapState mapGetters mapMutation...

MySQL View Principle Analysis

Table of contents Updatable Views Performance of ...

Detailed explanation of how to use Tomcat Native to improve Tomcat IO efficiency

Table of contents Introduction How to connect to ...

JavaScript to implement simple carousel chart most complete code analysis (ES5)

This article shares the specific code for JavaScr...

Webpack loads css files and its configuration method

webpack loads css files and its configuration Aft...

A collection of possible problems when migrating sqlite3 to mysql

Brief description Suitable for readers: Mobile de...

mySQL server connection, disconnection and cmd operation

Use the mysql command to connect to the MySQL ser...