1 Conceptual distinction
Ordinary indexes can be repeated, but unique indexes, like primary keys, cannot be repeated. A unique index can be used as a means of validating data. For example, in the ID card number field of a student table, if we artificially stipulate that this field cannot be repeated, then a unique index is used. (Usually, the student ID field is set as the primary key)
The primary key ensures that every row in the database is unique, such as ID card, student ID, etc., which must be unique and non-duplicate in the table. The role of a unique index is the same as that of a primary key. The difference is that there can be only one primary key in a table, and the primary key cannot be empty. There can be multiple unique indexes, and a unique index can have one empty record, which means it just needs to be different from others. For example, in the student table, the school usually uses the student ID number as the primary key, and the ID card is used as the unique index; but when it comes to the Education Bureau, they make the ID card number the primary key and the student ID number the unique index. The choice of the primary key of the table depends on the actual application, and the primary key cannot be empty. 2 Case introductionIn a certain resident system, each person has a unique ID number. If the system needs to look up a name by ID number, it will execute SQL similar to the following: select name from CUser where id_card = 'ooxx'; Then you will definitely create an index on the id_card field. However, the id_card field is large and it is not recommended to use it as the primary key. So now there are two options:
Assuming that the business code has ensured that duplicate ID numbers will not be written, both options are logically correct. But from a performance perspective, unique index or normal index? Let's look at the following case: Assume that the values on field k are not repeated.
Next, we analyze the performance. 3 Query Performanceselect id from T where k=4 By traversing the B+ tree from the root to the leaf node in order, it can be considered that the data page is searched through binary search.
It looks like the performance gap is very small. InnoDB data is read and written in units of data pages. That is, when reading a record, it is not read from the disk, but read into the memory as a whole in pages. Therefore, for ordinary indexes, one more operation of "finding and determining the next record" is required, which is one pointer search and one calculation. If the k=4 record is the last record of the data page, then to get the next record, you have to read the next data page, which is a slightly complicated operation. For integer fields, a data page can store nearly a thousand keys, so the probability of this situation is actually very low. Therefore, when calculating the average performance difference, the cost of this operation can be considered to be negligible compared to the current CPU overhead. We know that MySQL has a change buffer. 4 Update performanceNow let's insert a new record (4,400) into the table. What will InnoDB do? It is necessary to distinguish whether the target page to be updated by the record is in memory: 4.1 In Memory
Find the position between 3 and 5, determine that there is no conflict, insert the value, and the statement execution ends.
Find the position between 3 and 5, insert the value, and the statement execution ends. The difference between the impact of a normal index and a unique index on the performance of an update statement is just a judgment call, which consumes a tiny amount of CPU time. 4.2 Not in Memory
The data page needs to be read into memory, and if it is determined that there is no conflict, the value is inserted and the statement execution ends.
The update is recorded in the change buffer and the statement execution ends. Reading data from disk into memory involves random IO access and is one of the most expensive operations in a database. The change buffer reduces random disk accesses, so the update performance is significantly improved. 5 Index Selection in PracticeHow to choose between normal index and unique index? There is no difference between these two types of indexes in terms of query performance, and the main consideration is the impact on update performance. Therefore, it is recommended to choose common indexes as much as possible. If all updates are followed by queries for the same record, the change buffer should be closed. In other cases, the change buffer can improve update performance. The combination of ordinary indexes and change buffers is very effective in optimizing updates of tables with large amounts of data. When using mechanical hard drives, the change buffer mechanism is very effective. Therefore, when you have a library like "historical data" and use mechanical hard disks for cost considerations, you should pay attention to the indexes in these tables, try to use normal indexes, increase the change buffer, and ensure the data writing speed of the "historical data" table. 6 Change buffer and redo logThe core mechanism of WAL to improve performance is to minimize random reads and writes. These two concepts are easy to confuse. So, here I put them in the same process to illustrate the distinction. 6.1 Insertion Processinsert into t(id,k) values(id1,k1),(id2,k2); Assuming the current state of the k index tree, after finding the location, the data page where k1 is located is in the memory (InnoDB buffer pool), and the data page k2 is not in the memory.
This update does the following:
After that the transaction is completed. The cost of executing this update statement is very low, only writing two memories and then writing one disk (the first two operations combined write one disk), and it is written sequentially. 6.2 How to handle subsequent read requests?select * from t where k in (k1, k2); The read statement follows the update statement, and the data in the memory is still there. At this time, these two read operations have nothing to do with the system tablespace and redo log. So I didn’t draw these two in the picture.
When reading Page1, it is returned directly from memory. If we read data after WAL, do we have to read the disk? Do we have to update the data in the redo log before we can return? Actually, no need. Looking at the status in the figure above, although the disk still has the previous data, the result is returned directly from the memory and the result is correct. To read Page2, you need to read Page2 from disk into memory, then apply the operation log in the change buffer to generate a correct version and return the result. It can be seen that the data page is not read into the memory until Page2 needs to be read. Therefore, we need to simply compare the impact of these two mechanisms on update performance.
7 ConclusionBecause unique indexes cannot use the change buffer optimization mechanism, if the business can accept it, from a performance perspective, it is recommended to give priority to non-unique indexes. 7.1 Whether to use a unique indexThe main concern is that "business may not be ensured." This article discusses performance issues under the premise that "the business code has been guaranteed not to write duplicate data". If the business cannot guarantee this, or the business requires the database to make constraints, then there is no choice but to create a unique index. In this case, the significance of this article is to provide an additional troubleshooting idea if you encounter a situation where a large amount of data is inserted slowly and the memory hit rate is low. 7.2 If a change buffer is used for a write and the host restarts abnormally afterwards, will the change buffer data be lost?Will not be lost. Although only the memory is updated, we also record the change buffer operations in the redo log when the transaction is committed, so the change buffer can also be retrieved during crash recovery. 7.3 Will the merge process write the data directly back to the disk? merge execution process
The redo log contains data changes and change buffer changes. The merge process ends here. At this time, the data page and the corresponding disk location of the change buffer in memory have not been modified and are dirty pages. After that, they each flush back their own physical data, which is another process. Questions to think aboutIn the process of constructing the first example, with the cooperation of session A, session B is asked to delete the data and then reinsert the data. Then it is found that in the explain result, the rows field changes from 10001 to more than 37000. If session A is not used, and only delete from t, call idata(), and explain are executed separately, you will see that the rows field is still around 10,000. What is the reason for this? If it does not recur, check
Why is the explain result wrong after this sequence of operations? The delete statement deleted all the data, and then inserted 100,000 rows of data through call idata(), which seemed to overwrite the original 100,000 rows. However, session A started a transaction but did not commit it, so the 100,000 rows of data inserted previously cannot be deleted. In this way, each row of previous data has two versions, the old version is the data before deletion, and the new version is the data marked as deleted. In this way, there are actually two copies of the data on index a. Then you may say, that’s not right, the data on the primary key cannot be deleted. Then why is the number of scanned rows seen by the explain command still around 100,000 when there is no force index statement? (The implication is that if this is also doubled, perhaps the optimizer will think that it is more appropriate to select field a as the index) Yes, but this is the primary key, and the primary key is estimated directly according to the number of rows in the table. As for the number of rows in a table, the optimizer directly uses the value of The above is the detailed explanation of the difference between MySQL common index and unique index. For more information about MySQL common index and unique index, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Vue implements page caching function
>>: 11 Examples of Advanced Usage of Input Elements in Web Forms
<br />When thoughts were divided into East a...
Table of contents [See an example]: [The original...
Preface MySQL is the most popular relational data...
This article example shares the specific code for...
When we open the source code of a regular website...
How can we say that we should avoid 404? The reas...
1. Introduction to Logrotate tool Logrotate is a ...
Detailed example of IOS database upgrade data mig...
0x0 Test Environment The headquarters production ...
Dockerfile is a text file that contains instructi...
<base target=_blank> changes the target fram...
Preface When writing front-end pages, we often us...
Preface Use nginx for load balancing. As the fron...
Using the <img> element with the default sr...
I was bored and suddenly thought of the implementa...