In MySQL, most indexes (such as PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in BTREE, but when using the memory engine, you can choose BTREE index or HASH index. The two different types of indexes have different usage scopes.
Obviously, if the values are very different and the main search is for equal values (=, <, >, in), the hash index is a more efficient choice, with a search complexity of O(1). 1. HASH index The storage address is calculated by using the hash function, so when retrieving, there is no need to traverse from the root node and search level by level like Btree. Due to the special structure of Hash index, its retrieval efficiency is very high. The index retrieval can be located once, unlike B-Tree index which needs multiple IO accesses from root node to branch node and finally to page node. Therefore, the query efficiency of Hash index is much higher than that of B-Tree index. Many people may have questions again. Since the efficiency of Hash index is much higher than that of B-Tree, why don’t people use Hash index instead of B-Tree index? Everything has two sides, and the same is true for Hash indexes. Although Hash indexes are highly efficient, they also have many limitations and disadvantages due to their particularity. The main ones are as follows. (1) Hash indexes can only satisfy "=", "IN" and "<=>" queries, and cannot be used for range queries (range queries are slow). Since the hash index compares the hash values after the hash operation, it can only be used for equal value filtering and cannot be used for range-based filtering, because the size relationship of the hash values after being processed by the corresponding hash algorithm cannot be guaranteed to be exactly the same as before the hash operation. (2) Hash indexes cannot be used to avoid data sorting operations. Since the hash index stores the hash value after hash calculation, and the size relationship of the hash value is not necessarily exactly the same as the key value before the hash operation, the database cannot use the index data to avoid any sorting operation; (3) Hash indexes cannot be queried using partial index keys. For a composite index, the hash index calculates the hash value by merging the composite index keys together, rather than calculating the hash value separately. Therefore, when querying through the first one or several index keys of the composite index, the hash index cannot be used. (4) Hash indexes cannot avoid table scans at any time. As we know, a hash index is a table that stores the hash value of the hash operation result and the corresponding row pointer information after the index key is hashed. Since different index keys have the same hash value, even if the number of records that satisfy a certain hash key value is obtained, the query cannot be completed directly from the hash index. It is still necessary to perform a corresponding comparison by accessing the actual data in the table and obtain the corresponding result. (5) The performance of a Hash index is not necessarily higher than that of a B-Tree index when a large number of hash values are equal. For index keys with low selectivity, if a Hash index is created, a large amount of record pointer information will be stored in the same Hash value and associated with it. This makes it very troublesome to locate a certain record, and will waste multiple table data accesses, resulting in poor overall performance. 2. B+ Tree
As shown in the figure, if you want to find data item 29, disk block 1 will be loaded from the disk to the memory first. At this time, an IO occurs. A binary search is used in the memory to determine that 29 is between 17 and 35. The P2 pointer of disk block 1 is locked. The memory time is very short (compared to the disk IO) and can be ignored. Disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of disk block 1. The second IO occurs. 29 is between 26 and 30. The P2 pointer of disk block 3 is locked. Disk block 8 is loaded into the memory through the pointer. The third IO occurs. At the same time, a binary search is performed in the memory to find 29, and the query ends. A total of three IOs are performed. The reality is that a 3-layer b+ tree can represent millions of data. If searching millions of data only requires three IOs, the performance improvement will be huge. If there is no index, each data item will require an IO, then a total of millions of IOs will be required, which is obviously very costly.
1. The index field should be as small as possible: Through the above analysis, we know that the number of IO times depends on the height h of b+number. Assuming that the data in the current data table is N, and the number of data items in each disk block is m, then h=㏒(m+1)N. When the data volume N is constant, the larger the m, the smaller the h; and m = disk block size/data item size. The disk block size is the size of a data page, which is fixed. If the space occupied by the data item is smaller and the number of data items is greater, the height of the tree is lower. This is why each data item, i.e. the index field, should be as small as possible. For example, int occupies 4 bytes, which is half of bigint 8 bytes. This is why the b+ tree requires that the real data be placed in the leaf nodes rather than the inner nodes. Once placed in the inner nodes, the data items of the disk blocks will drop significantly, causing the tree to increase in height. When the data item is equal to 1, it will degenerate into a linear list. 2. The leftmost matching feature of the index (i.e. matching from left to right): When the data items of the b+ tree are composite data structures, such as (name, age, sex), the b+ tree builds the search tree in order from left to right. For example, when data such as (Zhang San, 20, F) is retrieved, the b+ tree will first compare the name to determine the next search direction. If the names are the same, then the age and sex are compared in turn to finally get the retrieved data. However, when data without a name such as (20, F) comes, the b+ tree does not know which node to check next, because name is the first comparison factor when building the search tree, and it is necessary to search based on the name first to know where to query next. For example, when retrieving data like (Zhang San, F), the b+ tree can use name to specify the search direction, but the next field age is missing, so it can only find the data with the name equal to Zhang San, and then match the data with the gender of F. This is a very important property, namely the leftmost matching feature of the index. The above is the detailed content of the difference between MySQL btree index and hash index. For more information about MySQL btree index and hash index, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Detailed steps for deepin20 to install NVIDIA closed-source drivers
>>: Vue custom encapsulated button component
Table of contents What is maintainable code? Code...
gzip is a command often used in Linux systems to ...
This article shares the specific code of js canva...
Table of contents 1. Using Set()+Array.from() 2. ...
Docker allows network services to be provided by ...
Docker provides a way to automatically deploy sof...
Vue plugin reports an error: Vue.js is detected o...
Table of contents Impact of full table scan on th...
Table of contents Overview Object rest attribute ...
1. As mentioned above I saw this macro when I was...
Solution to "Could not run curl-config"...
1. There are many Python version management tools...
environment Server: centos7 Client: window Deploy...
Table of contents Preface What are constructible ...
Most navigation bars are arranged horizontally as...