Difference between MySQL btree index and hash index

Difference between MySQL btree index and hash index

In MySQL, most indexes (such as PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in BTREE, but when using the memory engine, you can choose BTREE index or HASH index. The two different types of indexes have different usage scopes.

  • B-tree indexes have the ability to perform range and prefix searches. For a B-tree with N nodes, the complexity of retrieving a record is O(LogN). Equivalent to binary search.
  • Hash indexes can only be used for equality searches, but no matter how large the hash table is, the search complexity is O(1).

Obviously, if the values ​​are very different and the main search is for equal values ​​(=, <, >, in), the hash index is a more efficient choice, with a search complexity of O(1).
If the difference between values ​​is relatively poor and range searches are the main focus, B-tree is a better choice because it supports range searches.

1. HASH index

The storage address is calculated by using the hash function, so when retrieving, there is no need to traverse from the root node and search level by level like Btree.

Due to the special structure of Hash index, its retrieval efficiency is very high. The index retrieval can be located once, unlike B-Tree index which needs multiple IO accesses from root node to branch node and finally to page node. Therefore, the query efficiency of Hash index is much higher than that of B-Tree index.

Many people may have questions again. Since the efficiency of Hash index is much higher than that of B-Tree, why don’t people use Hash index instead of B-Tree index? Everything has two sides, and the same is true for Hash indexes. Although Hash indexes are highly efficient, they also have many limitations and disadvantages due to their particularity. The main ones are as follows.

(1) Hash indexes can only satisfy "=", "IN" and "<=>" queries, and cannot be used for range queries (range queries are slow).

Since the hash index compares the hash values ​​after the hash operation, it can only be used for equal value filtering and cannot be used for range-based filtering, because the size relationship of the hash values ​​after being processed by the corresponding hash algorithm cannot be guaranteed to be exactly the same as before the hash operation.

(2) Hash indexes cannot be used to avoid data sorting operations.

Since the hash index stores the hash value after hash calculation, and the size relationship of the hash value is not necessarily exactly the same as the key value before the hash operation, the database cannot use the index data to avoid any sorting operation;

(3) Hash indexes cannot be queried using partial index keys.

For a composite index, the hash index calculates the hash value by merging the composite index keys together, rather than calculating the hash value separately. Therefore, when querying through the first one or several index keys of the composite index, the hash index cannot be used.

(4) Hash indexes cannot avoid table scans at any time.

As we know, a hash index is a table that stores the hash value of the hash operation result and the corresponding row pointer information after the index key is hashed. Since different index keys have the same hash value, even if the number of records that satisfy a certain hash key value is obtained, the query cannot be completed directly from the hash index. It is still necessary to perform a corresponding comparison by accessing the actual data in the table and obtain the corresponding result.

(5) The performance of a Hash index is not necessarily higher than that of a B-Tree index when a large number of hash values ​​are equal.

For index keys with low selectivity, if a Hash index is created, a large amount of record pointer information will be stored in the same Hash value and associated with it. This makes it very troublesome to locate a certain record, and will waste multiple table data accesses, resulting in poor overall performance.

2. B+ Tree

  • The search process of b+ tree

As shown in the figure, if you want to find data item 29, disk block 1 will be loaded from the disk to the memory first. At this time, an IO occurs. A binary search is used in the memory to determine that 29 is between 17 and 35. The P2 pointer of disk block 1 is locked. The memory time is very short (compared to the disk IO) and can be ignored. Disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of disk block 1. The second IO occurs. 29 is between 26 and 30. The P2 pointer of disk block 3 is locked. Disk block 8 is loaded into the memory through the pointer. The third IO occurs. At the same time, a binary search is performed in the memory to find 29, and the query ends. A total of three IOs are performed. The reality is that a 3-layer b+ tree can represent millions of data. If searching millions of data only requires three IOs, the performance improvement will be huge. If there is no index, each data item will require an IO, then a total of millions of IOs will be required, which is obviously very costly.

  • B+ tree properties

1. The index field should be as small as possible:

Through the above analysis, we know that the number of IO times depends on the height h of b+number. Assuming that the data in the current data table is N, and the number of data items in each disk block is m, then h=㏒(m+1)N. When the data volume N is constant, the larger the m, the smaller the h; and m = disk block size/data item size. The disk block size is the size of a data page, which is fixed. If the space occupied by the data item is smaller and the number of data items is greater, the height of the tree is lower. This is why each data item, i.e. the index field, should be as small as possible. For example, int occupies 4 bytes, which is half of bigint 8 bytes. This is why the b+ tree requires that the real data be placed in the leaf nodes rather than the inner nodes. Once placed in the inner nodes, the data items of the disk blocks will drop significantly, causing the tree to increase in height. When the data item is equal to 1, it will degenerate into a linear list.

2. The leftmost matching feature of the index (i.e. matching from left to right):

When the data items of the b+ tree are composite data structures, such as (name, age, sex), the b+ tree builds the search tree in order from left to right. For example, when data such as (Zhang San, 20, F) is retrieved, the b+ tree will first compare the name to determine the next search direction. If the names are the same, then the age and sex are compared in turn to finally get the retrieved data. However, when data without a name such as (20, F) comes, the b+ tree does not know which node to check next, because name is the first comparison factor when building the search tree, and it is necessary to search based on the name first to know where to query next. For example, when retrieving data like (Zhang San, F), the b+ tree can use name to specify the search direction, but the next field age is missing, so it can only find the data with the name equal to Zhang San, and then match the data with the gender of F. This is a very important property, namely the leftmost matching feature of the index.

The above is the detailed content of the difference between MySQL btree index and hash index. For more information about MySQL btree index and hash index, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Differences between MySQL Hash index and B-Tree index
  • Why does MySQL database index choose to use B+ tree?
  • Summary of B-tree index knowledge points in MySQL optimization
  • A brief discussion on MySQL B-tree index and index optimization summary
  • How to get the height of MySQL innodb B+tree
  • Comparison between Btree and Hash index in MySQL
  • Brief Analysis of MySQL B-Tree Index

<<:  Detailed steps for deepin20 to install NVIDIA closed-source drivers

>>:  Vue custom encapsulated button component

Recommend

Teach you how to write maintainable JS code

Table of contents What is maintainable code? Code...

Linux gzip command compression file implementation principle and code examples

gzip is a command often used in Linux systems to ...

js canvas realizes rounded corners picture

This article shares the specific code of js canva...

Seven ways to implement array deduplication in JS

Table of contents 1. Using Set()+Array.from() 2. ...

How to bind Docker container to external IP and port

Docker allows network services to be provided by ...

A quick guide to Docker

Docker provides a way to automatically deploy sof...

Vue plugin error: Vue.js is detected on this page. Problem solved

Vue plugin reports an error: Vue.js is detected o...

A brief discussion on whether too many MySQL data queries will cause OOM

Table of contents Impact of full table scan on th...

Detailed explanation of TS object spread operator and rest operator

Table of contents Overview Object rest attribute ...

In-depth analysis of the Linux kernel macro container_of

1. As mentioned above I saw this macro when I was...

How to solve the error of PyCurl under Linux

Solution to "Could not run curl-config"...

Detailed deployment of docker+gitlab+gitlab-runner

environment Server: centos7 Client: window Deploy...

This article teaches you how to import CSS like JS modules

Table of contents Preface What are constructible ...

Example of horizontal arrangement of li tags in HTMl

Most navigation bars are arranged horizontally as...