summary During the interview, when discussing about MySQL index issues, I found that some people could talk about the differences between B+ tree, B tree, and balanced binary tree, but could not tell the difference between B+ tree and hash index. It is obvious that this is rote memorization and does not understand the essence of indexing. This article aims to analyze the principles behind this. Welcome to leave a message to discuss question If you are confused or have only a vague understanding of the following questions, please continue reading. I believe this article will definitely help you.
Analysis First of all, why should we discuss MySQL index and Redis skip table together? Because they solve the same problem, which is to quickly find the location (or corresponding value) of a data set according to the specified key. When you think about the problem from this perspective, do you still not know the difference between B+ tree index and hash index? The problem of finding a data set Now we have clearly defined the boundaries of the problem domain in order to solve the problem of searching for data sets. What issues need to be considered in this regard?
Let's look at several commonly used search structures hash Hash is in the form of key and value. Through a hash function, the value can be quickly found according to the key. B+ Tree The B+ tree evolved from the balanced binary tree. Why didn’t we learn about structures like B+ tree and skip list in the algorithm class? Because they are all obtained from engineering practice and are compromised on the basis of theory. The B+ tree is first of all an ordered structure. In order to prevent the tree from being too high and affecting the search efficiency, a page of data is stored on the leaf node instead of a single data, which improves the search efficiency. In order to better support range queries, the B+ tree redundantly stores non-leaf node data on the leaf nodes. In order to support page turning, the leaf nodes are connected by pointers. Jump table The jump list is extended on the basis of the linked list in order to implement the sorted set data structure of redis. Level 0: It stores the original data. It is an ordered linked list. Each node is on the chain. Level 0+: The nodes are connected in series through pointers. It is a subset of the original data. The higher the level, the less data is connected in series. This can significantly improve the search efficiency. Summarize
If you are interested in LSM structure, you can read cassandra vs mongo (1) Storage Engine Cassandra vs Mongo (1) Storage Engine Summary Storage Engine: B-Tree Cache Management The core of cache management lies in the replacement algorithm. Common replacement algorithms include FIFO (First In First Out) and LRU (Least Recently Used). Relational databases have been improved on the basis of LRU, mainly using LIRS (Low Inter-reference Recency Set) LSM Log-Structured Merge Tree: The core idea of the structured merge tree is not to write data from memory to disk immediately, but to save it in memory first, and then flush it to disk after a certain amount of data has accumulated. LSM VS B-Tree LSM sacrifices some read performance to obtain better write performance based on B-Tree, and uses other implementations to compensate for the read performance, such as boom-filter. 1. Write When writing to a B-tree, the first step is to find the corresponding block location and then insert the new data. As more and more data is written, nodes have to be split to maintain the B-tree structure. This will increase the probability of random writes when inserting data and reduce performance. LSM forms a small sorted tree in memory, and then continuously merges it when flushing to disk. Because all writes are done in memory and not to disk, writes are very efficient. 2. Read The B-tree starts with a binary query from the root node to the leaf node, reading one node at a time. If the corresponding page is not in the memory, the disk is read to cache the data. The entire structure of the LSM tree is not ordered, so we don’t know where the data is. We need to do a binary query from each small ordered structure. If we find the data, we return it. If we don’t find the data, we continue to look for the next ordered structure. So LSM sacrifices read performance. However, the reason why LSM can be used as a large-scale data storage system is that the read performance can be improved in other ways. For example, the read performance depends more on the memory/cache hit rate rather than the disk read. Cassandra Cassandra is a NoSql database with better write performance than read performance. One reason for the good write performance is that it uses the LSM storage engine. Mongo MMAPv1 Mongo 3.2 and earlier used the MMAPv1 storage engine by default, which is based on the B-Tree type. Padding The MMAPv1 storage engine uses a process called "record allocation" to allocate disk space for document storage. What is different between MongoDB and Cassandra is that you need to update the original document. If the original document space is insufficient, you need to move the document to a new location and update the corresponding index. This will lead to some unnecessary updates and data fragmentation. In order to avoid the above situation, there is the concept of boundaries, which is to pre-allocate space for the document. But this may result in a waste of resources. Mongo pre-allocates according to the power-of-2 increasing strategy of 64M, 128M, 256M...2G, with a maximum of 2G. The problem is not obvious when the data size is small, but when it reaches 2G, the problem of large disk usage becomes apparent. This is also different from relational databases, which store long record data separately. Lock MMAPv1 uses collection-level locks, which means that only one collection can be added, deleted, or modified at a time. When performing concurrent operations, waiting will occur. WiredTiger The default storage engine for 3.2 and later is also based on B-Tree. It uses concurrent technologies such as lock-free and risk pointer to make it work better on multi-core machines. The lock level is document. And compression was introduced to reduce disk usage. Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. You may also be interested in:
|
<<: Vue+element ui realizes anchor positioning
>>: Summary of new usage of vi (vim) under Linux
1: Differences in speed and loading methods The di...
In MySQL, how do you view the permissions a user ...
In the past few years of my career, I have writte...
Linux version upgrade: 1. First, confirm that the...
Exporting Data Report an error SHOW VARIABLES LIK...
Table of contents Vue+ElementUI background manage...
I hope to implement some properties of the query ...
I recently helped someone with a project and the ...
1. Introduction It has been supported since versi...
WebService Remote Debugging In .NET, the remote d...
Table of contents 1.mysqldump Execution process: ...
1.html part Copy code The code is as follows: <...
Anchor tag usage: Linking to a specific location i...
This effect is most common on our browser page. L...
This article shares the installation and configur...