summary During the interview, when discussing about MySQL index issues, I found that some people could talk about the differences between B+ tree, B tree, and balanced binary tree, but could not tell the difference between B+ tree and hash index. It is obvious that this is rote memorization and does not understand the essence of indexing. This article aims to analyze the principles behind this. Welcome to leave a message to discuss question If you are confused or have only a vague understanding of the following questions, please continue reading. I believe this article will definitely help you.
Analysis First of all, why should we discuss MySQL index and Redis skip table together? Because they solve the same problem, which is to quickly find the location (or corresponding value) of a data set according to the specified key. When you think about the problem from this perspective, do you still not know the difference between B+ tree index and hash index? The problem of finding a data set Now we have clearly defined the boundaries of the problem domain in order to solve the problem of searching for data sets. What issues need to be considered in this regard?
Let's look at several commonly used search structures hash Hash is in the form of key and value. Through a hash function, the value can be quickly found according to the key. B+ Tree The B+ tree evolved from the balanced binary tree. Why didn’t we learn about structures like B+ tree and skip list in the algorithm class? Because they are all obtained from engineering practice and are compromised on the basis of theory. The B+ tree is first of all an ordered structure. In order to prevent the tree from being too high and affecting the search efficiency, a page of data is stored on the leaf node instead of a single data, which improves the search efficiency. In order to better support range queries, the B+ tree redundantly stores non-leaf node data on the leaf nodes. In order to support page turning, the leaf nodes are connected by pointers. Jump table The jump list is extended on the basis of the linked list in order to implement the sorted set data structure of redis. Level 0: It stores the original data. It is an ordered linked list. Each node is on the chain. Level 0+: The nodes are connected in series through pointers. It is a subset of the original data. The higher the level, the less data is connected in series. This can significantly improve the search efficiency. Summarize
If you are interested in LSM structure, you can read cassandra vs mongo (1) Storage Engine Cassandra vs Mongo (1) Storage Engine Summary Storage Engine: B-Tree Cache Management The core of cache management lies in the replacement algorithm. Common replacement algorithms include FIFO (First In First Out) and LRU (Least Recently Used). Relational databases have been improved on the basis of LRU, mainly using LIRS (Low Inter-reference Recency Set) LSM Log-Structured Merge Tree: The core idea of the structured merge tree is not to write data from memory to disk immediately, but to save it in memory first, and then flush it to disk after a certain amount of data has accumulated. LSM VS B-Tree LSM sacrifices some read performance to obtain better write performance based on B-Tree, and uses other implementations to compensate for the read performance, such as boom-filter. 1. Write When writing to a B-tree, the first step is to find the corresponding block location and then insert the new data. As more and more data is written, nodes have to be split to maintain the B-tree structure. This will increase the probability of random writes when inserting data and reduce performance. LSM forms a small sorted tree in memory, and then continuously merges it when flushing to disk. Because all writes are done in memory and not to disk, writes are very efficient. 2. Read The B-tree starts with a binary query from the root node to the leaf node, reading one node at a time. If the corresponding page is not in the memory, the disk is read to cache the data. The entire structure of the LSM tree is not ordered, so we don’t know where the data is. We need to do a binary query from each small ordered structure. If we find the data, we return it. If we don’t find the data, we continue to look for the next ordered structure. So LSM sacrifices read performance. However, the reason why LSM can be used as a large-scale data storage system is that the read performance can be improved in other ways. For example, the read performance depends more on the memory/cache hit rate rather than the disk read. Cassandra Cassandra is a NoSql database with better write performance than read performance. One reason for the good write performance is that it uses the LSM storage engine. Mongo MMAPv1 Mongo 3.2 and earlier used the MMAPv1 storage engine by default, which is based on the B-Tree type. Padding The MMAPv1 storage engine uses a process called "record allocation" to allocate disk space for document storage. What is different between MongoDB and Cassandra is that you need to update the original document. If the original document space is insufficient, you need to move the document to a new location and update the corresponding index. This will lead to some unnecessary updates and data fragmentation. In order to avoid the above situation, there is the concept of boundaries, which is to pre-allocate space for the document. But this may result in a waste of resources. Mongo pre-allocates according to the power-of-2 increasing strategy of 64M, 128M, 256M...2G, with a maximum of 2G. The problem is not obvious when the data size is small, but when it reaches 2G, the problem of large disk usage becomes apparent. This is also different from relational databases, which store long record data separately. Lock MMAPv1 uses collection-level locks, which means that only one collection can be added, deleted, or modified at a time. When performing concurrent operations, waiting will occur. WiredTiger The default storage engine for 3.2 and later is also based on B-Tree. It uses concurrent technologies such as lock-free and risk pointer to make it work better on multi-core machines. The lock level is document. And compression was introduced to reduce disk usage. Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. You may also be interested in:
|
<<: Vue+element ui realizes anchor positioning
>>: Summary of new usage of vi (vim) under Linux
After IntelliJ IDEA deploys a Javaweb project usi...
This article uses examples to describe common ope...
The first cutter in China github.com/chokcoco Fir...
I've been learning Kafka recently. When I was...
Table of contents 1. Page Layout 2. Image upload ...
1. Why set maxPostSize? The tomcat container has ...
SQL method for calculating timestamp difference O...
This article shares the specific code for JavaScr...
In MySQL, the LOAD_FILE() function reads a file a...
Code first, then text Copy code The code is as fol...
Effect (source code at the end): accomplish: 1. D...
This article shares with you a js special effect ...
Table of contents Preface Scenario simulation Sum...
Table of contents 1. What is DOM 2. Select elemen...
Recent experience in installing mysql5.7.17 free ...