A brief talk about Mysql index and redis jump table

summary

During the interview, when discussing about MySQL index issues, I found that some people could talk about the differences between B+ tree, B tree, and balanced binary tree, but could not tell the difference between B+ tree and hash index. It is obvious that this is rote memorization and does not understand the essence of indexing. This article aims to analyze the principles behind this. Welcome to leave a message to discuss

question

If you are confused or have only a vague understanding of the following questions, please continue reading. I believe this article will definitely help you.

How to implement mysql index
What is the difference between mysql index structure B+ tree and hash? What scenarios are they suitable for?
Are there other ways to implement database indexes?
How is redis jump table implemented
What are the differences between skip lists, B+ trees, and LSM trees?

Analysis

First of all, why should we discuss MySQL index and Redis skip table together? Because they solve the same problem, which is to quickly find the location (or corresponding value) of a data set according to the specified key.

When you think about the problem from this perspective, do you still not know the difference between B+ tree index and hash index?

The problem of finding a data set

Now we have clearly defined the boundaries of the problem domain in order to solve the problem of searching for data sets. What issues need to be considered in this regard?

What search methods need to be supported, single key/multiple key/range search,
Insertion/deletion efficiency
Search efficiency (i.e. time complexity)
Storage size (space complexity)

Let's look at several commonly used search structures

hash

Hash is in the form of key and value. Through a hash function, the value can be quickly found according to the key.

B+ Tree

The B+ tree evolved from the balanced binary tree. Why didn’t we learn about structures like B+ tree and skip list in the algorithm class? Because they are all obtained from engineering practice and are compromised on the basis of theory.

The B+ tree is first of all an ordered structure. In order to prevent the tree from being too high and affecting the search efficiency, a page of data is stored on the leaf node instead of a single data, which improves the search efficiency. In order to better support range queries, the B+ tree redundantly stores non-leaf node data on the leaf nodes. In order to support page turning, the leaf nodes are connected by pointers.

Jump table

The jump list is extended on the basis of the linked list in order to implement the sorted set data structure of redis. Level 0: It stores the original data. It is an ordered linked list. Each node is on the chain. Level 0+: The nodes are connected in series through pointers. It is a subset of the original data. The higher the level, the less data is connected in series. This can significantly improve the search efficiency.

Summarize

Data Structure	Implementation principle	Key query method	Search efficiency	Storage size	Insertion and deletion efficiency
Hash	Hash Table	Support single key	Close to O(1)	Small, no additional storage except data	O(1)
B+ Tree	Extended from balanced binary tree	Single key, range, paging	O(Log(n)	In addition to the data, there are also left and right pointers, as well as leaf node pointers	O(Log(n), the tree structure needs to be adjusted, and the algorithm is more complicated
Jump table	Extended from ordered linked list	Single key, paging	O(Log(n)	In addition to data, there are pointers, but the number of pointers per node is less than 2, so it takes up less space than a B+ tree.	O(Log(n), only needs to process linked lists, the algorithm is relatively simple

If you are interested in LSM structure, you can read cassandra vs mongo (1) Storage Engine

Cassandra vs Mongo (1) Storage Engine

Summary

Storage Engine:

B-Tree

Cache Management

The core of cache management lies in the replacement algorithm. Common replacement algorithms include FIFO (First In First Out) and LRU (Least Recently Used). Relational databases have been improved on the basis of LRU, mainly using LIRS (Low Inter-reference Recency Set)
The cache is divided into two levels. The first level uses LRU. The most recently used data will enter the first level. If the data is accessed twice or more in a short period of time, it becomes hot data and enters the second level. This avoids the situation where a large amount of hot data in the cache may be replaced when performing a full table scan.

LSM

Log-Structured Merge Tree: The core idea of the structured merge tree is not to write data from memory to disk immediately, but to save it in memory first, and then flush it to disk after a certain amount of data has accumulated.

LSM VS B-Tree

LSM sacrifices some read performance to obtain better write performance based on B-Tree, and uses other implementations to compensate for the read performance, such as boom-filter.

1. Write

When writing to a B-tree, the first step is to find the corresponding block location and then insert the new data. As more and more data is written, nodes have to be split to maintain the B-tree structure. This will increase the probability of random writes when inserting data and reduce performance.

LSM forms a small sorted tree in memory, and then continuously merges it when flushing to disk. Because all writes are done in memory and not to disk, writes are very efficient.

2. Read

The B-tree starts with a binary query from the root node to the leaf node, reading one node at a time. If the corresponding page is not in the memory, the disk is read to cache the data.

The entire structure of the LSM tree is not ordered, so we don’t know where the data is. We need to do a binary query from each small ordered structure. If we find the data, we return it. If we don’t find the data, we continue to look for the next ordered structure. So LSM sacrifices read performance. However, the reason why LSM can be used as a large-scale data storage system is that the read performance can be improved in other ways. For example, the read performance depends more on the memory/cache hit rate rather than the disk read.

Cassandra

Cassandra is a NoSql database with better write performance than read performance. One reason for the good write performance is that it uses the LSM storage engine.

Mongo

MMAPv1

Mongo 3.2 and earlier used the MMAPv1 storage engine by default, which is based on the B-Tree type.

Padding

The MMAPv1 storage engine uses a process called "record allocation" to allocate disk space for document storage. What is different between MongoDB and Cassandra is that you need to update the original document. If the original document space is insufficient, you need to move the document to a new location and update the corresponding index. This will lead to some unnecessary updates and data fragmentation.

In order to avoid the above situation, there is the concept of boundaries, which is to pre-allocate space for the document. But this may result in a waste of resources. Mongo pre-allocates according to the power-of-2 increasing strategy of 64M, 128M, 256M...2G, with a maximum of 2G. The problem is not obvious when the data size is small, but when it reaches 2G, the problem of large disk usage becomes apparent.

This is also different from relational databases, which store long record data separately.

Lock

MMAPv1 uses collection-level locks, which means that only one collection can be added, deleted, or modified at a time. When performing concurrent operations, waiting will occur.

WiredTiger

The default storage engine for 3.2 and later is also based on B-Tree. It uses concurrent technologies such as lock-free and risk pointer to make it work better on multi-core machines.

The lock level is document. And compression was introduced to reduce disk usage.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:

MySQL index for beginners
Solve MySQL deadlock routine by updating different indexes
Understanding MySQL deadlock routines through unique index S lock and X lock
Share some key interview questions about MySQL index
Index in MySQL
MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index
How to add index to mysql using shell script
Solutions to MySQL batch insert and unique index problems
Guide to Efficient Use of MySQL Indexes

<<: Vue+element ui realizes anchor positioning

>>: Summary of new usage of vi (vim) under Linux

Detailed explanation of MySQL combined query

A brief talk about Mysql index and redis jump table

Detailed explanation of MySQL combined query

The difference between shtml and html

How to view the storage location of MySQL data files

How to Rename Multiple Files at Once in Linux

Summary of MySQL's commonly used concatenation statements

Vue song progress bar sample code

Implementation of breakpoint resume in Node.js

vue2.x configuration from vue.config.js to project optimization

Example code for circular hover effect using CSS Transitions

CSS3 timeline animation

Recommend

The difference between Div and table in HTML (discussed in detail in all aspects)

Detailed explanation of the process of querying user permissions using mysql statements

Is a design that complies with design specifications a good design?

Detailed explanation of upgrading Python and installing pip under Linux

MySQL database import and export data error solution example explanation

Integration practice of Vue+Element background management framework

Antd+vue realizes the idea of dynamic verification of circular attribute form

Practical record of handling MySQL automatic shutdown problems

In-depth explanation of MySQL stored procedures (in, out, inout)

Webservice remote debugging and timeout operation principle analysis

How to copy MySQL table

Html+CSS floating advertisement strip implementation

Example sharing of anchor tag usage in HTML

CSS to achieve zoom in and out close button (example code)

MySQL 8.0.15 winx64 decompression version installation and configuration method graphic tutorial