A brief talk about Mysql index and redis jump table

A brief talk about Mysql index and redis jump table

summary

During the interview, when discussing about MySQL index issues, I found that some people could talk about the differences between B+ tree, B tree, and balanced binary tree, but could not tell the difference between B+ tree and hash index. It is obvious that this is rote memorization and does not understand the essence of indexing. This article aims to analyze the principles behind this. Welcome to leave a message to discuss

question

If you are confused or have only a vague understanding of the following questions, please continue reading. I believe this article will definitely help you.

  • How to implement mysql index
  • What is the difference between mysql index structure B+ tree and hash? What scenarios are they suitable for?
  • Are there other ways to implement database indexes?
  • How is redis jump table implemented
  • What are the differences between skip lists, B+ trees, and LSM trees?

Analysis

First of all, why should we discuss MySQL index and Redis skip table together? Because they solve the same problem, which is to quickly find the location (or corresponding value) of a data set according to the specified key.

When you think about the problem from this perspective, do you still not know the difference between B+ tree index and hash index?

The problem of finding a data set

Now we have clearly defined the boundaries of the problem domain in order to solve the problem of searching for data sets. What issues need to be considered in this regard?

  1. What search methods need to be supported, single key/multiple key/range search,
  2. Insertion/deletion efficiency
  3. Search efficiency (i.e. time complexity)
  4. Storage size (space complexity)

Let's look at several commonly used search structures

hash

Hash is in the form of key and value. Through a hash function, the value can be quickly found according to the key.

B+ Tree

The B+ tree evolved from the balanced binary tree. Why didn’t we learn about structures like B+ tree and skip list in the algorithm class? Because they are all obtained from engineering practice and are compromised on the basis of theory.

The B+ tree is first of all an ordered structure. In order to prevent the tree from being too high and affecting the search efficiency, a page of data is stored on the leaf node instead of a single data, which improves the search efficiency. In order to better support range queries, the B+ tree redundantly stores non-leaf node data on the leaf nodes. In order to support page turning, the leaf nodes are connected by pointers.

Jump table

The jump list is extended on the basis of the linked list in order to implement the sorted set data structure of redis. Level 0: It stores the original data. It is an ordered linked list. Each node is on the chain. Level 0+: The nodes are connected in series through pointers. It is a subset of the original data. The higher the level, the less data is connected in series. This can significantly improve the search efficiency.

Summarize

Data Structure Implementation principle Key query method Search efficiency Storage size Insertion and deletion efficiency
Hash Hash Table Support single key Close to O(1) Small, no additional storage except data O(1)
B+ Tree Extended from balanced binary tree Single key, range, paging O(Log(n) In addition to the data, there are also left and right pointers, as well as leaf node pointers O(Log(n), the tree structure needs to be adjusted, and the algorithm is more complicated
Jump table Extended from ordered linked list Single key, paging O(Log(n) In addition to data, there are pointers, but the number of pointers per node is less than 2, so it takes up less space than a B+ tree. O(Log(n), only needs to process linked lists, the algorithm is relatively simple

If you are interested in LSM structure, you can read cassandra vs mongo (1) Storage Engine

Cassandra vs Mongo (1) Storage Engine

Summary

Storage Engine:

B-Tree

Cache Management

The core of cache management lies in the replacement algorithm. Common replacement algorithms include FIFO (First In First Out) and LRU (Least Recently Used). Relational databases have been improved on the basis of LRU, mainly using LIRS (Low Inter-reference Recency Set)
The cache is divided into two levels. The first level uses LRU. The most recently used data will enter the first level. If the data is accessed twice or more in a short period of time, it becomes hot data and enters the second level. This avoids the situation where a large amount of hot data in the cache may be replaced when performing a full table scan.

LSM

Log-Structured Merge Tree: The core idea of ​​the structured merge tree is not to write data from memory to disk immediately, but to save it in memory first, and then flush it to disk after a certain amount of data has accumulated.

LSM VS B-Tree

LSM sacrifices some read performance to obtain better write performance based on B-Tree, and uses other implementations to compensate for the read performance, such as boom-filter.

1. Write

When writing to a B-tree, the first step is to find the corresponding block location and then insert the new data. As more and more data is written, nodes have to be split to maintain the B-tree structure. This will increase the probability of random writes when inserting data and reduce performance.

LSM forms a small sorted tree in memory, and then continuously merges it when flushing to disk. Because all writes are done in memory and not to disk, writes are very efficient.

2. Read

The B-tree starts with a binary query from the root node to the leaf node, reading one node at a time. If the corresponding page is not in the memory, the disk is read to cache the data.

The entire structure of the LSM tree is not ordered, so we don’t know where the data is. We need to do a binary query from each small ordered structure. If we find the data, we return it. If we don’t find the data, we continue to look for the next ordered structure. So LSM sacrifices read performance. However, the reason why LSM can be used as a large-scale data storage system is that the read performance can be improved in other ways. For example, the read performance depends more on the memory/cache hit rate rather than the disk read.

Cassandra

Cassandra is a NoSql database with better write performance than read performance. One reason for the good write performance is that it uses the LSM storage engine.

Mongo

MMAPv1

Mongo 3.2 and earlier used the MMAPv1 storage engine by default, which is based on the B-Tree type.

Padding

The MMAPv1 storage engine uses a process called "record allocation" to allocate disk space for document storage. What is different between MongoDB and Cassandra is that you need to update the original document. If the original document space is insufficient, you need to move the document to a new location and update the corresponding index. This will lead to some unnecessary updates and data fragmentation.

In order to avoid the above situation, there is the concept of boundaries, which is to pre-allocate space for the document. But this may result in a waste of resources. Mongo pre-allocates according to the power-of-2 increasing strategy of 64M, 128M, 256M...2G, with a maximum of 2G. The problem is not obvious when the data size is small, but when it reaches 2G, the problem of large disk usage becomes apparent.

This is also different from relational databases, which store long record data separately.

Lock

MMAPv1 uses collection-level locks, which means that only one collection can be added, deleted, or modified at a time. When performing concurrent operations, waiting will occur.

WiredTiger

The default storage engine for 3.2 and later is also based on B-Tree. It uses concurrent technologies such as lock-free and risk pointer to make it work better on multi-core machines.

The lock level is document. And compression was introduced to reduce disk usage.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • MySQL index for beginners
  • Solve MySQL deadlock routine by updating different indexes
  • Understanding MySQL deadlock routines through unique index S lock and X lock
  • Share some key interview questions about MySQL index
  • Index in MySQL
  • MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index
  • How to add index to mysql using shell script
  • Solutions to MySQL batch insert and unique index problems
  • Guide to Efficient Use of MySQL Indexes

<<:  Vue+element ui realizes anchor positioning

>>:  Summary of new usage of vi (vim) under Linux

Recommend

Where is the project location deployed by IntelliJ IDEA using Tomcat?

After IntelliJ IDEA deploys a Javaweb project usi...

Incredible CSS navigation bar underline following effect

The first cutter in China github.com/chokcoco Fir...

Implementation of Docker to build Zookeeper&Kafka cluster

I've been learning Kafka recently. When I was...

Tomcat maxPostSize setting implementation process analysis

1. Why set maxPostSize? The tomcat container has ...

SQL method for calculating timestamp difference

SQL method for calculating timestamp difference O...

JavaScript imitates Jingdong carousel effect

This article shares the specific code for JavaScr...

Summary of MySQL LOAD_FILE() function method

In MySQL, the LOAD_FILE() function reads a file a...

Small problem with the spacing between label and input in Google Browser

Code first, then text Copy code The code is as fol...

Native JS realizes the special effect of spreading love by mouse sliding

This article shares with you a js special effect ...

Reasons and solutions for MySQL sql_mode modification not taking effect

Table of contents Preface Scenario simulation Sum...

Document Object Model (DOM) in JavaScript

Table of contents 1. What is DOM 2. Select elemen...