Summary of MySQL's commonly used database and table sharding solutions

Summary of MySQL's commonly used database and table sharding solutions

1. Database bottleneck

Whether it is an IO bottleneck or a CPU bottleneck, it will eventually lead to an increase in the number of active connections of the database, and then approach or even reach the threshold of the number of active connections that the database can carry. From the perspective of business services, there are few or even no available database connections. You can imagine what will happen next (concurrency, throughput, and crashes).

1. IO bottleneck

The first type: Disk read IO bottleneck. There is too much hot data and the database cache cannot store it. Each query will generate a large amount of IO, which reduces the query speed -> sharding and vertical sharding.

The second type: Network IO bottleneck, too much data is requested, and the network bandwidth is insufficient -> sharding.

2. CPU bottleneck

The first type: SQL problems, such as SQL containing join, group by, order by, non-index field conditional query, etc., which increase CPU computing operations -> SQL optimization, establish appropriate indexes, and perform business calculations at the business service layer.

The second type: The amount of data in a single table is too large, too many rows are scanned during query, SQL efficiency is low, and the CPU is the first bottleneck -> Horizontal table partitioning.

2. Sub-library and sub-table

1. Horizontal database

Concept: Split the data in one database into multiple databases based on fields and certain strategies (hash, range, etc.).

result:

  • The structure of each library is the same;
  • The data in each database is different and there is no overlap;
  • The union of all libraries is the full data;

Scenario: The absolute concurrency of the system has increased, and table sharding is difficult to fundamentally solve the problem. In addition, there is no clear business affiliation to vertically shard the database.

Analysis: With more libraries, the pressure on IO and CPU can be relieved exponentially.

2. Horizontal table

Concept: Split the data in one table into multiple tables based on fields and according to certain strategies (hash, range, etc.).

result:

  • The structure of each table is the same;
  • The data in each table is different and there is no intersection;
  • The union of all tables is the full data;

Scenario: The absolute concurrency of the system has not increased, but the amount of data in a single table is too large, which affects the SQL efficiency and increases the CPU burden, thus becoming a bottleneck. Recommended: An analysis of the principles of SQL query optimization

Analysis: The amount of data in the table is reduced, and the efficiency of single SQL execution is high, which naturally reduces the burden on the CPU.

3. Vertical sub-database

Concept: Based on the table, different tables are divided into different databases according to different business attributes.

result:

  • Each library is structured differently;
  • The data in each database is also different and there is no intersection;
  • The union of all libraries is the full data;

Scenario: The absolute concurrency of the system has increased, and separate business modules can be abstracted.

Analysis: At this point, it can basically be turned into a service.

For example, as the business develops, there are more and more public configuration tables and dictionary tables. At this time, these tables can be separated into separate libraries or even turned into services. Furthermore, as the business develops and a set of business models are developed, the related tables can be separated into separate databases or even turned into services.

4. Vertical table

Concept: Based on the fields and according to the activity of the fields, the fields in the table are divided into different tables (main table and extended table).

result:

  • Each table is structured differently;
  • The data in each table is also different. Generally speaking, the fields of each table have at least one column that intersects, usually the primary key, which is used to associate data;
  • The union of all tables is the full data;

Scenario: The absolute concurrency of the system has not increased. The table does not have many records, but has many fields. Hot data and non-hot data are stored together, so the storage space required for a single row of data is large. As a result, the number of data rows in the database cache is reduced, and a large amount of random read IO will be generated when reading disk data during queries, causing an IO bottleneck.

Analysis: You can use list pages and detail pages to help you understand. The principle of vertical table splitting is to put hot data (data that may be redundant and often queried together) together as the main table, and put non-hot data together as the extended table. In this way, more hot data can be cached, thereby reducing random read IO. After the split, if you want to get all the data, you need to join two tables to get the data.

But remember, never use join, because join will not only increase the CPU burden but also couple the two tables together (must be on one database instance). For associated data, you should work on the business service layer, get the main table and extended table data separately, and then use the associated fields to associate to get all the data.

3. Library and table sharding tools

  • sharding-sphere: jar, formerly sharding-jdbc;
  • TDDL: jar, Taobao Distributed Data Layer;
  • Mycat: middleware.

Note: Please research the pros and cons of the tool yourself, with the official website and community taking priority.

4. Steps for splitting databases and tables

Evaluate the number of shards or tables based on capacity (current capacity and growth) -> select key (evenly) -> table sharding rules (hash or range, etc.) -> execute (generally double write) -> capacity expansion issues (minimize data movement).

5. Issues with sharding

1. Non-partition key query problem

Based on horizontal database and table sharding, the splitting strategy is the commonly used hash method.

In addition to the partition key, there is only one non-partition key on the client as a condition for query

Mapping method

Genetic method

Note: When writing, the user_id is generated by the genetic method, as shown in the figure. Regarding the x-bit gene, for example, if it is divided into 8 tables, 23=8, so x is 3, which is a 3-bit gene. When querying based on user_id, the module can be directly routed to the corresponding sub-library or sub-table.

When querying based on user_name, first generate user_name_code through the user_name_code generation function and then take the modulus and route it to the corresponding sub-library or sub-table. The commonly used Snowflake algorithm for id generation.

In addition to the partition key, there is more than one non-partition key as a condition for query on the client

Mapping method

Redundancy method

Note: When querying by order_id or buyer_id, the query is routed to the db_o_buyer database; when querying by seller_id, the query is routed to the db_o_seller database. It feels a bit like putting the cart before the horse! Is there any other good way? What about changing the technology stack?

In addition to the partition key, the background also has various non-partition key combination condition queries

NoSQL approach

Redundancy method


2. Non-partition key cross-database and cross-table paging query problem

Based on horizontal database and table sharding, the splitting strategy is the commonly used hash method.

Note: Solved using NoSQL methods (ES, etc.).

3. Capacity expansion issues

Based on horizontal database and table sharding, the splitting strategy is the commonly used hash method.

Horizontal expansion of the database (upgrade from the database method)

Note: The expansion is exponential.

Horizontal expansion table (double write migration method)

Step 1: (Synchronous dual writing) Modify the application configuration and code, add dual writing, and deploy;

Step 2: (Synchronous Dual Write) Copy the old data in the old database to the new database;

Step 3: (Synchronous dual writing) Verify the old data in the new database based on the old database;

Step 4: (Synchronous double writing) Modify the application configuration and code, remove the double writing, and deploy;

Note: Double writing is a common solution.

6. Summary of Sub-library and Sub-table

To split the database and tables, you must first know where the bottleneck is, and then you can split it reasonably (split the database or split the table? Horizontally or vertically? How many times?). And it cannot be split for the purpose of dividing the database and table.

Choosing the key is very important. It is necessary to consider both even splitting and non-partition key queries.

As long as the requirements are met, the splitting rules should be as simple as possible.

This concludes this article on the summary of commonly used MySQL sharding solutions. For more information about MySQL sharding, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Getting Started Guide to MySQL Sharding
  • A brief discussion on order reconstruction: MySQL sharding
  • MySQL sharding details
  • Mysql database sharding and table sharding completely collapsed
  • Several methods of primary key processing after Mysql database and table sharding
  • SpringBoot+MybatisPlus+Mysql+Sharding-JDBC sharding
  • Several ways to shard MySQL databases and tables

<<:  Using JS to implement a rotating Christmas tree in HTML

>>:  A simple way to clear the CSS, JavaScript and background image cache in the browser

Recommend

CSS sets the box container (div) height to always be 100%

Preface Sometimes you need to keep the height of ...

Some common mistakes with MySQL null

According to null-values, the value of null in My...

Solution to Ubuntu 20.04 Firefox cannot play videos (missing flash plug-in)

1. Flash plug-in package download address: https:...

Dynamic SQL statement analysis in Mybatis

This article mainly introduces the dynamic SQL st...

Tutorial on installing rabbitmq using yum on centos8

Enter the /etc/yum.repos.d/ folder Create rabbitm...

Vue achieves seamless carousel effect

This article shares the specific code of Vue to a...

How to upgrade MySQL 5.6 to 5.7 under Windows

Written in front There are two ways to upgrade My...

js to call the network camera and handle common errors

Recently, due to business reasons, I need to acce...

How to deploy the crownblog project to Alibaba Cloud using docker

Front-end project packaging Find .env.production ...

Getting Started: A brief introduction to HTML's basic tags and attributes

HTML is made up of tags and attributes, which are...

How to solve the problem of character set when logging in to Linux

Character set error always exists locale: Cannot ...

Linux common text processing commands and vim text editor

Today, let's introduce several common text pr...

How to run top command in batch mode

top command is the best command that everyone is ...

How to set up scheduled tasks in Linux and Windows

Table of contents Linux 1. Basic use of crontab 2...