Summary of MySQL database and table sharding

Summary of MySQL database and table sharding

During project development, our database data is getting larger and larger, and as a result, there is too much data in a single table. As a result, queries became slower and application operations were severely affected due to the table locking mechanism, resulting in a database performance bottleneck.

When this happens, we can consider splitting the database or table, that is, splitting a single database or table into multiple libraries and multiple data tables. Then, when users access, they can access different libraries and different tables based on certain algorithms and logic. In this way, the data is dispersed into multiple data tables, reducing the access pressure of a single data table. Improved database access performance.

The following is a summary of the sub-libraries and sub-tables in the project:

Single database and single table

Single database and single table is the most common database design. For example, there is a user table in the database db, and all users can be found in the user table in the db library.

Single database with multiple tables

As the number of users increases, the amount of data in the user table will become larger and larger. When the amount of data reaches a certain level, queries on the user table will gradually slow down, thus affecting the performance of the entire DB. If you use MySQL, there is a more serious problem. When you need to add a column, MySQL will lock the table, and all read and write operations will have to wait.

The user can be split horizontally in some way to generate two tables with exactly the same table structure, such as user_0000 and user_0001. The data of user_0000 + user_0001 + ... is exactly a complete set of data.

Multiple databases and multiple tables

As the amount of data increases, the storage space of a single DB may not be enough. As the number of queries increases, a single database server may no longer be able to support it. At this time, the database can be horizontally differentiated.

Rules for sharding

When designing a table, you need to determine the rules for dividing the table into different databases and tables. For example, when a new user comes in, the program must determine which table to add the user information to; similarly, when logging in, we must find the corresponding record in the database through the user's account number, and all of this needs to be done according to a certain rule.

routing

The process of finding the corresponding tables and libraries through the library and table partitioning rules. For example, if the rule for splitting databases and tables is user_id mod 4, when a user registers a new account with account id 123, we can use id mod 4 to determine that this account should be saved in the User_0003 table. When user 123 logs in, we verify that the record is in User_0003 by performing 123 mod 4.

Problems and precautions caused by sub-library and sub-table

1. Issues with the dimensions of sub-libraries and sub-tables

If a user purchases a product, the transaction record needs to be saved and retrieved. If the table is divided according to the user's latitude, each user's transaction record is saved in the same table, so it is quick and convenient to find a user's purchase status. However, the purchase status of a certain product is likely to be distributed in multiple tables, which is more difficult to find. On the contrary, if you divide the table by product dimension, you can easily find the purchase status of this product, but it is more troublesome to find the transaction record of the buyer.

So the common solutions are:

a. Solve the problem by scanning the table. This method is basically impossible and the efficiency is too low.

b. Record two sets of data, one according to the user dimension and the other according to the product dimension.

c. Solve it through search engines, but if the real-time requirement is very high, it will be related to real-time search.

2. Problems with joint queries

Union queries are basically impossible because the related tables may not be in the same database.

3. Avoid cross-database transactions

Avoid modifying tables in db1 while modifying tables in db0 in one transaction. This makes the operation more complicated and affects efficiency.

4. Try to put the same set of data on the same DB server

For example, if seller A's products and transaction information are placed in db0, when db1 fails, seller A's related information can be used normally. This means preventing data in one database from being dependent on data in another database.

One master, multiple backups

In practical applications, in most cases, reads far outweigh writes. MySQL provides a read-write separation mechanism. All write operations must correspond to the Master. Read operations can be performed on the Master and Slave machines. The structure of the Slave is exactly the same as that of the Master. A Master can have multiple Slaves, and even Slaves can be attached to the Slave. This method can effectively improve the QPS of the DB cluster.

All write operations are performed on the Master first and then synchronized to the Slave, so there is a certain delay in synchronizing from the Master to the Slave machine. When the system is very busy, the delay problem will be more serious, and the increase in the number of Slave machines will also make this problem more serious.

In addition, it can be seen that the Master is the bottleneck of the cluster. When there are too many write operations, the stability of the Master will be seriously affected. If the Master fails, the entire cluster will not work properly.

so

1. When the reading pressure is very high, you can consider adding slave machines to solve the problem. However, when the number of slave machines reaches a certain level, you have to consider splitting the database.

2. When the writing pressure is very high, database sharding is necessary.

Why should MySQL be divided into databases and tables?

It can be said that wherever MySQL is used, as long as the amount of data is large, you will immediately encounter a problem, which is to divide the database into tables.

Here is a question: Why do we need to divide the database into different tables? Can't MySQL handle large tables?

In fact, it is possible to handle large tables. In the projects I have experienced, the physical file size of a single table is more than 80G, the number of records in a single table is more than 500 million, and this table is a very core table: the friend relationship table.

But this method is not the best one. Because file systems such as Ext3 file system also have many problems in handling large files. This level can be replaced by xfs file system. But when a single MySQL table is too large, there is a problem that is difficult to solve: operations related to table structure adjustment are basically impossible. Therefore, large items will face the application of sharding in the use of databases and tables.

Innodb itself has only two locks on the Btree of the data file, the leaf node lock and the child node lock. As you can imagine, when a page split occurs or a new leaf is added, data cannot be written to the table. Therefore, sharding is a better choice.

So how many sub-libraries and sub-tables are appropriate?

The test shows that the write and read performance is relatively good when the number of records in a single table is less than 10 million. In this way, if a buffer is left, the number of records in a single table with all data characters can be kept below 8 million, and the number of records in a single table with characters can be kept below 5 million.

If the plan is based on 100 databases and 100 tables, such as user business:

5 million*100*100 = 50000000 = 500 billion records.

Once you have a rough idea in mind, it is relatively easy to make plans based on business.

Real Questions

Suppose the number of website users is in the tens of millions, but the number of active users is only 1%. How to increase the access speed of active users by optimizing the database?

answer:

You can use MySQL partitioning to divide active users into one zone and inactive users into another zone. The active user zone itself has a relatively small amount of data, so the access speed of active users can be improved.

You can also divide the table horizontally, putting active users in one table and inactive users in another table, which can increase the access speed of active users.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links

You may also be interested in:
  • MYSQL performance optimization sharing (sharding of databases and tables)
  • Mysql database partitioning and table partitioning methods (commonly used)
  • Summary of MYSQL database data splitting: sub-library and sub-table
  • Application scenarios and design methods of MySQL table and database sharding
  • In-depth explanation of MySql table, database, sharding and partitioning knowledge
  • How to split data in MySQL table and database
  • Getting Started Guide to MySQL Sharding
  • Summary of MySQL's commonly used database and table sharding solutions
  • MySQL sharding details
  • MySQL sharding project practice

<<:  Summary of the understanding of virtual DOM in Vue

>>:  Modify the boot time of grub in ubuntu

Recommend

js regular expression lookahead and lookbehind and non-capturing grouping

Table of contents Combining lookahead and lookbeh...

How to turn a jar package into a docker container

How to turn a jar package into a docker container...

Use of Linux ipcs command

1. Command Introduction The ipcs command is used ...

Linux installation MySQL5.6.24 usage instructions

Linux installation MySQL notes 1. Before installi...

Detailed explanation of Linux DMA interface knowledge points

1. Two types of DMA mapping 1.1. Consistent DMA m...

Introduction to MySQL Connection Control Plugin

Table of contents 1. Introduction to the connecti...

Summary of data interaction between Docker container and host

Preface When using Docker in a production environ...

How to hide the version number in Nginx

Nginx hides version number In a production enviro...

How to write elegant JS code

Table of contents variable Use meaningful and pro...

Docker meets Intellij IDEA, Java development improves productivity tenfold

Table of contents 1. Preparation before developme...

onfocus="this.blur()" is hated by blind webmasters

When talking about the screen reading software op...

JavaScript generates random graphics by clicking

This article shares the specific code of javascri...