First, let's talk about why we need to divide the table: When a sheet of data reaches millions, it will take longer to query once. If there is a joint query, it may fail there. The purpose of sharding tables is to reduce the burden on the database and shorten query time. In daily development, we often encounter large tables. The so-called large tables refer to tables that store millions or even tens of millions of records. Such a table is too large, causing the database to take too long to query and insert, resulting in poor performance. If a joint query is involved, the performance will be even worse. The purpose of table sharding and table partitioning is to reduce the burden on the database and improve the efficiency of the database. Generally speaking, it is to improve the efficiency of adding, deleting, modifying and checking tables. The amount of data in the database is not necessarily controllable. Without sharding, as time goes by and the business develops, there will be more and more tables in the database, and the amount of data in the tables will also increase. Correspondingly, the overhead of data operations, additions, deletions, modifications and queries will also increase. In addition, since distributed deployment is not possible, and the resources of a server (CPU, disk, memory, IO, etc.) are limited, the amount of data that the database can carry and the data processing capabilities will eventually encounter bottlenecks. The process of mysql executing a sql is as follows: 1. Receive sql; 2. Put the SQL in the queue; 3. Execute sql; 4. Return the execution result. Where does the most time-consuming process occur? First, it is the waiting time in queue, and second, the execution time of SQL. In fact, these two are the same thing. While waiting, there must be SQL being executed. So we need to shorten the execution time of SQL. There is a mechanism in MySQL called table locking and row locking. Why does this mechanism exist? It is to ensure the integrity of the data. Let me give you an example. If two SQLs want to modify the same data in the same table, what should we do at this time? Can both SQLs modify this data at the same time? Obviously, MySQL handles this situation in two ways: table locking (myisam storage engine) and row locking (innodb storage engine). Table locking means that none of you can operate on this table. You must wait until I finish operating on the table. The same is true for row locking. Other SQL statements must wait until I have completed the operation on this data before they can operate on this data. If there is too much data, the execution time will be too long, and the waiting time will be longer, which is why we need to divide the table. 2. The plan of sub-table 1. Cluster 1. When building a MySQL cluster, some people may ask, what is the relationship between the root table and the partitioned table? Although it is not a sharded table in the actual sense, it enables the function of sharding tables. What is the significance of clustering? To reduce the burden on a database, to put it simply, is to reduce the number of SQLs in the SQL queue. For example, if there are 10 SQL requests, if they are placed in the queue of a database server, they have to wait for a long time. If these 10 SQL requests are distributed to the queues of 5 database servers, there will only be 2 in the queue of one database server. In this way, will the waiting time be greatly shortened? This is already obvious. So I put it in the scope of the sub-table. I have done some MySQL clusters: Installation, configuration, and read-write separation of linux mysql proxy mysql replication mutual master-slave installation and configuration, as well as data synchronization Advantages: good scalability, no complicated operations after multiple sub-tables (PHP code) Disadvantages: The amount of data in a single table remains unchanged, the time spent on an operation is still the same, and the hardware overhead is high. 2. Sub-table Two ways to split the table: Splitting fields into different tables and splitting string type fields in the original table into other tables can speed up the query of the main table. 2. Vertical segmentation is by field. A database has 30 million user records, including dozens of fields such as id, user, password, first_name, last_name, email, addr, etc. User login requires user and password fields, and searching for user and password fields is slow. If user and password are created in a separate table, the speed will be faster. Create another table for other fields of the user. This is just an example. Split the data into multiple tables with the same structure. Horizontal means dividing by records. A database has 30 million user records and the processing speed is slow. In this case, the 30 million can be divided into five parts, each with 6 million, and placed on different machines. Horizontal table: That is, we estimate in advance that there will be tables with large data volumes and frequent accesses, and divide them into several tables. This estimate is roughly correct. The table of posts in the forum will definitely be very large over time, with hundreds of thousands or even millions of data possible. There is an information table in the chat room. Dozens of people chat together for an entire night. Over time, the data in this table must be very large. There are many situations like this. Therefore, for this kind of large data volume table that can be estimated, we divide it into N tables in advance. The value of N depends on the actual situation. Take the chat information table as an example: I created 100 such tables in advance, message_00, message_01, message_02... message_98, message_99. Then I used the user ID to determine which table the user's chat information should be placed in. I could use the remainder method to get the information. 3. In practical application: It is necessary to combine vertical table partitioning and horizontal table partitioning. If a database has 30 million users, you can consider vertical partitioning first, and then perform horizontal partitioning after the partitioning. That is, first split other fields into the user_info table, leaving only key fields such as user id, password, username, etc. in the user main table. Then perform horizontal splitting to divide the user and user information tables into multiple tables with the same structure. Next, let's take a look at how MYSQL works when storing data in separate tables: 1. Simple MySQL master-slave replication: MySQL's master-slave replication solves the read-write separation of the database and greatly improves the read performance, as shown in the following figure: The master-slave replication process is shown in the following figure: However, master-slave replication also brings a series of other performance bottlenecks: 1. Writes cannot be scaled 2. Writes cannot be cached 3. Replication delay 4. Increased lock rate 5. The table becomes larger and the cache rate decreases The problem must be solved, which leads to the following optimization solution. Let’s take a look. 2. MySQL vertical partitioning If the business is divided sufficiently independently, it would be a good solution to put the data of different businesses in different database servers. In addition, if one of the businesses crashes, it will not affect the normal operation of other businesses. It also plays a role in load distribution, greatly improving the database throughput. The database architecture diagram after vertical partitioning is as follows: However, although the businesses are independent enough, there are always some connections between them, such as users, who are basically associated with each business. Moreover, this partitioning method cannot solve the problem of the surge in data volume in a single table, so why not try horizontal partitioning? 3. MySQL horizontal sharding This is a very good idea. Users are grouped according to certain rules (by ID hash) and the data of this group of users is stored in a database shard, i.e. a sharding. In this way, as the number of users increases, it is only necessary to simply configure a server. The principle diagram is as follows: How to determine the shard where a user is located? You can create a data table corresponding to users and shards. Each time a request is made, first find the user's shard ID from this table, and then query the relevant data from the corresponding shard, as shown in the following figure: Single database and single table Single database and single table is the most common database design. For example, there is a user table in the database db, and all users can be found in the user table in the db library. Single database with multiple tables As the number of users increases, the amount of data in the user table will become larger and larger. When the amount of data reaches a certain level, queries on the user table will gradually slow down, thus affecting the performance of the entire DB. If you use MySQL, there is a more serious problem. When you need to add a column, MySQL will lock the table, and all read and write operations will have to wait. The user can be split horizontally in some way to generate two tables with exactly the same table structure, such as user_0000 and user_0001. The data of user_0000 + user_0001 + ... is exactly a complete set of data. Multiple databases and multiple tables As the amount of data increases, the storage space of a single DB may not be enough. As the number of queries increases, a single database server may no longer be able to support it. At this time, the database can be horizontally differentiated. Rules for sharding When designing a table, you need to determine the rules for dividing the table into different databases and tables. For example, when a new user comes in, the program must determine which table to add the user information to; similarly, when logging in, we must find the corresponding record in the database through the user's account number, and all of this needs to be done according to a certain rule. routing The process of finding the corresponding tables and libraries through the library and table partitioning rules. For example, if the rule for splitting databases and tables is user_id mod 4, when a user registers a new account with account id 123, we can use id mod 4 to determine that this account should be saved in the User_0003 table. When user 123 logs in, we verify that the record is in User_0003 by performing 123 mod 4. Problems and precautions caused by sub-library and sub-table1. Issues with the Dimension of Splitting Databases and TablesIf a user purchases a product, the transaction record needs to be saved and retrieved. If the table is divided according to the user's latitude, each user's transaction record is saved in the same table, so it is quick and convenient to find a user's purchase status. However, the purchase status of a certain product is likely to be distributed in multiple tables, which is more difficult to find. On the contrary, if you divide the table by product dimension, you can easily find the purchase status of this product, but it is more troublesome to find the transaction record of the buyer. So the common solutions are: a. Solve the problem by scanning the table. This method is basically impossible and the efficiency is too low. b. Record two sets of data, one according to the user dimension and the other according to the product dimension. c. Solve it through search engines, but if the real-time requirement is very high, it will be related to real-time search. 2. Problems with joint queriesUnion queries are basically impossible because the related tables may not be in the same database. 3. Avoid cross-database transactionsAvoid modifying tables in db1 while modifying tables in db0 in one transaction. This makes the operation more complicated and affects efficiency. 4. Try to put the same set of data on the same DB serverFor example, if seller A's products and transaction information are placed in db0, when db1 fails, seller A's related information can be used normally. This means preventing data in one database from being dependent on data in another database. One master, multiple backups In practical applications, in most cases, reads far outweigh writes. MySQL provides a read-write separation mechanism. All write operations must correspond to the Master. Read operations can be performed on the Master and Slave machines. The structure of the Slave is exactly the same as that of the Master. A Master can have multiple Slaves, and even Slaves can be attached to the Slave. This method can effectively improve the QPS of the DB cluster. All write operations are performed on the Master first and then synchronized to the Slave, so there is a certain delay in synchronizing from the Master to the Slave machine. When the system is very busy, the delay problem will be more serious, and the increase in the number of Slave machines will also make this problem more serious. In addition, it can be seen that the Master is the bottleneck of the cluster. When there are too many write operations, the stability of the Master will be seriously affected. If the Master fails, the entire cluster will not work properly. Therefore, 1. When the reading pressure is very high, you can consider adding slave machines to solve the problem, but when the number of slave machines reaches a certain level, you have to consider splitting the database. 2. When the writing pressure is very high, database sharding is necessary. Why should MySQL be divided into databases and tables?It can be said that wherever MySQL is used, as long as the amount of data is large, you will immediately encounter a problem, which is to divide the database into tables. Here is a question: Why do we need to divide the database and table? Can't MySQL handle large tables? In fact, it is possible to handle large tables. In the projects I have experienced, the physical file size of a single table is more than 80G, the number of records in a single table is more than 500 million, and this table It belongs to a very core table: friend relationship table. But this method is not the best one. Because file systems such as Ext3 file system also have many problems in handling large files. This level can be replaced with the xfs file system. However, when a single MySQL table is too large, there is a problem that is difficult to solve: the operation base related to table structure adjustment This is no longer possible. Therefore, large projects will face the application of separate databases and tables during use. From the perspective of Innodb itself, there are only two locks on the Btree of the data file, the leaf node lock and the child node lock. As you can imagine, when a page split or addition occurs, When a new leaf is created, data cannot be written into the table. Therefore, sub-library and sub-table is still a better choice. So how many sub-libraries and sub-tables are appropriate? After testing, the write and read performance is relatively good in a single table with 10 million records. In this way, if a buffer is left, the single table is full of data fonts. Less than 8 million records, and single tables with character data types are kept below 5 million. If the plan is based on 100 databases and 100 tables, such as user business: 5 million*100*100 = 50000000 = 500 billion records. Once you have a rough idea in mind, it is relatively easy to make plans based on business. You may also be interested in:
|
<<: How to start a Vue.js project
Table of contents 1. Introduction 1.1 Babel Trans...
Ideas: An outer box sets the background; an inner...
In a front-end technology group before, a group m...
Optimization ideas There are two main optimizatio...
The centos8 distribution is released through the ...
What is pip pip is a Python package management to...
This article shares the specific code of js to ac...
1.watch listener Introducing watch import { ref, ...
This is my first blog. It’s about when I started ...
Table of contents Drag and drop implementation Dr...
【Historical Background】 I have been working as a ...
1. Install xshell6 2. Create a server connection ...
1. On a networked machine, use the default centos...
Each web page has an address, identified by a URL...
The following error occurs when entering Chinese ...