Many friends have asked in forums and message areas under what circumstances MySQL needs to be sharded, and what design method is the best choice. Based on these questions, the editor has compiled some application scenarios and examples of the best design methods for MySQL sharding. 1. Sub-table Scenario: For large-scale Internet applications, the number of record rows in a single database table may reach tens of millions or even hundreds of millions, and the database faces extremely high concurrent access. MySQL architecture using Master-Slave replication mode, Only the reading of the database can be expanded, while the writing operation of the database is still concentrated on the Master. In addition, it is impossible to have an unlimited number of Slaves mounted on a single Master. The number of Slaves is limited by the Master's capabilities and load. Therefore, the database throughput capacity needs to be further expanded to meet the needs of high concurrent access and massive data storage! For a single table that is accessed very frequently and has a huge amount of data, the first thing we need to do is to reduce the number of records in the single table in order to reduce the time required for data query and improve the database throughput. This is the so-called table sharding! Before sharding a table, you first need to select an appropriate sharding strategy so that the data can be evenly distributed across multiple tables without affecting normal queries! For Internet companies, most data is associated with users, so user ID is the most commonly used sub-table field. Because most queries need to carry user ID, this will not affect the query and can make the data more balanced. Distributed to each table (of course, in some scenarios, the distribution of hot and cold data may be unbalanced), as shown in the following figure: Suppose there is an order table that records user purchase information. Since the order table has too many records, it will be split into 256 tables. The split records are stored in the corresponding table according to user_id%256, and the front-end application finds the corresponding order storage table and accesses it according to the corresponding user_id%256. In this way, user_id becomes a necessary query condition, otherwise the data cannot be accessed because the table where the data is stored cannot be located. Note: The number of tables after splitting is generally 2 to the power of n, which is the reason why it is split into 256 tables above! Assume that the order table structure is as follows: create table order_( order_id bigint(20) primary key auto_increment, user_id bigint(20), user_nick varchar(50), auction_id bigint(20), auction_title bigint(20), price bigint(20), auction_cat varchar(200), seller_id bigint(20), seller_nick varchar(50) ) After the table is split, assuming user_id = 257 and auction_id = 100, you need to query the corresponding order information based on auction_id. The corresponding SQL statement is as follows: select * from order_1 where user_id=257 and auction_id = 100; Among them, order_1 is calculated based on 257%256, which means the first order table after the partition. 2. Database Division Scenario: Table sharding can solve the problem of decreased query efficiency caused by excessive data volume in a single table, but it cannot bring about a qualitative improvement in the concurrent processing capability of the database. In the face of high concurrent read and write access, when the database master When the server cannot bear the pressure of write operations, it is meaningless to expand the slave server no matter how you do it. Therefore, we must change our thinking and split the database to improve the database writing capability. This is the so-called database partitioning! Similar to the table sharding strategy, database sharding can use a keyword modulo method to route data access, as shown in the following figure: Still using the previous order table, assuming that the value of the user_id field is 258, the original single database is divided into 256 databases, then the application's access request to the database will be routed to the second database (258%256 = 2). 3. Sub-library and sub-table Scenario: Sometimes the database may face the pressure of high concurrent access and the need to store massive data. In this case, it is necessary to adopt both the table sharding strategy and the library sharding strategy for the database in order to expand the system at the same time. Concurrent processing capabilities and improving the query performance of a single table, this is what is called sharding. The strategy of sharding is more complicated than the previous strategy of sharding only or only sharding only. A routing strategy of sharding is as follows: 1. Intermediate variable = user_id % (number of sub-databases * number of tables in each database) 2. Library = integer (intermediate variable / number of tables in each library) 3. Table = Intermediate variable % Number of tables in each library Also use user_id as the routing field. First, use user_id to modulo the number of libraries * the number of each library table to get an intermediate variable. Then divide the intermediate variable by the number of each library table and round it up to get The corresponding library; and the intermediate variable modulo the number of each library table, that is, the corresponding table is obtained. The detailed process of the database and table sharding strategy is as follows: Assume that the original single-database single-table order is split into 256 libraries, each containing 1024 tables. Then, according to the routing strategy mentioned above, for the access of user_id=262145, the routing calculation process is as follows: 1. Intermediate variable = 262145 % (256 * 1024) = 1 2. Library = rounded (1/1024) = 0 3. Table = 1 % 1024 = 1 This means that the query and modification of the order record of user_id=262145 will be routed to the first order_1 table in the 0th database for execution! ! ! You may also be interested in:
|
<<: Four ways to switch tab pages in VUE
>>: Explanation of installation and configuration of building go environment under linux
Original : http://developer.yahoo.com/performance...
There are two types of dead link formats defined b...
1. Install Docker yum -y install docker-io The &q...
background Today, while cooperating with other pr...
This article shares the specific code of js to ac...
Now most projects have begun to be deployed on Do...
The ".zip" format is used to compress f...
Table of contents vuex persistence Summarize vuex...
The Nginx ngx_http_image_filter_module module (ng...
This article describes the MySQL user rights mana...
1 QPS calculation (number of queries per second) ...
The parent node of the parent node, for example, t...
There are three types of virtual hosts supported ...
1. Version Information # cat /etc/system-release ...
I recently upgraded a test server operating syste...