1 ReviewIn the previous section, we explained in detail how to partition a database, including vertical splitting (Scale Up) and horizontal splitting (Scale Out). We also briefly summarized several strategies for horizontal partitioning. Let’s review them now. 2 Five strategies for horizontal partitioning2.1 HashThis strategy is to calculate the Hash Key of one or more columns of the table, and finally partition the data areas corresponding to different values of the Hash code. For example, we can create a strategy to partition the year of the date in the table so that each year will be clustered in one interval. PARTITION BY HASH(YEAR(createtime)) PARTITIONS 10 2.2 RangeThis strategy is to divide the data into different ranges. For example, we can divide a table with tens of millions of data into four partitions by ID, with each partition containing about 5 million data, and all data exceeding 7.5 million are placed in the fourth partition. PARTITION BY RANGE(id) ( PARTITIONP0 VALUES LESS THAN(2500001), PARTITIONP1 VALUES LESS THAN(5000001), PARTITIONp2 VALUES LESS THAN(7500001), PARTITIONp3 VALUES LESS THAN MAXVALUE ) 2.3. KeyAn extension of the Hash strategy, where the Hash Key is generated by the MySQL system. 2.4. List (predefined list)This strategy allows the system to split the rows by the values corresponding to the defined list. For example, we divide the area according to the job code, and the codes of different job types correspond to different partitions to achieve the purpose of divide and conquer. PARTITION BY LIST(gwcode) ( PARTITIONP0 VALUES IN (46,77,89), PARTITIONP1 VALUES IN (106,125,177), PARTITIONP2 VALUES IN (205,219,289), PARTITIONP3 VALUES IN (302,317,458,509,610) ) The above SQL script uses the list matching LIST function to partition the employee position numbers into 4 partitions. Administrative positions with numbers 46, 77, and 89 are in partition P0, technical positions with numbers 106, 125, and 177 are in partition P1, and so on. 2.5. CompositeThe composite mode is actually a combination of the above modes. For example, you can perform Hash partitioning on the basis of Range. 3 Testing Range Strategy3.1 Create master table and sub-tableWe create a common user table users, and then create a partition table users_part to partition users born in the 1980s by year, as follows: 3.1.1 Summary table statementmysql> CREATE TABLE users ( "id" int(10) unsigned NOT NULL, "name" varchar(100) DEFAULT NULL, "birth" datetime )ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected 3.1.2 Split table statementNote that the last line assigns all users born after 1989 to the 10th partition. We simulate users born in the 1980s, and the actual business will be split according to the specific situation. mysql> create table users_part ( "id" int(10) unsigned NOT NULL, "name" varchar(100) DEFAULT NULL, "birth" datetime ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY RANGE (year(birth)) ( PARTITION p0 VALUES LESS THAN (1981), PARTITION p1 VALUES LESS THAN (1982), PARTITION p2 VALUES LESS THAN (1983), PARTITION p3 VALUES LESS THAN (1984), PARTITION p4 VALUES LESS THAN (1985), PARTITION p5 VALUES LESS THAN (1986), PARTITION p6 VALUES LESS THAN (1987), PARTITION p7 VALUES LESS THAN (1988), PARTITION p8 VALUES LESS THAN (1989),17 PARTITION p9 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected 3.2 Initialize table dataWe can use functions or stored procedures to initialize data in batches and insert 10 million pieces of data here. DROP PROCEDURE IF EXISTS init_users_part; delimiter $ /* Set the statement terminator to $*/ CREATE PROCEDURE init_users_part() begin DECLARE srt int default 0; while srt < 10000000 /* Set to write 1000W data*/ do insert into `users_part` values (srt, concat('username_',idx1),adddate('1980-01-01',rand() * 3650)); /*Randomly select values within 10 years*/ set srt = srt + 1; end while; end $ delimiter ; call init_users_part(); 3.3 Synchronize data to the complete tablemysql> insert into users select * from users_part; //Copy 10 million data to the unpartitioned complete table users Query OK, 10000000 rows affected (51.59 sec) Records: 10000000 Duplicates: 0 Warnings: 0 3.4 Test the efficiency of SQL executionmysql> select count(*) from users_part where `birth` > '1986-01-01' and `birth` < '1986-12-31'; +----------+ | count(*) | +----------+ |976324| +----------+ 1 row in set (0.335 sec) mysql> select count(*) from users where `birth` > '1986-01-01' and `birth` < '1986-12-31'; +----------+ | count(*) | +----------+ |976324| +----------+ 1 row in set (5.187 sec) The results are quite clear. The execution efficiency of the partitioned table is indeed higher, and the execution time is less than 1/10 of that of the non-partitioned table. 3.5 Using Explain to perform plan analysismysql> explain select count(*) from users_part where `birth` > '1986-01-01' and `birth` < '1986-12-31'; +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | SIMPLE | users_part | p7 | ALL | NULL | NULL | NULL | NULL | 987769| 100.00 | Using where | +----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select count(*) from users where `birth` > '1986-01-01' and `birth` < '1986-12-31'; +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | users | NULL | ALL | NULL | NULL | NULL | NULL |10000000 | 100.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) Here we focus on two key parameters: one is partitions. In users_part, it is p7, which means that the data is retrieved in the seventh partition. The users table is null, which means that it is a full-area scan without partitions. Another parameter is rows, which is the number of rows predicted to be scanned. The users table is obviously a full table scan. 3.6 Improving Indexing EfficiencyBecause we use the birth field for partitioning and conditional queries, we try to create an index on the birth field to optimize efficiency. mysql> create index idx_user on users(birth); Query OK, 0 rows affected (1 min 7.04 sec) Records: 10000000 Duplicates: 0 Warnings: 0 mysql> create index idx_user_part on users_part(birth); Query OK, 0 rows affected (1 min 1.05 sec) Records: 10000000 Duplicates: 0 Warnings: 0 List of database file sizes after index creation: 2008-05-24 09:23 8,608 no_part_tab.frm 2008-05-24 09:24 255,999,996 no_part_tab.MYD 2008-05-24 09:24 81,611,776 no_part_tab.MYI 2008-05-24 09:25 0 part_tab#P#p0.MYD 2008-05-24 09:26 1,024 part_tab#P#p0.MYI 2008-05-24 09:26 25,550,656 part_tab#P#p1.MYD 2008-05-24 09:26 8,148,992 part_tab#P#p1.MYI 2008-05-24 09:26 25,620,192 part_tab#P#p10.MYD 2008-05-24 09:26 8,170,496 part_tab#P#p10.MYI 2008-05-24 09:25 0 part_tab#P#p11.MYD 2008-05-24 09:26 1,024 part_tab#P#p11.MYI 2008-05-24 09:26 25,656,512 part_tab#P#p2.MYD 2008-05-24 09:26 8,181,760 part_tab#P#p2.MYI 2008-05-24 09:26 25,586,880 part_tab#P#p3.MYD 2008-05-24 09:26 8,160,256 part_tab#P#p3.MYI 2008-05-24 09:26 25,585,696 part_tab#P#p4.MYD 2008-05-24 09:26 8,159,232 part_tab#P#p4.MYI 2008-05-24 09:26 25,585,216 part_tab#P#p5.MYD 2008-05-24 09:26 8,159,232 part_tab#P#p5.MYI 2008-05-24 09:26 25,655,740 part_tab#P#p6.MYD 2008-05-24 09:26 8,181,760 part_tab#P#p6.MYI 2008-05-24 09:26 25,586,528 part_tab#P#p7.MYD 2008-05-24 09:26 8,160,256 part_tab#P#p7.MYI 2008-05-24 09:26 25,586,752 part_tab#P#p8.MYD 2008-05-24 09:26 8,160,256 part_tab#P#p8.MYI 2008-05-24 09:26 25,585,824 part_tab#P#p9.MYD 2008-05-24 09:26 8,159,232 part_tab#P#p9.MYI 2008-05-24 09:25 8,608 part_tab.frm 2008-05-24 09:25 68 part_tab.par Test SQL performance again mysql> select count(*) from users_part where `birth` > '1986-01-01' and `birth` < '1986-12-31'; +----------+ | count(*) | +----------+ |976324| +----------+ 1 row in set (0.171 sec) mysql> select count(*) from users where `birth` > '1986-01-01' and `birth` < '1986-12-31'; +----------+ | count(*) | +----------+ |976324| +----------+ 1 row in set (0.583 sec) Here we can see that after adding indexes to key fields and restarting (net stop mysql, net start mysql), the performance of the partitioned table has improved slightly. The performance improvement of the non-partitioned full table is the most obvious, almost approaching the efficiency of the partitioned table. 3.7 Analysis of cross-region execution efficiencyFrom the above analysis, we can see that the efficiency of executing in a single zone is significantly lower than that of not partitioning. This is because the scanning range is reduced after partitioning. So what will happen if we add a range of birth years to the above conditions to make it cross-regional? Let’s test it. mysql> select count(*) from users_part where `birth` > '1986-01-01' and `birth` < '1987-12-31'; +----------+ | count(*) | +----------+ |976324| +----------+ 1 row in set (1.914 sec) mysql> select count(*) from users where `birth` > '1986-01-01' and `birth` < '1987-12-31'; +----------+ | count(*) | +----------+ |976324| +----------+ 1 row in set (3.871 sec) It can be seen that the performance will be worse after crossing regions. It should be understood that the more cross-zones there are, the worse the performance will be. Therefore, when designing partitions, you should be aware of this and avoid frequent cross-zone situations, and carefully judge the partition boundary conditions. 3.8 Summary1. Partitioned and unpartitioned file space is roughly the same (data and index files) 2. When the key fields in the query statement are not indexed, the partitioning time is much better than the non-partitioning time 3. If the fields in the query statement are indexed, the difference between partitioning and non-partitioning is reduced, but it is still better than the non-partitioning case, and this advantage will become more obvious as the amount of data increases. 4. For large amounts of data, it is recommended to use the partitioning function, regardless of whether it is indexed or not. 5. According to the MySQL manual, increasing myisam_max_sort_file_size will increase partition performance (the maximum size of temporary files allowed when MySQL rebuilds the index) 6. When designing partitions, carefully judge the partition boundary conditions to avoid excessively frequent cross-zone operations, otherwise the performance will not be ideal. 4 Partitioning Strategy Detailed Explanation4.1 HASHHASH partitioning is mainly used to ensure that data is evenly distributed among a predetermined number of partitions, while in RANGE and LIST partitioning, you must explicitly specify in which partition a given column value or set of column values should be stored. In HASH partitioning, MySQL automatically completes these tasks. All you have to do is specify a column value or expression based on the column value to be hashed, and the number of partitions into which the partitioned table is to be split. Here is an example: /*Hash*/ drop table if EXISTS `t_userinfo`; CREATE TABLE `t_userinfo` ( `id` int(10) unsigned NOT NULL, `personcode` varchar(20) DEFAULT NULL, `personname` varchar(100) DEFAULT NULL, `depcode` varchar(100) DEFAULT NULL, `depname` varchar(500) DEFAULT NULL, `gwcode` int(11) DEFAULT NULL, `gwname` varchar(200) DEFAULT NULL, `gravalue` varchar(20) DEFAULT NULL, `createtime` DateTime NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY HASH(YEAR(createtime)) PARTITIONS 4( PARTITION P0 DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data0/idx', PARTITION P1 DATA DIRECTORY = '/data1/data' INDEX DIRECTORY = '/data1/idx', PARTITION P2 DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data2/idx', PARTITION P3 DATA DIRECTORY = '/data3/data' INDEX DIRECTORY = '/data3/idx' ); In the above example, the HASH function is used to perform a HASH operation on the createtime date and partition the data based on this date. There are 10 partitions in total. Add a "PARTITION BY HASH (expr)" clause to the table creation statement, where "expr" is an expression that returns an integer. It can be the name of a column with a field type of MySQL integer, or an expression that returns a non-negative number. In addition, you may need to add a "PARTITIONS num" clause at the end, where num is a non-negative integer representing the number of partitions into which the table will be split. Each partition has its own independent data and index file storage directory, and the physical disk partitions where these directories are located may also be completely independent, which can improve disk IO throughput. 4.2 RANGEAssign multiple rows to the same partition based on column values that fall within a given contiguous interval. These intervals must be contiguous and non-overlapping, defined using the VALUES LESS THAN operator. Here is an example: /*Range*/ drop table if EXISTS `t_userinfo`; CREATE TABLE `t_userinfo` ( `id` int(10) unsigned NOT NULL, `personcode` varchar(20) DEFAULT NULL, `personname` varchar(100) DEFAULT NULL, `depcode` varchar(100) DEFAULT NULL, `depname` varchar(500) DEFAULT NULL, `gwcode` int(11) DEFAULT NULL, `gwname` varchar(200) DEFAULT NULL, `gravalue` varchar(20) DEFAULT NULL, `createtime` DateTime NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY RANGE(gwcode) ( PARTITION P0 VALUES LESS THAN(101) DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data0/idx', PARTITION P1 VALUES LESS THAN(201) DIRECTORY = '/data1/data' INDEX DIRECTORY = '/data1/idx', PARTITION P2 VALUES LESS THAN(301) DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data2/idx', PARTITION P3 VALUES LESS THAN MAXVALUE DIRECTORY = '/data3/data' INDEX DIRECTORY = '/data3/idx' ); In the above example, the RANGE function is used to partition the job number into 4 partitions. Position numbers 1 to 100 are in partition P0, positions 101 to 200 are in partition P1, and so on. If the category number is greater than 300, you can use MAXVALUE to store all data greater than 300 in partition P3. Each partition has its own independent data and index file storage directory, and the physical disk partitions where these directories are located may also be completely independent, which can improve disk IO throughput. 4.3 、LIST(Predefined list)Similar to partitioning by RANGE, the difference is that LIST partitioning selects partitions based on column values matching a value in a discrete set of values. LIST partitioning is implemented by using "PARTITION BY LIST(expr)", where "expr" is a column value or an expression based on a column value that returns an integer value. Each partition is then defined using "VALUES IN (value_list)", where "value_list" is a comma-separated list of integers. Here is an example: /*List*/ drop table if EXISTS `t_userinfo`; CREATE TABLE `t_userinfo` ( `id` int(10) unsigned NOT NULL, `personcode` varchar(20) DEFAULT NULL, `personname` varchar(100) DEFAULT NULL, `depcode` varchar(100) DEFAULT NULL, `depname` varchar(500) DEFAULT NULL, `gwcode` int(11) DEFAULT NULL, `gwname` varchar(200) DEFAULT NULL, `gravalue` varchar(20) DEFAULT NULL, `createtime` DateTime NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY LIST(`gwcode`) ( PARTITION P0 VALUES IN (46,77,89) DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data0/idx', PARTITION P1 VALUES IN (106,125,177) DATA DIRECTORY = '/data1/data' INDEX DIRECTORY = '/data1/idx', PARTITION P2 VALUES IN (205,219,289) DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data2/idx', PARTITION P3 VALUES IN (302,317,458,509,610) DATA DIRECTORY = '/data3/data' INDEX DIRECTORY = '/data3/idx' ); In the above example, the list matching LIST function is used to partition the employee position numbers into 4 partitions. Numbers 46, 77, and 89 are in partition P0, 106, 125, and 177 are in partition P1, and so on. Unlike RANGE, the data in the LIST partition must match the job number in the list to be partitioned, so this method is only suitable for comparisons of small and certain interval values. Each partition has its own independent data and index file storage directory, and the physical disk partitions where these directories are located may also be completely independent, which can improve disk IO throughput. 4.4 KEYSimilar to HASH partitioning, the difference is that KEY partitioning only supports calculation of one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values. Here is an example: /*key*/ drop table if EXISTS `t_userinfo`; CREATE TABLE `t_userinfo` ( `id` int(10) unsigned NOT NULL, `personcode` varchar(20) DEFAULT NULL, `personname` varchar(100) DEFAULT NULL, `depcode` varchar(100) DEFAULT NULL, `depname` varchar(500) DEFAULT NULL, `gwcode` int(11) DEFAULT NULL, `gwname` varchar(200) DEFAULT NULL, `gravalue` varchar(20) DEFAULT NULL, `createtime` DateTime NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY KEY(gwcode) PARTITIONS 4( PARTITION P0 DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data0/idx', PARTITION P1 DATA DIRECTORY = '/data1/data' INDEX DIRECTORY = '/data1/idx', PARTITION P2 DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data2/idx', PARTITION P3 DATA DIRECTORY = '/data3/data' INDEX DIRECTORY = '/data3/idx' ); Note: This partitioning algorithm is currently rarely used. There is uncertainty in using the hash function provided by the server, which will make later data statistics and sorting more complicated. Therefore, we prefer to use the Hash expression defined by us. Everyone just needs to know its existence and how to use it. 4.5 Nested Partitions (Subpartitions)Nested partitions (subpartitions) are subdivisions of each partition in a RANGE/LIST partition table. The further segmentation can be of type HASH/KEY etc. drop table if EXISTS `t_userinfo`; CREATE TABLE `t_userinfo` ( `id` int(10) unsigned NOT NULL, `personcode` varchar(20) DEFAULT NULL, `personname` varchar(100) DEFAULT NULL, `depcode` varchar(100) DEFAULT NULL, `depname` varchar(500) DEFAULT NULL, `gwcode` int(11) DEFAULT NULL, `gwname` varchar(200) DEFAULT NULL, `gravalue` varchar(20) DEFAULT NULL, `createtime` DateTime NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY RANGE (id) SUBPARTITION BY HASH (id% 4) SUBPARTITIONS 2( PARTITION p0 VALUES LESS THAN (5000000) DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data0/idx', PARTITION p1 VALUES LESS THAN MAXVALUE DATA DIRECTORY = '/data1/data' INDEX DIRECTORY = '/data1/idx' ); As above, the RANGE partition is further divided into sub-partitions, and the sub-partitions use the HASH type. 5 Partition Management5.1 Deleting a Partition/*Delete partition P1*/ 2 ALERT TABLE users_part DROP PARTITION P1; 5.2 Rebuilding Partitions5.2.1 RANGE Partition Reconstruction/*Here we merge the original P0 and P1 partitions into the new P0 partition and reset the condition to less than 5000000. */ ALTER TABLE users_part REORGANIZE PARTITION P0,P1 INTO (PARTITION P0 VALUES LESS THAN (5000000)); Used for merging situations where space is wasted. 5.2.2 LIST Partition Reconstruction/*Merge the original P0 and P1 partitions into the new P0 partition, which is similar to the previous one. */ ALTER TABLE users_part REORGANIZE PARTITION p0,p1 INTO (PARTITION p0 VALUES IN(1,4,5,8,9,12,13,101,555)); 5.2.3 HASH/KEY Partition Reconstruction/*The number of partitions rebuilt using REORGANIZE becomes 2. Here the number can only be reduced but not increased. If you want to add more, you can use the ADD PARTITION method. */ ALTER TABLE users_part REORGANIZE PARTITION COALESCE PARTITION 2; 5.3 Adding a new partition5.3.1 Adding RANGE Partition/*Add a new RANGE partition*/ ALTER TABLE category ADD PARTITION (PARTITION p4 VALUES IN (16,17,18,19) DATA DIRECTORY = '/data8/data' INDEX DIRECTORY = '/data8/idx'); 5.3.2 Add HASH/KEY partition/* Expand the total number of partitions to n. Please use a numerical value instead*/ ALTER TABLE users_part ADD PARTITION PARTITIONS n; 5.3.3 Adding partitions to an existing tablealter tableuser_part partition by RANGE (month(birth)) ( PARTITION p0 VALUES LESS THAN (1), PARTITION p1 VALUES LESS THAN (2) , PARTITION p2 VALUES LESS THAN (3) , PARTITION p3 VALUES LESS THAN (4) , PARTITION p4 VALUES LESS THAN (5) , PARTITION p5 VALUES LESS THAN (6) , PARTITION p6 VALUES LESS THAN (7) , PARTITION p7 VALUES LESS THAN (8) , PARTITION p8 VALUES LESS THAN (9) , PARTITION p9 VALUES LESS THAN (10) , PARTITION p10 VALUES LESS THAN (11), PARTITION p11 VALUES LESS THAN (12), PARTITION P12 VALUES LESS THAN (13) ); 6 Remove the partition primary key restrictionThe default partitioning restriction is that the partitioning field must be part of the primary key (PRIMARY KEY). This restriction needs to be removed. If a primary key is set in the table, the following prompt will be reported: A PRIMARY KEY must include all columns in the table's partitioning function (prefixed columns are not considered). One solution is to use the primary key as the partitioning condition: ALTER TABLE users_part PARTITION BY HASH(id) PARTITIONS 4; Another way is to add the partition condition field to the primary key to turn it into a joint primary key. As shown below, id and gwcode form a joint primary key: alter table users_part drop PRIMARY KEY; alter table users_part add PRIMARY KEY(id, gwcode); This is the end of this article about how to use the Partition function in MySQL to implement horizontal partitioning. For more information about MySQL horizontal partitioning, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Detailed steps to install Hadoop cluster under Linux
>>: Complete the search function in the html page
Use the FRAME property to control the style type ...
Record the installation of two MySQL5.6.35 databa...
This article records the installation graphic tut...
Connections can be used to query, update, and est...
What is HTTP? When we want to browse a website, w...
1: Differences in speed and loading methods The di...
1. The concept of css: (Cascading Style Sheet) Ad...
Table of contents Preface Creating a component li...
Table of contents 1. IDEA downloads the docker pl...
The upload form with image preview function, the ...
Table of contents Preface: System Requirements: I...
Table of contents Preface: Implementation steps: ...
Recently, I have a need to list all host names in...
Effect screenshots: Implementation code: Copy code...
What is an inode? To understand inode, we must st...