MySQL database table partitioning considerations [recommended]

MySQL database table partitioning considerations [recommended]

Table partitioning is different from database partitioning. So what should we pay attention to when using table partitioning? Today we will look at the details of MySQL database table partitioning.

1. Partition column index constraints

If the table has a primary key or unique key, the partition column of the partition table must be included in the primary key or unique key list. This is to ensure the efficiency of the primary key. Otherwise, it will be troublesome if the items in the same primary key area are in partition A and partition B respectively.

2. Conditions for each partition type

range Each partition contains those rows whose values ​​of the partitioning expression lie within a given contiguous interval. These intervals must be continuous and cannot overlap.

List only supports integer fields or expressions that return integers. The value list in each partition list must be an integer.

The hash type only supports integer fields or expressions that return integers.

The key type only supports column names (one or more column names), and does not support expressions

3. Partition available functions

ABS()

CEILING() (see CEILING() and FLOOR(), immediately following this list)

DAY()

DAYOFMONTH()

DAYOFWEEK()

DAYOFYEAR()

DATEDIFF()

EXTRACT()

FLOOR() (see CEILING() and FLOOR(), immediately following this list)

HOUR()

MICROSECOND()

MINUTE()

MOD()

MONTH()

QUARTER()

SECOND()

TIME_TO_SEC()

TO_DAYS()

WEEKDAY()

YEAR()

YEARWEEK()

Notice:

Because the partition function does not include the FROM_UNIXTIME function, it is not possible to partition by converting timestamps to time. You can only partition by date or datetime.

For example, by year we can use:

PARTITION BY RANGE (YEAR(date))

By month:

PARTITION BY RANGE(date div 100)  

#div will convert the date into an integer, for example: 2014-12-01 -> 20141201, 100 will remove two digits from the end, and the final result is 201412

An example of partitioning by order:

CREATE TABLE `order` (
 `order_id` bigint(19) NOT NULL DEFAULT '0' COMMENT 'Order ID: Year, month, day, hour, minute, second, 12-digit, 7-digit random number',
 `date` date NOT NULL DEFAULT '0000-00-00' COMMENT 'Order date',
 `amount` int(11) DEFAULT NULL COMMENT 'Payment amount, in cents',
 `status` tinyint(1) DEFAULT '0' COMMENT '0: Waiting for payment 1: Payment successful 2: Payment failed 3: Verification failed',
 `addtime` int(10) DEFAULT NULL COMMENT 'Order adding time',
 PRIMARY KEY (`order_id`,`date`)
)ENGINE=MyISAM DEFAULT CHARSET=utf8;

Because we cannot use timestamps to partition by time, we add a date field. This field and order_id are used as the primary key. We know that the partition column must be included in the primary key. Next we use date to calculate the year-month combination to partition

ALTER TABLE order PARTITION BY RANGE( date DIV 100)
(
  PARTITION p_2014_06 VALUES LESS THAN (201407),
  PARTITION p_2014_07 VALUES LESS THAN (201408),
  PARTITION p_2014_08 VALUES LESS THAN (201409),
  PARTITION p_2014_09 VALUES LESS THAN (201410),
  PARTITION p_2014_10 VALUES LESS THAN (201411),
  PARTITION p_catch_all VALUES LESS THAN MAXVALUE
);

In the above, LESS THAN MAXVALUE sets the last partition p_catch_all, so you cannot add partitions using the add method. The following statements are not available:
ALTER TABLE order ADD PARTITION (PARTITION p_2014_11 VALUES LESS THAN (201412));

You can only split the last p_catch_all partition into two. This has the advantage that the data in the p_catch_all partition will not be lost. The merging and splitting of data are performed using REORGANIZE PARTITION.

alter table order reorganize partition p_catch_all into 
(
  partition p_2014_11 values ​​less than (201412),
  partition p_catch_all values ​​less than maxvalue
);

Merge partitions:

alter table order reorganize partition p_2014_10,p_2014_11,p_catch_all into 
(
  partition p_catch_test values ​​less than MAXVALUE
);

Why not assign it to p_catch_all? Because the partition will be reported to exist.

Why do we need to include the last partition p_catch_all when merging? Because except for the last partition, the reorganized partition ranges cannot change the total range.
Delete the partition but not the data:

alter table table name remove partitioning

Note: The above statement can be executed in 5.5, but there seems to be a problem in 5.6. You need to test it first

After partitioning, if the where condition is a range, the partitioning will not work, such as where date >= '2014-01-01' And date <= '2014-01-31'
You must use = or in conditions where date = '2014-01-01' or where date in ('2014-01-01', '2014-01-02', '2014-01-03'...)

Supplement: Four partition types of MySQL tables

1. What is table partitioning

Generally speaking, table partitioning is to divide a large table into several small tables according to conditions. MySQL 5.1 began to support data table partitioning. For example, if a user table has more than 6 million records, the table can be partitioned according to the storage date or the location. Of course, partitioning can also be based on other conditions.

2. Why partition the table?

To improve scalability, manageability, and increase database efficiency for large tables and tables with varying access patterns.

Some advantages of partitioning include:

Can store more data than a single disk or file system partition.

For data that has lost its meaning of being saved, it is usually easy to delete the data by deleting the partitions related to the data. Conversely, in some cases, the process of adding new data can be conveniently implemented by adding a new partition specifically for that new data. Other advantages commonly associated with partitioning include those listed below. These features in MySQL Partitioning are not yet implemented, but are high on our priority list; we hope to include them in the production version of 5.1.

Some queries can be greatly optimized, mainly by saving the data that satisfies a given WHERE clause in one or more partitions, so that the search does not need to search other remaining partitions. Because partitions can be modified after a partitioned table is created, you can reorganize your data to improve the efficiency of commonly used queries if you did not do so when you first configured the partitioning scheme.

Queries involving aggregate functions such as SUM() and COUNT() can be easily parallelized. A simple example of such a query is "SELECT salesperson_id, COUNT (orders) as order_total FROM sales GROUP BY salesperson_id;". By "parallel", it means that the query can be performed on each partition simultaneously, and the final result is simply obtained by summing the results of all partitions.

By spreading data queries across multiple disks, you can achieve greater query throughput.

3. Partition Type

RANGE partitioning: Assign multiple rows to partitions based on column values ​​that fall within a given continuous interval.

LIST partitioning: Similar to RANGE partitioning, the difference is that LIST partitioning is based on the column value matching a value in a discrete value set.

HASH partitioning: A partition is selected based on the return value of a user-defined expression that is calculated using the column values ​​of the rows to be inserted into the table. This function can contain any expression valid in MySQL that produces a non-negative integer value.

KEY partitioning: Similar to HASH partitioning, the difference is that KEY partitioning only supports the calculation of one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values.

RANGE partitioning

Assigns multiple rows to partitions based on column values ​​that fall within a given contiguous interval.

These intervals must be continuous and non-overlapping, and are defined using the VALUES LESS THAN operator. The following are examples.

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT NOT NULL,
  store_id INT NOT NULL
)
partition BY RANGE (store_id) (
  partition p0 VALUES LESS THAN (6),
  partition p1 VALUES LESS THAN (11),
  partition p2 VALUES LESS THAN (16),
  partition p3 VALUES LESS THAN (21)
);

According to this partitioning scheme, all rows corresponding to employees working in stores 1 to 5 are stored in partition P0, employees working in stores 6 to 10 are stored in P1, and so on. Note that each partition is defined in order, from lowest to highest. This is a requirement of the PARTITION BY RANGE syntax; in this respect, it is similar to a "switch ... case" statement in C or Java. For a new row containing the data (72, 'Michael', 'Widenius', '1998-06-25′, NULL, 13), it is easy to determine that it will be inserted into the p2 partition, but what will happen if a store numbered 21 is added? In this scenario, since there is no rule to include stores with store_id greater than 20, the server will not know where to save the row, which will result in an error. You can avoid this error by using a "catchall" VALUES LESS THAN clause in the CREATE TABLE statement, which catches all values ​​greater than the highest value you explicitly specify:

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT NOT NULL,
  store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
  PARTITION p0 VALUES LESS THAN (6),
  PARTITION p1 VALUES LESS THAN (11),
  PARTITION p2 VALUES LESS THAN (16),
  PARTITION p3 VALUES LESS THAN MAXVALUE
);

MAXVALUE represents the largest possible integer value. Now, all rows with store_id column values ​​greater than or equal to 16 (the highest value defined) will be stored in partition p3. At some point in the future, when the number of stores has grown to 25, 30, or more, you can use the ALTER TABLE statement to add new partitions for stores 21-25, 26-30, and so on. In almost the same structure, you can also partition the table based on the employee's job code, that is, based on consecutive intervals of job_code column values. For example - assuming that 2-digit job codes are used to represent general (store) workers, 3-digit codes represent office and support staff, and 4-digit codes represent management, you can create the partitioned table using the following statement:

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT NOT NULL,
  store_id INT NOT NULL
)
PARTITION BY RANGE (job_code) (
  PARTITION p0 VALUES LESS THAN (100),
  PARTITION p1 VALUES LESS THAN (1000),
  PARTITION p2 VALUES LESS THAN (10000)
);

In this example, all rows related to store workers will be stored in partition p0, all rows related to office and support staff will be stored in partition p1, and all rows related to management will be stored in partition p2. It is also possible to use an expression in the VALUES LESS THAN clause. The most notable restriction here is that MySQL must be able to evaluate the expression returned as part of the LESS THAN (<) comparison; therefore, the expression cannot evaluate to NULL. For this reason, the hired, separated, job_code, and store_id columns of the employee table have been defined as NOT NULL. In addition to being able to split the table data based on store number, you can also use an expression based on one of two DATE (date) to split the table data. For example, suppose you want to partition the table based on the year each employee left the company, that is, the value of YEAR(separated). An example of a CREATE TABLE statement that implements this partitioning scheme is as follows:

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
)
PARTITION BY RANGE (YEAR(separated)) (
  PARTITION p0 VALUES LESS THAN (1991),
  PARTITION p1 VALUES LESS THAN (1996),
  PARTITION p2 VALUES LESS THAN (2001),
  PARTITION p3 VALUES LESS THAN MAXVALUE
);

In this scheme, records of all employees hired before 1991 are stored in partition p0, records of all employees hired between 1991 and 1995 are stored in partition p1, records of all employees hired between 1996 and 2000 are stored in partition p2, and information on all workers hired after 2000 is stored in partition p3.
RANGE partitioning is particularly useful in the following situations: 1) When you need to delete "old" data on a partition, just delete the partition. If you use the partitioning scheme given in the most recent example above, you can simply use "ALTER TABLE employees DROP PARTITION p0;" to delete all rows corresponding to employees who stopped working before 1991. For tables with a large number of rows, this is much more efficient than running a DELETE query such as "DELETE FROM employees WHERE YEAR (separated) <= 1990;". 2) You want to use a column that contains date or time values, or values ​​that increase from some other series. 3) Frequently run queries that directly depend on the columns used to partition the table. For example, when executing a query such as "SELECT COUNT(*) FROM employees WHERE YEAR(separated) = 2000 GROUP BY store_id;", MySQL can quickly determine that only partition p2 needs to be scanned because the remaining partitions cannot contain any records that meet the WHERE clause.

Note: This optimization is not yet enabled in the MySQL 5.1 source code; however, work is in progress.

LIST partition

Similar to partitioning by RANGE, the difference is that LIST partitioning is based on the column value matching a value in a discrete set of values.

LIST partitioning is implemented by using "PARTITION BY LIST(expr)", where "expr" is a column value or an expression based on a column value that returns an integer value, and then defining each partition by "VALUES IN (value_list)", where "value_list" is a comma-separated list of integers. Note: As of MySQL 5.1, when using LIST partitioning, it is possible to match only lists of integers.

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
);

Assume that there are 20 audio-visual stores distributed in 4 regions with distribution rights, as shown in the following table:

====================
Regional Store ID Number North District 3, 5, 6, 9, 17
East District 1, 2, 10, 11, 19, 20
West District 4, 12, 13, 14, 18
Central District 7, 8, 15, 16
====================

To partition the table in such a way that rows belonging to stores in the same region are stored in the same partition, you can use the following "CREATE TABLE" statement:

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
)
PARTITION BY LIST(store_id)
  PARTITION pNorth VALUES IN (3,5,6,9,17),
  PARTITION pEast VALUES IN (1,2,10,11,19,20),
  PARTITION pWest VALUES IN (4,12,13,14,18),
  PARTITION pCentral VALUES IN (7,8,15,16)
);

This makes it easy to add or delete employee records for a specific region in the table. For example, suppose all the video stores on the West Side were sold to other companies. Then all the records (rows) related to the employees working in the West District video store can be deleted using the query "ALTER TABLE employees DROP PARTITION pWest;", which is much more efficient than the DELETE query "DELETE query DELETE FROM employees WHERE store_id IN (4,12,13,14,18);" which has the same effect. [Key Points]: If you try to insert a row whose column value (or the return value of the partitioning expression) is not in the partition value list, the "INSERT" query will fail and report an error. For example, assuming the above scheme is used for LIST partitioning, the following query will fail:

Sql code:

INSERT INTO employees VALUES(224, 'Linus', 'Torvalds', '2002-05-01', '2004-10-12', 42, 21);

This is because the "store_id" column value 21 cannot be found in the list of values ​​used to define partitions pNorth, pEast, pWest, or pCentral. It is important to note that LIST partitions do not have definitions such as "VALUES LESS THAN MAXVALUE" that include other values. Any value that will match must be found in the list of values.
In addition to being able to combine LIST partitions with RANGE partitions to generate a composite subpartition, it is also possible to combine LIST partitions with HASH and KEY partitions to generate a composite subpartition.

HASH partition

The partitions are selected based on the return value of a user-defined expression that is evaluated using column values ​​for the rows to be inserted into the table. This function can contain any expression valid in MySQL that produces a non-negative integer value.

To partition a table using HASH partitioning, add a "PARTITION BY HASH (expr)" clause to the CREATE TABLE statement, where "expr" is an expression that returns an integer. It can just be the name of a column whose field type is a MySQL integer. Additionally, you will most likely need to add a "PARTITIONS num" clause after it, where num is a non-negative integer representing the number of partitions the table will be split into.

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
)
PARTITION BY HASH(store_id)
PARTITIONS 4;

If you do not include a PARTITIONS clause, the number of partitions will default to 1. Exception: For NDB Cluster tables, the default number of partitions will be the same as the number of cluster data nodes, with this correction likely to take into account any MAX_ROWS setting in order to ensure that all rows can fit into the partitions.

LINER HASH

MySQL also supports linear hashing, which differs from regular hashing in that linear hashing uses a linear powers-of-two algorithm, whereas regular hashing uses the modulus of the hash function value. The only difference in syntax between linear hash partitioning and regular hash partitioning is the addition of the "LINEAR" keyword in the "PARTITION BY" clause.

Sql code:

CREATE TABLE employees (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
)
PARTITION BY LINEAR HASH(YEAR(hired))
PARTITIONS 4;

Suppose an expression expr, when using the linear hash function, the partition to which the record will be saved is partition N of num partitions, where N is obtained according to the following algorithm: 1. Find the next power of 2 greater than num. We call this value V, which can be obtained by the following formula: 2. V = POWER(2, CEILING(LOG(2, num))) (For example, suppose num is 13. Then LOG(2,13) ​​is 3.7004397181411. CEILING(3.7004397181411) is 4, then V = POWER(2,4), which is equal to 16). 3. Set N = F(column_list) & (V – 1). 4. When N >= num: Set V = CEIL(V / 2) Set N = N & (V – 1) For example, suppose table t1, using linear hash partitioning and with 4 partitions, is created by the following statement: CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE) PARTITION BY LINEAR HASH( YEAR(col3) ) PARTITIONS 6; Now suppose that you want to insert two rows into table t1, one record with the col3 column value of '2003-04-14' and the other record with the col3 column value of '1998-10-19'. The partition to which the first record will be saved is determined as follows: V = POWER(2, CEILING(LOG(2,7))) = 8 N = YEAR('2003-04-14′) & (8 – 1) = 2003 & 7 = 3 (3 >= 6 is false: the record will be saved in partition #3) The partition number to which the second record will be saved is calculated as follows: V = 8 N = YEAR('1998-10-19′) & (8-1) = 1998 & 7 = 6 (6 >= 4 is true: additional steps are required) N = 6 & CEILING(5 / 2) = 6 & 3 = 2 (2 >= 4 is false: the record will be saved in partition #2) The advantage of linear hash partitioning is that adding, deleting, merging, and splitting partitions becomes faster, which is beneficial for processing tables with extremely large amounts of data (1000s of gigabytes). Its disadvantage is that the distribution of data between partitions is unlikely to be balanced compared to the data distribution obtained using conventional HASH partitioning.

KEY partition

Similar to HASH partitioning, the difference is that KEY partitioning only supports calculation of one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values.

Sql code:

CREATE TABLE tk (
  col1 INT NOT NULL,
  col2 CHAR(5),
  col3 DATE
)
PARTITION BY LINEAR KEY (col1)
PARTITIONS 3;

Using the keyword LINEAR in KEY partitioning has the same effect as using it in HASH partitioning. The partition number is obtained by the powers-of-two algorithm instead of the modulus algorithm.

Summarize

The above is all the content of this article about the precautions for MySQL database table partitioning. Interested friends can refer to: How to delete MySQL table data, several important MySQL variables, MYSQL subquery and nested query optimization example analysis, etc. I hope it will be helpful to everyone. If you have any questions, please leave a message. Everyone is welcome to communicate and discuss.

You may also be interested in:
  • Detailed explanation of the use and underlying principles of MySQL table partitions
  • MySQL data table partitioning strategy and advantages and disadvantages analysis
  • MySQL advanced features - detailed explanation of the concept and mechanism of data table partitioning
  • Detailed analysis of table partitioning technology in MySQL
  • A Brief Analysis of MySQL Data Table Partition Technology
  • Detailed explanation of MySQL table partitioning
  • How to create mysql table partitions
  • Understanding MySQL table partitioning in one article

<<:  A brief discussion on Nginx10m+ high concurrency kernel optimization

>>:  Use JavaScript to create page effects

Recommend

Summary of MySQL database and table sharding

During project development, our database data is ...

Sample code for implementing DIV suspension with pure CSS (fixed position)

The DIV floating effect (fixed position) is imple...

Native js to achieve accordion effect

In actual web page development, accordions also a...

Implementation of MySQL custom list sorting by specified field

Problem Description As we all know, the SQL to so...

MySQL 8.0.21 free installation version configuration method graphic tutorial

Six steps to install MySQL (only the installation...

Example sharing of anchor tag usage in HTML

Anchor tag usage: Linking to a specific location i...

Simple usage example of vue recursive component

Preface I believe many students are already famil...

SQL statements in Mysql do not use indexes

MySQL query not using index aggregation As we all...

Detailed explanation of Json format

Table of contents A JSON is built on two structur...

How to choose transaction isolation level in MySQL project

introduction Let's start with our content. I ...

How to remove the blue box that appears when the image is used as a hyperlink

I recently used Dreamweaver to make a product pres...

How to implement responsive layout with CSS

Implementing responsive layout with CSS Responsiv...

The docker-maven-plugin plugin cannot pull the corresponding jar package

When using the docker-maven-plugin plug-in, Maven...

Pure CSS3 realizes the effect of div entering and exiting in order

This article mainly introduces the effect of div ...