MySQL Best Practices: Basic Types of Partition Tables

MySQL Best Practices: Basic Types of Partition Tables

Overview of MySQL Partitioned Tables

As MySQL becomes more and more popular, the data stored in MySQL is getting larger and larger. In our daily work, we often encounter a table that stores hundreds of millions or even billions of records. These tables store a large amount of historical records. Cleaning up these historical data is a headache because all the data is in a common table. Therefore, you can only enable one or more delete statements with where conditions to delete (usually the where condition is time). This puts a lot of pressure on the database. Even though we removed these, the underlying data files did not get any smaller. Facing this kind of problem, the most effective way is to use the partition table. The most common partitioning method is to partition by time. One of the biggest advantages of partitioning is that it can clean up historical data very efficiently.

Partition Type

Currently, MySQL supports four types of partitioning: range partitioning (RANGE), list partitioning (LIST), hash partitioning (HASH), and KEY partitioning. Let's take a look at each partition type:

RANGE partitioning

Assigns multiple rows to partitions based on column values ​​that fall within a given contiguous interval. The most common one is based on the time field. The partition-based column is preferably an integer. If it is a date type, a function can be used to convert it to an integer. In this example, the to_days function is used

CREATE TABLE my_range_datetime(
 id INT,
 hiredate DATETIME
) 
PARTITION BY RANGE (TO_DAYS(hiredate) ) (
 PARTITION p1 VALUES LESS THAN ( TO_DAYS('20171202') ),
 PARTITION p2 VALUES LESS THAN ( TO_DAYS('20171203') ),
 PARTITION p3 VALUES LESS THAN ( TO_DAYS('20171204') ),
 PARTITION p4 VALUES LESS THAN ( TO_DAYS('20171205') ),
 PARTITION p5 VALUES LESS THAN ( TO_DAYS('20171206') ),
 PARTITION p6 VALUES LESS THAN ( TO_DAYS('20171207') ),
 PARTITION p7 VALUES LESS THAN ( TO_DAYS('20171208') ),
 PARTITION p8 VALUES LESS THAN ( TO_DAYS('20171209') ),
 PARTITION p9 VALUES LESS THAN ( TO_DAYS('20171210') ),
 PARTITION p10 VALUES LESS THAN ( TO_DAYS('20171211') ),
 PARTITION p11 VALUES LESS THAN (MAXVALUE) 
);

p11 is a default partition, and all records greater than 20171211 will be in this partition. MAXVALUE is an infinite value. p11 is an optional partition. If this partition is not specified when defining the table, we will receive an error when inserting data greater than 20171211.

When we execute a query, we must include the partition field. This allows you to use the partition trimming feature

mysql> insert into my_range_datetime select * from test;                                  
Query OK, 1000000 rows affected (8.15 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

mysql> explain partitions select * from my_range_datetime where hiredate >= '20171207124503' and hiredate<='20171210111230'; 
+----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | my_range_datetime | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400061 | Using where |
+----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.03 sec)

Note the content of partitions in the execution plan. Only three partitions, p7, p8, p9, and p10, are queried. From this, it can be seen that partition pruning can indeed be achieved using the to_days function.

The above is based on datetime. If it is timestamp type, what if we encounter the above problem?

In fact, MySQL provides a RANGE partitioning scheme based on the UNIX_TIMESTAMP function. In addition, only the UNIX_TIMESTAMP function can be used. If other functions, such as to_days, are used, the following error will be reported: "ERROR 1486 (HY000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed".

And the official documentation also mentions "Any other expressions involving TIMESTAMP values ​​are not permitted. (See Bug #42849.)".

Next, we will test the RANGE partitioning scheme based on the UNIX_TIMESTAMP function to see if it can achieve partition pruning.

Partitioning scheme for TIMESTAMP

The table creation statement is as follows:

CREATE TABLE my_range_timestamp (
  id INT,
  hiredate TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP ( hiredate ) ) (
  PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-02 00:00:00') ),
  PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-03 00:00:00') ),
  PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-04 00:00:00') ),
  PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-05 00:00:00') ),
  PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-06 00:00:00') ),
  PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-07 00:00:00') ),
  PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-08 00:00:00') ),
  PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-09 00:00:00') ),
  PARTITION p9 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-10 00:00:00') ),
  PARTITION p10 VALUES LESS THAN (UNIX_TIMESTAMP('2017-12-11 00:00:00') )
);

Insert data and view the execution plan of the above query

mysql> insert into my_range_timestamp select * from test;
Query OK, 1000000 rows affected (13.25 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

mysql> explain partitions select * from my_range_timestamp where hiredate >= '20171207124503' and hiredate<='20171210111230';
+----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | my_range_timestamp | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400448 | Using where |
+----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

Partition pruning can also be achieved.

Before version 5.7, for columns of DATA and DATETIME types, if you want to implement partition pruning, you can only use the YEAR() and TO_DAYS() functions. In version 5.7, the TO_SECONDS() function is added.

LIST Partition

LIST partition

LIST partitioning is similar to RANGE partitioning, the difference is that LIST is a collection of enumeration value lists, and RANGE is a collection of continuous interval values. The two are very similar in syntax. It is also recommended that the LIST partition column is a non-null column, otherwise the insertion of a null value will fail if there is no null value in the enumeration list. This is different from other partitions. RANGE partitions will store it as the minimum partition value, and HASH\KEY partitions will convert it to 0 for storage. The main reason is that LIST partitions only support integers, and non-integer fields need to be converted to integers through functions.

create table t_list( 
  a int(11), 
  b int(11) 
  )(partition by list (b) 
  partition p0 values ​​in (1,3,5,7,9), 
  partition p1 values ​​in (2,4,6,8,0) 
  );

Hash Partitioning

We often encounter tables like membership tables in our actual work. There is no obvious feature field for partitioning. But the table data is very large. In order to partition this type of data, MySQL provides hash partitioning. Based on the given number of partitions, data is allocated to different partitions. HASH partitioning can only perform HASH on integers. Non-integer fields can only be converted into integers through expressions. The expression can be any valid function or expression in MySQL. For non-integer HASH, there will be an extra step of expression calculation when inserting data into the table. Therefore, it is not recommended to use complex expressions as this will affect performance.

The basic statements for the Hash partition table are as follows:

CREATE TABLE my_member (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  created DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
)
PARTITION BY HASH(id)
PARTITIONS 4;

Notice:

  1. HASH partitioning does not require the PARTITIONS clause to be specified. For example, if PARTITIONS 4 is specified in the above text, the default number of partitions is 1.
  2. Writing PARTITIONS without specifying the number of partitions is not allowed.
  3. As with RANGE partitioning and LIST partitioning, the expr in the PARTITION BY HASH (expr) clause must return an integer value.
  4. The underlying implementation of HASH partitioning is actually based on the MOD function. For example, for the following table

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE) PARTITION BY HASH( YEAR(col3) ) PARTITIONS 4; If you want to insert a record with col3 as "2017-09-15", the partition selection is determined by the following values:

MOD(YEAR('2017-09-01'),4) = MOD(2017,4) = 1

LINEAR HASH Partitioning

LINEAR HASH partitioning is a special type of HASH partitioning. Unlike HASH partitioning which is based on the MOD function, LINEAR HASH partitioning is based on another algorithm.

The format is as follows:

CREATE TABLE my_members (
  id INT NOT NULL,
  fname VARCHAR(30),
  lname VARCHAR(30),
  hired DATE NOT NULL DEFAULT '1970-01-01',
  separated DATE NOT NULL DEFAULT '9999-12-31',
  job_code INT,
  store_id INT
)
PARTITION BY LINEAR HASH( id )
PARTITIONS 4;

Note: Its advantage is that in scenarios with large amounts of data, such as TB level, adding, deleting, merging and splitting partitions will be faster. Its disadvantage is that compared with HASH partitions, it is more likely to have uneven data distribution.

KEY partition

KEY partitioning is actually similar to HASH partitioning, with the following differences:

  1. KEY partitioning allows multiple columns, while HASH partitioning allows only one column.
  2. If there is a primary key or unique key, the partition column in the key can be left unspecified. The default is the primary key or unique key. If not, the column must be explicitly specified.
  3. The KEY partition object must be a column, not an expression based on a column.
  4. The algorithms for KEY partitioning and HASH partitioning are different. In PARTITION BY HASH (expr), the MOD value is the value returned by expr, while in PARTITION BY KEY (column_list), it is based on the MD5 value of the column.

The format is as follows:

CREATE TABLE k1 (
  id INT NOT NULL PRIMARY KEY,  
  name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;

In the absence of a primary key or unique key, the format is as follows:

CREATE TABLE tm1 (
  s1 CHAR(32)
)
PARTITION BY KEY(s1)
PARTITIONS 10;

Summarize:

If there is a primary key or unique key in MySQL partitioning, the partitioning column must be included in it.

For native RANGE partitions, LIST partitions, and HASH partitions, the partition object can only return an integer value.

The partition field cannot be NULL, otherwise how to determine the partition range, so try to use NOT NULL

This is the end of this article about MySQL best practices and basic types of partitioned tables. For more information about basic types of MySQL partitioned tables, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL partition table is classified by month
  • MySQL partitions existing tables in the data table
  • A performance bug about MySQL partition tables
  • Detailed explanation of the difference between Mysql temporary table and partition table
  • Detailed explanation of MySQL partition table
  • Why must the partition key in a MySQL partition table be part of the primary key?

<<:  Vue implements Tab tab switching

>>:  How to open port 8080 on Alibaba Cloud ECS server

Recommend

Javascript common higher-order functions details

Table of contents 1. Common higher-order function...

Implementation of webpack code fragmentation

Table of contents background CommonsChunkPlugin s...

Installation of Docker CE on Ubuntu

This article is used to record the installation o...

A brief discussion on the differences between FTP, FTPS and SFTP

Table of contents Introduction to FTP, FTPS and S...

How to delete a MySQL table

It is very easy to delete a table in MySQL, but y...

JavaScript to dynamically load and delete tables

This article shares the specific code of JavaScri...

Vue implements zip file download

This article example shares the specific code of ...

Build Maven projects faster in Docker

Table of contents I. Overview 2. Conventional mul...

Detailed explanation of the problems and solutions caused by floating elements

1. Problem Multiple floating elements cannot expa...

MySQL 8.0.20 Installation Tutorial with Pictures and Text (Windows 64-bit)

1: Download from mysql official website https://d...

How to use cc.follow for camera tracking in CocosCreator

Cocos Creator version: 2.3.4 Demo download: https...