Correct use of MySQL partition tables

Overview of MySQL Partitioned Tables

We often encounter a table that stores hundreds of millions or even billions of records. These tables store a large amount of historical records. Cleaning up these historical data is a headache because all the data is in a common table. Therefore, you can only enable one or more delete statements with where conditions to delete (usually the where condition is time). This puts a lot of pressure on the database. Even though we removed these, the underlying data files did not get any smaller. Facing this kind of problem, the most effective way is to use the partition table. The most common partitioning method is to partition by time.

One of the biggest advantages of partitioning is that it can clean up historical data very efficiently.

1. Confirm whether the MySQL server supports partition tables

Order:

show plugins;

2. Characteristics of MySQL partition tables

Logically it is one table, but physically it is stored in multiple files.

HASH partition (HASH)

Characteristics of HASH partition

Store data rows in different partitions of the table according to the value of MOD (partition key, number of partitions)
Data can be evenly distributed across partitions
The key value of the HASH partition must be an INT type value, or can be converted to INT type through a function

How to create a HASH partition table

Use the INT type field customer_id as the partition key

CREATE TABLE `customer_login_log` (
 `customer_id` int(10) unsigned NOT NULL COMMENT 'Login user ID',
 `login_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'User login time',
 `login_ip` int(10) unsigned NOT NULL COMMENT 'Login IP',
 `login_type` tinyint(4) NOT NULL COMMENT 'Login type: 0 unsuccessful 1 successful'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User login log table'

PARTITION BY HASH(customer_id) PARTITIONS 4;

Use the non-INT type field login_time as the partition key (need to be converted to INT type first)

CREATE TABLE `customer_login_log` (
 `customer_id` int(10) unsigned NOT NULL COMMENT 'Login user ID',
 `login_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'User login time',
 `login_ip` int(10) unsigned NOT NULL COMMENT 'Login IP',
 `login_type` tinyint(4) NOT NULL COMMENT 'Login type: 0 unsuccessful 1 successful'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User login log table'

PARTITION BY HASH(UNIX_TIMESTAMP(login_time)) PARTITIONS 4;

If the customer_login_log table is not partitioned, the file on the physical disk is

customer_login_log.frm # stores the original data information of the table customer_login_log.ibd # Innodb data file

If you build a HASH partition table as above, there are five files

customer_login_log.frm 
customer_login_log#P#p0.ibd
customer_login_log#P#p1.ibd
customer_login_log#P#p2.ibd
customer_login_log#P#p3.ibd

Demo

The usage is the same as without partition. It seems that there is only one database, but there are actually multiple partition files. For example, if we want to insert a piece of data, we don’t need to specify the partition. MySQL will automatically handle it for us.

Query

Range partitioning (RANGE)

RANGE partition characteristics

Store data rows into different partitions of the table based on the range of partition key values
The ranges of multiple partitions must be continuous but cannot overlap.
By default, the VALUES LESS THAN attribute is used, that is, each partition does not include the specified value.

How to create RANGE partition

If the p3 partition is not defined, an error will be reported when the customer_id inserted is greater than 29999. If it is defined, all data exceeding this limit will be stored in p3.

Applicable scenarios for RANGE partitioning

The partition key is of date or time type (this can make the data of each partition table more balanced. If the integer id is used as the partition key in the above example, and if the number of active users is concentrated between 10000 and 19999, the amount of data in p1 will be much larger than that in other partitions, which makes the partition meaningless. And if the partition is partitioned by time type, if you want to archive data in chronological order, you only need to archive one partition)
Include the partition key in all queries (avoid cross-partition queries)
Periodically clean up historical data by partition range

LIST partition

Features of LIST partition

Partition by a list of partition key values
As with range partitioning, the list values of each partition cannot be repeated
Each row of data must be able to find the corresponding partition list, otherwise the data insertion will fail.

How to create a LIST partition

If you insert a row with login_type 10, an error will be reported.

3. How to partition the login log table (customer_login_log)

Business Scenario

Every time a user logs in, the customer_login_log log is recorded
User login logs are kept for one year and can be deleted or archived after one year

Partition type and partition key of the login log table

Using RANGE partitioning
Use login_time as the partition key

Partitioned user login log table

The data is stored in partitions by year, so the YEAR function is used for conversion.

CREATE TABLE `customer_login_log` (
 `customer_id` int(10) unsigned NOT NULL COMMENT 'Login user ID',
 `login_time` DATETIME NOT NULL COMMENT 'User login time',
 `login_ip` int(10) unsigned NOT NULL COMMENT 'Login IP',
 `login_type` tinyint(4) NOT NULL COMMENT 'Login type: 0 unsuccessful 1 successful'
) ENGINE=InnoDB 
PARTITION BY RANGE (YEAR(login_time))(
PARTITION p0 VALUES LESS THAN (2017),
PARTITION p1 VALUES LESS THAN (2018),
PARTITION p2 VALUES LESS THAN (2019)
)

Insert and query data

Query the partition data in the specified table

SELECT table_name,partition_name,partition_description,table_rows FROM
information_schema.`PARTITIONS` WHERE table_name = 'customer_login_log';

Insert two more 18-year logs and store them in the p2 table

As mentioned before, when creating a partition table, it is best to create a MAXVALUE partition. The reason why it is not created here is for the convenience of data maintenance. If we create a MAXVALUE partition, it is easy to overlook a problem. When we insert some data in 2019, it will be automatically stored in that MAXVALUE partition, which will be inconvenient when doing data maintenance later. Therefore, the MAXVALUE partition is not created.

Instead, we add this partition at the end of each year through the planned task. For example, at the end of 2018, we need to create a log partition for 2019 in the log table, otherwise the logs for 2019 will fail to be inserted.

We can use the following statement

Adding a partition

ALTER TABLE customer_login_log ADD PARTITION (PARTITION p3 VALUES LESS THAN(2020))

Add partitions and insert data

Deleting a Partition

If we want to delete the data from 2016 to 2017, since we have already partitioned the data, we only need to delete the p0 partition with one statement.

ALTER TABLE customer_login_log DROP PARTITION p0;

It can be found that the p0 partition has been deleted and all the logs in 2016 have been cleared.

Archive partition historical data

We may have another need to archive data

MySQL version>=5.7, archiving partition historical data is very convenient, and a method for exchanging partitions is provided

Partition data archive migration conditions:

MySQL>=5.7
Same structure
The archived data table must be a non-partitioned table
Non-temporary table; cannot have foreign key constraints
The archive engine should be: archive

Create a table and exchange partitions

CREATE TABLE `arch_customer_login_log` (
 `customer_id` INT unsigned NOT NULL COMMENT 'Login user ID',
 `login_time` DATETIME NOT NULL COMMENT 'User login time',
 `login_ip` INT unsigned NOT NULL COMMENT 'Login IP',
 `login_type` TINYINT NOT NULL COMMENT 'Login type: 0 unsuccessful 1 successful'
) ENGINE=InnoDB ;

ALTER TABLE customer_login_log 
 exchange PARTITION p1 WITH TABLE arch_customer_login_log;

It can be found that the 2017 data in the original customer_login_log table (the data in the p1 partition) has been transferred to the arch_customer_login_log table, but the p1 partition has not been deleted, only the data has been transferred, so we also need to execute the DROP command to delete the partition to prevent data from being inserted into it.

Change the storage engine of archived data to the archive engine

Finally, we change the storage engine of the archived data to the archive engine. The command is

ALTER TABLE customer_login_log ENGINE=ARCHIVE;

The advantage of using the archive engine is that it takes up less space than Innodb, but the archive engine can only perform query operations, not write operations.

4. Main points of using partition table

Select partition keys based on business scenarios to avoid cross-partition queries
When querying a partitioned table, it is best to include the partition key in the WHERE clause.
For a table with a primary key or unique index, the primary key or unique index must be part of the partition key (this is why we removed the primary key login log id (login_id) when partitioning above, otherwise it would not be possible to partition by year as above, so the partition table is actually more suitable for the MyISAM engine)

Differences between MyISAM and Innodb indexes

1. About automatic growth

The auto-increment column of the MyISAM engine must be an index. If it is a composite index, the auto-increment column does not have to be the first column. It can be incremented after sorting the previous columns.

The automatic growth column of the InnoDB engine must be an index. If it is a composite index, it must also be the first column of the composite index.

2. About primary key

MyISAM allows tables without any indexes or primary keys to exist.

MyISAM indexes store row addresses.

If the innodb engine does not set a primary key or a non-empty unique index, it will automatically generate a 6-byte primary key (invisible to the user)

Innodb data is part of the primary index, and the additional index stores the value of the primary index.

3. About count() function

MyISAM stores the total number of rows in a table. If you select count(*) from table;, you will directly retrieve the value.

Innodb does not save the total number of rows in the table. If you use select count(*) from table;, the entire table will be traversed, which is quite consuming. However, after adding the wehre condition, MyISAM and Innodb handle it in the same way.

4. Full-text indexing

MyISAM supports full-text index of FULLTEXT type

InnoDB does not support full-text indexing of the FULLTEXT type, but InnoDB can use the sphinx plug-in to support full-text indexing, and the effect is better. (Sphinx is an open source software that provides API interfaces in multiple languages and can optimize various MySQL queries)

5.delete from table

When using this command, InnoDB will not recreate the table, but delete the data one by one. If you want to clear a table with a large amount of data on InnoDB, it is best not to use this command. (It is recommended to use truncate table, but the user needs to have the permission to drop this table)

6. Index storage location

MyISAM indexes are saved separately as table name + .MYI file.

Innodb indexes are stored together with data in the table space.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support for 123WORDPRESS.COM.

You may also be interested in: