MySQL compression usage scenarios and solutions

MySQL compression usage scenarios and solutions

Introduction

Describes the use cases and solutions for MySQL compression, including compression transport protocols, compressed column solutions, and compressed table solutions.

When it comes to MySQL compression, we may think of the following compression-related scenarios:

1. The amount of data transmitted between the client and the server is too large and needs to be compressed to save bandwidth

2. The amount of data in a certain column of MySQL is large, and only the data of a certain column is compressed

3. There is too much data in one or several MySQL tables. You need to compress the table data to reduce disk space usage.

These problems have good solutions on the MySQL side. For the first problem, you can use the MySQL compression protocol to solve it; for the second problem, you can use the MySQL compression and decompression functions to perfectly solve it; and for the most complex third problem, it can be solved at the engine level. Currently, engines such as myisam, innodb, tokudb, and MyRocks all support table compression. This article will discuss in detail such issues related to MySQL compression mechanism. The following are the main contents:

1. Introduction to MySQL Compression Protocol

1. Applicable scenarios

The MySQL compression protocol is suitable for scenarios where the amount of data transmitted between the MySQL server and client is large, or the available bandwidth is low. Typical scenarios are as follows:

a. Insufficient bandwidth when querying large amounts of data (for example, when exporting data);

b. When copying, the amount of binlog is too large. Enable the slave_compressed_protocol parameter to perform log compression replication.

2. Introduction to Compression Protocol

The compression protocol is part of the MySQL communication protocol. To enable the compression protocol for data transmission, both the MySQL server and client need to support the zlib algorithm. Enabling the compression protocol will cause a slight increase in CPU load. Enable compression protocol Use the -C parameter or --compress=true parameter to enable client compression function. If the -C or compress=true option is enabled, the server capability flag of 0x0020 (CLIENT_COMPRESS) will be sent when connecting to the server segment, and after negotiation with the server (after 3 handshakes), the compression protocol will be supported. Due to compression, the format of the data packet will change. The specific changes are as follows:

Uncompressed packet format:

The compressed data packet format is:

You may have noticed that the compressed datagram format is divided into compressed and uncompressed formats. This is an optimization made by MySQL to reduce CPU overhead. If the content is less than 50 bytes, it will not be compressed, and if it is greater than 50 bytes, compression will be enabled. The specific rules are as follows:

When the value of the third field is equal to 0x00, it means that the current packet is not compressed, so the content of n * byte is 1 * byte, n * byte, that is, the request type and request content.

When the value of the third field is greater than 0x00, it means that the current packet has been compressed using zlib. Therefore, n * byte needs to be decompressed when used. The decompressed content is 1 * byte, n * byte, that is, the request type and request content.

3. Solution practice

Add the -C or --compress=true parameter when the client connects. If you want to add compression protocol support for synchronization, you need to configure slave_compressed_protocol=1. The following is an example of connecting to a MySQL server using the compressed protocol:

MySQL -h hostip -uroot -p password --compress

MySQLdump -h hostip -uroot -p password -default-character-set=utf8 --compress --single-transaction dbname tablename > tablename.sql

If you need to enable compressed transmission in master-slave replication, just turn on the slave_compressed_protocol=1 parameter on the slave.

4. Compression effect

You can observe the effect of compressed transmission by using the --compress option in MySQLdump, or by using the slave_compressed_protocol parameter in master-slave replication. The effect is easy to see, so I won't show screenshots here.

MySQL column compression solution

Currently, there is no direct solution to compress MySQL columns. Tencent's TMySQL can directly compress columns. Here we mainly introduce a roundabout way to save the country, which is to use the compression and decompression functions provided by MySQL at the business level to compress and decompress columns. That is, if you want to compress a column, you need to call the COMPRESS function when writing to compress the contents of that column and then store it in the corresponding column. When reading, use the UNCOMPRESSED function to decompress the compressed content.

1. Applicable scenarios

The amount of data in one or several columns in MySQL is particularly large, usually data types such as varchar, text, and char.

2. Introduction to compression function

The MySQL compression function COMPRESS compresses a string and returns a binary string. The use of this function requires that the MySQL server supports compression, otherwise it will return NULL. The compressed field is best saved using the varbinary or blob field type. Use the UNCOMPRESSED function to decompress compressed data. Note that this approach requires a small amount of modification on the business side. The compressed content is stored as follows:

a. Empty strings are stored as empty strings

b. Non-empty strings are stored as follows: the first 4 bytes store the uncompressed string, followed by the compressed string.

3. Solution practice

Several related functions involved in the field compression scheme are as follows:

Compression function

COMPRESS()

Decompression function

UNCOMPRESS()

String length function

LENGTH()

Uncompressed string length function

UNCOMPRESSED_LENGTH()

Practical steps:

a. Create a test table

CREATE TABLE IF NOT EXISTS `test`.`test_compress` (

`id` int unsigned NOT NULL AUTO_INCREMENT COMMENT 'ID',

`content` blob NOT NULL COMMENT 'content column',

PRIMARY KEY (`id`)

 ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Compress test table';

b. Insert compressed data into the netlist

insert into `test`.`test_compress`(content) values(COMPRESS(REPEAT('a',1000)));

c. Read compressed data

select UNCOMPRESS(content) from `test`.`test_compress`;

d. Query the corresponding length and content

Copy the code as follows:
SELECT UNCOMPRESSED_LENGTH(content) AS length, LENGTH(content) AS compress_length, UNCOMPRESS(content), content FROM `test`.`test_compress`

4. Compression effect

From the above screenshot, we can see that the compression effect is relatively good. For text, char, varchr, blob, etc., the more repeated data there is, the better the compression effect.

3. InnoDB table compression solution

1. Applicable scenarios

Compressed tables are generally used in scenarios where the amount of data is too large, the disk space is insufficient, the load is mainly reflected in IO, and the server's CPU has more spare capacity.

2. Introduction to table compression a. Why is compression needed?

Currently, many tables support compression, such as Myisam, InnoDB, TokuDB, and MyRocks. Since the use of InnoDB does not require any changes, is completely transparent to the online environment, and has a very mature compression solution, only InnoDB is described in detail here. The compression schemes for TokuDB and MyRocks will be described in MySQL Compression Scheme (Part 2).

When SSDs were not widely used, databases were almost all IO-loaded. When the CPU had a lot of spare capacity, the bottleneck of disk IO was already prominent. The storage of large amounts of data, especially log data and monitoring data, will cause the disk space to grow rapidly. Insufficient hard disk space will also be evident in many businesses. A better way was born, which is to use compression to reduce disk space usage and optimize IO and bandwidth by sacrificing a small amount of CPU resources. Especially for the business of reading more and less.

After SSD came out, the IO load of the database was reduced, but the problem of disk space was still not well solved. Therefore, compressed tables are still widely used. This is why so many engines support compression. InnoDB has supported compression since MySQL 5.5, but the compression ratio is relatively low, usually around 50%. The compression ratio of tokuDB can reach about 80%, and that of MyRocks can reach about 70%.

Note: The compression ratio has a lot to do with the composition of the data you store. Not all data can achieve the compression ratio mentioned above. If most of the data is strings and there is a lot of repeated data, the compression ratio will be very good.

b. Introduction to InnoDB Compression

The prerequisite for using innodb compression is that the innodb_file_per_table parameter must be enabled and the innodb_file_format parameter must be set to Barracuda.

You can use ROW_FORMAT=COMPRESSED to create or alter a table to enable compression for InnoDB. If KEY_BLOCK_SIZE is not specified, the default value is half of innodb_page_size. You can also enable compression for InnoDB by specifying KEY_BLOCK_SIZE=n. n can be 1, 2, 4, 8, or 16 in K. The smaller the value of n, the higher the compression ratio and the more CPU resources consumed. Note that compression is not supported for 32K or 64K pages. When compression is enabled, index data will also be compressed.

You can also set the compression level by adjusting innodb_compression_level, from 1 to 9, with 6 being the default. The lower the level, the higher the compression ratio, but also means more CPU resources are required.

c. Compression algorithm

InnoDB compression relies on the famous zlib library and adopts the L777 compression algorithm, which is mature and efficient in reducing data size and CPU utilization. At the same time, this algorithm is lossless, so the original uncompressed data can always be reconstructed from the compressed file. The implementation principle of LZ777 is to find the serial number of repeated data and then compress it, so the data pattern determines the compression efficiency. Generally speaking, user data can be compressed by more than 50%.

d. How to handle compressed tables in buffer_pool

In the buffer_pool buffer pool, compressed data is saved in pages of KEY_BLOCK_SIZE size. If you want to extract compressed data or update the column corresponding to compressed data, an uncompressed page will be created to decompress the data. After the data update is completed, the data of the compressed page will be rewritten to the compressed page. When there is insufficient memory, MySQL will kick out the corresponding uncompressed pages. So if you enable compression, your buffer_pool may contain compressed and uncompressed pages, or only compressed pages. However, you may still need to enlarge your buffer_pool buffer pool so that it can store both compressed and uncompressed pages.

MySQL uses a least-recently-used (LRU) algorithm to determine which pages to keep in memory and which to evict, so hot data is kept in memory more often. When compressed tables are accessed, MySQL uses an adaptive LRU algorithm to maintain a balance between compressed and uncompressed pages in memory. When the system IO load is high, this algorithm tends to remove uncompressed pages to make more space for more compressed pages. When the system CPU load is high, MySQL tends to remove both compressed and uncompressed pages. At this time, more memory is used to retain hot data, thereby reducing decompression operations.

e. How to evaluate whether KEY_BLOCK_SIZE is appropriate

To better understand the impact of compressed tables on performance, there are corresponding tables in the Information Schema library that can be used to evaluate indicators such as memory usage and compression rate. INNODB_CMP collects information about the overall status of a certain type of KEY_BLOCK_SIZE compressed tables and summarizes statistics for all KEY_BLOCK_SIZE compressed tables. The INNODB_CMP_PER_INDEX table collects compression information for each table and index. This information is helpful for evaluating the compression efficiency of a table at a certain time or diagnosing performance problems. The collection of INNODB_CMP_PER_INDEX table will affect system performance. It must be recorded with the innodb_cmp_per_index_enabled option. It is best not to enable it in a production environment.

We can observe the compression failures of the INNODB_CMP table. If there are many failures, we need to increase KEY_BLOCK_SIZE. It is generally recommended that KEY_BLOCK_SIZE be set to 8.

3. Solution practice

a. Set the innodb_file_per_table and innodb_file_format parameters

SET GLOBAL innodb_file_per_table=1;SET GLOBAL innodb_file_format=Barracuda;

b. Create the corresponding compression table

Copy the code as follows:
CREATE TABLE compress_test (c1 INT PRIMARY KEY, content varchar(255)) ROW_FORMAT=COMPRESSEDKEY_BLOCK_SIZE=8;

If the table already exists, modify it through alter. The SQL is as follows:

ALTER TABLE compress_test ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

4. Compression effect

The compression effect is illustrated by modifying an online monitoring table to the compressed file size. The comparison before and after compression is as follows:

You may also be interested in:
  • How to implement batch compression of MYISAM tables in MySQL
  • MySQL database backup command sharing (MySQL compressed database backup)
  • MySQL encryption/compression functions

<<:  Linux IO multiplexing epoll network programming

>>:  Sample code for generating QR code using js

Recommend

Summary of common knowledge points required for MySQL

Table of contents Primary key constraint Unique p...

js implements axios limit request queue

Table of contents The background is: What will ha...

What you need to know about creating MySQL indexes

Table of contents Preface: 1. Create index method...

Graphical tutorial on installing JDK1.8 under CentOS7.4

Linux installation JDK1.8 steps 1. Check whether ...

How to use CSS custom variables in Vue

Table of contents The CSS custom variable functio...

Analysis of the principles and usage of Docker container data volumes

What is a container data volume If the data is in...

Detailed explanation of angular parent-child component communication

Table of contents APIs used Simple Example person...

Detailed explanation of docker version es, milvus, minio startup commands

1. es startup command: docker run -itd -e TAKE_FI...

In-depth explanation of the style feature in Vue3 single-file components

Table of contents style scoped style module State...

HTML Basics: HTML Content Details

Let's start with the body: When viewing a web ...

MySQL dual-master (master-master) architecture configuration solution

In enterprises, database high availability has al...