Optimize the storage efficiency of BLOB and TEXT columns in InnoDB tables

Optimize the storage efficiency of BLOB and TEXT columns in InnoDB tables

First, let's introduce a few key points about the MySQL InnoDB engine storage format:

1. InnoDB can choose to use a shared table space or an independent table space. It is recommended to use an independent table space for easy management and maintenance. Enable the innodb_file_per_table option. After 5.5, you can dynamically modify it online to take effect, and execute ALTER TABLE xx ENGINE = InnoDB to convert the existing table into an independent tablespace. For versions earlier than 5.5, you need to restart after modifying this option for it to take effect.

2. The default size of InnoDB data page is 16KB. After version 5.6, the new option innodb_page_size can be modified. In versions before 5.6, you can only modify the source code and recompile. However, it is not recommended to modify this configuration unless you are very clear about its advantages and disadvantages.

3. When new data is written to the InnoDB data page, 1/16 of the space will be reserved. The reserved space can be used for subsequent new record writing, reducing the overhead of frequently adding new data pages.

4. Each data page needs to store at least 2 rows of records. Therefore, the maximum length of a row record is theoretically 8KB, but in fact it should be smaller because there are some InnoDB internal data structures to store.

5. Limited by the InnoDB storage method, if the data is written sequentially, the ideal fill rate of the data page is 15/16, but it is generally impossible to guarantee complete sequential writing. Therefore, the fill rate of the data page is generally 1/2 to 15/16. Therefore, it is best for each InnoDB table to have an auto-increment column as the primary key so that new records can be written as sequentially as possible.

6. When the data page fill rate is less than 1/2, InnoDB will shrink it to release free space.

7. The InnoDB engine of MySQL 5.6 currently supports four formats: COMPACT , REDUNDANT , DYNAMIC , and COMPRESSED . The default format is COMPACT. COMPRESSED is rarely used and is not recommended (see the next item). If you need to use the compression feature, you can directly consider the TokuDB engine.

8. The COMPACT row format can save about 20% of storage space compared to REDUNDANT, and COMPRESSED can save about 50% of storage space compared to COMPACT, but it will cause TPS to drop by 90%. Therefore, the use of COMPRESSED row format is strongly discouraged.

9. When the row format is DYNAMIC or COMPRESSED, long columns such as TEXT/BLOB (long columns, which may also be other long columns, not necessarily only TEXT/BLOB types, depending on the specific situation) will be completely stored in an independent data page, and the clustered index page will only use a 20-byte pointer to point to the new page. This is the so-called off-page, similar to ORACLE's row migration, which wastes a lot of disk space and has poor I/O performance. Therefore, it is strongly recommended not to use BLOB, TEXT, or VARCHAR column types with a length exceeding 255.

10. When the InnoDB file format ( innodb_file_format ) is set to Antelope and the row format is COMPACT or REDUNDANT, BLOB, TEXT, or long VARCHAR columns only store the first 768 bytes in the clustered index page (the maximum 768 bytes is used to facilitate the creation of prefix indexes). The rest of the content is stored in additional pages, even if it is only one byte more. Therefore, all column lengths should be as short as possible.

11. The page of BLOB, TEXT, or long VARCHAR column stored in off-page is exclusive and cannot be shared. Therefore, it is strongly discouraged to use multiple long columns in a table.

In summary, if you really need to store BLOB, TEXT, or long VARCHAR columns in an InnoDB table in actual business, here are some suggestions:

1. Try to serialize and compress all data and store them in the same column to avoid multiple off-pages.

2. Convert columns with actual maximum storage length less than 255 to VARCHAR or CHAR type (if it is variable-length data, there is no difference between the two; if it is fixed-length data, use CHAR type).

3. If it is not possible to integrate all columns into one column, you can make a second choice and split them into multiple sub-tables according to the maximum length of each column. Try to make the total row length of each sub-table less than 8KB to reduce the frequency of off-page.

4. The above suggestions are based on the premise that the data page is the default 16KB. If you change it to 8KB or other sizes, please test it yourself based on the above theory to find the most appropriate value.

5. When the length of a character column is less than 255, whether CHAR or VARCHAR is used for storage, or the length of the VARCHAR column is defined as 255, the actual table space will not increase.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links

You may also be interested in:
  • MySQL database migration quickly exports and imports large amounts of data
  • Shell script to operate MySQL database to delete duplicate data
  • How to add, delete and modify columns in MySQL database
  • MySQL implements a solution similar to Oracle sequence
  • mysql code to implement sequence function
  • Can't connect to local MySQL through socket ''/tmp/mysql.sock'' solution
  • Detailed explanation of rpm installation in mysql
  • A complete list of commonly used MySQL functions (classified and summarized)
  • Use MySQL master-slave configuration to achieve read-write separation and reduce database pressure
  • mysql+spring+mybatis to realize code configuration of database read-write separation

<<:  How to solve the problem of zabbix monitoring causing Chinese garbled characters in the graphical interface due to PHP problems

>>:  A brief talk about Rx responsive programming

Recommend

Several ways to implement CSS height changing with width ratio

[Solution 1: padding implementation] principle: I...

Vue uses mockjs to generate simulated data case details

Table of contents Install mockjs in your project ...

Various problems encountered in sending emails on Alibaba Cloud Centos6.X

Preface: I have newly installed an Alibaba cloud ...

How to quickly install tensorflow environment in Docker

Quickly install the tensorflow environment in Doc...

XHTML introductory tutorial: Web page Head and DTD

Although head and DTD will not be displayed on th...

Detailed tutorial on installing and configuring MySql5.7 on Ubuntu 20.04

Table of contents 1. Ubuntu source change 2. Inst...

MySQL Basic Tutorial: Detailed Explanation of DML Statements

Table of contents DML statements 1. Insert record...

How to view and modify the time zone in MySQL

Today I found that a program inserted an incorrec...

Shell script nginx automation script

This script can satisfy the operations of startin...

Detailed explanation of screen command usage in Linux

GUN Screen: Official website: http://www.gnu.org/...

How to open MySQL binlog log

binlog is a binary log file, which records all my...

js object to achieve data paging effect

This article example shares the specific code of ...