Detailed explanation of data types and schema optimization in MySQL

Detailed explanation of data types and schema optimization in MySQL

I'm currently learning about MySQL optimization. This article introduces the optimization of data types and schema.

1. Choose the optimized data type

MySQL supports many data types, and how to choose the correct data type is crucial to performance. The following principles can help determine the data type:

  • Smaller is usually better

Whenever possible, use the smallest data type that can correctly store the data. This will use less disk, memory, and cache, and will take less time to process.

  • Keep it simple

When two data types are capable of storing the same field, choosing the simpler one is often the best option. For example, integers and strings. Since the operation cost of integers is lower than that of characters, when choosing between the two, choosing integers usually gives better performance.

  • Try to avoid NULL

When a column can be NULL, MySQL needs to do more work in terms of indexing and value comparison. Although the impact on performance is not significant, you should try to avoid designing it to be NULL.

In addition to the above principles, the steps to follow when choosing a data type are: first determine the appropriate large type, such as data, string, time, etc.; then select the specific type. We will now discuss some specific types under the larger category, starting with numbers, which come in two types: integers and real numbers.

1.1 Integer Types

The integer types and the space they occupy are as follows:

Integer Types Space size (bit)
TINYINT 8
SMALLINT 16
MEDIUMINT twenty four
INT 32
BIGINT 64

The storage range of an integer type is related to the size of the space: -2^(N-1) to 2^(N-1)-1, where N is the number of bits in the space.

The integer type has the optional attribute UNSIGNED. When declared, it means that negative numbers are not allowed. The storage range becomes: 0 to 2^(N)-1, which is doubled.

In MySQL, you can also specify the width for integer types, such as INT(1), but this is not very meaningful and does not limit the legal range of values. Values ​​from -2^31 to 2^31-1 can still be stored. What is affected is the number of characters displayed by the interactive tools that interact with MySQL.

1.2 Real number types

The comparison of real number types is as follows:

Real number types Space size (Byte) Value range Calculation accuracy
FLOAT 4 Negative numbers: -3.4E+38 to -1.17E-38; non-negative numbers: 0, 1.17E-38 to 3.4E+38 Approximate calculation
DOUBLE 8 Negative numbers: -1.79E+308 to -2.22E-308; non-negative numbers: 0, 2.22E-308 to 1.79E+308 Approximate calculation
DECIMAL It’s about precision Same as DOUBLE Precise calculation

As can be seen from the above, FLOAT and DOUBLE both have fixed space sizes, but at the same time, because they use standard floating-point operations, they can only be calculated approximately. DECIMAL can achieve precise calculations, but it takes up more space and consumes more computing overhead.

The space occupied by DECIMAL is related to the specified precision, for example, DECIMAL(M,D):

  • M is the maximum length of the entire number, the value range is [1, 65], and the default value is 10;
  • D is the length after the decimal point, the value range is [0, 30], and D <= M. The default value is 0.

When storing DECIMAL data, MySQL stores it as a binary string, storing 9 digits for every 4 bytes. When the number is less than 9 digits, the space occupied by the number is as follows:

Number of digits Space occupied (Byte)
1, 2 1
3.4 2
5, 6 3
7, 8 4

The digits before and after the decimal point will be stored separately, and the decimal point also occupies 1 byte. Here are two calculation examples:

  • DECIMAL(18, 9): The integer part is 9 in length and occupies 4 bytes. The decimal part is 9 in length and occupies 4 bytes. Adding 1 byte for the decimal point, a total of 9 bytes are occupied.
  • DECIMAL(20, 9): The integer part is 14 bytes long and occupies 7 (4+3) bytes. The decimal part is 9 in length and occupies 4 bytes. Adding 1 byte for the decimal point, a total of 12 bytes are occupied.

It can be seen that DECIMAL still takes up a lot of space, so DECIMAL is only needed when precise calculations of decimals are required. In addition, we can also use BIGINT instead of DECIMAL. For example, if you need to ensure calculations with 5 decimal places, you can multiply the value by 10 to the fifth power and store it as BIGINT. This can avoid the problems of inaccurate floating-point storage calculations and the high cost of precise DECIMAL calculations.

1.3 String Type

The most commonly used string types are VARCHAR and CHAR. As a variable-length string, VARCHAR uses 1 or 2 extra bytes to record the length of the string. When the maximum length does not exceed 255, only 1 byte is needed to record the length. If it exceeds 255, 2 bytes are required. VARCHAR is suitable for:

  • The maximum length is much greater than the average length;
  • Columns are updated less frequently to avoid fragmentation;
  • With complex character sets, such as UTF-8, each character can be stored using a different byte.

CHAR is a fixed-length string. Sufficient space is allocated according to the defined string length. Applicable scenarios:

  • Short length;
  • Similar length, such as MD5;
  • Updated frequently.

In addition to VARCHAR and CHAR, BLOB and TEXT types can be used to store large strings. The difference between BLOB and TEXT is that BLOB is stored in binary format, while TEXT is stored in character format. This also means that BLOB type data has no concept of character set and cannot be sorted by character, while TEXT type data has the concept of character set and can be sorted by character. The usage scenarios of the two are also determined by the storage format. When storing binary data, such as pictures, BLOB should be used, and when storing text, such as articles, the TEXT type should be used.

1.4 Date and Time Types

The minimum time granularity that can be stored in MySQL is seconds. Common date types include DATETIME and TIMESTAMP.

type Storage Content Space size (Byte) Time zone concept
DATETIME Integer in the format YYYYMMDDHHMMSS 8 none
TIMESTAMP The number of seconds since midnight on January 1, 1970 4 have

The value displayed by TIMESTAMP will depend on the time zone, which means that the value queried in different time zones will be different. In addition to the differences listed above, TIMESTAMP has a special property. When inserting and updating, if the value of the first TIMESTAMP column is not specified, the value of this column will be set to the current time.

During the development process, we should try to use TIMESTAMP, mainly because it only takes half the space of DATETIME and is more space efficient.

What if we want to store the date and time accurately to the second? Since MySQL does not provide it, we can use BIGINT to store timestamps at the microsecond level, or use DOUBLE to store the fractional part after the second.

1.5 Choosing an Identifier

Integers are generally the best choice for identifiers, primarily because they are simple, fast to compute, and can use AUTO_INCREMENT.

2. Paradigms and anti-paradigms

Simply put, a paradigm is the level of design standards that a data table's structure complies with. In the first normal form, attributes are inseparable. All tables built in current RDBMS systems conform to the first normal form. The second paradigm eliminates partial dependence on non-primary attribute codes (which can be understood as primary keys). The third paradigm eliminates transitive dependencies on non-primary attribute pairs.

In a strictly normalized database, each factual data will appear only once, and there will be no data redundancy. This can bring the following benefits:

  • Update operations are faster;
  • Modify less data;
  • Tables are smaller, fit better in memory, and perform operations faster;
  • Less need for DISTINCT or GROUP BY.

However, since the data is scattered in various tables, the tables need to be associated when querying. The advantage of the anti-normalization is that there is no need to perform associations and the data is stored redundantly.

In actual applications, there will not be complete normalization or complete denormalization. It is often necessary to mix normalization and denormalization. Using a partially normalized schema is often the best choice. Regarding database design, I saw this passage on the Internet and you can feel it.

Database design should be divided into three realms:

The first level: just getting started with database design, the importance of paradigms is not yet fully understood. The anti-paradigm design that appears at this time usually causes problems.

The second level: As you encounter and solve problems, you will gradually understand the true benefits of the paradigm, and thus be able to quickly design a low-redundancy, high-efficiency database.

The third level: After N years of training, you will definitely discover the limitations of the paradigm. At this time, we can break the paradigm and design a more reasonable anti-paradigm part.

Paradigms are like the moves in martial arts. If a beginner does not follow the moves, he will only die miserably. After all, the moves are the essence summarized and summarized by masters. As your martial arts skills improve and you become proficient in the moves, you will inevitably discover the limitations of the moves and either forget them or create your own moves.

As long as you work hard and persevere for a few more years, you will always reach the second level, and you will always feel that the paradigm is a classic. At this time, those who can not rely too much on the paradigm and quickly break through the limitations of the paradigm are naturally masters.

3. Cache tables and summary tables

In addition to the anti-normalization mentioned above, storing redundant data in the table, we can also create a completely independent summary table or cache table to meet the retrieval needs.

A cache table refers to a table that stores data that can be obtained from other tables in the schema, that is, logically redundant data. A summary table refers to a table that stores non-redundant data calculated by aggregating data using statements such as GROUP BY.

Cache tables can be used to optimize search and retrieval query statements. A technique that can be used here is to use different storage engines for cache tables. For example, the main table uses InnoDB, while the cache table can use MyISAM to obtain a smaller index space. You can even put the cache table into a dedicated search system, such as Lucene.

Summary tables are designed to avoid the high cost of real-time statistical calculations. The cost comes from two aspects: one is the need to scan most of the data in the table, and the other is to create specific indexes that will affect UPDATE operations. For example, to query the number of WeChat friends in the past 24 hours, you can scan the entire table every hour, write a record to the summary table after statistics, and when querying, you only need to query the latest 24 records in the summary table, without having to scan the entire table for statistics every time you query.

When using cache tables and summary tables, you must decide whether to maintain the data in real time or rebuild it periodically, depending on your needs. Compared with real-time maintenance, regular reconstruction can save more resources and reduce table fragmentation. During reconstruction, we still need to ensure that the data is available during operation, which needs to be achieved through "shadow tables". Create a shadow table behind the real table. After filling the data, switch the shadow table and the original table through an atomic rename operation.

4. Speed ​​up ALTER TABLE operations

When MySQL performs an ALTER TABLE operation, it often creates a new table, retrieves data from the old table and inserts it into the new table, and then deletes the old table. If the table is large, this will take a long time and cause MySQL service interruption. To avoid service interruptions, two techniques are usually used:

Execute the ALTER TABLE operation on a machine that is not providing services, and then switch to the primary database that is providing services;
"Shadow copy" creates a new table that is unrelated to the original table. After the data migration is completed, switch it through the renaming operation.
However, not all ALTER TABLE operations will cause table reconstruction. For example, when modifying the default value of a field, using MODIFY COLUMN will cause table reconstruction, while using ALTER COLUMN will not, and the operation speed is very fast. This is because when ALTER COLUMN changes the default value, it directly modifies the .frm file of the existing table (which stores the default value of the field) without rebuilding the table.

refer to

High Performance MySQL

MySQL DECIMAL data type

The above is a detailed explanation of data types and schema optimization in MySQL. For more information about MySQL data types and schema optimization, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Implementation of mysql decimal data type conversion
  • Implementation of mysql data type conversion
  • Detailed explanation of the usage of MySQL data type DECIMAL
  • Detailed explanation of the decimal padding problem of decimal data type in MySQL
  • MySQL data types full analysis
  • Compatibility comparison between PostgreSQL and MySQL data types
  • Description of the correspondence between MyBatis JdbcType and Oracle and MySql data types
  • MySQL data type selection principles

<<:  CentOS IP connection network implementation process diagram

>>:  The whole process record of vue3 recursive component encapsulation

Recommend

Summary of the use of TypeScript in React projects

Preface This article will focus on the use of Typ...

Detailed process of installing and configuring MySQL and Navicat prenium

Prerequisite: Mac, zsh installed, mysql downloade...

CSS3 achieves conic-gradient effect

grammar: background-image: conic-gradient(from an...

How to use the debouce anti-shake function in Vue

Table of contents 1. Anti-shake function 2. Use d...

Detailed explanation of MySQL Explain

In daily work, we sometimes run slow queries to r...

A brief discussion on React native APP updates

Table of contents App Update Process Rough flow c...

Detailed explanation of MYSQL stored procedure comments

Table of contents 1. Instructions for use 2. Prep...

MySql sharing of null function usage

Functions about null in MySql IFNULL ISNULL NULLI...

Detailed explanation of CSS3 text shadow text-shadow property

Text shadow text-shadow property effects: 1. Lowe...

Example code for implementing 3D text hover effect using CSS3

This article introduces the sample code of CSS3 t...

How to add Lua module to Nginx

Install lua wget http://luajit.org/download/LuaJI...

How to generate Hive table creation statement comment script in MySQL metadata

Preface This article mainly introduces the releva...

Implementation of docker-compose deployment of zk+kafka+storm cluster

Cluster Deployment Overview 172.22.12.20 172.22.1...