Summary of Mysql high performance optimization skills

Summary of Mysql high performance optimization skills

Database Command Specification

  1. All database object names must use lowercase letters and be separated by underscores.
  2. All database object names are prohibited from using MySQL reserved keywords (if the table name contains a keyword query, it needs to be enclosed in single quotes)
  3. The names of database objects should be self-explanatory and should not exceed 32 characters.
  4. Temporary database tables must be prefixed with tmp_ and suffixed with the date, and backup tables must be prefixed with bak_ and suffixed with the date (timestamp).
  5. All columns storing the same data must have the same name and type (generally used as associated columns. If the associated column types are inconsistent during query, data type implicit conversion will be automatically performed, which will cause the index on the column to fail and reduce query efficiency)

Basic database design specifications

1. All tables must use the Innodb storage engine. If there are no special requirements (i.e., functions that Innodb cannot meet, such as column storage, storage space data, etc.), all tables must use the Innodb storage engine (Myisam is used by default before MySQL 5.5, and Innodb is used by default after 5.6).
Innodb supports transactions, row-level locks, better recoverability, and better performance under high concurrency.

2. The character set of the database and table is unified to use UTF8
Better compatibility. A unified character set can avoid garbled characters caused by character set conversion. Different character sets need to be converted before comparison, which will cause index failure. If emoji expressions need to be stored in the database, the character set needs to use the utf8mb4 character set.

3. All tables and fields need to be commented Use the comment clause to add comments to tables and columns, and maintain the data dictionary from the beginning

4. Try to control the size of the data in a single table. It is recommended to control it within 5 million.
5 million is not the limit of the MySQL database. If the number is too large, it will cause big problems in modifying the table structure, backup, and recovery.

The amount of data can be controlled by archiving historical data (applied to log data), sub-library and sub-table (applied to business data), etc.

5. Use MySQL partition tables with caution. Partition tables are physically represented by multiple files, but logically represented as one table.
Choose partition keys carefully, as cross-partition queries may be less efficient;
It is recommended to use physical partitioning to manage big data.

6. Try to separate hot and cold data and reduce the width of the table
MySQL limits each table to a maximum of 4096 columns, and the size of each row of data cannot exceed 65535 bytes.
Reduce disk IO to ensure the memory cache hit rate of hot data (the wider the table, the more memory is occupied when loading the table into the memory buffer pool, and more IO is consumed);

Use cache more effectively to avoid reading useless cold data;

Put columns that are often used together into one table (to avoid more join operations).

7. It is forbidden to create reserved fields in the table. The naming of reserved fields is difficult to make the names clear and meaningful. The reserved field cannot confirm the data type stored, so the appropriate type cannot be selected. Modifications to reserved field types will lock the table.

8. It is forbidden to store large binary data such as pictures and files in the database. Usually, the files are very large, which will cause the data volume to grow rapidly in a short period of time. When the database reads the database, it usually performs a large number of random IO operations. When the file is large, the IO operation is very time-consuming.

Usually stored on a file server, the database only stores file address information

9. Do not perform database stress testing online
10. It is forbidden to directly connect to the production environment database from the development environment and test environment

Database field design specifications

1. Give priority to the smallest data type that meets storage needs

reason:

The larger the column field, the more space is required to create the index. The fewer index nodes can be stored in a page, the more IO times are required for traversal, and the worse the index performance is.

method:

Convert a string into a digital type for storage, such as converting an IP address into an integer
MySQL provides two methods to handle IP addresses

inet_aton converts ip to unsigned integer (4-8 bits)
inet_ntoa converts the integer IP address into an address. Before inserting data, use inet_aton to convert the IP address into an integer to save space. When displaying data, use inet_ntoa to convert the integer IP address into an address for display.

For non-negative data (such as auto-increment ID, integer IP), unsigned integers should be used for storage:

Unsigned data can take up twice as much storage space as signed data.

SIGNED INT -2147483648~2147483647
UNSIGNED INT 0~4294967295
The N in VARCHAR(N) represents the number of characters, not the number of bytes. Using UTF8 to store 255 Chinese characters, Varchar(255) = 765 bytes. * Too large a length will consume more memory.

2. Avoid using TEXT and BLOB data types. The most common TEXT type can store 64k of data.

It is recommended to separate BLOB or TEXT columns into separate extension tables.
Mysql memory temporary tables do not support large data types such as TEXT and BLOB. If the query contains such data, memory temporary tables cannot be used for sorting and other operations, and disk temporary tables must be used.

Moreover, for this kind of data, Mysql still needs to perform a second query, which will make the SQL performance very poor, but it does not mean that such data types cannot be used.

If you must use it, it is recommended to separate the BLOB or TEXT column into a separate extension table. When querying, do not use select * but only retrieve the necessary columns. Do not query the TEXT column if the data in the column is not needed.

TEXT or BLOB types can only use prefix indexes because MySQL has a limit on the length of index fields, so TEXT types can only use prefix indexes, and TEXT columns cannot have default values.

3. Avoid using ENUM types

To modify an ENUM value, you need to use the ALTER statement

The ORDER BY operation of ENUM type is inefficient and requires additional operations

It is forbidden to use numeric values ​​as ENUM enumeration values

4. Define all columns as NOT NULL whenever possible

reason:

Indexing NULL columns requires extra space to store, so more space is used for comparisons and calculations. NULL values ​​must be handled specially.

5. Use TIMESTAMP (4 bytes) or DATETIME type (8 bytes) to store time

TIMESTAMP stores the time range 1970-01-01 00:00:01 ~ 2038-01-19-03:14:07
TIMESTAMP occupies 4 bytes, the same as INT, but is more readable than INT. Values ​​outside the TIMESTAMP value range are stored using the DATETIME type.

People often use strings to store date data (incorrect practice)

Disadvantage 1: Cannot use date functions for calculations and comparisons Disadvantage 2: Using strings to store dates takes up more space

6. Financial-related monetary data must use the decimal type

Inaccurate floating point: float, double
Precision floating point: decimal
The Decimal type is a precise floating point number that does not lose precision during calculations. The space occupied is determined by the defined width. Every 4 bytes can store 9 digits, and the decimal point occupies one byte. It can be used to store integer data larger than bigint.

Index design specifications

1. Limit the number of indexes on each table. It is recommended that a single table should have no more than 5 indexes, but the more the better! Indexes can improve efficiency but can also reduce efficiency.

Indexes can increase query efficiency, but they can also reduce the efficiency of insertion and update, and even query efficiency in some cases.

Because when the MySQL optimizer chooses how to optimize a query, it will evaluate each available index based on unified information to generate the best execution plan. If many indexes can be used for queries at the same time, it will increase the time for the MySQL optimizer to generate an execution plan and also reduce query performance.

2. Do not create a separate index for each column in the table
Before version 5.6, one SQL statement could only use one index in a table. After version 5.6, although there is an optimization method of merging indexes, it is still far from being as good as the query method using a joint index.

3. Every Innodb table must have a primary key
Innodb is an index-organized table: the logical order in which data is stored is the same as the order in which it is indexed. Each table can have multiple indexes, but the table can only have one storage order.

Innodb organizes tables in the order of primary key indexes.

Do not use frequently updated columns as primary keys, and do not use multi-column primary keys (equivalent to joint indexes)
Do not use UUID, MD5, HASH, or string columns as primary keys (the sequential growth of data cannot be guaranteed).
It is recommended to use an auto-increment ID value as the primary key

Common index column recommendations

Columns that appear in the WHERE clause of SELECT, UPDATE, and DELETE statements. Fields that are included in ORDER BY, GROUP BY, and DISTINCT. Do not create an index for all columns that match the fields in 1 and 2. It is usually better to create a joint index for the fields in 1 and 2. Related columns of multi-table joins

How to choose the order of index columns

The purpose of creating an index is to use the index to search for data, reduce random IO, and increase query performance. The less data the index can filter out, the less data will be read from the disk.

The most discriminative column is placed on the left side of the joint index (discriminativeness = number of different values ​​in the column / total number of rows in the column)
Try to place columns with smaller field lengths on the left side of the joint index (because the smaller the field length, the more data can be stored on one page, and the better the IO performance)
Put the most frequently used columns on the left side of the joint index (this way you can create fewer indexes)

Avoid creating redundant and duplicate indexes (increases the time it takes for the query optimizer to generate an execution plan)

Duplicate index examples: primary key(id), index(id), unique index(id)
Redundant index examples: index(a,b,c), index(a,b), index(a)

For frequent queries, consider using covering indexes first.

Covering index: an index that includes all query fields (fields included in where, select, ordery by, group by). Benefits of covering index:

Avoid secondary query of Innodb table index

Innodb stores data in the order of clustered indexes. For Innodb, the secondary index stores the primary key information of the row in the leaf node. If you use the secondary index to query data, after finding the corresponding key value, you must perform a secondary query through the primary key to obtain the data we actually need.

In a covering index, all data can be obtained from the key value of the secondary index, avoiding secondary queries on the primary key, reducing IO operations, and improving query efficiency.

Random IO can be converted into sequential IO to speed up query efficiency. Since covering indexes are stored in the order of key values, for IO-intensive range searches, the data IO is much less than randomly reading each row from the disk. Therefore, using covering indexes during access, the random read IO of the disk can be converted into sequential IO for index searches.

Index SET Specification

Avoid using foreign key constraints as much as possible. It is not recommended to use foreign key constraints, but you must create indexes on the association keys between tables. Foreign keys can be used to ensure the referential integrity of data, but it is recommended to implement foreign keys on the business side. This will affect the write operations of the parent and child tables, thereby reducing performance.

Database SQL development specifications

1. It is recommended to use precompiled statements for database operations

Precompiled statements can reuse these plans, reducing the time required for SQL compilation and solving the SQL injection problem caused by dynamic SQL.

Passing only parameters is more efficient than passing SQL statements.

The same statement can be parsed once and used multiple times to improve processing efficiency.

2. Avoid implicit conversion of data types

Implicit conversion will cause the index to fail, such as:

select name,phone from customer where id = '111';

3. Make full use of existing indexes on the table

Avoid using double % signs in query conditions. For example: a like '%123%' (if there is no leading %, but only trailing %, the index on the column can be used)

A SQL statement can only use one column in a composite index for range query. For example, if there is a joint index on columns a, b, and c, and there is a range query on column a in the query condition, the indexes on columns b and c will not be used.

When defining a joint index, if column a is to be used for range search, column a should be placed on the right side of the joint index, and left join or not exists should be used to optimize the not in operation, because not in also usually causes index invalidation.

4. When designing a database, you should consider future expansion

5. The program connects to different databases using different accounts, and performs cross-database queries

Leave room for database migration and sub-database and sub-table to reduce business coupling and avoid security risks caused by excessive permissions

6. Do not use SELECT *. You must use SELECT <field list> to query.

reason:

Consumes more CPU and IO and network bandwidth resources. Cannot use covering indexes to reduce the impact of table structure changes.

7. Do not use INSERT statements without a field list

like:

insert into values ​​('a','b','c');

Should use:

insert into t(c1,c2,c3) values ​​('a','b','c');

8. Avoid using subqueries. You can optimize subqueries into join operations.

Usually, the subquery can be converted into a correlated query for optimization only when the subquery is in the in clause and the subquery is a simple SQL (excluding union, group by, order by, and limit clauses).

Reasons for poor subquery performance:

The result set of a subquery cannot use an index. Usually, the result set of a subquery will be stored in a temporary table. No matter it is a temporary table in memory or a temporary table on disk, there will be no index, so the query performance will be affected to a certain extent. Especially for subqueries that return larger result sets, the impact on query performance is greater.

Since subqueries will generate a large number of temporary tables without indexes, they will consume too many CPU and IO resources and generate a large number of slow queries.

9. Avoid using JOIN to join too many tables

For MySQL, there is an associated cache, and the size of the cache can be set by the join_buffer_size parameter.

In MySQL, if you join multiple tables in the same SQL statement, one more association cache will be allocated. The more tables are associated in one SQL statement, the more memory will be occupied.

If a large number of multi-table join operations are used in the program and the join_buffer_size is not set reasonably, it will easily cause server memory overflow, which will affect the stability of server database performance.

At the same time, for the association operation, temporary table operations will be generated, affecting the query efficiency. Mysql allows a maximum of 61 tables to be associated, and it is recommended not to exceed 5.

10. Reduce the number of interactions with the database

Databases are more suitable for processing batch operations. Combining multiple identical operations together can improve processing efficiency.

11. When performing or judgment on the same column, use in instead of or

The number of in values ​​should not exceed 500. The in operation can make more effective use of indexes, or rarely use indexes in most cases.

12. Do not use order by rand() for random sorting

order by rand() will load all the data that meets the conditions in the table into the memory, and then sort all the data in the memory according to the randomly generated values, and may generate a random value for each row. If the data set that meets the conditions is very large, it will consume a lot of CPU, IO and memory resources.

It is recommended to get a random value in the program and then get the data from the database.

13. Function conversion and calculation of columns are prohibited in the WHERE clause

When a column is converted or calculated using a function, the index cannot be used.

Not recommended:

where date(create_time)='20190101'

recommend:

where create_time >= '20190101' and create_time < '20190102'

14. Use UNION ALL instead of UNION when there are obviously no duplicate values

UNION will put all the data of the two result sets into a temporary table and then perform deduplication.
UNION ALL will no longer perform deduplication on the result set

15. Split complex large SQL into multiple small SQL

The big SQL is logically complex and requires a lot of CPU to calculate.
In MySQL, one SQL statement can only use one CPU for calculation.
SQL splitting can improve processing efficiency through parallel execution

Database Operation Behavior Specification

Batch write (UPDATE, DELETE, INSERT) operations of more than 1 million rows must be performed in batches multiple times

1. Large batch operations may cause serious master-slave delays

In a master-slave environment, large batch operations may cause serious master-slave delays. Large batch write operations generally take a long time to execute.

Only when the execution on the master database is completed will it be executed on other slave databases, so it will cause a long delay between the master database and the slave database.

2. When binlog logs are in row format, a large number of logs will be generated

Large batch write operations will generate a large number of logs, especially for row-format binary data. Since the modification of each row of data is recorded in the row format, the more data we modify at a time, the more logs will be generated, and the longer it will take to transmit and restore the logs. This is also a reason for the master-slave delay.

3. Avoid large transaction operations

Large batches of data must be modified in one transaction, which will cause large batches of data in the table to be locked, resulting in a large amount of blocking, which will have a very large impact on MySQL performance.

In particular, long-term blocking will occupy all available connections of the database, which will prevent other applications in the production environment from connecting to the database. Therefore, it is important to note that large batch write operations must be performed in batches.

For large tables, use pt-online-schema-change to modify the table structure

Avoiding master-slave delays caused by large table modifications

Avoid locking the table when modifying table fields. Be cautious when modifying the data structure of a large table, as this will cause serious table locking operations, which is intolerable, especially in a production environment.

pt-online-schema-change will first create a new table with the same structure as the original table, modify the table structure on the new table, then copy the data in the original table to the new table, and add some triggers to the original table.

Copy the newly added data in the original table to the new table. After all the row data is copied, name the new table as the original table and delete the original table. Break down the original DDL operation into multiple small batches.

It is forbidden to grant super permissions to the account used by the program

When the maximum number of connections is reached, another user with super privileges is run.
The super privilege can only be used by the DBA account to handle problems

For the program to connect to the database account, follow the principle of least privilege

The database account used by the program can only be used under one DB. The account used by the program across databases is not allowed to have drop permissions in principle.

You may also be interested in:
  • How to analyze MySQL query performance
  • Full steps to create a high-performance index in MySQL
  • Introduction to the use of MySQL performance stress benchmark tool sysbench
  • MySQL performance optimization index pushdown
  • Reasons for the sudden drop in MySQL performance
  • Solutions to Mysql index performance optimization problems
  • MySQL performance optimization tips
  • MySQL 20 high-performance architecture design principles (worth collecting)
  • Detailed explanation of GaussDB for MySQL performance optimization

<<:  Detailed explanation of the difference between arrow functions and normal functions in JavaScript

>>:  VMware vCenter 6.7 installation process (graphic tutorial)

Recommend

A brief comparison of Props in React

Table of contents Props comparison of class compo...

In-depth understanding of React Native custom routing management

Table of contents 1. Custom routing 2. Tab naviga...

Complete steps to build a Laravel development environment using Docker

Preface In this article, we will use Docker to bu...

Analysis of the principle and usage of MySQL custom functions

This article uses examples to illustrate the prin...

Vue implements the full selection function

This article example shares the specific code of ...

W3C Tutorial (7): W3C XSL Activities

A style sheet describes how a document should be ...

Detailed explanation of the use of Linux seq command

01. Command Overview The seq command is used to g...

vue perfectly realizes el-table column width adaptation

Table of contents background Technical Solution S...

Vue implements card flip carousel display

Vue card flip carousel display, while switching d...

Introduction and use of Javascript generator

What is a generator? A generator is some code tha...

How to redirect PC address to mobile address in Vue

Requirements: The PC side and the mobile side are...

CSS injection knowledge summary

Modern browsers no longer allow JavaScript to be ...

How to change the CentOS server time to Beijing time

1. I purchased a VPS and CentOS system, and found...

Three steps to solve the IE address bar ICON display problem

<br />This web page production skills tutori...