An article to master MySQL index query optimization skills

An article to master MySQL index query optimization skills

Preface

This article summarizes some common MySQL usage tips for reference by teams without a DBA. Unless otherwise specified, the storage engine is InnoDB.

Features of MySQL

Understanding the characteristics of MySQL will help you use MySQL better. The biggest difference between MySQL and other common databases is the concept of storage engine, which is responsible for storing and reading data. Different storage engines have different characteristics. Users can choose a suitable storage engine based on the characteristics of their business, or even develop a new engine. The logical architecture of MySQL is as follows:

The default storage engine of MySQL is InnoDB. The main features of this storage engine are:

  • Support transaction processing
  • Support row-level locks
  • Data is stored in a tablespace, which consists of a series of data files.
  • Use MVVC (Multi-version Concurrency Control) mechanism to achieve high concurrency
  • The table is created based on the clustered index of the primary key
  • Support hot backup

Overview of other common storage engine features:

  • MyISAM: The default engine for older versions of MySQL. It does not support transactions and row-level locks. Developers can manually control table locks. It supports full-text indexes. It cannot be safely restored after a crash. It supports compressed tables. The data in compressed tables cannot be modified, but it takes up less space and can improve query performance.
  • Archive: Only supports Insert and Select. Batch insert is very fast. Data is queried by scanning the entire table.
  • SCV: Treat an SCV file as a table
  • Memory: Data is stored in memory

There are many more, I won’t list them all.

Data type optimization

Principles for selecting data types:

  • Choose a data type that takes up less space
  • Choose a simple type
  • Avoid unnecessary nullable columns

Types with a smaller footprint save more hardware resources, such as disk, memory, and CPU. Try to use simple types. For example, use int instead of char if possible, because the sorting of the latter involves the selection of character sets, which is more complicated than using int. Nullable columns use more storage space, and if you create an index on a nullable column, MySQL requires extra bytes for record keeping. When creating a table, the default is nullable, which is easily overlooked by developers. It is best to manually change it to non-nullable if the data to be stored will definitely not have null values.

Integer Types

Integer types include:

  • tinyint
  • smallint
  • mediumint
  • int
  • bigint

They use 8, 16, 24, 32, and 64 bits respectively to store numbers, and they can represent

The numbers in the range can be preceded by the unsigned modifier, which can double the representable range of positive numbers, but negative numbers cannot be represented. In addition, it is useless to specify a length for an integer. Once the data type is determined, the length is also determined accordingly.

Decimal Type

  • float
  • double
  • decimal

float and double are just the usual float and double. The former uses 32 bits to store data, while the latter uses 64 bits to store data. Like integers, it is useless to specify a length for them.

The decimal type is more complex, supports precise calculations, and occupies a large amount of space. Decimal uses 4 bytes to represent 9 digits. For example, decimal(18,9) means that the length of the number is 18, including 9 decimal digits, 9 integer digits, and the decimal point itself, which takes up a total of 9 bytes. Considering that decimal takes up more space and the precision calculation is very complicated, you can consider using bigint instead when the amount of data is large. You can perform some scaling operations on the real data before persistence and reading.

String Type

  • varchar
  • char
  • varbinary
  • binary
  • blob
  • text
  • enumerate

The actual space occupied by varchar type data is equal to the length of the string plus 1 or 2 bytes used to record the length of the string (when row-format is not set to fixed). Varchar saves a lot of space. When the length of string data in a column of a table varies greatly, it is appropriate to use varchar.

The actual space occupied by char is fixed. When the lengths of the string data in the table are almost the same or very short, it is appropriate to use the chart type.

Corresponding to varchar and char are varbinary and binary. The latter stores binary strings. Compared with the former, the latter is case-sensitive, does not need to consider the encoding method, and is faster when performing comparison operations.

It should be noted that although varchar(5) and varchar(200) use the same storage space when storing the string "hello", it does not mean that setting the length of varchar too large will not affect performance. In fact, some internal calculations of MySQL, such as creating temporary tables in memory (some queries will cause MySQL to automatically create temporary tables), will allocate a fixed size of space to store data.

Blob uses binary strings to store large texts, and text uses characters to store large texts. InnoDB uses a dedicated external storage area to store such data, and only pointers to them are stored in the data row. It is not suitable to create indexes for such data (if you need to create one, you can only create it for the string prefix), but no one will do that.

If a column of strings is repeated in large quantities and has limited content, you can use enumerations instead. MySQL maintains a "number-string" table when processing enumerations. Using enumerations can save a lot of storage space.

Time Type

  • year
  • date
  • time
  • datetime
  • timestamp

The datetime storage range is 1001 to 9999, accurate to seconds. timestamp stores the number of seconds since midnight, January 1, 1970, and can be represented until 2038. It takes up 4 bytes, which is half the space taken up by datetime. The time represented by timestamp is related to the time zone. In addition, the timestamp column has another feature. When executing an insert or update statement, MySQL will automatically update the data of the first column of type timestamp to the current time. Many tables are designed with a column called UpdateTime. It is quite appropriate to use timestamp for this column, which will be automatically updated, provided that the system will not be used until 2038.

Selection of primary key type

Use integers whenever possible. Integers take up less space and can be set to automatically grow. In particular, do not use hash value strings such as GUID, MD5, etc. as primary keys. These strings are very random. Since the InnoDB primary key is a clustered index column by default, the data storage is too scattered. In addition, the secondary index column of InnoDB includes the primary key column by default. If the primary key is too long, the secondary index will also take up a lot of space.

Special types of data

It is best to use a 32-bit unsigned integer to store IP addresses. MySQL provides the functions inet_aton() and inet_ntoa() to convert between the digital representation and string representation of IP addresses.

Index optimization

InnoDB uses B+ tree to implement index. For example, suppose there is a People table, the table creation statement is as follows

CREATE TABLE `people` (
 `Id` int(11) NOT NULL AUTO_INCREMENT,
 `Name` varchar(5) NOT NULL,
 `Age` tinyint(4) NOT NULL,
 `Number` char(5) NOT NULL COMMENT 'Number',
 PRIMARY KEY (`Id`),
 KEY `i_name_age_number` (`Name`,`Age`,`Number`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8;

Insert data:

Its index structure is roughly as follows:

In other words, the order of the index columns is very important. If the Name columns of two rows of data are the same, the Age column is used to compare the sizes. If the Age columns are the same, the Number column is used to compare the sizes. Sort by the first column first, then the second column, and finally the third column.

When using queries, you should try to match from left to right. In addition, if you search the left column range, the right column cannot use the index. Also, you cannot search for alternate columns, otherwise the subsequent indexes cannot be used. The following SQL statements are positive examples:

  • SELECT * from people where Name = 'Abel' and Age = 2 AND Number = 12312
  • SELECT * from people where Name = 'Abel'
  • SELECT * from people where Name like 'Abel%'
  • SELECT * from people where Name = 'Andy' and Age BETWEEN 11 and 20
  • SELECT * from people ORDER BY NAME
  • SELECT * from people ORDER BY NAME, Age
  • SELECT * from people GROUP BY Name

The following SQL statements are negative examples:

  • SELECT * from people where Age = 2
  • SELECT * from people where NAME like '%B'
  • SELECT * from people where age = 2
  • SELECT * from people where NAME = 'ABC' AND number = 3
  • SELECT * from people where NAME like 'B%' and age = 22

A technique for creating indexes using Hash values

If there is a column in the table that stores long strings, let's say the name is a URL, the index created on this column will be large. There is a way to alleviate this: create an index on the numeric hash value of the URL string. Create a new field, for example, URL_CRC, to store the hash value of the URL, then create an index for this field and write the following query:

select * from t where URL_CRC = 387695885 and URL = 'www.baidu.com'

If the amount of data is large, to prevent hash conflicts, you can customize the hash function or use part of the return value of the MD5 function as the hash value:

SELECT CONV(RIGHT(MD5('www.baidu.com'),16), 16, 10)

Prefix Index

If the data stored in the string column is long and the created index is also large, you can use a prefix index, that is, index only the first few characters of the string. This can shorten the index size. However, obviously, this type of index does not work when executing order by and group by.

It is important to choose the prefix length when creating a prefix index. Try to choose a shorter prefix without destroying the original data distribution. For example, if most of the strings start with "abc", then if the prefix index length is limited to 4, the index value will contain too many repeated "abcX".

Multi-column indexes

The index created on "People" mentioned above is a multi-column index, which is often better than multiple single-column indexes.

  • When performing AND queries on multiple indexes, you should create multi-column indexes instead of multiple single-column indexes.
  • You can try writing this:
select * from t where f1 = 'v1' and f2 <> 'v2' union all select * from t where f2 = 'v2' and f1 <> 'v1'

The order of multi-column indexes is very important. Usually, when sorting and grouping queries are not considered, columns with high selectivity (selectivity refers to the number of different data in a table index column/total number of rows. High selectivity means less duplicate data) should be placed in front. But there are exceptions. If you can confirm that some queries are frequently executed, you should give priority to the selectivity of these queries. For example, if the selectivity of Name in the People table above is greater than Age, the query statement should be written like this:

select * from people where name = 'xxx' and age = xx

It is more appropriate to put the Name column on the left side of the index, but if a certain SQL execution rate is the highest, such as

select * from people where name = 'xxx' and age = 20

When there are very few records with age=20 in the database, it is more efficient to put age at the left end of the index column. Putting age on the left side of the index may be unfair to other queries where age is not equal to 20. If you are not sure that age=20 is the most frequent query condition, you still need to consider all factors and it is more appropriate to put name on the left side.

Clustered index

A clustered index is a data storage structure. InnoDB directly stores data rows in the leaf nodes of the primary key index, rather than just storing the values ​​of the index columns and the primary key values ​​of the rows pointed to, as in the case of a secondary index. Due to this feature, a table can have only one clustered index. If a table does not define a primary key or a column with a unique index, InnoDB generates a hidden column and sets it as the clustered index column.

Covering Index

Simply put, some queries only need to query the index column, so there is no need to perform a secondary query based on the primary key ID recorded in the index B-tree node.

Duplicate and redundant indexes

If you repeatedly create an index on a column, it will not bring any benefits but only disadvantages, so you should try to avoid it. For example, it is redundant to create a unique index and a normal index for the primary key, because the primary key of InnoDB is a clustered index by default.

Redundant indexes are different from duplicate indexes. For example, if one index is (A,B) and another index is (A), this is called a redundant index. The former can replace the latter, but the latter cannot replace the former. However, (A,B) and (B) as well as (A,B) and (B,A) are not redundant indexes, and neither can replace the other.

If an index (A) already exists in a table and you want to create an index (A, B), you only need to expand the existing index. There is no need to create a new index. It should be noted that if an index (A) already exists, there is no need to create an index (A, ID), where ID refers to the primary key, because index A already contains the primary key by default and is considered a redundant primary key.

However, sometimes redundant indexes are desirable. For example, if there is an existing index (A), and it is expanded to (A, B), because column B is a very long type, querying A alone is not as fast as before. In this case, you can consider creating a new index (A, B).

Unused indexes

Unused indexes only increase the efficiency of insert, update, and delete operations and should be deleted in a timely manner.

Summary of index usage

Three-star principle of indexing:

  • An index that puts query-related records together in order gets one star.
  • One star if the order of data in the index matches the order of query results
  • One star if the index contains all the columns required by the query

The first principle means that the order of queries in the where condition is consistent with the index, which is to use the index from left to right as mentioned above.

Indexes are not a panacea. When the amount of data is huge, maintaining the index itself consumes performance, and you should consider partitioning and table storage.

Query Optimization

Reasons for slow query

Whether extra rows were requested from the database

For example, the application only needs 10 pieces of data, but it requests all the data from the database and discards most of the data before displaying it on the UI.

Whether extra columns are requested from the database

For example, the application only needs to display 5 columns, but all columns are checked through select * from

Whether the same query is executed multiple times

Can the application consider querying once and caching, and then use the records from the first query when needed later?

Is MySQL scanning additional records?

By viewing the execution plan, you can get a rough idea of ​​the number of records that need to be scanned. If this number exceeds expectations, try to solve it by adding indexes, optimizing SQL (which is the focus of this section), or changing the table structure (such as adding a separate summary table specifically for a certain statement query).

Ways to restructure your query

  • Break a complex query into multiple simple queries
  • Split large queries into smaller ones, each with the same query functionality, and only complete a small part
  • Decompose the associated query. You can change a large associated query into querying several tables separately, and then handle it in the application code

Miscellaneous

Optimizing count()

Count has two functions: one is to count the specified columns or expressions, and the other is to count the number of rows. If the parameter passes in a column name or an expression, count will count all rows whose results are not NULL. If the parameter is *, count will count all rows. Here is an example of passing an expression:

SELECT count(name like 'B%') from people
  • You can use approximate optimizations instead of count(), such as the number of rows in the execution plan.
  • Index Covering Scan
  • Add summary table
  • Increase the number of data records in the memory cache system

Optimizing associated queries

  • The MySQL optimizer performs related table queries like this. For example, if there are two tables A and B related by column c, MySQL will traverse table A and then search for data in table B based on the value of column c. To sum up, usually, if there is no index, you only need to add an index to column c of table B.
  • Make sure that the columns involved in order by and group by belong to only one table, so that the index can play a role.

Optimizing Subqueries

For MySQL 5.5 and below, try to use joins instead of subqueries.

Optimize group by and distinct

If possible, try to apply both operations to the primary key.

Optimize limit, for example, SQL

SELECT * from sa_stockinfo ORDER BY StockAcc LIMIT 400, 5

The MySQL optimizer will find all the column data for 405 rows and then discard 400. If you can use the covering index query, you don't need to query so many columns. First modify it to:

SELECT * FROM sa_stockinfo i JOIN (SELECT StockInfoID FROM sa_stockinfo ORDER BY StockAcc LIMIT 400,5)t ON i.StockInfoID = t.StockInfoID

Indexes are built on StockAcc. This query will use index coverage to quickly find the primary keys that meet the conditions, and then perform a joint query. The effect is obvious when the amount of data is large.

Optimizing union

If it is not necessary, be sure to use the keyword union all so that MySQL will not perform uniqueness verification when putting the data into the temporary table.

To determine whether a record exists, the usual approach is

select count(*) from t where condition

It is better to write:

SELECT IFNULL((SELECT 1 from tableName where condition LIMIT 1),0)

Summarize

This is the end of this article about MySQL index query optimization techniques. For more relevant MySQL index query optimization techniques, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL index type summary and usage tips and precautions
  • MySQL index usage monitoring skills (worth collecting!)
  • Tips and precautions for using MySQL index
  • Summary of MySQL database optimization technology and index usage skills
  • Summary of some tips on MySQL index knowledge

<<:  Linux cut command explained

>>:  About the correct way to convert time in js when importing excel

Recommend

MySQL automatically inserts millions of simulated data operation code

I use Navicat as my database tool. Others are sim...

How to correctly modify the ROOT password in MySql8.0 and above versions

Deployment environment: Installation version red ...

Several CSS3 tag shorthands (recommended)

border-radius: CSS3 rounded corners Syntax: borde...

Docker core and specific use of installation

1. What is Docker? (1) Docker is an open source t...

Things to note when writing self-closing XHTML tags

The img tag in XHTML is so-called self-closing, w...

How to fix the WeChat applet input jitter problem

Find the problem Let's look at the problem fi...

Understanding flex-grow, flex-shrink, flex-basis and nine-grid layout

1. flex-grow, flex-shrink, flex-basis properties ...

Summary of pitfalls encountered in installing mysql and mysqlclient on centos7

1. Add MySQL Yum repository MySQL official websit...

Native js to implement form validation function

Table of contents When developing, analyzing the ...

Install Apple Mac OS X in VMWare12 Graphic Tutorial

1. Introduction: Because my friend wanted to lear...

Tutorial on installing nginx in Linux environment

Table of contents 1. Install the required environ...

Upgrading Windows Server 2008R2 File Server to Windows Server 2016

The user organization has two Windows Server 2008...

How to modify port 3389 of Windows server 2008 R2 remote desktop

The default port number of the Windows server rem...