Guide to Efficient Use of MySQL Indexes

Guide to Efficient Use of MySQL Indexes

Preface

I believe most people have used MySQL and indexes, but do you know how to create appropriate indexes? When the amount of data is small, inappropriate indexes will not have much impact on performance, but when the data gradually increases, performance will drop sharply.

This article is a summary of MySQL indexes. If there are any errors, please comment and point them out.

Index Basics

We all know the steps of looking up a word in a dictionary. First, find the page number of the word in the index page, and then go to the corresponding page number to check the information of the word. The MySQL indexing method is similar to this. First, find the corresponding value in the index, and then find the corresponding data row based on the matching index record. If there is the following sql statement:

select * from student where code='2333'

If an index is created on the code column, MySQL will use the index to find the row with the value '2333', and then read all the data in the row and return it.

Index Type

B-Tree Index

(either B-tree or B-tree), the vast majority of index types are B-Tree (or B-Tree variants), and we usually use this type of index. The MyISAM storage engine in MySQL uses B-tree, and InnoDB uses B+Tree. You can search Baidu for the difference between B-tree and B+tree.

The tree-structured index can speed up the access to data. The storage engine no longer needs to scan the entire table to obtain the required data. Instead, it performs a binary search from the root node of the tree. As we all know, the speed of binary search is quite fast, so we can use the index to greatly improve the query speed. B-Tree supports the following types of queries:

Assume that there is only a multi-column index on the student table: name, age, weight. The following queries can all use this index:

  • Full value match

And all columns in the index column are matched. For example, to query name='abc' and age=12, the first and second columns are used here.

  • Match the leftmost column

Only the beginning part of the index is used. For example, the query for name='ggg' only uses the first column of the index, and the query for name='ggg' and age=12 uses the first and second columns of the index.

  • Matching column prefix

You can also match only the beginning part of a column, for example, query name lik 'g%' to query records where name starts with g. The first column is used here

  • Matching range value

Can be used to match range values, such as querying name > 'abc' and name < 'bcd'

  • Exactly match a column and range match another column

Used to match multiple columns, such as querying name='abc' and age > 12.

In general, we can find that B-Tree index is suitable for searching based on the leftmost prefix, that is, the order of query fields must be the same as the order of index fields and start with the first index field. For example, the index can be used to query name, name and age, name and age and weight, but the index cannot be used to query age, age and name.

Hash Index

The hash index is implemented based on the hash table and will only take effect if all columns of the index are exactly matched. In MySQL, only the Memory engine explicitly supports hash indexes, which is also its default index.

InnoDB cannot create hash indexes, but it has a feature called adaptive hash index. When certain index values ​​are used very frequently, the engine will create another hash index in memory based on the B-Tree index, so that the B-Tree index also has some advantages of the hash index. This feature is a completely automatic, internal behavior, meaning it cannot be manually controlled or configured.

High-performance indexing strategy

Below are some common indexing strategies.

Independent columns

This is very simple. If the columns in the query are not independent, the index cannot be used, for example:

select * from student where age+1=12

Even if the age column has an index, the above query statement cannot use the index.

Prefix Indexes and Index Selectivity

If you need to index a very long string column, creating an index directly will cause the index to take up more space and be slower. One optimization strategy is to simulate a hash index: calculate a hash value for the column and create an index on the hash value column.

Another way is to create a prefix index. Only the characters at the beginning of this field are indexed. This can greatly reduce the space occupied and the index creation speed will be much faster. But this also has the following disadvantages:

  • The index selectivity is reduced. If multiple strings have the same prefix, they cannot be distinguished and string comparison is required.
  • Order by and group by are not supported. The reason is obvious. Only some characters are indexed and cannot be fully distinguished.

The key here is to determine how many characters to index appropriately. It is necessary to avoid excessive length and to ensure sufficient index selectivity. There are two ways to help determine the number of index characters:

The index field prefix data is evenly distributed. That is, the number of strings starting with the index character is evenly distributed. For example, if we index the first 3 characters of the name field, the following result is reasonable (only the top 8 are taken):

number Index the first three characters
500 abc
465 asd
455 acd
431 zaf
430 aaa
420 vvv
411 asv
512 pdf

If the data in each column is relatively large, it means that the discrimination is not high enough and the number of index characters needs to be increased until the selectivity of the prefix is ​​close to the indexability of the entire column, that is, the preceding data should be as small as possible.

Calculate the selectivity of the full column and make the selectivity of the prefix close to the selectivity of the full column. The following statement calculates the complete column selectivity:

-- The number of different strings/total number is the complete column selective select count(distinct name)/count(*) from person;

The following statement calculates the selectivity of the first three fields of the index:

-- String data with different first 3 characters/total data select count(distincy left(city,3))/count(*) from person

Keep increasing the number of index characters until the selectivity approaches the full column selectivity and further increasing the number of index characters does not significantly improve the selectivity of the data.

Creation Method

-- Assume the optimal length is 4
alter table person add key (name(4));

Multi-column indexes

Many people have this misunderstanding: if a query uses multiple fields and an 'and' query, wouldn't it be possible to maximize efficiency by indexing each field? This is not the case. MySQL will only select one of the fields for index search. In this case, you should create a multi-column index (also called a joint index) so that you can use multiple index fields. Note that the order of the index columns must be consistent with the query order.

The "index merge" strategy was introduced in versions 5.0 and above. To some extent, multiple single-column indexes can also be used, such as the following query:

-- MySQL will use the name and age indexes to find the data and then merge them -- If you use and, it will find the data and then compare and get the intersection select * from person where name = "bob" or age=12

However, this is not recommended. Too many and or or conditions will consume a lot of CPU and memory in the algorithm's caching, sorting, and merging operations.

Choose the appropriate index column order

In a multi-column B-Tree index, the order of the index columns means that the index is sorted first by the leftmost column, then the second column... Indexing a good multi-column index should put the most selective index first, and then lower in turn, so that it can better facilitate the index. Selective computation method discovery: prefix index section.

Clustered Index

A clustered index is not a separate index type, but a data storage method. The specific details depend on its implementation.

InnoDB's clustered index actually stores index values ​​and data rows in the same structure. Because you cannot put a row of data in two different places at the same time, a table can have only one clustered index. InnoDB's clustered index columns are called "primary key columns".

If no primary key is defined, InnoDB chooses a unique, non-empty index instead. If there is no such index, InnoDB implicitly defines a primary key to serve as the clustered index.

The main advantage of clustered indexes is that they can store related data together, reduce disk IO, and improve query efficiency. But there are also disadvantages:

  • The insertion order is heavily dependent on the insertion order. Inserting in the order of primary keys is the fastest way, otherwise it may cause page splitting problems, occupy more disk space, and slow down the scanning speed. The table can be reorganized by OPTIMIZE TABLE.
  • Updating clustered index columns is expensive because when the index value changes, the row data will move to the new location along with the index.
  • Accessing row data using a secondary index (non-clustered index) requires two index lookups, because the leaf nodes of the secondary index do not store the physical location of the row data, but the primary key value of the row, and then the row data is retrieved from the clustered index using the primary key value.

Covering Index

Simply put, an index covers the column fields that need to be queried, so there is no need to use the primary key for a secondary search in the clustered index, and the required data can be obtained in a secondary index.

InnoDB indexes store index values ​​in leaf nodes, so if all fields to be queried are included in an index and this index is used, the query speed can be greatly improved. For example, the following query:

-- If name is indexed, directly retrieve the name value from the leaf node of the index without a secondary search. select name from person where name = 'abc'
-- If there is a `name,age` aggregate index, the data will be returned directly without a secondary search. select name,age from person where name='abc' and age=12

Sorting using an index

MySQL's sorting operation can also use indexes. The index can only be used for sorting when the column order of the index is exactly the same as the order of ORDER BY and the sorting method of all columns (ascending or descending) is also the same. Note: The number of sorted fields can be less than the corresponding index fields, but the order must be consistent. as follows:

-- Assume there is a joint index of (name, age, sex) -- You can use the index to sort select ... order by name desc, age desc
select ... order by name desc,age desc,sex desc
-- Sorting is not allowed select ... order by name desc,sex desc
select ... order by name desc,age asc

Finish

This article is based on MySQL 5.5. Newer versions may have different strategies.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • MySQL index for beginners
  • Solve MySQL deadlock routine by updating different indexes
  • Understanding MySQL deadlock routines through unique index S lock and X lock
  • Share some key interview questions about MySQL index
  • Index in MySQL
  • A brief talk about Mysql index and redis jump table
  • MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index
  • How to add index to mysql using shell script
  • Solutions to MySQL batch insert and unique index problems

<<:  Implementation of Docker to build private warehouse (registry and Harbor)

>>:  How to use Javascript to generate smooth curves

Recommend

Solve the problem of mysql's int primary key self-increment

Introduction When we use the MySQL database, we a...

Solution to the failure of 6ull to load the Linux driver module

Table of contents 0x01 Failed to load the driver ...

Vue encapsulates the public function method of exporting Excel data

vue+element UI encapsulates a public function to ...

Docker solution for logging in without root privileges

When you use the docker command for the first tim...

How to write a picture as a background and a link (background picture plus link)

The picture is used as the background and the lin...

Vue's guide to pitfalls using throttling functions

Preface In a common business scenario, we need to...

Summary of ways to implement single sign-on in Vue

The project has been suspended recently, and the ...

How to smoothly upgrade and rollback Nginx version in 1 minute

Today, let's talk about a situation that is o...

How to safely shut down MySQL

When shutting down the MySQL server, various prob...

How to use CURRENT_TIMESTAMP in MySQL

Table of contents Use of CURRENT_TIMESTAMP timest...

In-depth explanation of MySQL isolation level and locking mechanism

Table of contents Brief description: 1. Four char...

Why is it slow when using limit and offset paging scenarios?

Let’s start with a question Five years ago when I...

Sample code for implementing 3D rotation effect using pure CSS

Mainly use the preserve-3d and perspective proper...