Summary of some tips on MySQL index knowledge

Summary of some tips on MySQL index knowledge

1. Basic knowledge of indexing

1.1 Advantages of Indexes

  1. Greatly reduces the amount of data that the server needs to scan and speeds up database retrieval
  2. Helps the server avoid sorts and temporary tables
  3. Convert random io to sequential io

1.2 Usefulness of Indexes

  1. Quickly find rows matching the WHERE clause
  2. Eliminate rows from consideration. If you can choose between multiple indexes, MySQL usually uses the index that finds the fewest rows.
  3. If the table has a multicolumn index, the optimizer can use any leftmost prefix of the index to find rows.
  4. When there is a table join, retrieve row data from other tables
  5. Find the min or max value of a specific index column
  6. If sorting or grouping is done on the leftmost prefix of the available index, the table is sorted and grouped
  7. In some cases, queries can be optimized to retrieve values ​​without querying the data rows.

1.3 Classification of Indexes

The database will create an index by default, but it does not create an index for the primary key, but for the unique key, because the primary key is unique and non-empty.

  • Primary key index: It is a special unique index that does not allow null values. (Primary key constraint is a primary key index)
  • Unique index: The values ​​in the index column must be unique, but null values ​​are allowed.
  • Normal index: The basic index type in MySQL, which has no restrictions and allows duplicate values ​​and null values ​​to be inserted into the column where the index is defined, purely for faster data query.
  • Full-text index: It can only be used on the MyISAM engine and can only be used on CHAR, VARCHAR, and TEXT type fields.
    >What is full-text indexing? It means that in a pile of text, you can find the record row to which the field belongs by using a certain keyword. For example, if there is "LOL LPL 牧小农", you may be able to find the record by using 牧小农. This is a possibility, because the use of full-text indexing involves many details, we only need to know the general idea. In general development, full-text indexing is not used because it takes up a lot of physical space and reduces the modifiability of records, so it is rarely used.
  • Composite index: An index created on a combination of multiple fields in a table. The index will only be used when the left fields of these fields are used in the query conditions. When using a composite index, the leftmost prefix set is followed.

For example, the index here consists of three fields: id, name, and age. The index rows are stored in the order of id/name/age. The index can index the following field combinations (id, name, age), (id, name), or (id). If the fields to be queried do not constitute the leftmost prefix of the index, the index will not be used. For example, age or the combination of (name, age) will not be queried using the index.

1.4 Interview Technical Terms

Table return: After the database finds the row where the specified record is located based on the index (not the primary key), it needs to retrieve the data from the data block again based on the primary key. This is called table return.

Covering index: Read an article I wrote: I failed the three interviews on a SQL question - SQL performance optimization

Leftmost match: In a combined index, if your SQL statement uses the leftmost index in the combined index, then this SQL statement can use this combined index for matching. If a range query (>, <, between, like) is encountered, matching will stop.

select * from t where a=1 and b=1 and c =1; #This way you can use the defined index (a,b,c) and use a,b,c
select * from t where a=1 and b=1; #This way you can use the defined index (a,b,c) and use a,b
select * from t where b=1 and a=1; #This way you can use the defined indexes (a, b, c) and use a and c (MySQL has a query optimizer)
select * from t where a=1; #This way we can also use the defined index (a, b, c), using a
select * from t where b=1 and c=1; #This does not allow the use of defined indexes (a, b, c)
select * from t where a=1 and c=1; #This way, the defined indexes (a, b, c) can be used, but only index a is used, indexes b and c are not used

Index pushdown: Also called Index Condition Pushdown (ICP), this is a method provided by MySQL to use a certain index to obtain tuples from a specific table. Note that we have deliberately emphasized "one" here, because such index optimization is not used for multi-table connections but for single-table scans. To be precise, it is a way to scan a single table using an index to obtain data.

1.5 Data Structure Used by Index

1.5.1 Hash Table

shortcoming︰

1. If hash storage is used, all data files need to be added to the memory, which consumes more memory space.

2. If all queries are equal value queries, then hash is indeed very fast. However, in enterprises or actual working environments, more data is searched within a range, rather than equal value queries, so hash is not very suitable.

1.5.2 Binary Tree

Disadvantages:

Whether it is a binary tree or a red-black tree, the depth of the tree will increase the number of IO operations, affecting the efficiency of data reading.

1.5.3 B+ Tree

B-tree features:

1. All key values ​​are distributed throughout the tree

2. The search may end at a non-leaf node, and a search is performed in the entire set of keywords, with performance close to that of a binary search.

3. Each node has at most m subtrees

4. The root node has at least 2 subtrees

5. A branch node has at least m/2 subtrees (all nodes except the root node and leaf node are branch nodes)

6. All leaf nodes are on the same layer, each node can have at most m-1 keys, and they are arranged in ascending order

Example diagram description:

Each node occupies a disk block. A node has two ascending keywords and three pointers to the root node of the subtree. The pointer stores the address of the disk block where the child node is located. The three range domains divided by the two keywords correspond to the range domains of the data of the subtrees pointed to by the three pointers. Taking the root node as an example, the keywords are 16 and 34, the data range of the subtree pointed to by the P1 pointer is less than 16, the data range of the subtree pointed to by the P2 pointer is 16~34, and the data range of the subtree pointed to by the P3 pointer is greater than 34.

Keyword search process:

  1. Find disk block 1 based on the root node and read it into memory. [Disk I/O operation 1st time]
  2. Compare keyword 28 in the interval (16,34) and find pointer P2 of disk block 1.
  3. Find disk block 3 according to the P2 pointer and read it into memory. [Disk I/O operation 2nd time]
  4. Compare keyword 28 in the interval (25,31) and find pointer P2 of disk block 3.
  5. Find disk block 8 according to the P2 pointer and read it into memory. [Disk I/O operation 3rd time]
  6. Find keyword 28 in the keyword list in disk block 8.

shortcoming:

  • Each node has a key and also contains data. However, the storage space of each page is limited. If the data is large, the number of keys stored in each node will become smaller.
  • When the amount of stored data is large, the depth will be large, which will increase the number of disk IO times during query, thus affecting query performance.

1.6 Index matching method

Full value match: Full value match refers to matching all columns in the index.

explain select * from staffs where name = 'July' and age = '23' and pos = 'dev';

Match the leftmost prefix: only match the first few columns

explain select * from staffs where name = 'July' and age = '23';

explain select * from staffs where name = 'July';

Match column prefix: You can match the beginning of a column value.

explain select * from staffs where name like 'J%';

explain select * from staffs where name like '%y';

Match range value: You can search for data within a certain range.

explain select * from staffs where name > 'Mary';

Exactly match a column and range match another column: You can query all of the first column and part of the second column

explain select * from staffs where name = 'July' and age > 25;

Index-only queries: When querying, only the index needs to be accessed, and no data rows need to be accessed. This is essentially a covering index.

explain select name,age,pos from staffs where name = 'July' and age = 25 and pos = 'dev';

2. Hash Index

Based on the implementation of hash tables, only queries that exactly match all columns of the index are valid

In MySQL, only the memory storage engine explicitly supports hash indexes.

The hash index itself only needs to store the corresponding hash value, so the index structure is very compact, which makes the hash index search very fast.

2.1 Limitations of Hash Indexes

  1. A hash index contains only hash values ​​and row pointers, but does not store field values. The index cannot use the values ​​in the index to avoid reading rows.
  2. Hash index data is not stored in the order of index values, so it cannot be sorted.
  3. Hash indexes do not support partial column matching. Hash indexes use the entire content of the index column to calculate the hash value.
  4. Hash indexes do not support equality comparison queries, nor do they support any range queries.
  5. Accessing hash indexed data is very fast unless there are many hash conflicts. When a hash conflict occurs, the storage engine must traverse all row pointers in the linked list and compare them row by row until all qualifying rows are found.
  6. If there are many hash conflicts, the maintenance cost will be very high

2.2 Examples

When you need to store a large number of URLs and select id from url where url=""

You can also select id fom url where url="" and url_crc=CRC32("")

This query has a high performance because it uses a very small index to complete the search

3. Composite Index

When including multiple columns as indexes, it is important to note that the correct order depends on the query of the index, and you also need to consider how to better meet the needs of sorting and grouping.

Example: Create a composite index a, b, c and use the index in different SQL statements


Statements Does the index work?
where a=3 Yes, only a
where a=3 and b=5 Yes, a and b are used.
where a =3 and b = 5 and c = 4 Yes, a, b, c are used
where a = 3 or c = 4 no
where a = 3 and c = 4 Yes, only a
where a = 3 and b > 10 and c = 7 Yes, a and b are used.
where a = 3 and b like '%mxn%' and c=7 Used a

4. Clustered index and non-clustered index

4.1 Clustered Index

It is not a separate index type, but a data storage method, which means that data rows and adjacent key values ​​are compactly stored together, putting data storage and indexes together. Finding the index also finds the data.

If no primary key is defined, InnoDB chooses a unique, non-empty index instead. If there is no unique index, InnoDB implicitly defines a primary key as the clustered index. InnoDB only clusters records within the same page. Pages containing adjacent key values ​​may be far apart.

4.2 Nonclustered Indexes

The data file is stored separately from the index file, and the data is stored in a separate structure from the index. The leaf node of the index structure points to the corresponding row of the data. MyISAM caches the index in memory through key_buffer. When the data needs to be accessed (accessed through the index), the index is directly searched in the memory, and then the corresponding data on the disk is found through the index. This is why the speed is slow when the index is not hit in the key buffer.

The data in the data page is found through the leaf node pointer, so the non-clustered index is in logical order

5. Covering Index

5.1 Basic Introduction

  1. If an index contains the values ​​of all fields that need to be queried, we call it a covering index.
  2. Not all types of indexes can be called covering indexes. Covering indexes must store the values ​​of the index columns.
  3. Different storage implements covering indexes in different ways. Not all engines support covering indexes. Memory does not support covering indexes.

5.2 Advantages

1. Index entries are usually much smaller than the size of data rows. If you only need to read the index, MySQL will greatly reduce the amount of data access.

2. Because the index is stored in the order of column values, the IO for IO-intensive range queries will be much less than randomly reading each row of data from disk.

3. Some storage engines such as MYISAM only cache indexes in memory, and rely on the operating system to cache data. Therefore, a system call is required to access data, which may cause serious performance problems.

4. Covering indexes are particularly useful for INNODB tables due to INNODB's clustered indexes

5.3 Case Demonstration

1. When a query covered by an index is initiated, the using index information can be seen in the extra column of explain, and the covering index is used at this time

2. In most storage engines, covering indexes can only cover queries that access only some columns in the index. However, further optimization can be performed by using InnoDB's secondary index to cover the query.

For example, actor uses the innodb storage engine and has a secondary index on the last_name field. Although the columns of this index do not include the primary key actor_id, it can also be used to perform a covering query on actor_id.

6. Optimize small details

When using index columns for querying, try not to use expressions and put the calculations in the business layer instead of the database layer.

Try to use primary key queries instead of other indexes, because primary key queries will not trigger table queries.

Using prefix index

>Sometimes you need to index a very long string, which will make the index large and slow. Usually, you can use a partial string at the beginning of a column, which greatly saves index space and improves index efficiency. However, this will reduce the selectivity of the index. The selectivity of the index refers to the ratio of non-repeated index values ​​to the total number of records in the data table, ranging from 1/#T to 1. The higher the selectivity of the index, the higher the query efficiency, because a more selective index allows MySQL to filter out more rows when searching.

In general, the selectivity of a column prefix is ​​high enough to meet the query performance, but for columns of BLOB, TEXT, and VARCHAR types, a prefix index must be used because MySQL does not allow indexing the full length of these columns. The trick to using this method is to choose a prefix that is long enough to ensure high selectivity, but not too long.

--Create a data table create table citydemo(city varchar(50) not null);
insert into citydemo(city) select city from city;

-- Repeat the following SQL statement 5 times insert into citydemo(city) select city from citydemo;

--Update the name of the city table update citydemo set city=(select city from city order by rand() limit 1);

--Find the most common city list and find that each value appears 45-65 times,
select count(*) as cnt,city from citydemo group by city order by cnt desc limit 10;

--Find the most frequently appearing city prefix, starting with the 3 prefix letters, and finding that it appears more frequently than before. You can intercept multiple characters to see how many times the city appears. select count(*) as cnt,left(city,3) as pref from citydemo group by pref order by cnt desc limit 10;
select count(*) as cnt,left(city,7) as pref from citydemo group by pref order by cnt desc limit 10;
--At this point, the selectivity of the prefix is ​​close to the selectivity of the complete column--You can also use another method to calculate the selectivity of the complete column. You can see that when the prefix length reaches 7, the selectivity improvement is very small if the prefix length is increased. select count(distinct left(city,3))/count(*) as sel3,
count(distinct left(city,4))/count(*) as sel4,
count(distinct left(city,5))/count(*) as sel5,
count(distinct left(city,6))/count(*) as sel6,
count(distinct left(city,7))/count(*) as sel7,
count(distinct left(city,8))/count(*) as sel8 
from citydemo;

--After the calculation is completed, you can create a prefix index alter table citydemo add key (city (7));

--Note: Prefix index is an effective way to make the index smaller and faster, but it also has disadvantages: MySQL cannot use prefix index for order by and group by. 

Use index scan to sort

MySQL has two ways to generate ordered results: by sorting or by scanning in index order. If the value of the type column in explain is index, it means that MySQL uses index scanning to do the sorting.
Scanning the index itself is fast because only one has to move from one index record to the immediately next record. However, if the index does not cover all the columns required for the query, then each time an index record is scanned, the corresponding row has to be queried back in the table. This is basically random IO, so reading data in index order is usually slower than a sequential full table scan.
MySQL can use the same index to sort and find rows. If possible, the index should be designed to satisfy both tasks.
MySQL can use the index to sort the results only when the order of the index columns is exactly the same as the order of the order by clause and all columns are sorted in the same way. If the query needs to associate multiple tables, the index can be used for sorting only when all the fields referenced by the order by clause are from the first table. The restrictions of the order by clause are the same as those of the search query. The leftmost prefix of the index must be met. Otherwise, MySQL needs to perform sequential operations and cannot use the index to sort.

union all, in, or can use indexes, but in is recommended.

Range columns can use indexes. The range conditions are: <, >. Range columns can use indexes, but columns following the range column cannot use indexes. An index can be used for at most one range column.

Forced type conversion will scan the entire table

create table user(id int,name varchar(10),phone varchar(11));
alter table user add index idx_1(phone);
explain select * from user where phone=13800001234; (will not trigger the index)
explain select * from user where phone='13800001234'; (trigger index)

It is not advisable to create indexes on fields that are updated frequently and have low data differentiation.

Updates will change the B+ tree. Indexing is recommended for fields that are frequently updated, which will greatly reduce database performance.
For attributes like gender that are not very different, it is meaningless to create an index, and it cannot effectively filter data.
Generally, an index can be created when the discrimination is above 80%. The discrimination can be calculated using count(distinct(column name))/count(*)

The column for creating an index does not allow null values, which may result in unexpected results.

When you need to join tables, it is best not to have more than three tables, because the fields that need to be joined must have the same data type.

Use limit whenever possible

It is recommended to limit the number of indexes on a single table to 5 or less.

The number of single index fields is not allowed to exceed 5 (combined index)

When creating an index, you should avoid the following misconceptions:

> The more indexes the better (wrong)

> Premature optimization, optimizing without understanding the system (error)

Summarize

This is the end of this article about MySQL index knowledge tips. For more relevant MySQL index knowledge content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to maintain MySQL indexes and data tables
  • Full steps to create a high-performance index in MySQL
  • What you need to know about creating MySQL indexes
  • MySQL query redundant indexes and unused index operations
  • Detailed explanation of the difference between MySQL normal index and unique index
  • A brief discussion on which fields in Mysql are suitable for indexing
  • In-depth study of MySQL composite index
  • mysql add index mysql how to create index
  • MySQL index type summary and usage tips and precautions
  • MySQL Create Index method, syntax structure and examples
  • MySQL performance optimization index optimization
  • Analysis of the connection and difference between MySQL primary key and index
  • How to construct a table index in MySQL

<<:  Detailed explanation of javascript event bubbling, event capture and event delegation

>>:  Building FastDFS file system in Docker (multi-image tutorial)

Recommend

MySQL common backup commands and shell backup scripts sharing

To back up multiple databases, you can use the fo...

Excel export always fails in docker environment

Excel export always fails in the docker environme...

A brief discussion on docker-compose network settings

Networks usage tutorial Official website docker-c...

MySQL 8.0.12 decompression version installation tutorial personal test!

Mysql8.0.12 decompression version installation me...

Docker container introduction

1. Overview 1.1 Basic concepts: Docker is an open...

Detailed explanation of dynamic Christmas tree through JavaScript

Table of contents 1. Animated Christmas Tree Made...

JavaScript code to implement Weibo batch unfollow function

A cool JavaScript code to unfollow Weibo users in...

MySQL Series Database Design Three Paradigm Tutorial Examples

Table of contents 1. Knowledge description of the...

Example of how to optimize MySQL insert performance

MySQL Performance Optimization MySQL performance ...

Analysis and solution of Chinese garbled characters in HTML hyperlinks

A hyperlink URL in Vm needs to be concatenated wit...

An example of how Tomcat manages Session

Learned ConcurrentHashMap but don’t know how to a...

vue-cli introduction and installation

Table of contents 1. Introduction 2. Introduction...

JS realizes simple picture carousel effect

This article shares the specific code of JS to ac...

How to enable MySQL remote connection in Linux server

Preface Learn MySQL to reorganize previous non-MK...