Table of contents- 1. Index Basics
- 1. Types of Indexes
- 1.1 B-Tree Index
- 1.2 Hash Index
- 1.3 Spatial Data Index (R-Tree)
- 1.4 Full-text indexing
- 2. Advantages and disadvantages of indexes
- 3. High-performance indexing strategy
- 1. Independent columns
- 2. Prefix Index
- 3. Multi-column index
- 4. Appropriate index column order
- 5. Clustered Index
- 6. Covering Index
- 3. Query performance optimization
- 1. Explain performance analysis
- 1.1 id: table reading order
- 1.2 select_type: query operation type
- 1.3 table: the source of the table
- 1.4 type: access type
- 1.5 possible_key: possible indexes
- 1.6 key: the index actually used
- 1.7 key_len: Number of bytes used by the index
- 1.8 ref: Displays detailed information about the index being used
- 1.9 rows: the number of rows queried
- 1.10 Extra: Additional important information
- Summarize
1. Index Basics
1. Types of Indexes
1.1 B-Tree Index
Most MySQL storage engines use B+ tree indexes by default. Different storage engines use B+ tree indexes in different ways. MyISAM uses prefix compression technology to make the index smaller, but InnoDB stores it in metadata format. MyISAM indexes reference the indexed rows by the physical location of the data, while InnoDB references the indexed rows based on the primary key. B-Tree and B+Tree B-Tree: 
B+ Tree: 
the difference: - The keywords and records of the B-tree are placed together. The leaf nodes can be regarded as external nodes and do not contain any information. The non-leaf nodes of the B+ tree only have keywords and indexes pointing to the next node. Records are only placed in leaf nodes.
- In a B-tree, the closer a record is to the root node, the faster it is to find the record. As long as the keyword is found, the existence of the record can be determined. However, in a B+ tree, the search time for each record is basically the same. It is necessary to go from the root node to the leaf node, and the keyword must be compared again in the leaf node. From this perspective, the performance of B-tree seems to be better than that of B+-tree, but in actual applications, the performance of B+-tree is better. Because the non-leaf nodes of the B+ tree do not store actual data, each node can accommodate more elements than the B-tree, and the tree height is smaller than the B-tree. The benefit of this is that the number of disk accesses is reduced. Although the B+ tree needs more comparisons to find a record than the B tree, the time for a disk access is equivalent to hundreds or thousands of memory comparisons, so the performance of the B+ tree may be better in practice. In addition, the leaf nodes of the B+ tree are connected together using pointers to facilitate sequential traversal (for example, viewing all files in a directory, all records in a table, etc.), which is why many databases and file systems use B+ trees.
Why is B+ tree more suitable than B-tree for file indexing and database indexing in operating systems in practical applications? - B+ tree has lower disk read and write costs
- The internal nodes of the B+ tree do not have pointers to the specific information of the keyword. Therefore, its internal nodes are smaller than those of the B-tree. If all keywords of the same internal node are stored in the same disk block, the disk block can accommodate more keywords. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.
- The query efficiency of B+ tree is more stable
- Since the non-terminal point is not the node that ultimately points to the file content, it is just an index of the keyword in the leaf node. Therefore, any keyword search must take a path from the root node to the leaf node. The path lengths of all keyword queries are the same, resulting in the same query efficiency for each data point.
Why not use red-black trees? - B+ tree has fewer search times
- The time complexity of a balanced tree search operation is related to the tree height h, O(h)=O(logdN), where d is the out-degree of each node.
- The out-degree of a red-black tree is 2, while the out-degree of a B+ tree is generally very large, so the tree height h of a red-black tree is obviously much larger than that of a B+ tree, and the number of searches is also greater.
- B+ tree uses disk pre-reading feature
- In order to reduce disk I/O operations, the disk is often not read strictly on demand, but reads in advance every time. During the pre-read process, the disk performs sequential reading. Sequential reading does not require disk seek and only requires a short disk rotation time, so the speed is very fast.
- The operating system generally divides memory and disk into blocks of fixed size, each of which is called a page, and memory and disk exchange data in units of pages. The database system sets the size of an index node to the size of a page so that a node can be fully loaded in one I/O. And the pre-reading feature can be used, and adjacent nodes can also be pre-loaded
1.2 Hash Index
Hash indexes are implemented based on hash tables. For each row of data, the storage engine calculates a hash code for all index columns. The hash code can be used for search in O(1) time, but cannot be used for sorting and grouping. It only supports exact search and cannot be used for partial search or range search. In MySQL, only the Memory engine explicitly supports hash indexes. The InnoDB storage engine has a special feature called "adaptive hash index". When an index value is used very frequently, a hash index will be created on top of the B+Tree index. This gives the B+Tree index some of the advantages of the hash index, such as fast hash search. 1.3 Spatial Data Index (R-Tree)
The MyISAM storage engine supports spatial data indexes (R-Tree) and can be used for geographic data storage. Spatial data indexes index data from all dimensions and can effectively use any dimension for combined queries. GIS related functions must be used to maintain data. 1.4 Full-text indexing
The MyISAM storage engine supports full-text indexing, which is used to find keywords in text instead of directly comparing them for equality. The search condition uses MATCH AGAINST instead of the normal WHERE. The full-text index is implemented using an inverted index, which records the mapping of keywords to the documents in which they are located. The InnoDB storage engine also began to support full-text indexing in MySQL version 5.6.4. 2. Advantages and disadvantages of indexes
advantage - Indexes greatly reduce the amount of data that the server needs to scan
- Indexes can help the server avoid sorting and temporary tables, reducing CPU consumption
- Random IO can be converted into sequential IO to speed up IO
shortcoming - Although indexes greatly increase query speed, they also reduce the speed of updating tables, such as INSERT, UPDATE, and DELETE. Because when updating a table, MySQL not only saves the data, but also saves the index file. Every time you update a field with an index column, it will adjust the index information after the key value changes caused by the update.
- In fact, the index is also a table that stores the primary key and index fields and points to the records of the entity table, so the index column also takes up space.
3. High-performance indexing strategy
1. Independent columns
If the columns in the MySQL query are not independent, the index will not be used. "Independent columns" means that the index column cannot be part of an expression or a function parameter. For example
mysql> SELECT id, name FROM t_user WHERE id + 1 = 5;
MySQL cannot parse this id + 1 equation. We should develop the habit of simplifying the WHERE conditions. 2. Prefix Index
Sometimes you need to index very long character columns, which makes the index large and slow. For example, for columns of BLOB, TEXT, and VARCHAR types, you must use a prefix index to index only the beginning characters. The selection of prefix length needs to be determined based on index selectivity 3. Multi-column index
Many people do not fully understand multi-column indexes. A common mistake is to create a separate index for each column or to create multi-column indexes in the wrong order. In most cases, creating independent single-column indexes on multiple columns does not improve MySQL's query performance, so the "index merge" strategy is introduced, which can use multiple single-column indexes on the table to locate the specified row to a certain extent. For example, in the following statement, it is best to set username and password as multi-column indexes.
SELECT username, password FROM t_user WHERE username = 'Aiguodala' AND password = 'Aiguodala'; 4. Appropriate index column order
Put the most selective index columns first. The selectivity of an index refers to the ratio of unique index values to the total number of records. The maximum value is 1, in which case each record has a unique index corresponding to it. The higher the selectivity, the more distinguishable each record is and the higher the query efficiency is. 5. Clustered Index
A clustered index is not a separate index type, but a way of storing data. The term "clustered" means that data rows and adjacent key values are stored compactly together. InnoDB clusters data by primary key. If no primary key is defined, InnoDB selects a unique non-empty index instead. If there is no such index, InnoDB implicitly defines a primary key as the clustered index. Advantages and Disadvantages of Aggregated Data advantage: - You can save related data together
- For example, when implementing an email mailbox, data is clustered according to the user ID, so that only a small amount of data needs to be read from the disk to obtain all the emails of a certain user. If there is no clustered index, obtaining each email will result in a disk IO.
- Faster data access. Clustered indexes store indexes and data in the same B+ tree, making it easier to find data.
- Queries using covering index scans can directly use the primary key values in the page nodes
shortcoming: - Clustering data maximizes the performance of IO-intensive applications, but if all data is placed in memory, the order of access is not important and clustered indexes have no advantage.
- The insertion speed is heavily dependent on the insertion order. If the data is not loaded in the order of the primary key, it is best to use the OPTIMIZE TABLE command to reorganize the table after loading. Therefore, it is recommended to choose an auto-increment primary key.
- Updating clustered index columns is expensive because it forces InnoDB to move each updated row to a new location.
- Tables based on clustered indexes may face the problem of "page splitting" when new rows are inserted or when the primary key is updated and rows need to be moved. When the primary key value of a row requires that the row be inserted into a full page, the storage engine splits the page into two pages to accommodate the row. This is a split operation. Page splits cause the table to occupy more disk space.
- Clustered indexes can cause full table scans to slow down, especially when rows are sparse or when data is not stored contiguously due to page splits.
Nonclustered index The data is stored in an index-separated structure. The leaf nodes of the index structure point to the corresponding rows of the data. MyISAM caches the index in memory through key_buffer. When the data needs to be accessed (accessed through the index), the index is directly searched in the memory, and then the corresponding data on the disk is found through the index. This is why the speed is slow when the index is not hit in the key buffer. 6. Covering Index
The index covers the values of all fields that need to be queried benefit: - Index entries are much smaller than data rows, so they can greatly reduce the amount of data access and make it easier to put all of them into memory.
- Indexes are stored in order of column values, so range queries with IO-intensive types will require much less IO than randomly reading each row of data from disk.
- Some storage engines (such as MyISAM) cache only indexes in memory and rely on the operating system to cache the data. Therefore, accessing only the index can be done without using system calls (which are usually time-consuming).
- InnoDB's secondary index (non-clustered index) stores the primary key value of the row in the leaf node. If the secondary primary key can cover the query, the secondary query of the primary key index can be avoided.
3. Query performance optimization
1. Explain performance analysis
Use the EXPLAIN keyword to simulate the optimizer to execute SQL query statements, so as to know how MySQL processes your SQL statements. Analyze the performance bottleneck of your query statement or table structure Example: 
1.1 id: table reading order
id is the sequence number of the select query, which contains a set of numbers that indicate the order in which the select clauses or operation tables are executed in the query. The id is the same: the execution order is from top to bottom
EXPLAIN SELECT * FROM t1, t2, t3 WHERE t1.id = t2.id AND t2.id = t3.id;

Different ids: The execution order is that the one with the larger id is executed first
EXPLAIN SELECT t2.id FROM t2 WHERE t2.id =
(SELECT t1.id FROM t1 WHERE t1.id =
(SELECT t3.id FROM t3)
);

1.2 select_type: query operation type
select_type represents the type of query, which is mainly used to distinguish between common queries, joint queries, subqueries and other complex queries select_type attribute | meaning |
---|
SIMPLE | Simple select query, the query does not contain subqueries or UNION | PRIMARY | If the query contains any complex sub-parts, the outermost query is marked as Primary | DERIVED | Subqueries included in the FROM list are marked as DERIVED. MySQL will recursively execute these subqueries and store the results in a temporary table. | SUBQUERY | A subquery is included in the SELECT or WHERE list, and WHERE is followed by a single value (=) | DEPEDENT SUBQUERY | A subquery is included in the SELECT or WHERE list. The subquery is based on the outer layer. WHERE is followed by a set of values (IN) | UNCACHEABLE SUBQUERY | Cannot use cached subquery | UNION | If the second SELECT appears after a UNION, it is marked as a UNION; if a UNION is included in a subquery in the FROM clause, the outer SELECT will be marked as DERIVED | UNION RESULT | SELECT to get results from UNION tables |
1.3 table: the source of the table
table indicates which table this data is based on 1.4 type: access type
type is the access type of the query. It is a more important indicator. The results are from best to worst:
system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > all
--The common order is system > const > eq_ref > ref > range > index > all
Generally speaking, you need to ensure that the query reaches at least the range level, preferably the ref level. Type Name | meaning |
---|
SYSTEM | The table has only one row of records (equal to the system table). This is a special column of const type. It does not appear normally and can be ignored. | CONST | Indicates that the index is found once, and const is used to compare primary key or unique index. Because only one row of data is matched, it is very fast. If you put the primary key in the where list, MySQL can convert the query into a constant | EQ_REF | Unique index scan, for each index key, there is only one record matching it in the table. Commonly used for primary key or unique index scans | REF | A nonunique index scan returns all rows that match a single value. Essentially an index access, it returns all rows that match a single value. However, it may find multiple rows that meet the criteria, so it should be considered a hybrid of search and scan. | RANGE | Retrieve only a given range of rows, using an index to select the rows. The key column shows which index is used. This is usually a query that contains between, <, >, in, etc. in your where clause. This range scan is better than a full table scan because it only needs to start at one point in the index and end at another point, without scanning the entire index. | INDEX | The index appears when SQL uses the index but does not filter through the index. Generally, a covering index is used or the index is used for sorting and grouping. | ALL | Full Table Scan, which traverses the entire table to find matching rows |
1.5 possible_key: possible indexes
Shows one or more indexes that may apply to this table. If there is an index on the field involved in the query, the index will be listed, but it may not be actually used by the query. 1.6 key: the index actually used
The actual index used. If NULL, no index is used. 1.7 key_len: Number of bytes used by the index
Indicates the number of bytes used in the index. This column can be used to calculate the length of the index used in the query. The key_len field can help you check whether the index is fully utilized. The longer ken_len is, the more fully the index is used. 1.8 ref: Displays detailed information about the index being used
ref shows which column of the index is used and can be a constant if possible. Which columns or constants are used to look up values in the index column 1.9 rows: the number of rows queried
The rows column shows the number of rows that MySQL thinks it must examine to execute the query. The less the better! 1.10 Extra: Additional important information
Other important additional information - Using filesort: Use external index sorting (no user-created index is used)
- This means that MySQL will use an external index to sort the data instead of reading it in the order of the index in the table. Sorting operations in MySQL that cannot be completed using indexes are called "file sorts"
- The Using filesort message indicates that the SQL statement is not well designed and is not sorted according to the created index or in the order specified by the index.
- Using temporary
- Temporary tables are used to store intermediate results. MySQL uses temporary tables when sorting query results. Commonly used in sorting order by and grouping query group by
- The occurrence of Using temporary indicates that the SQL statement is poorly designed, possibly because the composite index is not used in order.
- Using index
- Using index means that a covering index is used in the corresponding select operation, which avoids accessing the data rows of the table, and has good efficiency!
- If using where appears at the same time, it means that the index is used to perform the search of the index key value.
- If using where is not present, the index is only used to read data instead of performing a search.
- Using where
- Indicates that where filtering is used
- Using join buffer
- Connection caching is used
- impossible where
- The value of the where clause is always false and cannot be used to retrieve any tuples.
- select tables optimized away
- In the absence of a GROUP BY clause, optimization of MIN/MAX operations based on indexes or COUNT(*) operations for the MyISAM storage engine does not need to wait until the execution phase to perform the calculations. The optimization is completed during the query execution plan generation phase.
Summarize This is the end of this article about creating high-performance indexes in MySQL. For more relevant MySQL high-performance index content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:- The usage strategy and optimization behind MySQL indexes (high-performance index strategy)
- MySQL uses indexes to optimize performance
- A simple introduction to MySQL prefix index
- Detailed analysis and principle of MySQL index mechanism
- MySQL optimization and index analysis
- Do you know MySQL index?
- Analyzing the role of MySQL indexes
- Detailed analysis of MySQL index structure
- Creating high-performance indexes for MySQL
|