MySQL database optimization: index implementation principle and usage analysis

MySQL database optimization: index implementation principle and usage analysis

This article uses examples to illustrate the principles and usage of index implementation for MySQL database optimization. Share with you for your reference, the details are as follows:

index

What is an index

Indexes are used to quickly find records with specific values. All MySQL indexes are stored in the form of B-trees. If there is no index, when executing a query, MySQL must scan all records in the entire table starting from the first record until it finds a record that meets the requirements. The more records there are in the table, the more expensive this operation will be. If an index has been created on the column used as the search condition, MySQL can quickly obtain the location of the target record without scanning any records. If the table has 1000 records, searching for a record using an index is at least 100 times faster than scanning the records sequentially.

Index Classification

Primary key index

A primary key is a unique index, but it must be specified as "PRIMARY KEY". If you have ever used AUTO_INCREMENT columns, you may already be familiar with concepts such as primary keys. The primary key is usually specified when creating a table, for example "CREATE TABLE tablename ([…], PRIMARY KEY (column list));". However, we can also add a primary key by modifying the table, such as "ALTER TABLE tablename ADD PRIMARY KEY (column list);". Each table can have only one primary key.

Create a primary key index

A primary key is a unique index, but it must be specified as "PRIMARY KEY". If you have ever used AUTO_INCREMENT columns, you may already be familiar with concepts such as primary keys. The primary key is usually specified when creating a table, for example "CREATE TABLE tablename ([…], PRIMARY KEY (column list));". However, we can also add a primary key by modifying the table, such as "ALTER TABLE tablename ADD PRIMARY KEY (column list);". Each table can have only one primary key.

When a table sets a column as the primary key, the column is the primary key index.

create table aaa
(id int unsigned primary key auto_increment,
name varchar(32) not null default '');

This is the id column which is the primary key index.

create table bbb (id int , name varchar(32) not null default '');

If you did not specify a primary key index when you created the table, you can also add it after creating the table, command:

Example:

alter table table name add primary key (column name);

Drop the primary key index

alter table articles drop primary key;

Query Index

desc table name; index name cannot be displayed show index from table name show keys from table name

Full-text index

Create table structure

CREATE TABLE articles (
    id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
    title VARCHAR(200),
    body TEXT,
    FULLTEXT (title,body)
   )engine=myisam charset utf8;
INSERT INTO articles (title,body) VALUES
   ('MySQL Tutorial','DBMS stands for DataBase ...'),
   ('How To Use MySQL Well','After you went through a ...'),
   ('Optimizing MySQL','In this tutorial we will show ...'),
   ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
   ('MySQL vs. YourSQL','In the following database comparison ...'),
   ('MySQL Security','When configured properly, MySQL ...');

Incorrect usage:

select * from articles where body like '%mysql%'; incorrect usage index will not take effect

Correct usage:

select * from articles where match(title,body) against ( 'database')

illustrate:

1. In MySQL, fulltext index is only valid for MyISAM
2. The fulltext provided by MySQL is effective for English -> Sphinx (coreseek) technology processes Chinese
3. The usage method is match (field name...) against ('keyword')
4. Full-text index: stop words. Because creating an index in a text is an infinite number, some common words and characters will not be created. These words are called stop words. For example (a, b, mysql, the)

mysql> select match(title,body) against ('database') from articles; (The output is the match degree between each row and database)

Unique Index

This type of index is basically the same as the previous "normal index", but with one difference: all values ​​in the index column can only appear once, that is, they must be unique. Unique indexes can be created in the following ways:

Create an index, for example, CREATE UNIQUE INDEX <index name> ON tablename (column list);

Modify the table, for example, ALTER TABLE tablename ADD UNIQUE [index name] (column list);

Specify the index when creating a table, for example, CREATE TABLE tablename ([…], UNIQUE [index name] (column list));

Create table structure

create table ddd(id int primary key auto_increment , name varchar(32) unique);

Notice

A unique field can be NULL, and can have multiple NULLs, but if it is specific content, it cannot be repeated.

But there cannot be repeated empty strings''

Normal index

The only task of a normal index (an index defined by the keyword KEY or INDEX) is to speed up access to data. Therefore, you should create indexes only for those columns that appear most frequently in query conditions (WHEREcolumn=) or sort conditions (ORDERBYcolumn). Whenever possible, you should choose a column with the neatest and most compact data (such as an integer column) to create an index.

create table ccc(
id int unsigned,
name varchar(32)
)
create index index name on table (column 1, column name 2);

How the index works

A database index is a sorted data structure in a database management system to assist in quickly querying and updating data in database tables. Indexes are usually implemented using B-trees and their variant B+ trees.
In addition to the data, the database system also maintains data structures that satisfy specific search algorithms. These data structures reference (point to) the data in some way, so that advanced search algorithms can be implemented on these data structures. This data structure is an index.
There is a price to pay for setting indexes for tables: first, it increases the storage space of the database, and second, it takes more time to insert and modify data (because the index also needs to change accordingly).

The figure above shows one possible indexing approach. On the left is the data table, which has two columns and seven records. The one on the far left is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on the disk). In order to speed up the search of Col2, we can maintain a binary search tree as shown on the right. Each node contains an index key value and a pointer to the physical address of the corresponding data record. In this way, we can use binary search to obtain the corresponding data within the complexity of O(log2n).

Creating indexes can greatly improve system performance.

First, by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
Second, it can greatly speed up data retrieval, which is also the main reason for creating an index.
Third, it can speed up the connection between tables, which is especially meaningful in implementing the referential integrity of data.
Fourth, when using grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can also be significantly reduced.
Fifth, by using indexes, you can use optimized queries during the query process to improve system performance.

Some people may ask: Since adding indexes has so many advantages, why not create an index for every column in the table? Because, adding indexes also has many disadvantages.

First, creating and maintaining indexes takes time, and this time increases as the amount of data increases.
Second, indexes require physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space. If you want to create a clustered index, the space required will be even greater.
Third, when adding, deleting, and modifying data in the table, the index must also be maintained dynamically, which reduces the speed of data maintenance.

Indexes are created on certain columns in a database table. When creating an index, you should consider which columns can be indexed and which columns cannot be indexed. Generally speaking, indexes should be created on these columns: on columns that are frequently searched to speed up searches; on columns that serve as primary keys to enforce the uniqueness of the column and organize the data structure in the table; on columns that are frequently used in joins, which are mainly foreign keys, to speed up joins; create indexes on columns that are frequently searched based on ranges, because the index is already sorted and the specified range is continuous; create indexes on columns that are frequently sorted, because the index is already sorted, so queries can take advantage of the index sorting to speed up sorting query time; create indexes on columns that are frequently used in WHERE clauses to speed up condition determination.

Likewise, there are some columns for which indexes should not be created. Generally speaking, columns that should not be indexed have the following characteristics:

First, you should not create indexes on columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, indexing them or not will not improve query speed. On the contrary, the addition of indexes reduces the system maintenance speed and increases the space requirements.
Second, you should not add indexes to columns that have very few data values. This is because these columns have very few values, such as the gender column of the personnel table. In the query result, the data rows in the result set account for a large proportion of the data rows in the table, that is, the proportion of data rows that need to be searched in the table is very large. Adding indexes does not significantly speed up retrieval.
Third, indexes should not be added to columns defined as text, image, and bit data types. This is because the data volume of these columns is either quite large or has very few values.
Fourth, when the modification performance is much greater than the retrieval performance, you should not create an index. This is because modification performance and retrieval performance are contradictory to each other. When adding indexes, retrieval performance will be improved, but modification performance will be reduced. When you reduce the number of indexes, modification performance improves and retrieval performance decreases. Therefore, when modification performance is much greater than retrieval performance, indexes should not be created.

Depending on the capabilities of the database, three types of indexes can be created in the database designer: unique index, primary key index, and clustered index.

Unique Index

A unique index is one that does not allow any two rows in it to have the same index value.
Most databases do not allow a newly created unique index to be saved with a table when there are duplicate key values ​​in the existing data. The database might also prevent the addition of new data that would create duplicate key values ​​in the table. For example, if a unique index is created on the last name (lname) of an employee in the employee table, no two employees can have the same last name. Primary Key Index A database table often has a column or combination of columns whose value uniquely identifies each row in the table. This column is called the primary key of the table. Defining a primary key for a table in a database diagram will automatically create a primary key index, which is a specific type of unique index. This index requires that each value in the primary key be unique. It also allows fast access to data when the primary key index is used in queries. Clustered Index In a clustered index, the physical order of the rows in the table is the same as the logical (index) order of the key values. A table can contain only one clustered index.
If an index is not a clustered index, the physical order of the rows in the table does not match the logical order of the key values. Clustered indexes generally provide faster access to data than nonclustered indexes.

Locality Principle and Disk Pre-reading

Due to the characteristics of storage media, disk access itself is much slower than main memory. In addition to the mechanical movement consumption, the disk access speed is often a few hundredths of the main memory. Therefore, in order to improve efficiency, disk I/O should be minimized. To achieve this goal, the disk is often not read strictly on demand, but will pre-read each time. Even if only one byte is needed, the disk will start from this position and read a certain length of data sequentially backwards into the memory. The theoretical basis for this is the famous locality principle in computer science: when a piece of data is used, the nearby data will usually be used immediately. The data required during program execution is usually concentrated.
Since disk sequential reads are very efficient (no seek time is required and only a small amount of rotation time is required), pre-reading can improve I/O efficiency for programs with locality.
The length of the pre-read is generally an integer multiple of a page. A page is a logical block of computer management memory. Hardware and operating systems often divide main memory and disk storage areas into continuous blocks of equal size. Each storage block is called a page (in many operating systems, the page size is usually 4k). Main memory and disk exchange data in units of pages. When the data that the program wants to read is not in the main memory, a page fault exception will be triggered. At this time, the system will send a read signal to the disk. The disk will find the starting position of the data and read one or several pages continuously and load them into the memory. Then the exception will be returned and the program will continue to run.

Performance Analysis of B-/+Tree Indexes

Now we can finally analyze the performance of the B-/+Tree index.
As mentioned above, the number of disk I/Os is generally used to evaluate the quality of index structures. Let's start with the B-Tree analysis. According to the definition of B-Tree, a search requires accessing at most h nodes. The designers of the database system cleverly used the disk pre-reading principle to set the size of a node equal to a page, so that each node can be fully loaded with only one I/O. In order to achieve this goal, the following techniques are needed in the actual implementation of B-Tree:
Each time a new node is created, a page of space is directly requested, which ensures that a node is physically stored in a page. In addition, computer storage allocation is aligned by page, so a node only needs one I/O.
A search in a B-Tree requires at most h-1 I/Os (the root node resides in memory), and the asymptotic complexity is O(h)=O(logdN). In general practical applications, the out-degree d is a very large number, usually exceeding 100, so h is very small (usually not more than 3).
For a red-black tree structure, h is obviously much deeper. Since logically close nodes (parent and child) may be physically far away and cannot take advantage of locality, the asymptotic I/O complexity of the red-black tree is also O(h), which is significantly less efficient than the B-Tree.

In summary, using B-Tree as an index structure is very efficient.

You should take the time to learn about B-tree and B+ tree data structures.

1) B-tree

Each node in a B-tree contains a key value and a pointer to the address of the data object corresponding to the key value, so a successful search for an object does not require reaching the leaf node of the tree.
A successful search includes searching within a node and searching along a path. The time required for a successful search depends on the level at which the key is located and the number of keys within the node.
The method to search for a given keyword in a B-tree is: first take the root node, and search for the given keyword in the keywords K1,…,kj contained in the root node (sequential search or binary search can be used). If a keyword equal to the given value is found, the search is successful; otherwise, it can be determined that the keyword to be searched is between a certain Ki or Ki+1, so take the next level index node block pointed to by Pi and continue searching until it is found, or the search fails when the pointer Pi is empty.

2) B+ Tree

The key code stored in the non-leaf node of the B+ tree does not indicate the address pointer of the data object. The non-leaf node is only the index part. All leaf nodes are on the same layer, containing all key codes and storage address pointers of corresponding data objects, and the leaf nodes are linked in ascending order according to the key code. If the actual data objects are stored in the order they are added rather than by key number, the leaf node index must be a dense index. If the actual data is stored in key order, the leaf node index is a sparse index.
The B+ tree has two head pointers, one is the root node of the tree, and the other is the leaf node with the minimum key code.
So there are two search methods for B+ trees:
One is to search in the order of the linked list pulled up by the leaf node itself.
One is to start the search from the root node, which is similar to the B-tree. However, if the key code of a non-leaf node is equal to a given value, the search does not stop, but continues along the right pointer until the key code on the leaf node is found. So regardless of whether the search is successful or not, all levels of the tree will be walked.
In a B+ tree, data objects are inserted and deleted only on leaf nodes.
The difference between these two data structures for handling indexes:
a. The same key value will not appear multiple times in a B-tree, and it may appear in a leaf node or a non-leaf node. The keys of a B+ tree must appear in leaf nodes and may be repeated in non-leaf nodes to maintain the balance of the B+ tree.
b. Because the position of the B-tree key is not fixed and it only appears once in the entire tree structure, although it can save storage space, it significantly increases the complexity of insertion and deletion operations. B+ tree is a better compromise.
c. The query efficiency of the B-tree is related to the position of the key in the tree. The maximum time complexity is the same as that of the B+ tree (at the leaf node), and the minimum time complexity is 1 (at the root node). In the case of B+ trees, the complexity is fixed for a certain built tree. Can scan powers of 2.

The cost of indexing

Disk space occupied

Impact on the efficiency of DML (update, delete, insert) statements

Additions, deletions, and modifications will affect the index because the index needs to be reorganized.

Storage Engine Allowed Index Types
myisam btree
innodb btree
memory/yeap Hash,btree

Which columns are suitable for adding indexes?

① The query field used as the query condition should be indexed ② The field with poor uniqueness is not suitable for creating an index alone, even if it is frequently used.

Select * from emp where sex='male'

③Update fields frequently and do not define indexes.
④ Do not create indexes for fields that will not appear in the where statement

Summary: Indexes should be created only for fields that meet the following conditions:

① It must be frequently used in the where condition ② The content of the field is not a few unique values ​​③ The field content does not change frequently

Notes on indexing

Create a table

Add dept data

create PROCEDURE insert_dept(in start int(10),in max_num int(10))
BEGIN
 declare i int DEFAULT 0;
 set autocommit=0;
 REPEAT
 set i=i+1;
 insert into dept values ​​((start+i),rand_string(10),rand_string(8));
 UNTIL i = max_num
 end REPEAT;
 commit;
END
Execute call insert_dept(100,10);

Create a primary key index

alter table table name add primary key (column name);

Create a joint index

alter table dept add index my_ind (dname,loc); // dname is the column on the left, loc is the column on the right

Notice:

1. For a created multi-column index, if the first part is not used, the index will not be created.
explain select * from dept where loc='aaa'\G
The index will not be used
2. Fuzzy search will fail if there is a percentage sign before "like".
3. If there is an or in the condition, it will not be used even if the condition has an index. In other words, all fields that are required to be used must be indexed. We recommend that you avoid using the or keyword as much as possible.
4. If the column type is a string, be sure to quote the data in the condition. Otherwise the index is not used. (When adding, the string must be ''), that is, if the column is a string type, it must be enclosed in ''.
5. If MySQL estimates that a full table scan is faster than an index, the index is not used.

Query the usage rate

show status like 'handler_read%';

Everyone can pay attention to:

handler_read_key: The higher the value, the better. A higher value indicates the number of times the index is used for query.
handler_read_rnd_next: The higher this value is, the less efficient the query is.

Readers who are interested in more MySQL-related content can check out the following topics on this site: "Summary of MySQL Index Operation Skills", "Summary of MySQL Common Functions", "Summary of MySQL Log Operation Skills", "Summary of MySQL Transaction Operation Skills", "Summary of MySQL Stored Procedure Skills" and "Summary of MySQL Database Lock-Related Skills".

I hope this article will be helpful to everyone's MySQL database design.

You may also be interested in:
  • Several methods to solve the problem of MySQL fuzzy query index failure
  • Detailed explanation of the difference between MySQL normal index and unique index
  • Detailed introduction to MySQL database index

<<:  Summary of the 10 most frequently asked questions in Linux interviews

>>:  JavaScript to achieve fireworks effects (object-oriented)

Recommend

How to optimize MySQL group by statement

In MySQL, create a new table with three fields, i...

Use Typescript configuration steps in Vue

Table of contents 1. TypeScript is introduced int...

A simple way to implement all functions of shopping cart in Vue

The main functions are as follows: Add product in...

How to restore a single database or table in MySQL and possible pitfalls

Preface: The most commonly used MySQL logical bac...

Solve the problem of mysql's int primary key self-increment

Introduction When we use the MySQL database, we a...

Detailed explanation of the difference between uniapp and vue

Table of contents 1. Simple page example 2.uni-ap...

JavaScript typing game

This article shares the specific code of JavaScri...

HTML tag default style arrangement

html, address,blockquote,body, dd, div,dl, dt, fie...

Detailed explanation of the use of Vue Smooth DnD, a draggable component of Vue

Table of contents Introduction and Demo API: Cont...

MySQL online log library migration example

Let me tell you about a recent case. A game log l...

Detailed explanation of the difference between $router and $route in Vue

We usually use routing in vue projects, and vue-r...

RHCE installs Apache and accesses IP with a browser

1. at is configured to write "This is a at t...

Detailed explanation of the calculation method of flex-grow and flex-shrink in flex layout

Flex(彈性布局) in CSS can flexibly control the layout...

Simple example of adding and removing HTML nodes

<br />Simple example of adding and removing ...

Detailed tutorial for installing mysql 8.0.12 under Windows

This article shares with you a detailed tutorial ...