A summary of the knowledge points of database indexing. Everything you need to know is here.

A summary of the knowledge points of database indexing. Everything you need to know is here.

I believe everyone is familiar with database index.

An index is a structure that sorts the values ​​of one or more columns in a database table. An index can be used to quickly access specific information in a database table. As a tool to assist in querying, a reasonably designed index can greatly reduce the query pressure on the db. As we all know, the db is the core and weakest part of the project. If the pressure is too great, it is easy to cause failures and cause unpredictable impacts. Therefore, whether it is daily development or interview, the knowledge system of indexing must be mastered.

Of course, although it is necessary to master it, there are many knowledge points about indexing, and many beginners often miss them. This is why I want to write this summary of knowledge points. It is not only a sharing with readers, but also a comprehensive review for myself. I hope it will be helpful to you.

Okay, without further ado, let’s get to the point.

First of all, let me state that all the knowledge points indexed in this article are based on the MySQL database

Pros and Cons of Indexes

advantage:

1. Greatly speed up data query

2. Unique index can ensure the uniqueness of each row in the database table

3. Acceleration table connection time

shortcoming:

1. Creating and maintaining indexes takes time, so the number of indexes should not be too large.

2. An index is a data structure that takes up disk space.

3. When updating the table, the index must also be dynamically maintained, which reduces the maintenance speed

Types of Indexes

The purpose of indexing is to improve query efficiency, but there are many ways to implement indexing, so the concept of index model is introduced here. Here we introduce three data structures commonly used for indexing, namely hash tables, ordered arrays, and search trees.

Hash Index

A hash table, also known as a hash table, is designed to use a hash function to map the key code to the location where the value is stored. When reading, the key code is used to find the location and store it directly. The average search complexity of this data structure is O(1).

For example, if we maintain a table of ID card information and user names, and need to query the name based on the ID card number, the hash index might look like this:

The advantage of this index structure is that it is efficient in randomly adding or deleting single elements. The disadvantage is that the elements in the hash table are not necessarily arranged in order, so if you want to do interval query, it will be very slow.

Suppose I want to find all users whose ID numbers are in the range [ID_card_n1, ID_card_n3] in the figure, I have to scan them all.

Therefore, the hash table structure is suitable for scenarios where only equal value queries are required.

Ordered array indexing

Ordered array indexes are very efficient in both equal value and interval query scenarios. Let's take the above figure as an example. If we use an ordered array to implement it, it would look like this:

The elements of the array are arranged in order according to the ID number. When you need to query the data, you can use the binary search method to get it quickly, and the time complexity is O(logN). Moreover, because it is arranged in order, querying the data within a certain range is also very fast.

Of course, the disadvantages of ordered arrays are also obvious. Just like ArrayList, although the search is fast, adding and deleting elements may require moving all the subsequent elements, which is a natural defect of the array. Therefore, ordered array indexes are only suitable for static storage engines. For example, if you want to save all the population information of a city in 2017, this type of data will not be modified again.

Search Tree Index

When it comes to search trees, the one we are most familiar with should be the binary search tree. The characteristic of a binary search tree is that the left son of each node is smaller than the parent node, the parent node is smaller than the right son, and the left and right subtrees are also binary search trees. The average time complexity is O(log2(n)).

It has the characteristics of fast insertion and deletion operations of linked lists and the advantages of fast search of arrays. At the same time, because the binary search tree itself is ordered, it also supports range search.

In fact, binary search tree seems to be a good choice for indexing, but it is not.

First of all, we must make it clear that this tree exists on the disk. We have to read the corresponding nodes from the disk every time. However, the nodes of the binary search tree are stored randomly in the file, so reading a node may require a disk IO. Binary search trees are relatively high. For example, a balanced binary tree with one million elements has more than ten layers in height. In other words, in most cases, retrieving data once requires more than ten disk IOs. This cost is too high, so binary search trees are generally not used as indexes.

In order to make a query read as little disk as possible, the query process must access as few data blocks as possible, that is, to make the height of the tree as low as possible, that is, to use a multi-way search tree. The InnoDB storage engine uses this multi-way search tree, which is what we often call a B+ tree.

InnoDB index structure

InnoDB is the most commonly used search engine in MySQL. Its underlying index structure uses the B+ tree, and all data is stored in the B+ tree. Each index corresponds to a B+ tree in InnoDB.

The characteristics of B+ tree are:

  • All leaf nodes contain information about all elements and pointers to records containing these elements, and the leaf nodes themselves are linked in ascending order according to the size of the keyword.
  • All intermediate node elements exist in child nodes at the same time and are the largest (or smallest) elements among the child node elements.

This structure has two advantages:

  • A single node can store more elements. Except for leaf nodes, other nodes only contain keys and do not store values. In this way, the height of the tree can be effectively reduced, thereby reducing the number of query IO times.
  • At the same time, because the leaf node contains a pointer to the next leaf node, if the first leaf node is found during a range query, the subsequent data can be queried based on the pointer without traversing from the root node. This is why many experts recommend that the primary key of the table be designed to be self-increasing, because this will improve the efficiency of range queries.

Index Classification

According to the structure, database indexes can be divided into clustered indexes and non-clustered indexes.

A clustered index, also called a clustered index, constructs a B+ tree according to the primary key of each table. At the same time, the leaf nodes store the row record data of the entire table. To put it simply, it is what we often call a primary key index. The index created on top of the clustered index is called a secondary index, and accessing data with the secondary index always requires a second search.

Non-clustered index, also called non-clustered index, secondary index. This type of index stores data and index separately, and the leaf nodes of the index structure point to the corresponding locations of the data.

Clustered index

InnoDB uses a clustered index to organize the primary key into a B+ tree, and the row data is stored in the leaf nodes. Let's assume a user table that contains the fields id, name, and company.

The picture shows the index structure of InnoDB as follows:

As can be seen from the figure, if we use the condition "where id = 14" to search for the primary key, we can find the corresponding leaf node according to the B+ tree retrieval algorithm and then obtain the row data.

If you perform a conditional search on the Name column, two steps are required: the first step is to retrieve Name in the auxiliary index B+ tree and reach its leaf node to obtain the corresponding primary key. The second step is to use the primary key to perform another B+ tree search operation in the primary index B+ tree, and finally reach the leaf node to obtain the entire row of data. (The key point is that secondary indexes need to be created through other keys)

This is the structure of a clustered index, and the representative of a non-clustered index is MyISM, which is also a common search engine in MySQL.

Nonclustered index

The two B+ trees of non-clustered indexes look no different. The node structures are exactly the same, but the stored contents are different. The nodes of the primary key index B+ tree store the primary key, and the nodes of the secondary key index B+ tree store the secondary key. The index itself does not store data. The data is stored in an independent place. The leaf nodes of these two B+ trees use an address to point to the actual table data.

It seems that the efficiency of non-clustered indexes is higher than that of clustered indexes because there is no need to check the B+ tree twice. Then why does the most commonly used InnoDB engine still use this storage structure? What are its advantages?

1. In a clustered index, since row data and leaf nodes are stored together, there will be multiple row data in the same page. When accessing different row records of the same data page, the page has been loaded into the buffer. When accessing it again, the access will be completed in the memory without accessing the disk. In this way, the primary key and row data are loaded into the memory together, and the row data can be returned immediately after the leaf node is found. Therefore, if the data is organized according to the primary key ID, the data can be obtained faster.

2. The benefit of using the primary key as a "pointer" instead of the address value as a pointer for the auxiliary index is that it reduces the maintenance work of the auxiliary index when rows are moved or data pages are split. Using the primary key value as a pointer will make the auxiliary index take up more space, but the benefit is that InnoDB does not need to update the "pointer" in the auxiliary index when moving rows. **That is to say, the position of the row (located by 16K Page in the implementation) will change as the data in the database is modified (the previous B+ tree node split and Page split). The use of clustered index can ensure that no matter how the nodes of the primary key B+ tree change, the auxiliary index tree will not be affected.

3. Clustered indexes are suitable for sorting and range queries, while non-clustered indexes are not suitable.

Covering Index

Speaking of auxiliary indexes, we can also extend another special index, which is the covering index.

As mentioned above, accessing data in a clustered index requires a secondary search, which means first finding the leaf node of the secondary key, obtaining the node corresponding to the primary key, and then using the primary key index to query the data. This is relatively slow. In fact, if the field we need can be obtained in the first search, there is no need to search the primary key a second time, that is, there is no need to "return to the table".

For example, the table above has three fields: id, name, and company. I added an index to name. When querying data, I write the following statement:

select name from user where name like '张%';

Because our statement is indexed, and the returned fields exist in the leaf nodes, the table will not be returned when querying, how great~~

Therefore, if the required field happens to be an index column, try to use this query method instead of using statements such as select * .

Index Type

The index classification mentioned above is based on structure. If classified by scope, indexes can also be divided into the following categories:

Normal index: This is the most basic index type, with no restrictions such as uniqueness.

CREATE INDEX INDEX_NAME ON TABLE_NAME(PROPERTY_NAME)

Unique index: It is basically the same as a normal index, but all index columns can only appear once to maintain uniqueness.

CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME(PROPERTY_NAME)

Primary key: Like a unique index, there cannot be duplicate columns, but in essence, a primary key is not an index, but a constraint and must be specified as a "PRIMARY KEY". It differs from a unique index in that:

  • After the primary key is created, it must contain a unique index, but the unique index is not necessarily the primary key.
  • Unique index columns allow null values, while primary key columns do not allow null values.
  • When the primary key column is created, it defaults to a null value + a unique index.
  • A primary key can be referenced by other tables as a foreign key, whereas a unique index cannot.
  • A table can have only one primary key, but can have multiple unique indexes.
  • The primary key is more suitable for unique identifiers that are not easily changed, such as auto-increment columns, ID numbers, etc.

Full-text index: The index type of a full-text index is FULLTEXT and can be created on a column of type VARCHAR or TEXT. In versions prior to MySQL 5.6, only the MyISAM storage engine supports full-text indexing. In versions 5.6 and later, both the MyISAM and InnoDB storage engines support full-text indexing.

CREATE FULLTEXT INDEX INDEX_NAME ON TABLE_NAME(PROPERTY_NAME)

Joint index: Joint index is not a type of index classification, but a common index that contains multiple fields. For example, if there is a joint index called index(a, b), you can use a and b as conditions when searching.

Leftmost matching principle

In a joint index, the leftmost index takes precedence, and any consecutive index starting from the leftmost index can be matched. At the same time, if a range query (>, <, between, like) is encountered, the matching will stop.

As mentioned above, index(a, b) or a alone as a query condition will use the index, but if b is used alone as a query condition, the index will not be used.

Or if you create an index in the order of (a, b, c, d), and use a = 1 and b = 2 and c > 3 and d = 4 to search, the index for d will not be used because the c field is a range query and the fields after it will stop matching.

When will the index become invalid?

1. Use functions or expressions in index columns, such as this

select * from test where num + 1 = 5

MySQL cannot solve this kind of equation. This is entirely the user's behavior. The index column should be treated as an independent column so that the index will take effect.

2. There is a NULL value condition

select * from user where user_id is not null;

When designing a database table, we should try our best to avoid NULL values. If the data is empty, we can give a default value, such as 0 or -1 for numeric types and an empty string for character types.

3. Use or expression as a condition. If one column has no index, the indexes of other columns will not work.

select * from user where user_id = 700 or user_name = "老薛";

In this case, if user_id is indexed but user_name is not, the index of user_id will be invalid during execution. This is why or should be used as little as possible during development, unless both fields are indexed.

4. Comparison between columns. In a table, two columns (id and c_id) have separate indexes. The following query conditions will not use the index:

select * from test where id = c_id;

5. Conversion of data types. If the column type is a string, the data must be quoted in the condition, otherwise the index will not be used.

create index `idx_user_name` ON user(user_name)
select * from user where user_name = 123;

In the above example, although an index is created for user_name, the condition is not treated as a string when querying, so the index will not be used.

6. NOT Condition

When the query condition is not, index positioning becomes difficult, and the execution plan may be more inclined to full table scan. Such query conditions include: <>, NOT, in, not exists

select * from user where user_id<>500;
select * from user where user_id in (1,2,3,4,5);
select * from user where user_id not in (6,7,8,9,0);
select * from user where user_id exists (select 1 from user_record where user_record.user_id = user.user_id);

7. Like query starts with %

When using fuzzy search, try to use the wildcard after the last name Zhang. For example, if you want to find people with the last name Zhang, you can use user_name like '張%' . In this way, when searching for the index, you can match the index column from the front. However, if it is user_name like '%張' , then the whole table will be scanned.

8. Multi-column index follows the leftmost matching principle, which is mentioned above

When to use indexes

As mentioned above, although indexes can speed up query speed, they also take up space. Therefore, the more indexes you create, the better. In order to effectively apply indexes, we should reserve indexes for the most useful query fields. Generally speaking, indexes should be created on these fields:

  • Primary key field, needless to say;
  • Columns that are frequently searched, such as fields that are frequently used in where conditions;
  • The foreign key fields of other tables can be used as condition fields for joining tables to effectively speed up the query of joined tables.
  • Fields used for sorting, statistics, or grouping in queries;

Likewise, some columns should not be indexed. These columns include

  • Frequently updated fields are not suitable for creating indexes, because each update not only updates the record, but also updates the index and saves the index file.
  • No index is created for fields not used in the where condition;
  • There are too few records in the table, so there is no need to create an index;
  • Indexes should not be added to columns defined as text or image types. This is because the data volume of these columns is either quite large or has very few values, which is not conducive to the use of indexes;
  • Fields with repeated and evenly distributed data, so indexes are created for frequently queried and frequently sorted fields. Note that some data contains a lot of duplicate data, so indexing this field will not be very effective. For example, the gender field only has male and female data, so it is not suitable for indexing.

explain keyword

explain is a MySQL keyword, through which we can view the performance of the search statement.

This is the number of query tables, with a total of more than 30 million rows. With so much data, we must use indexes when searching. As for whether the index will take effect, we can also use this keyword to see

Look, the number of search results dropped to 16 instantly, and the index used was index_user_id , proving that our index is effective.

It is necessary to understand several important parameters of explain:

id: the serial number of the query

select_type: The type of query, which mainly distinguishes between ordinary queries and complex queries such as union queries and subqueries.

type:

Type shows the access type, which is a more important indicator. The result values ​​are from good to bad:

system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

System is the most efficient, and ALL is a full table scan. Generally speaking, the query must reach at least the range level.

key:

Shows the key that MySQL actually decided to use. If no index is chosen, key is NULL.

If key=primary, it means the primary key is used;

key=null means no index is used.
possible_keys:

Indicates which index MySQL can use to find rows in this table. If empty, there is no associated index. At this time, check whether there is anything in the statement that causes the index to fail.

rows:

Indicates the estimated number of rows scanned in the execution plan, which is an estimated value.

Extra:

If it is Only index, it means that the information is retrieved using only the information in the index tree, which is faster than scanning the entire table.

If it is where used, the where restriction is used.

If it is impossible where, it means there is no need for where, which usually means nothing was found.

The appearance of using index means that our index is effective.

Summarize

Well, that’s all the knowledge points about indexes. Finally, let’s summarize the precautions for indexes.

1. Indexes should be created based on the usage of table data. Do not create too many indexes. Generally, it is not recommended to have more than 6 index fields in a table.

2. A good knife should be used where it is needed most. It is often used for queries. There is not much duplicate data. The index is better for fields where the number of search rows does not exceed 4% of the table data volume.

3. When creating a joint index, pay attention to the leftmost matching principle. Remember, the leftmost field is a required field. I have suffered a great loss in this regard.

4. Use explain execution plan to check the performance of query statements.

refer to:

https://www.jianshu.com/p/fa8192853184

MySQL Practice 45 Lectures

at last

Although they are all basic knowledge, it took me a day to organize them. With more than 5,000 words, it can be regarded as a solid article. If you readers feel that you have gained something, I hope you can give me a forwarding or like. I don’t ask for four likes, but I will be satisfied with two or one likes. Your little effort is the motivation for my continuous creation!

This is the end of this article's summary of knowledge points about database indexes. All you need to know is here. For more relevant database index knowledge points, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed introduction to the creation and use of indexes in Oracle database
  • MySQL database optimization: index implementation principle and usage analysis
  • Mysql database advanced usage of views, transactions, indexes, self-connections, user management example analysis
  • How to customize the order in which Django models create database indexes
  • Database index knowledge points summary

<<:  js implements a simple English-Chinese dictionary

>>:  js to realize a simple advertising window

Recommend

MySQL EXPLAIN statement usage examples

Table of contents 1. Usage 2. Output results 1.id...

MySQL 8.0.18 installation tutorial under Windows (illustration)

Download Download address: https://dev.mysql.com/...

Why do we need Map when we already have Object in JavaScript?

Table of contents 1. Don’t treat objects as Maps ...

Summary of situations where MySQL indexes will not be used

Types of Indexes in MySQL Generally, they can be ...

Django online deployment method of Apache

environment: 1. Windows Server 2016 Datacenter 64...

Implementing simple chat room dialogue based on websocket

This article shares the specific code for impleme...

Tomcat Server Getting Started Super Detailed Tutorial

Table of contents 1. Some concepts of Tomcat –1, ...

Let's talk about what JavaScript's URL object is

Table of contents Overview Hash Properties Host p...

Axios secondary encapsulation example Demo in the project

1. Why do packaging? Facilitates overall code cal...

Docker data storage tmpfs mounts detailed explanation

Before reading this article, I hope you have a ba...

Flex layout realizes the layout mode of upper and lower fixed and middle sliding

This article mainly introduces the layout method ...

Use of js optional chaining operator

Preface The optional chaining operator (?.) allow...