I believe everyone is familiar with database index. An index is a structure that sorts the values of one or more columns in a database table. An index can be used to quickly access specific information in a database table. As a tool to assist in querying, a reasonably designed index can greatly reduce the query pressure on the db. As we all know, the db is the core and weakest part of the project. If the pressure is too great, it is easy to cause failures and cause unpredictable impacts. Therefore, whether it is daily development or interview, the knowledge system of indexing must be mastered. Of course, although it is necessary to master it, there are many knowledge points about indexing, and many beginners often miss them. This is why I want to write this summary of knowledge points. It is not only a sharing with readers, but also a comprehensive review for myself. I hope it will be helpful to you. Okay, without further ado, let’s get to the point.
Pros and Cons of Indexes advantage: 1. Greatly speed up data query 2. Unique index can ensure the uniqueness of each row in the database table 3. Acceleration table connection time shortcoming: 1. Creating and maintaining indexes takes time, so the number of indexes should not be too large. 2. An index is a data structure that takes up disk space. 3. When updating the table, the index must also be dynamically maintained, which reduces the maintenance speed Types of Indexes The purpose of indexing is to improve query efficiency, but there are many ways to implement indexing, so the concept of index model is introduced here. Here we introduce three data structures commonly used for indexing, namely hash tables, ordered arrays, and search trees. Hash Index A hash table, also known as a hash table, is designed to use a hash function to map the key code to the location where the value is stored. When reading, the key code is used to find the location and store it directly. The average search complexity of this data structure is O(1). For example, if we maintain a table of ID card information and user names, and need to query the name based on the ID card number, the hash index might look like this: The advantage of this index structure is that it is efficient in randomly adding or deleting single elements. The disadvantage is that the elements in the hash table are not necessarily arranged in order, so if you want to do interval query, it will be very slow. Suppose I want to find all users whose ID numbers are in the range [ID_card_n1, ID_card_n3] in the figure, I have to scan them all. Therefore, the hash table structure is suitable for scenarios where only equal value queries are required. Ordered array indexing Ordered array indexes are very efficient in both equal value and interval query scenarios. Let's take the above figure as an example. If we use an ordered array to implement it, it would look like this: The elements of the array are arranged in order according to the ID number. When you need to query the data, you can use the binary search method to get it quickly, and the time complexity is O(logN). Moreover, because it is arranged in order, querying the data within a certain range is also very fast. Of course, the disadvantages of ordered arrays are also obvious. Just like ArrayList, although the search is fast, adding and deleting elements may require moving all the subsequent elements, which is a natural defect of the array. Therefore, ordered array indexes are only suitable for static storage engines. For example, if you want to save all the population information of a city in 2017, this type of data will not be modified again. Search Tree Index When it comes to search trees, the one we are most familiar with should be the binary search tree. The characteristic of a binary search tree is that the left son of each node is smaller than the parent node, the parent node is smaller than the right son, and the left and right subtrees are also binary search trees. The average time complexity is O(log2(n)). It has the characteristics of fast insertion and deletion operations of linked lists and the advantages of fast search of arrays. At the same time, because the binary search tree itself is ordered, it also supports range search. In fact, binary search tree seems to be a good choice for indexing, but it is not. First of all, we must make it clear that this tree exists on the disk. We have to read the corresponding nodes from the disk every time. However, the nodes of the binary search tree are stored randomly in the file, so reading a node may require a disk IO. Binary search trees are relatively high. For example, a balanced binary tree with one million elements has more than ten layers in height. In other words, in most cases, retrieving data once requires more than ten disk IOs. This cost is too high, so binary search trees are generally not used as indexes. In order to make a query read as little disk as possible, the query process must access as few data blocks as possible, that is, to make the height of the tree as low as possible, that is, to use a multi-way search tree. The InnoDB storage engine uses this multi-way search tree, which is what we often call a B+ tree. InnoDB index structure InnoDB is the most commonly used search engine in MySQL. Its underlying index structure uses the B+ tree, and all data is stored in the B+ tree. Each index corresponds to a B+ tree in InnoDB. The characteristics of B+ tree are:
This structure has two advantages:
Index Classification According to the structure, database indexes can be divided into clustered indexes and non-clustered indexes. A clustered index, also called a clustered index, constructs a B+ tree according to the primary key of each table. At the same time, the leaf nodes store the row record data of the entire table. To put it simply, it is what we often call a primary key index. The index created on top of the clustered index is called a secondary index, and accessing data with the secondary index always requires a second search. Non-clustered index, also called non-clustered index, secondary index. This type of index stores data and index separately, and the leaf nodes of the index structure point to the corresponding locations of the data. Clustered index InnoDB uses a clustered index to organize the primary key into a B+ tree, and the row data is stored in the leaf nodes. Let's assume a user table that contains the fields id, name, and company. The picture shows the index structure of InnoDB as follows: As can be seen from the figure, if we use the condition "where id = 14" to search for the primary key, we can find the corresponding leaf node according to the B+ tree retrieval algorithm and then obtain the row data. If you perform a conditional search on the Name column, two steps are required: the first step is to retrieve Name in the auxiliary index B+ tree and reach its leaf node to obtain the corresponding primary key. The second step is to use the primary key to perform another B+ tree search operation in the primary index B+ tree, and finally reach the leaf node to obtain the entire row of data. (The key point is that secondary indexes need to be created through other keys) This is the structure of a clustered index, and the representative of a non-clustered index is MyISM, which is also a common search engine in MySQL. Nonclustered index The two B+ trees of non-clustered indexes look no different. The node structures are exactly the same, but the stored contents are different. The nodes of the primary key index B+ tree store the primary key, and the nodes of the secondary key index B+ tree store the secondary key. The index itself does not store data. The data is stored in an independent place. The leaf nodes of these two B+ trees use an address to point to the actual table data. It seems that the efficiency of non-clustered indexes is higher than that of clustered indexes because there is no need to check the B+ tree twice. Then why does the most commonly used InnoDB engine still use this storage structure? What are its advantages? 1. In a clustered index, since row data and leaf nodes are stored together, there will be multiple row data in the same page. When accessing different row records of the same data page, the page has been loaded into the buffer. When accessing it again, the access will be completed in the memory without accessing the disk. In this way, the primary key and row data are loaded into the memory together, and the row data can be returned immediately after the leaf node is found. Therefore, if the data is organized according to the primary key ID, the data can be obtained faster. 2. The benefit of using the primary key as a "pointer" instead of the address value as a pointer for the auxiliary index is that it reduces the maintenance work of the auxiliary index when rows are moved or data pages are split. Using the primary key value as a pointer will make the auxiliary index take up more space, but the benefit is that InnoDB does not need to update the "pointer" in the auxiliary index when moving rows. **That is to say, the position of the row (located by 16K Page in the implementation) will change as the data in the database is modified (the previous B+ tree node split and Page split). The use of clustered index can ensure that no matter how the nodes of the primary key B+ tree change, the auxiliary index tree will not be affected. 3. Clustered indexes are suitable for sorting and range queries, while non-clustered indexes are not suitable. Covering Index Speaking of auxiliary indexes, we can also extend another special index, which is the covering index. As mentioned above, accessing data in a clustered index requires a secondary search, which means first finding the leaf node of the secondary key, obtaining the node corresponding to the primary key, and then using the primary key index to query the data. This is relatively slow. In fact, if the field we need can be obtained in the first search, there is no need to search the primary key a second time, that is, there is no need to "return to the table". For example, the table above has three fields: id, name, and company. I added an index to name. When querying data, I write the following statement: select name from user where name like '张%'; Because our statement is indexed, and the returned fields exist in the leaf nodes, the table will not be returned when querying, how great~~ Therefore, if the required field happens to be an index column, try to use this query method instead of using statements such as Index Type The index classification mentioned above is based on structure. If classified by scope, indexes can also be divided into the following categories: Normal index: This is the most basic index type, with no restrictions such as uniqueness. CREATE INDEX INDEX_NAME ON TABLE_NAME(PROPERTY_NAME) Unique index: It is basically the same as a normal index, but all index columns can only appear once to maintain uniqueness. CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME(PROPERTY_NAME) Primary key: Like a unique index, there cannot be duplicate columns, but in essence, a primary key is not an index, but a constraint and must be specified as a "PRIMARY KEY". It differs from a unique index in that:
Full-text index: The index type of a full-text index is FULLTEXT and can be created on a column of type VARCHAR or TEXT. In versions prior to MySQL 5.6, only the MyISAM storage engine supports full-text indexing. In versions 5.6 and later, both the MyISAM and InnoDB storage engines support full-text indexing. CREATE FULLTEXT INDEX INDEX_NAME ON TABLE_NAME(PROPERTY_NAME) Joint index: Joint index is not a type of index classification, but a common index that contains multiple fields. For example, if there is a joint index called index(a, b), you can use Leftmost matching principle In a joint index, the leftmost index takes precedence, and any consecutive index starting from the leftmost index can be matched. At the same time, if a range query (>, <, between, like) is encountered, the matching will stop. As mentioned above, index(a, b) or a alone as a query condition will use the index, but if b is used alone as a query condition, the index will not be used. Or if you create an index in the order of (a, b, c, d), and use a = 1 and b = 2 and c > 3 and d = 4 to search, the index for d will not be used because the c field is a range query and the fields after it will stop matching. When will the index become invalid? 1. Use functions or expressions in index columns, such as this select * from test where num + 1 = 5 MySQL cannot solve this kind of equation. This is entirely the user's behavior. The index column should be treated as an independent column so that the index will take effect. 2. There is a NULL value condition select * from user where user_id is not null; When designing a database table, we should try our best to avoid NULL values. If the data is empty, we can give a default value, such as 0 or -1 for numeric types and an empty string for character types. 3. Use or expression as a condition. If one column has no index, the indexes of other columns will not work. select * from user where user_id = 700 or user_name = "老薛"; In this case, if user_id is indexed but user_name is not, the index of user_id will be invalid during execution. This is why or should be used as little as possible during development, unless both fields are indexed. 4. Comparison between columns. In a table, two columns (id and c_id) have separate indexes. The following query conditions will not use the index: select * from test where id = c_id; 5. Conversion of data types. If the column type is a string, the data must be quoted in the condition, otherwise the index will not be used. create index `idx_user_name` ON user(user_name) select * from user where user_name = 123; In the above example, although an index is created for user_name, the condition is not treated as a string when querying, so the index will not be used. 6. NOT Condition When the query condition is not, index positioning becomes difficult, and the execution plan may be more inclined to full table scan. Such query conditions include: <>, NOT, in, not exists select * from user where user_id<>500; select * from user where user_id in (1,2,3,4,5); select * from user where user_id not in (6,7,8,9,0); select * from user where user_id exists (select 1 from user_record where user_record.user_id = user.user_id); 7. Like query starts with % When using fuzzy search, try to use the wildcard after the last name Zhang. For example, if you want to find people with the last name Zhang, you can use 8. Multi-column index follows the leftmost matching principle, which is mentioned above When to use indexes As mentioned above, although indexes can speed up query speed, they also take up space. Therefore, the more indexes you create, the better. In order to effectively apply indexes, we should reserve indexes for the most useful query fields. Generally speaking, indexes should be created on these fields:
Likewise, some columns should not be indexed. These columns include
explain keyword explain is a MySQL keyword, through which we can view the performance of the search statement. This is the number of query tables, with a total of more than 30 million rows. With so much data, we must use indexes when searching. As for whether the index will take effect, we can also use this keyword to see Look, the number of search results dropped to 16 instantly, and the index used was It is necessary to understand several important parameters of explain: id: the serial number of the query select_type: The type of query, which mainly distinguishes between ordinary queries and complex queries such as union queries and subqueries. type: Type shows the access type, which is a more important indicator. The result values are from good to bad: system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL System is the most efficient, and ALL is a full table scan. Generally speaking, the query must reach at least the range level. key: Shows the key that MySQL actually decided to use. If no index is chosen, key is NULL. If key=primary, it means the primary key is used; key=null means no index is used. Indicates which index MySQL can use to find rows in this table. If empty, there is no associated index. At this time, check whether there is anything in the statement that causes the index to fail. rows: Indicates the estimated number of rows scanned in the execution plan, which is an estimated value. Extra: If it is Only index, it means that the information is retrieved using only the information in the index tree, which is faster than scanning the entire table. If it is where used, the where restriction is used. If it is impossible where, it means there is no need for where, which usually means nothing was found. The appearance of using index means that our index is effective. Summarize Well, that’s all the knowledge points about indexes. Finally, let’s summarize the precautions for indexes. 1. Indexes should be created based on the usage of table data. Do not create too many indexes. Generally, it is not recommended to have more than 6 index fields in a table. 2. A good knife should be used where it is needed most. It is often used for queries. There is not much duplicate data. The index is better for fields where the number of search rows does not exceed 4% of the table data volume. 3. When creating a joint index, pay attention to the leftmost matching principle. Remember, the leftmost field is a required field. I have suffered a great loss in this regard. 4. Use explain execution plan to check the performance of query statements. refer to: https://www.jianshu.com/p/fa8192853184 MySQL Practice 45 Lectures at last Although they are all basic knowledge, it took me a day to organize them. With more than 5,000 words, it can be regarded as a solid article. If you readers feel that you have gained something, I hope you can give me a forwarding or like. I don’t ask for four likes, but I will be satisfied with two or one likes. Your little effort is the motivation for my continuous creation! This is the end of this article's summary of knowledge points about database indexes. All you need to know is here. For more relevant database index knowledge points, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: js implements a simple English-Chinese dictionary
>>: js to realize a simple advertising window
Related articles: Install Docker using yum under ...
Click on the anchor link to scroll smoothly and a...
Table of contents 1. Usage 2. Output results 1.id...
Download Download address: https://dev.mysql.com/...
Table of contents 1. Don’t treat objects as Maps ...
1. <select style="width:195px" name=&...
Types of Indexes in MySQL Generally, they can be ...
environment: 1. Windows Server 2016 Datacenter 64...
This article shares the specific code for impleme...
Table of contents 1. Some concepts of Tomcat –1, ...
Table of contents Overview Hash Properties Host p...
1. Why do packaging? Facilitates overall code cal...
Before reading this article, I hope you have a ba...
This article mainly introduces the layout method ...
Preface The optional chaining operator (?.) allow...