What are your principles for designing indexes? How to avoid index failure?

What are your principles for designing indexes? How to avoid index failure?

We have previously introduced in detail the principles of indexing and index query. As the saying goes, if you want to do your work well, you must first sharpen your tools. During the learning stage, you must learn this knowledge step by step. Don't be too ambitious. You must be patient and strive to master each knowledge point at once after learning it, and then apply it.

We have discussed in the previous article that the design of the index should be based on WHERE condition and the fields after ORDER BY and GROUP BY The specific reasons are introduced in detail in my previous article The Principles of MySQL Indexes. Here we briefly outline it.

MySQL maintains a B+ tree structure for the primary key index, which we call a clustered index. For non-primary keys (generally joint indexes), the index fields are sorted in sequence, and then the comparison starts from the first field value. If the first field value is the same, the next field value is compared, and so on.

If the field values ​​in the joint index are the same, they are sorted according to the primary key. In addition, the B+ tree of the clustered index (primary key index) stores all the information of a row of records, while the non-clustered index (non-primary key index) only stores the index field values ​​and the primary key field values.

Well, that’s all we have to say about the review of index principles. In this article, we will continue to introduce the basic principles of MySQL settings. This is also easy to understand. It is about what principles need to be followed when designing and establishing indexes, and establishing indexes according to the “standards”. Today we will explain all the principles of index design at once.

Let me say a few more words about this knowledge point. During interviews, I often ask candidates this question to determine whether they really understand the index, rather than simply reciting jargon!

Primary key index

The primary key index is actually the simplest, but there are some things to pay attention to here that I would like to explain again.

When designing the primary key, it must be auto-incremental. It is strongly recommended not to use UUID as the primary key.

Why? Because UUID is unordered, MySQL sorts the data in the order of the primary key when maintaining the clustered index. That is to say, the data in each data page must be sorted from small to large according to the primary key. Moreover, the data is connected to each other through a one-way linked list. The value of the largest primary key in the previous data page must be smaller than the value of the smallest primary key in the next data page. The data pages are maintained through a two-way linked list.

We still follow the old rules and draw a picture to help everyone understand

If the primary key is auto-incrementing, MySQL only needs to use the primary key directory to quickly locate where the new record should be inserted. If the primary key is not auto-incrementing, it needs to start from the beginning each time, find the right position, and then insert the record. This will seriously affect efficiency, so the primary key must be designed to be auto-incrementing.

In addition, the unique index is similar to the primary key index, but the unique index is not necessarily self-increasing, so the cost of maintaining the unique index is definitely greater than the primary key index.

However, the value of a unique index is unique (a unique index can have a value of NULL), so you can determine a record more quickly through the index field, but you may need to perform a table query (I will not go into details about what a table query is, as it has been explained in detail in the previous article).

Create indexes for frequently queried fields

When we create indexes, we need to create indexes for fields that are often used as query conditions, which can improve the query speed of the entire table.

However, the query condition is usually not a single field, so more joint indexes are usually created.

In addition, there are usually fuzzy queries such as "like" in the query conditions. If it is a fuzzy query, it is best to follow the leftmost prefix query principle.

Avoid indexing large fields

This can be said in other words: try to use fields with small data volumes as indexes.

For example, suppose there are two such fields, one is varchar(5) and the other is varchar(200) . In this case, it is preferred to create an index for varchar(5) field, because MySQL will maintain the field values ​​together when maintaining the index. This will inevitably cause the index to take up more space, and it will take more time to compare when sorting.

What if you want to create an index for varchar(100) ? Then take some data, for example, if address type is varchar(200) , you can write it like this when creating the index:

CREATE INDEX tbl_address ON dual(address(20));

Choose a column with high discrimination as the index

What does this mean? I believe everyone will understand it immediately if I give you an example.

Suppose there is a "gender" field, and the value of the data stored in it is either male or female, then such a field is not suitable as an index.

The main feature of the value of such a field is that the discrimination is not high enough, and fields with low discrimination are not suitable for indexing. Why?

Because if the probability of a value appearing is almost equal, then no matter which value you search for, you are likely to get half the data.

In these cases, it is better not to have an index, because MySQL also has a query optimizer. When the query optimizer finds that a certain value appears in a high percentage of the data rows in the table, it generally ignores the index and performs a full table scan.

The customary percentage cutoff is "30%". (When the amount of matching data exceeds a certain limit, the query will give up using the index (this is also one of the scenarios where the index fails).

Here's why. So after reading this, I believe everyone should know why we should try to avoid using fields with small cardinality as indexes. In fact, this involves a MySQL term [Cardinality (index cardinality) is a very important concept of MySQL index]

Try to create indexes for fields following ORDER BY and GROUP BY

Create an index for the field after Order By , so that you don’t need to sort again when querying, because we already know that the records in the B+ tree are sorted after the index is created.

GROUP BY 和ORDER BY are actually similar, so I will discuss them together.

Because when GROUP BY is used, the fields following GROUP BY must be sorted first, and then the aggregation operation is performed.

If the fields after GROUP BY are not sorted, MySQL needs to sort them first, which will generate a temporary table, a sorted temporary table, and then perform aggregation operations in the temporary table. This is of course very inefficient. If the fields after GROUP BY have been indexed, MySQL does not need to sort them again, and no temporary table will be generated.

However, the tricky part is that if GROUP BY column is different from ORDER BY column, a temporary table will be generated even if both have indexes. In fact, I searched online and there seem to be many such cases. I will list them here. To be honest, although these are standards, this standard seems difficult to implement, because the actual scenario is definitely not so simple and pure.

1. If the GROUP BY column has no index, a temporary table is created.
2. If there is more than one GROUP BY column in the SELECT statement during GROUP BY, and the GROUP BY column is not the primary key, a temporary table is generated.
3. If the GROUP BY column has an index and the ORDER BY column does not, a temporary table is created.
4. If the GROUP BY column is different from the ORDER BY column, a temporary table will be generated even if both have indexes.
5. If the column in GROUP BY or ORDER BY does not come from the first table in the JOIN statement, a temporary table will be generated.
6. If the DISTINCT and ORDER BY columns have no index, a temporary table is created.
7. If the columns in GROUP BY and ORDER BY are the same and are primary keys, but the SELECT column contains columns other than GROUP BY columns, a temporary table will also be generated.

Don't use functions in conditionals

If a function operation is performed on a field of an established index, the index cannot be used.

why is that?

Because the B+ tree that MySQL maintains for the index is based on the original data of the field. If a function is added during use, MySQL will not consider this to be the original field, and will certainly not use the index.

But if someone is stubborn, what should I do if I want to use the function? You can't change your business just for the sake of indexing, right? If the index is invalid due to the use of MySQL internal functions, you can create the function together when creating the index.

What does this mean? Suppose there is a field called age and an index is created for it, but when it is used, it is like this

SELECT * FROM student WHERE round(age) = 2;

The index is not used at this time. If you really want round(age) to be indexed, you can create an index like this:

create index stu_age_round on test(round(age));

At this time, when you query using the above method, the index will take effect. I believe everyone can understand this.

Don’t create too many indexes

Because MySQL requires space and consumes performance to maintain indexes, MySQL maintains a B+ tree for each index field.

So if there are too many indexes, this will undoubtedly increase the burden on MySQL.

Do not create indexes for fields that are frequently added, deleted, or modified.

This is easy to understand, because as we have already introduced before, MySQL needs to re-maintain the index when the field changes.

Assuming that a field is frequently modified, it means that the index needs to be rebuilt frequently, which will inevitably affect the performance of MySQL. I won’t say more here.

Most of what we have talked about here are some principles that need to be paid attention to when designing. In fact, the real principles still need to be changed according to the actual business. There is no so-called "formula". The design that suits your actual business scenario is the best. Therefore, don’t pursue “optimization” too much, because this will often backfire. After all, talking about technology without considering the business is just being a hooligan.

Okay, let's take a closer look at the situations in which the index will fail. (PS: This article is basically all theory. I wanted to draw a picture to express it, but I found that I couldn’t start at all. I hope everyone can stick with it. It will be finished soon.)

Common scenarios for index failure

  1. Using the OR keyword will cause the index to become invalid. However, if you want to use OR and do not want the index to become invalid, you need to create an index for each column in or condition. This is obviously contrary to the above advice of not creating too many indexes.
  2. If the joint index does not follow the leftmost prefix principle, the index will also be invalid.
  3. When using fuzzy query, starting with % will also cause the index to fail (I will not repeat the reason here, because the previous article has mentioned it, this is to help you recall it again)
  4. If the index column uses implicit conversion, the index will also become invalid.

Assuming the field age type is int, we usually query like this

SELECT * FROM student WHERE age=15

The above situation can use the index, but if you write

SELECT * FROM student WHERE age='15'

In this case, the index cannot be used, that is, the index of age column is invalid.

If the field cardinality is small, the index may also fail, which has been explained in detail in the previous part of this article. This is caused by MySQL query optimizer.

For other principles, please read the principles of indexing and basic principles of query. Without the previous preparation, these may seem a bit empty. Therefore, please learn the index step by step. This is basically the core knowledge point when we use MySQL .

The above is the detailed content of What are the principles of your index design? How to avoid index failure? For more information on the principles of index design, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • How to design and optimize MySQL indexes
  • A brief discussion on MySQL index design principles and the differences between common indexes
  • In-depth explanation of the impact of NULL on indexes in MySQL
  • Share some key interview questions about MySQL index

<<:  Vue realizes the palace grid rotation lottery

>>:  Deploy Confluence with Docker

Recommend

HTML table tag tutorial (23): row border color attribute BORDERCOLORDARK

In rows, dark border colors can be defined indivi...

Nginx installation detailed tutorial

1. Brief Introduction of Nginx Nginx is a free, o...

jQuery plugin to achieve image comparison

This article example shares the specific code of ...

Summary of common functions and usage methods of WeChat applet development

Here, I have mainly sorted out some commonly used...

Nginx URL rewriting mechanism principle and usage examples

URL rewriting helps determine the preferred domai...

Centos8 (minimum installation) tutorial on how to install Python3.8+pip

After minimizing the installation of Python8, I i...

Problems and solutions encountered when installing mininet on Ubuntu 16.04.4LTS

Mininet Mininet is a lightweight software defined...

Mysql date formatting and complex date range query

Table of contents Preface Query usage scenario ca...

Detailed explanation of the spacing problem between img tags

IMG tag basic analysis In HTML5, the img tag has ...

Simplify complex website navigation

<br />Navigation design is one of the main t...