The easiest way to understand the concept of indexing is through an example. The following is such an example. Suppose we need to design an online dating website whose user profiles have many columns, such as nationality, province, city, gender, age, eye color, etc. The site must support searching for user profiles in multiple combinations. At the same time, it is also necessary to support sorting and returning limited results based on the user's recent online time and other users' comments, etc. How do we design indexes for such complex scenarios? A bit strange, the first thing to do is to decide whether we must use the index sort, or whether sorting after retrieval is acceptable. Index ordering constrains the way indexes and queries are built. For example, we cannot use the same index for a query like WHERE age BETWEEN 18 AND 25 and a sort based on other user reviews. If MySQL uses one index for a range query, it cannot use another index for a sort. Assume that this is the most commonly used WHERE condition, and we also need to support sorting for most queries. Supports multiple types of filteringNow we need to see which columns have more dispersed values and which columns appear most frequently in the WHERE condition. The filtering performance is good for data columns with dispersed values. This is usually a good thing, because it allows MySQL to efficiently filter out irrelevant rows. The nationality column may not be as filterable, but it is likely to be the most frequently searched. The gender column is usually not filterable, but is often used in queries. Based on this understanding, we created a series of indexes for many different combinations of columns, and these indexes began with (sex, country). The traditional perception is that it is useless to build indexes on columns with low filtering properties. So why do we add non-filterable columns at the beginning of each index? We have two reasons for doing this. The first reason is that, as mentioned earlier, gender is used in almost every query. We even designed it so that users can only search for one gender at a time. But more importantly, there isn’t much downside to adding such a column, because we use a little trick. Here’s our trick: even without restricting the query to gender, we can ensure that the index takes effect by adding AND sex IN('m', 'f') to the WHERE clause. This does not filter out the rows we want, so it has the same effect as not including gender in the WHERE clause. However, because MySQL will prepend this column to any index with more columns, we need to include this column. This trick works in this scenario, but it doesn't work if the column has many different values, because it would result in too many columns in the IN(). This example illustrates a basic principle: keep all options open in your data table design. When you design indexes, don't just think about which index is best for which query, also think about optimizing the query. When you need an index but find that other queries might be affected by it, you should first ask yourself whether you can change the queries. You should optimize both queries and indexes to find the solution. You don't necessarily need to design a perfect index. Next, we need to think about other possible combinations of WHERE conditions, and then consider which of those combinations will be slow without proper indexing. An index like (sex, country, age) is the obvious choice, but we might also want indexes like (sex, country, region, age) and (sex, country, region, city, age). This will result in the need to create a lot of indexes. If we can reuse indexes, we won't have too many combinations. We can use the IN() trick to remove the (sex, country, age) and (sex, country, region, age) indexes. If these columns are not specified in the search form, we can use the list of countries and the list of regions to ensure that the constraints in the index prepend are met (there may be many combinations of all countries, all regions and all genders). These indexes will satisfy most given search queries, but how do we design for less common filters such as uploaded pictures (has_pictures), eye color (eye_color), hair color (hair_color), and education level (education)? If these columns are not so selective and not so commonly used, we can just skip them and let MySQL scan some extra rows. Accordingly, we can add them before the age column, and use the IN() trick to prepend the description to handle the case where these columns are not specified. You may have noticed that we put age at the end of the index. Why is this column treated specially? We try to ensure that MySQL can take advantage of index columns as much as possible. Because MySQL uses the leftmost matching rule until it encounters the first range query condition. All the columns we mentioned can be used in the WHERE clause for equality conditions, but age is most likely to be a range query. We could also change the range query to a list using an IN query, such as age IN(18, 19, 20, 21, 22, 23, 24, 25) instead of age BETWEEN 18 AND 25, but this is not always possible. The general principle is that we try to put the range decision conditions at the end of the index, so the optimizer will use the index as much as possible. We mentioned that you can use IN queries with as many columns as necessary to cover index conditions that are not specified in the WHERE clause. But you can overdo it and create new problems. Using more such IN query lists causes the optimizer to evaluate a larger number of combinations, which may in turn slow down the query. Consider the following query: WHERE eye_color IN('brown', 'blue', 'hazel') AND hair_color IN('black', 'red', 'blonde', 'brown') AND sex IN('M', 'F') The optimizer will convert this into 432 = 24 combinations, and the WHERE condition will check each case. 24 is not yet a very large number of combinations, but if the number reaches several thousand it will be. Older versions of MySQL may have more problems with large numbers in IN queries. The query optimizer will execute more slowly and consume more memory. Newer versions of MySQL stop evaluating when there are too many combinations, but this affects MySQL's ability to use indexes. Avoid multiple range queriesLet's assume there is a last_online column, and we need to show users who were online in the past week: WHERE eye_color IN('brown', 'blue', 'hazel') AND hair_color IN('black', 'red', 'blonde', 'brown') AND sex IN('M', 'F') AND last_online > DATE_SUB(NOW(), INTERVAL 7 DAY) AND age BETWEEN 18 AND 25 The problem with this query is that it has two range queries. MySQL can use either the last_online or the age condition, but not both. If the last_online constraint appears without an age constraint, or if last_online is more selective than age, we might want to add another set of indexes, putting last_online last. But what if we can't convert age to an IN query, and we also want to be able to improve the query speed when there are both last_oinline and age range queries? At this time, we have no direct method. But we can convert a range into an equality comparison. To do this, we add a precomputed active column that we maintain periodically. If the user logs in, we mark it as 1, and if he does not log in continuously within 7 days, we mark it back to 0. This method allows MySQL to use an index such as (active, sex, country, age). This column may not be that precise, but this type of query may not require very high precision. If we need precise search, we can keep last_online in the WHERE condition but not add an index. This technique is similar to the case of URL lookup. This condition will not use any indexes, since it is unlikely to filter out rows that would be hit by the index. Adding indexes may not necessarily benefit queries. Now, you can see the pattern: If the user wants to find both active and inactive results, we can use an IN query. We added a lot of these list queries, and a workaround is to create separate indexes for each combination of queries. For example, we could have indexes on: (active, sex, country, age), (active, country, age), (sex, country, age), and (country, age). While such an index may be a better choice for certain queries, the negative impact of maintaining these combinations and the additional storage space required for the combinations may make this a weak strategy. This is a case where a change to the optimizer can really impact index optimization. If index scans can really be dropped in future versions of MySQL, it might be possible to use multiple range conditions on an index, in which case we no longer need to solve this problem with IN queries. Optimize sortingThe last topic is sorting. The results of small amounts of data can be sorted quickly using filesort, but what if there are millions of rows of data? For example, if only gender is specified in the WHERE condition. For such low-filtering scenarios, we can add specific indexes for sorting. For example, an index on (sex, rating) could be used for the following query: SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 10; This query has both a sort and a LIMIT clause and can be slow without an index. Even with an index, this query can be slow if the UI has paginated queries and the page number is not near the beginning. The following example ORDER BY and LIMIT make a bad combination: SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 100000, 10; Even with indexes, queries like this can cause serious problems. This is because a high skew will result in a large amount of data being discarded, taking a long time to scan. De-normalization, pre-calculation and caching may be able to solve the problem of such queries. A better strategy is to limit the pages that users can query. This is unlikely to degrade the user experience, since no one actually cares about page 10,000 of search results. Another good strategy is to use inferred join queries, which is a way for us to use covering indexes to get the primary key columns and then get the data rows. You can combine all the columns you need to get, which will reduce MySQL's work of collecting the data that needs to be discarded. Here is an example: SELECT <cols> FROM profiles INNER JOIN ( SELECT <primary key cols> FROM profiles WHERE x.sex='M' ORDER BY rating LIMIT 100000, 10 AS x USING(<primary key cols>); The above is the details of how MySQL builds data table indexes. For more information about MySQL building data table indexes, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: The magic of tbody tag speeds up the display of table content
>>: Thinking about grid design of web pages
The frame and rules attributes of the table tag c...
To back up multiple databases, you can use the fo...
MyISAM and InnoDB are the most common storage eng...
The process of installing MySQL database and conf...
Table of contents Introduction Install Display Fi...
Export: docker save -o centos.tar centos:latest #...
This article shares the specific code of react to...
An optimization solution when a single MYSQL serv...
Table of contents Initial Vue Building a Vue deve...
Hyperlinks are the most frequently used HTML elem...
First, the server environment information: Reason...
Here I use samba (file sharing service) v4.9.1 + ...
1. Inline styles To add inline styles to the virt...
Table of contents 1: Introduction to galera-clust...
[LeetCode] 183.Customers Who Never Order Suppose ...