MySQL index cardinality concept and usage examples

MySQL index cardinality concept and usage examples

This article uses examples to explain the concept and usage of MySQL index cardinality. Share with you for your reference, the details are as follows:

Cardinality (index cardinality) is a very important concept in MySQL index

Index cardinality is the number of distinct values ​​a column contains. For example, if a data column contains the values ​​1, 2, 3, 4, 5, 1, then its cardinality is 5. An index works best when its cardinality is high relative to the number of rows in the table (that is, the column contains many distinct values ​​and few duplicate values). If a column contains many different ages, the index can quickly distinguish the rows. If a column is used to record gender (with only two values, "M" and "F"), then an index would not be very useful. If the values ​​appear with almost equal probability, then searching for either value will yield half the rows. In these cases, it is best not to use an index at all, because when the query optimizer finds that a certain value appears in a high percentage of the data rows in the table, it will generally ignore the index and perform a full table scan. The commonly used percentage cutoff is "30%"

Another concept is called index selectivity

Index selectivity = index cardinality/total data. The cardinality can be viewed through "show index from table name".
The closer it is to 1, the more likely it is to use the index. It can also be understood that 1 is 100% high. The benefit of index selectivity is that MySQL can filter more rows when searching for matches. The selectivity of a unique index is the best, with a value of 1.

Here is a table of my own. The amount of data is not high, just to test the index selectivity

select * from articles
id Title Name15 Title 0 Big Bear16 Title 1 Big Bear17 Title 2 Big Bear18 Title 3 Big Bear19 Title 4 Big Bear20 Title 5 Big Bear21 Title 6 Big Bear22 Title 7 Big Bear23 Title 8 Two Bears24 Title 9 Two Bears

id is the default integer auto-increment primary key

Now add the author column as an index and view all indexes of the articles table

ALTER TABLE `articles` ADD INDEX (`author`)
SHOW INDEX FROM articles
table non_unique key_name seq_in_index column_name collation cardinality sub_part packed null index_type
articles 0 PRIMARY 1 id A 10 NULL NULL BTREE
articles 1 author 1 author A 2 NULL NULL BTREE

Explain the meaning of each field

  • table table name
  • non_unique is 0 if the index cannot include duplicate words. 1 if it can.
  • key_name index name
  • seq_in_index The column sequence number in the index, starting from 1
  • column_name column name
  • How collation columns are stored in the index. In MySQL SHOW INDEX syntax, there is a value of 'A' (ascending) or NULL (no sorting)
  • cardinality index cardinality
  • sub_part If the column is only partially indexed, the number of characters indexed. NULL if the entire column is indexed.
  • packed indicates how keywords are packed. If not compressed, this is NULL.
  • null Contains YES if the column contains NULL. If not, the column contains NO.
  • Index storage method used by index_type (BTREE, FULLTEXT, HASH, RTREE)

As you can see, the articles table already has two indexes.

The id index does not contain duplicate words. The primary key column name is id. The index cardinality is 10 in ascending order. There is no partial index. There is no compression. There is no null. The storage method is btree.

The author index contains repeated words. The index name is author. The column name is author. The index is sorted in ascending order. The index cardinality is 2. There is no partial index. There is no compression. There are no nulls. The storage method is btree.

According to the index selectivity algorithm, the id index selectivity is 10/10 = 1 and the author index selectivity is 2/10 = 0.2. Let's test it and use explain to see the statement analysis.

explain select * from articles where id = 15
id select_type table partitions type possible_keys key key_len ref rows filtered extra
1 SIMPLE articles NULL const PRIMARY PRIMARY 4 const 1 100.00 NULL

Let me talk about the explain command again.

EXPLAIN shows how MySQL uses indexes to process select statements and join tables. It can help choose better indexes and write more optimized query statements. Explain the meaning of each field

1. id SELECT identifier. This is the SELECT query sequence number. This is not important. The query sequence number is the order in which the SQL statements are executed.

2. select_type select type

2.1. SIMPLE When performing a simple select query that does not require a Union operation or does not contain a subquery, the select_type of the response query statement is simple. No matter how complex the query statement is, there must be only one unit query with select_type simple in the execution plan.
2.2. PRIMARY In a select query execution plan that requires a Union operation or contains a subquery, the outermost select_type is primary. Like simple, there is only one unit select query with select_type as primary.
2.3. union In the unit select queries formed by the union operation, except the first one, the select_type of all the unit select queries after the second one is union. The select_type of the first unit select of union is not union, but DERIVED. It is a temporary table used to store the query results after the union.
2.4. DEPENDENT UNION dependent Like UNION select_type , dependent union appears in the set query formed by union or union all. The word "dependent" here means that the unit query formed by union or union all is affected by external factors.
2.5. Union result Union result is a data table containing the union result

3. table table name

4. Type connection type, there are multiple parameters, first from the best type to the worst type is also the focus of this article

4.1 const , the table has at most one matching row, const is used to compare primary keys or unique indexes. Because only one row of data is matched, it is very fast, which can also be understood as the most optimized index and constant search.
4.2 eq_ref Regarding the explanation of eq_ref, the MySQL manual says: "For each row combination from the previous table, read a row from this table. Except for const types, this may be the best join type"
4.3 ref For each combination of rows from the previous table, all rows with matching index values ​​will be read from this table. If the join uses only a leftmost prefix of the key, or if the key is not UNIQUE or PRIMARY KEY (in other words, if the join cannot select a single row based on the key), then ref is used. This join type is good if the keys used match only a small number of rows.
4.4 ref_or_null This join type is like ref, but adds that MySQL can specifically search for rows containing NULL values. This join type optimization is often used in resolving subqueries.
4.5 index_merge This join type indicates that the index merge optimization method is used. In this case, the key column contains the list of indexes used, and key_len contains the longest key element of the used indexes.
4.6 unique_subquery
4.7 index_subquery
4.8 range Search within a given range, using an index to check rows
4.9 index This join type is the same as ALL, except that only the index tree is scanned. This is usually faster than ALL because index files are usually smaller than data files. (That is, although all and Index both read the entire table, index reads from the index, while all reads from the hard disk)
4.10 ALL For each combination of rows from the previous tables, a full table scan is performed. This is usually bad if the table is the first table not marked const, and is usually very bad in other cases. Often you can add more indexes instead of using ALL so that rows can be retrieved based on constant values ​​or column values ​​in the previous table.

5. possible_keys hints which index will be used to find the row in the table, which is not very important

6. keys indicates the index used by MYSQL query

7. key_len MYSQL index length

8. ref shows which column or constant is used together with key to select rows from the table

9. rows shows the number of rows that MYSQL executes the query. The larger the value, the worse it is, indicating that the index is not used well.

10. Extra This column contains detailed information about how MySQL solves the query.

You can see that the id query uses the id index. The query type is the optimal constant query. Then let's try another query, this time using the author index.

explain select * from articles where author = "Big Bear"
1 SIMPLE articles NULL ALL author NULL NULL NULL 10 80.00 Using where

It can be clearly seen that the author index is used for a simple query. The query type is the worst full table scan. Let's not rush to explain it. Let's use the same statement instead.

explain select * from articles where author = "二熊"
1 SIMPLE articles NULL ref author author 1022 const 2 100.00 NULL

You can see that the query type this time is ref

That is to say, because the number of data rows written by the author Daxiong exceeds 30% of the total data, MySQL believes that full table scanning is faster than using indexes. This is the meaning of the concepts of index cardinality and index selectivity. Therefore, when creating an index, you should pay attention to creating the index on a column with a high index cardinality.

Readers who are interested in more MySQL-related content can check out the following topics on this site: "Summary of MySQL Index Operation Skills", "Summary of MySQL Common Functions", "Summary of MySQL Log Operation Skills", "Summary of MySQL Transaction Operation Skills", "Summary of MySQL Stored Procedure Skills" and "Summary of MySQL Database Lock-Related Skills".

I hope this article will be helpful to everyone's MySQL database design.

You may also be interested in:
  • Solve MySQL deadlock routine by updating different indexes
  • Understanding MySQL deadlock routines through unique index S lock and X lock
  • Share some key interview questions about MySQL index
  • Index in MySQL
  • MySQL index for beginners

<<:  Why TypeScript's Enum is problematic

>>:  Use Shell scripts to batch start and stop Docker services

Recommend

Vue uses mixins to optimize components

Table of contents Mixins implementation Hook func...

Install Docker on CentOS 7

If you don't have a Linux system, please refe...

How to use axios to filter multiple repeated requests in a project

Table of contents 1. Introduction: In this case, ...

Use the CSS border-radius property to set the arc

Phenomenon: Change the div into a circle, ellipse...

JS implementation of carousel example

This article shares the specific code of JS to im...

CentOS 7 cannot access the Internet after modifying the network card

Ping www.baidu.com unknown domain name Modify the...

About the garbled problem caused by HTML encoding

Today a junior student asked a question. The HTML...

XHTML Getting Started Tutorial: XHTML Hyperlinks

It is no exaggeration to say that hyperlinks conne...

HTML form component example code

HTML forms are used to collect different types of...

In-depth understanding of umask in new linux file permission settings

Preface The origin is a question 1: If your umask...

Example analysis of the page splitting principle of MySQL clustered index

This article uses an example to illustrate the pa...

vue+element custom query component

This article mainly introduces the Vue project. O...

Detailed explanation of the solution to docker-compose being too slow

There is only one solution, that is to change the...

When is it appropriate to use dl, dt, and dd?

dl:Definition list Definition List dt:Definition t...