mysql IS NULL using index case explanation

mysql IS NULL using index case explanation

Introduction

The use of is null, is not null, and != in MySQL's SQL query statements has no effect on the index, and the index will not be invalidated and the full table will not be scanned due to the use of is null, is not null, and != in the where condition.

The official MySQL documentation has also clearly stated that is null does not affect the use of the index.

MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.

In fact, the reason why the index fails and the full table scan is usually because there are too many tables returned in one query. MySQL calculates that the time cost of using an index is higher than that of a full table scan, so MySQL would rather scan the full table than use an index.

Case

CREATE TABLE `user_info` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(11) DEFAULT NULL,
  `age` int(4) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_name` (`name`) USING BTREE
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('1', 'tom', '18');
INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('2', null, '19');
INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('3', 'cat', '20');

When executing SQL queries, I used is null and is not null, and found that the index query was still used, and there was no index invalidation problem.

insert image description here

insert image description here

analyze

To analyze the above phenomenon, you need to understand the working principle of MySQL index and index data structure in detail. Next, we analyze the MySQL index data structure by using two methods: tool parsing and directly viewing the binary file.

Tool Analysis

innodb_ruby is a very powerful MySQL analysis tool that can be used to easily parse MySQL's .ibd files and gain a deeper understanding of MySQL's data structure.

First install the innodb_ruby tool:

yum install -y rubygems ruby-deve
gem install innodb_ruby

innodb_ruby has many functions. Here we only need to use it to parse the index structure of MySQL, so only the following command is needed. For more functions and commands, see the wiki.

innodb_space -s ibdata1 -T sakila/film -I PRIMARY index-recurse

Parsing the primary key index:

$ innodb_space -s /usr/soft/mysql-5.6.31/data -T test/user_info -I PRIMARY index-recurse
ROOT NODE #3: 3 records, 89 bytes
  RECORD: (id=1) → (name="tom", age=18)
  RECORD: (id=2) → (name=:NULL, age=19)
  RECORD: (id=3) → (name="cat", age=20)

Parse the common index index_name:

$ innodb_space -s /usr/soft/mysql-5.6.31/data -T test/user_info -I index_name index-recurse
ROOT NODE #4: 3 records, 38 bytes
  RECORD: (name=:NULL) → (id=2)
  RECORD: (name="cat") → (id=3)
  RECORD: (name="tom") → (id=1)

By analyzing the index structure of mysql, we can find that null values ​​are also stored in the index tree, and null values ​​are processed into the smallest value and placed on the left side of the index_name index tree.

Binary Files

Find the physical file user_info.ibd corresponding to the user_info table, open it with software such as UltraEdit, and directly locate the 5th data page (MySQL defaults to a data page of 16KB).

insert image description here

As shown in the figure, these binary data are the index page data corresponding to the index_name index. Only the index records are selected and expanded as follows:

Minimum record 0x00010063

01 B2 01 00 02 00 29 Record header information 69 6E 66 69 6D 75 6D Minimum record (fixed value infimum)

Maximum record 0x00010070

00 04 00 0B 00 00 Record header information 73 75 70 72 65 6D 75 6D Maximum record (fixed value supremum)

ID 1 index 0x0001007f

03 00 00 00 10 FF F1 Record header information 74 6F 6D The value of the field name: tom
80 00 00 01 RowID: The value of the primary key id is 1

ID 2 Index 0x0001008c

01 00 00 18 00 0B Record header information Field name value: null
80 00 00 02 RowID: The value of the primary key id is 2

ID 3 Index 0x00010097

03 00 00 00 20 FF E8 Record header information 63 61 74 Field name value: cat
80 00 00 03 RowID: The value of the primary key id is 3

The last 2 bytes of the record header information of the minimum record are 00 29 -> 0x00010063 offset 0x0029 -> 0x0001008C, which is the index position of ID 2;

The last 2 bytes of the record header information of ID 2 are 00 0B -> 0x0001008C offset 0x000B -> 0x00010097, which is the index position of ID 3;

The last 2 bytes of the record header information of ID 3 are FF E8 -> 0x00010097 offset 0xFFE8 -> 0x0001007F, which is the index position of ID 1;

The last 2 bytes of the record header information of ID 1 are FF F1 -> 0x0001007F, offset 0xFFF1 -> 0x00010070, the record position of the largest record;

It can be seen that the index records are connected in series through a one-way linked list and sorted by index values, and the null value is processed into the smallest value and placed at the beginning of the index linked list, which is the leftmost position of the index tree. The result is consistent with the result parsed by the innodb_ruby tool.

Misunderstanding reasons

Why do people misunderstand that is null, is not null, != and other judgment conditions will cause the index to fail and the whole table to be scanned?

The reason why the index fails and the full table scan is usually because too many tables are returned in one query. MySQL calculates that the time cost of using an index is higher than that of a full table scan, so MySQL would rather scan the full table than use an index. The time cost of using an index is higher than the critical value of a full table scan, which can be simply memorized as about 20%.

For a detailed analysis process, please refer to another blog post by the author: MySQL table return causes index failure.

That is, if the table return range caused by a query statement exceeds 20% of all records, the index will become invalid. The judgment conditions such as is null, is not null, and != often appear in the scenarios where the range of the returned table is large, and people may misunderstand that these judgment conditions cause the index to be invalid.

Recurring index failure

To reproduce index failure, you only need to return a table range that exceeds 20% of all records. Insert 1000 non-null records as follows.

delimiter //
CREATE PROCEDURE init_user_info() 
BEGIN 
	DECLARE indexNo INT;
	SET indexNo = 0;
	WHILE indexNo < 1000 DO
		START TRANSACTION; 
			insert into user_info(name,age) values ​​(concat(floor(rand()*1000000000)),floor(rand()*100));
			SET indexNo = indexNo + 1;
		COMMIT; 
	END WHILE;
END //
delimiter ;
call init_user_info();

At this time, there are a total of 1003 records in the user_info table, of which only one record has a name value of null. Then, only 1/1003 of the returned records caused by the is null judgment statement will not exceed the critical value, while 1002/1003 of the returned records caused by the is not null judgment statement will far exceed the critical value, resulting in index failure.

As can be seen from the following two figures, is null still uses the index normally, while is not null, as expected, prefers a full table scan rather than using the index due to the high table return rate.

insert image description here

insert image description here

Use MySQL's optimizer tracing (supported by MySQL 5.6) to analyze the SQL execution plan:

SET optimizer_trace="enabled=on";
explain select * from user_info where name is not null;
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

The execution plan output by optimizer tracing shows that for this query, the time cost of using a full table scan is 206.9, while the time cost of using an index is 1203.4, which is much higher than a full table scan. Therefore, MySQL finally chooses to scan the entire table, resulting in index failure.

{
    "rows_estimation": [
        {
            "table": "`user_info`",
            "range_analysis": {
                "table_scan": {
                    "rows": 1004, // Full table scan requires scanning 1004 records "cost": 206.9 // Full table scan requires a cost of 206.9
                },
                "potential_range_indices": [
                    {
                        "index": "PRIMARY",
                        "usable": false,
                        "cause": "not_applicable"
                    },
                    {
                        "index": "index_name",
                        "usable": true,
                        "key_parts": [
                            "name",
                            "id"
                        ]
                    }
                ],
                "setup_range_conditions": [],
                "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                },
                "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                        {
                            "index": "index_name",
                            "ranges": [
                                "NULL < name"
                            ],
                            "index_dives_for_eq_ranges": true,
                            "rowid_ordered": false,
                            "using_mrr": false,
                            "index_only": false,
                            "rows": 1002, // The index needs to scan 1002 records "cost": 1203.4, // The cost of the index is 1203.4
                            "chosen": false,
                            "cause": "cost"
                        }
                    ],
                    "analyzing_roworder_intersect": {
                        "usable": false,
                        "cause": "too_few_roworder_scans"
                    }
                }
            }
        }
    ]
}

This is the end of this article about the use of mysql IS NULL index case. For more relevant mysql IS NULL usage content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL process control IF(), IFNULL(), NULLIF(), ISNULL() functions
  • This article takes you to explore NULL in MySQL
  • MySQL series of experience summary and analysis tutorials on NUll values
  • MySql sharing of null function usage
  • Storing NULL values ​​on disk in MySQL

<<:  Detailed process record of nginx installation and configuration

>>:  Detailed steps for installing nodejs environment and path configuration in Linux

Recommend

MySQL service and database management

Table of contents 1. Start and stop service instr...

Two implementation codes of Vue-router programmatic navigation

Two ways to navigate the page Declarative navigat...

Using react-beautiful-dnd to implement drag and drop between lists

Table of contents Why choose react-beautiful-dnd ...

Sample code for achieving small triangle border effect with pure CSS3+DIV

The specific code is as follows: The html code is...

Typical cases of MySQL index failure

Table of contents Typical Cases Appendix: Common ...

Perfect solution for vertical centering of form elements

Copy code The code is as follows: <!DOCTYPE ht...

What scenarios are not suitable for JS arrow functions?

Table of contents Overview Defining methods on an...

Steps to install MySQL on Windows using a compressed archive file

Recently, I need to do a small verification exper...

Detailed explanation of the process of modifying Nginx files in centos7 docker

1. Install nginx in docker: It is very simple to ...

Detailed explanation of TypeScript's basic types

Table of contents Boolean Type Number Types Strin...

Nginx/Httpd load balancing tomcat configuration tutorial

In the previous blog, we talked about using Nginx...

How to change the encoding of MySQL database to utf8mb4

The utf8mb4 encoding is a superset of the utf8 en...

Vue.set() and this.$set() usage and difference

When we use Vue for development, we may encounter...