mysql IS NULL using index case explanation

mysql IS NULL using index case explanation

Introduction

The use of is null, is not null, and != in MySQL's SQL query statements has no effect on the index, and the index will not be invalidated and the full table will not be scanned due to the use of is null, is not null, and != in the where condition.

The official MySQL documentation has also clearly stated that is null does not affect the use of the index.

MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.

In fact, the reason why the index fails and the full table scan is usually because there are too many tables returned in one query. MySQL calculates that the time cost of using an index is higher than that of a full table scan, so MySQL would rather scan the full table than use an index.

Case

CREATE TABLE `user_info` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(11) DEFAULT NULL,
  `age` int(4) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_name` (`name`) USING BTREE
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('1', 'tom', '18');
INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('2', null, '19');
INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('3', 'cat', '20');

When executing SQL queries, I used is null and is not null, and found that the index query was still used, and there was no index invalidation problem.

insert image description here

insert image description here

analyze

To analyze the above phenomenon, you need to understand the working principle of MySQL index and index data structure in detail. Next, we analyze the MySQL index data structure by using two methods: tool parsing and directly viewing the binary file.

Tool Analysis

innodb_ruby is a very powerful MySQL analysis tool that can be used to easily parse MySQL's .ibd files and gain a deeper understanding of MySQL's data structure.

First install the innodb_ruby tool:

yum install -y rubygems ruby-deve
gem install innodb_ruby

innodb_ruby has many functions. Here we only need to use it to parse the index structure of MySQL, so only the following command is needed. For more functions and commands, see the wiki.

innodb_space -s ibdata1 -T sakila/film -I PRIMARY index-recurse

Parsing the primary key index:

$ innodb_space -s /usr/soft/mysql-5.6.31/data -T test/user_info -I PRIMARY index-recurse
ROOT NODE #3: 3 records, 89 bytes
  RECORD: (id=1) → (name="tom", age=18)
  RECORD: (id=2) → (name=:NULL, age=19)
  RECORD: (id=3) → (name="cat", age=20)

Parse the common index index_name:

$ innodb_space -s /usr/soft/mysql-5.6.31/data -T test/user_info -I index_name index-recurse
ROOT NODE #4: 3 records, 38 bytes
  RECORD: (name=:NULL) → (id=2)
  RECORD: (name="cat") → (id=3)
  RECORD: (name="tom") → (id=1)

By analyzing the index structure of mysql, we can find that null values ​​are also stored in the index tree, and null values ​​are processed into the smallest value and placed on the left side of the index_name index tree.

Binary Files

Find the physical file user_info.ibd corresponding to the user_info table, open it with software such as UltraEdit, and directly locate the 5th data page (MySQL defaults to a data page of 16KB).

insert image description here

As shown in the figure, these binary data are the index page data corresponding to the index_name index. Only the index records are selected and expanded as follows:

Minimum record 0x00010063

01 B2 01 00 02 00 29 Record header information 69 6E 66 69 6D 75 6D Minimum record (fixed value infimum)

Maximum record 0x00010070

00 04 00 0B 00 00 Record header information 73 75 70 72 65 6D 75 6D Maximum record (fixed value supremum)

ID 1 index 0x0001007f

03 00 00 00 10 FF F1 Record header information 74 6F 6D The value of the field name: tom
80 00 00 01 RowID: The value of the primary key id is 1

ID 2 Index 0x0001008c

01 00 00 18 00 0B Record header information Field name value: null
80 00 00 02 RowID: The value of the primary key id is 2

ID 3 Index 0x00010097

03 00 00 00 20 FF E8 Record header information 63 61 74 Field name value: cat
80 00 00 03 RowID: The value of the primary key id is 3

The last 2 bytes of the record header information of the minimum record are 00 29 -> 0x00010063 offset 0x0029 -> 0x0001008C, which is the index position of ID 2;

The last 2 bytes of the record header information of ID 2 are 00 0B -> 0x0001008C offset 0x000B -> 0x00010097, which is the index position of ID 3;

The last 2 bytes of the record header information of ID 3 are FF E8 -> 0x00010097 offset 0xFFE8 -> 0x0001007F, which is the index position of ID 1;

The last 2 bytes of the record header information of ID 1 are FF F1 -> 0x0001007F, offset 0xFFF1 -> 0x00010070, the record position of the largest record;

It can be seen that the index records are connected in series through a one-way linked list and sorted by index values, and the null value is processed into the smallest value and placed at the beginning of the index linked list, which is the leftmost position of the index tree. The result is consistent with the result parsed by the innodb_ruby tool.

Misunderstanding reasons

Why do people misunderstand that is null, is not null, != and other judgment conditions will cause the index to fail and the whole table to be scanned?

The reason why the index fails and the full table scan is usually because too many tables are returned in one query. MySQL calculates that the time cost of using an index is higher than that of a full table scan, so MySQL would rather scan the full table than use an index. The time cost of using an index is higher than the critical value of a full table scan, which can be simply memorized as about 20%.

For a detailed analysis process, please refer to another blog post by the author: MySQL table return causes index failure.

That is, if the table return range caused by a query statement exceeds 20% of all records, the index will become invalid. The judgment conditions such as is null, is not null, and != often appear in the scenarios where the range of the returned table is large, and people may misunderstand that these judgment conditions cause the index to be invalid.

Recurring index failure

To reproduce index failure, you only need to return a table range that exceeds 20% of all records. Insert 1000 non-null records as follows.

delimiter //
CREATE PROCEDURE init_user_info() 
BEGIN 
	DECLARE indexNo INT;
	SET indexNo = 0;
	WHILE indexNo < 1000 DO
		START TRANSACTION; 
			insert into user_info(name,age) values ​​(concat(floor(rand()*1000000000)),floor(rand()*100));
			SET indexNo = indexNo + 1;
		COMMIT; 
	END WHILE;
END //
delimiter ;
call init_user_info();

At this time, there are a total of 1003 records in the user_info table, of which only one record has a name value of null. Then, only 1/1003 of the returned records caused by the is null judgment statement will not exceed the critical value, while 1002/1003 of the returned records caused by the is not null judgment statement will far exceed the critical value, resulting in index failure.

As can be seen from the following two figures, is null still uses the index normally, while is not null, as expected, prefers a full table scan rather than using the index due to the high table return rate.

insert image description here

insert image description here

Use MySQL's optimizer tracing (supported by MySQL 5.6) to analyze the SQL execution plan:

SET optimizer_trace="enabled=on";
explain select * from user_info where name is not null;
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

The execution plan output by optimizer tracing shows that for this query, the time cost of using a full table scan is 206.9, while the time cost of using an index is 1203.4, which is much higher than a full table scan. Therefore, MySQL finally chooses to scan the entire table, resulting in index failure.

{
    "rows_estimation": [
        {
            "table": "`user_info`",
            "range_analysis": {
                "table_scan": {
                    "rows": 1004, // Full table scan requires scanning 1004 records "cost": 206.9 // Full table scan requires a cost of 206.9
                },
                "potential_range_indices": [
                    {
                        "index": "PRIMARY",
                        "usable": false,
                        "cause": "not_applicable"
                    },
                    {
                        "index": "index_name",
                        "usable": true,
                        "key_parts": [
                            "name",
                            "id"
                        ]
                    }
                ],
                "setup_range_conditions": [],
                "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                },
                "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                        {
                            "index": "index_name",
                            "ranges": [
                                "NULL < name"
                            ],
                            "index_dives_for_eq_ranges": true,
                            "rowid_ordered": false,
                            "using_mrr": false,
                            "index_only": false,
                            "rows": 1002, // The index needs to scan 1002 records "cost": 1203.4, // The cost of the index is 1203.4
                            "chosen": false,
                            "cause": "cost"
                        }
                    ],
                    "analyzing_roworder_intersect": {
                        "usable": false,
                        "cause": "too_few_roworder_scans"
                    }
                }
            }
        }
    ]
}

This is the end of this article about the use of mysql IS NULL index case. For more relevant mysql IS NULL usage content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL process control IF(), IFNULL(), NULLIF(), ISNULL() functions
  • This article takes you to explore NULL in MySQL
  • MySQL series of experience summary and analysis tutorials on NUll values
  • MySql sharing of null function usage
  • Storing NULL values ​​on disk in MySQL

<<:  Detailed process record of nginx installation and configuration

>>:  Detailed steps for installing nodejs environment and path configuration in Linux

Recommend

SQL implements LeetCode (180. Continuous numbers)

[LeetCode] 180. Consecutive Numbers Write a SQL q...

Detailed explanation of setting resource cache in nginx

I have always wanted to learn about caching. Afte...

Summary of the use of Datetime and Timestamp in MySQL

Table of contents 1. How to represent the current...

Several reasons for not compressing HTML

The reason is simple: In HTML documents, multiple ...

CSS3 realizes the childhood paper airplane

Today we are going to make origami airplanes (the...

Analysis of the reasons why MySQL field definitions should not use null

Why is NULL so often used? (1) Java's null Nu...

How to start a Java program in docker

Create a simple Spring boot web project Use the i...

How to understand the difference between computed and watch in Vue

Table of contents Overview computed watch monitor...

JavaScript implements draggable progress bar

This article shares the specific code of JavaScri...

HTML tbody usage

Structured Table (IExplore Only) 1) Group by rows ...

Docker beginners' first exploration of common commands practice records

Before officially using Docker, let's first f...

In-depth explanation of binlog in MySQL 8.0

1 Introduction Binary log records SQL statements ...

How to declare a cursor in mysql

How to declare a cursor in mysql: 1. Declare vari...

What knowledge systems do web designers need?

Product designers face complex and large manufactu...