IntroductionThe use of is null, is not null, and != in MySQL's SQL query statements has no effect on the index, and the index will not be invalidated and the full table will not be scanned due to the use of is null, is not null, and != in the where condition. The official MySQL documentation has also clearly stated that is null does not affect the use of the index.
In fact, the reason why the index fails and the full table scan is usually because there are too many tables returned in one query. MySQL calculates that the time cost of using an index is higher than that of a full table scan, so MySQL would rather scan the full table than use an index. CaseCREATE TABLE `user_info` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(11) DEFAULT NULL, `age` int(4) DEFAULT NULL, PRIMARY KEY (`id`), KEY `index_name` (`name`) USING BTREE )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('1', 'tom', '18'); INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('2', null, '19'); INSERT INTO `user_info` (`id`, `name`, `age`) VALUES ('3', 'cat', '20'); When executing SQL queries, I used is null and is not null, and found that the index query was still used, and there was no index invalidation problem. analyzeTo analyze the above phenomenon, you need to understand the working principle of MySQL index and index data structure in detail. Next, we analyze the MySQL index data structure by using two methods: tool parsing and directly viewing the binary file. Tool Analysisinnodb_ruby is a very powerful MySQL analysis tool that can be used to easily parse MySQL's .ibd files and gain a deeper understanding of MySQL's data structure. First install the innodb_ruby tool: yum install -y rubygems ruby-deve gem install innodb_ruby innodb_ruby has many functions. Here we only need to use it to parse the index structure of MySQL, so only the following command is needed. For more functions and commands, see the wiki. innodb_space -s ibdata1 -T sakila/film -I PRIMARY index-recurse Parsing the primary key index: $ innodb_space -s /usr/soft/mysql-5.6.31/data -T test/user_info -I PRIMARY index-recurse ROOT NODE #3: 3 records, 89 bytes RECORD: (id=1) → (name="tom", age=18) RECORD: (id=2) → (name=:NULL, age=19) RECORD: (id=3) → (name="cat", age=20) Parse the common index index_name: $ innodb_space -s /usr/soft/mysql-5.6.31/data -T test/user_info -I index_name index-recurse ROOT NODE #4: 3 records, 38 bytes RECORD: (name=:NULL) → (id=2) RECORD: (name="cat") → (id=3) RECORD: (name="tom") → (id=1) By analyzing the index structure of mysql, we can find that null values are also stored in the index tree, and null values are processed into the smallest value and placed on the left side of the index_name index tree. Binary FilesFind the physical file user_info.ibd corresponding to the user_info table, open it with software such as UltraEdit, and directly locate the 5th data page (MySQL defaults to a data page of 16KB). As shown in the figure, these binary data are the index page data corresponding to the index_name index. Only the index records are selected and expanded as follows: Minimum record 0x00010063 01 B2 01 00 02 00 29 Record header information 69 6E 66 69 6D 75 6D Minimum record (fixed value infimum) Maximum record 0x00010070 00 04 00 0B 00 00 Record header information 73 75 70 72 65 6D 75 6D Maximum record (fixed value supremum) ID 1 index 0x0001007f 03 00 00 00 10 FF F1 Record header information 74 6F 6D The value of the field name: tom 80 00 00 01 RowID: The value of the primary key id is 1 ID 2 Index 0x0001008c 01 00 00 18 00 0B Record header information Field name value: null 80 00 00 02 RowID: The value of the primary key id is 2 ID 3 Index 0x00010097 03 00 00 00 20 FF E8 Record header information 63 61 74 Field name value: cat 80 00 00 03 RowID: The value of the primary key id is 3 The last 2 bytes of the record header information of the minimum record are 00 29 -> 0x00010063 offset 0x0029 -> 0x0001008C, which is the index position of ID 2; The last 2 bytes of the record header information of ID 2 are 00 0B -> 0x0001008C offset 0x000B -> 0x00010097, which is the index position of ID 3; The last 2 bytes of the record header information of ID 3 are FF E8 -> 0x00010097 offset 0xFFE8 -> 0x0001007F, which is the index position of ID 1; The last 2 bytes of the record header information of ID 1 are FF F1 -> 0x0001007F, offset 0xFFF1 -> 0x00010070, the record position of the largest record; It can be seen that the index records are connected in series through a one-way linked list and sorted by index values, and the null value is processed into the smallest value and placed at the beginning of the index linked list, which is the leftmost position of the index tree. The result is consistent with the result parsed by the innodb_ruby tool. Misunderstanding reasonsWhy do people misunderstand that is null, is not null, != and other judgment conditions will cause the index to fail and the whole table to be scanned? The reason why the index fails and the full table scan is usually because too many tables are returned in one query. MySQL calculates that the time cost of using an index is higher than that of a full table scan, so MySQL would rather scan the full table than use an index. The time cost of using an index is higher than the critical value of a full table scan, which can be simply memorized as about 20%. For a detailed analysis process, please refer to another blog post by the author: MySQL table return causes index failure. That is, if the table return range caused by a query statement exceeds 20% of all records, the index will become invalid. The judgment conditions such as is null, is not null, and != often appear in the scenarios where the range of the returned table is large, and people may misunderstand that these judgment conditions cause the index to be invalid. Recurring index failureTo reproduce index failure, you only need to return a table range that exceeds 20% of all records. Insert 1000 non-null records as follows. delimiter // CREATE PROCEDURE init_user_info() BEGIN DECLARE indexNo INT; SET indexNo = 0; WHILE indexNo < 1000 DO START TRANSACTION; insert into user_info(name,age) values (concat(floor(rand()*1000000000)),floor(rand()*100)); SET indexNo = indexNo + 1; COMMIT; END WHILE; END // delimiter ; call init_user_info(); At this time, there are a total of 1003 records in the user_info table, of which only one record has a name value of null. Then, only 1/1003 of the returned records caused by the is null judgment statement will not exceed the critical value, while 1002/1003 of the returned records caused by the is not null judgment statement will far exceed the critical value, resulting in index failure. As can be seen from the following two figures, is null still uses the index normally, while is not null, as expected, prefers a full table scan rather than using the index due to the high table return rate. Use MySQL's optimizer tracing (supported by MySQL 5.6) to analyze the SQL execution plan: SET optimizer_trace="enabled=on"; explain select * from user_info where name is not null; SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE; The execution plan output by optimizer tracing shows that for this query, the time cost of using a full table scan is 206.9, while the time cost of using an index is 1203.4, which is much higher than a full table scan. Therefore, MySQL finally chooses to scan the entire table, resulting in index failure. { "rows_estimation": [ { "table": "`user_info`", "range_analysis": { "table_scan": { "rows": 1004, // Full table scan requires scanning 1004 records "cost": 206.9 // Full table scan requires a cost of 206.9 }, "potential_range_indices": [ { "index": "PRIMARY", "usable": false, "cause": "not_applicable" }, { "index": "index_name", "usable": true, "key_parts": [ "name", "id" ] } ], "setup_range_conditions": [], "group_index_range": { "chosen": false, "cause": "not_group_by_or_distinct" }, "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "index_name", "ranges": [ "NULL < name" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 1002, // The index needs to scan 1002 records "cost": 1203.4, // The cost of the index is 1203.4 "chosen": false, "cause": "cost" } ], "analyzing_roworder_intersect": { "usable": false, "cause": "too_few_roworder_scans" } } } } ] } This is the end of this article about the use of mysql IS NULL index case. For more relevant mysql IS NULL usage content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Detailed process record of nginx installation and configuration
>>: Detailed steps for installing nodejs environment and path configuration in Linux
[LeetCode] 180. Consecutive Numbers Write a SQL q...
I have always wanted to learn about caching. Afte...
Table of contents 1. How to represent the current...
The reason is simple: In HTML documents, multiple ...
Today we are going to make origami airplanes (the...
Why is NULL so often used? (1) Java's null Nu...
Create a simple Spring boot web project Use the i...
Table of contents Overview computed watch monitor...
This article shares the specific code of JavaScri...
Structured Table (IExplore Only) 1) Group by rows ...
When using docker-compose for deployment, the out...
Before officially using Docker, let's first f...
1 Introduction Binary log records SQL statements ...
How to declare a cursor in mysql: 1. Declare vari...
Product designers face complex and large manufactu...