IntroductionWhen the MySQL InnoDB engine queries records and cannot use index coverage, it is necessary to perform a table return operation to obtain the required fields of the record. Before executing SQL, MySQL will perform SQL optimization, index selection and other operations. MySQL will estimate the query cost required for each index and the query cost required without using the index, and choose a method that MySQL thinks has the lowest cost to perform the SQL query operation. When the amount of data in the table is large, MySQL often estimates that the query cost of the table return operation is too high, resulting in incorrect index usage. CaseThe following example creates a test table in MySQL version 5.6 and a Linux environment with 1CPU and 2G memory, and creates nearly 2 million records for testing. CREATE TABLE `salary_static` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key', `school_id` int(11) NOT NULL COMMENT 'school id', `student_id` int(11) NOT NULL COMMENT 'Graduate id', `salary` int(11) NOT NULL DEFAULT '0' COMMENT 'Graduation salary', `year` int(11) NOT NULL COMMENT 'Graduation year', PRIMARY KEY (`id`), KEY `school_id_key` (`school_id`) USING BTREE, KEY `year_school_key` (`year`,`school_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Graduate salary statistics'; delimiter // CREATE PROCEDURE init_salary_static() BEGIN DECLARE year INT; DECLARE schid INT; DECLARE stuid INT; SET year = 2000; WHILE year < 2020 DO START TRANSACTION; SET schid = 1; WHILE schid < 100 DO SET stuid = 1; WHILE stuid < 1000 DO insert into salary_static(school_id,student_id,salary,year) values (schid,stuid,floor(rand()*10000),year); SET stuid = stuid + 1; END WHILE; SET schid = schid + 1; END WHILE; SET year = year + 1; COMMIT; END WHILE; END // delimiter ; call init_salary_static(); After the test data is created, execute the following SQL statement for statistical query. select school_id,avg(salary) from salary_static where year between 2016 and 2019 group by school_id; It is expected that the SQL should use the year_school_key index for querying. However, through the explain command, it can be found that the SQL uses the school_id_key index. And because the wrong index is used, the SQL performs a full table scan, resulting in a query time of 7 seconds. After forcing the use of the year_school_key index for querying, it was found that the query time of the SQL was sharply reduced to 0.6 seconds, which is 10 times shorter than the time of the school_id_key index. select school_id,avg(salary) from salary_static force index(year_school_key) where year between 2015 and 2019 group by school_id; analyzeUse MySQL's optimizer tracing (supported by MySQL 5.6) to analyze the SQL execution plan: SET optimizer_trace="enabled=on"; select school_id,avg(salary) from salary_static where year between 2016 and 2019 group by school_id; SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE; The output result is a JSON file, which shows the execution plan of the SQL optimization process and index selection process in MySQL. Focus on the content under range_analysis in the json of the execution plan, which shows the index selection during the where range query process. table_scan indicates a full table scan, which estimates that 1,973,546 records need to be scanned. However, since a full table scan uses a clustered index and is a sequential IO read, the query cost of each record is very small, and the final calculated query cost is 399,741. range_scan_alternatives indicates a range query using an index. The year_school_key index estimates that 812,174 records need to be scanned. However, random IO reads are required due to the need to return to the table. The final calculated query cost is 974,610. Therefore, for the where query process, the full table scan is finally chosen instead of the index. "range_analysis": { "table_scan": { "rows": 1973546, "cost": 399741 }, "potential_range_indices": [ { "index": "PRIMARY", "usable": false, "cause": "not_applicable" }, { "index": "school_id_key", "usable": true, "key_parts": [ "school_id", "id" ] }, { "index": "year_school_key", "usable": true, "key_parts": [ "year", "school_id", "id" ] } ], "setup_range_conditions": [ ], "group_index_range": { "chosen": false, "cause": "not_applicable_aggregate_function" }, "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "year_school_key", "ranges": [ "2016 <= year <= 2019" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 812174, "cost": 974610, "chosen": false, "cause": "cost" } ], "analyzing_roworder_intersect": { "usable": false, "cause": "too_few_roworder_scans" } } } The query cost value here can be calculated manually, cost = I/O cost (one cost for each read of the record page, each cost is 1.0) + CPU cost (one cost for each record, each cost is 0.2). Full table scan query costWhen table_scan is used for full table scanning, it is estimated that 1973546 records need to be scanned. The show table status like "salary_static" command shows that the total number of records in the table is 82411520 bytes (Data_length). Each record page in innodb is 16KB, which means that a full table scan needs to read 82411520/1024/16 = 5030 record pages.
5030 * 1.0 = 5030
1973546 * 0.2 = 394709.2
5030 + 394709.2 = 399739.2 Index query costIt is estimated that 812,174 records need to be scanned when indexing year_school_key. To use this index, you need to first query the rowId through the index and then return to the table through the rowId. MySQL considers that each table return requires a separate I/O cost
812174 * 0.2 = 162434.8
812174 * 1.0 = 812174
162434.8 + 812174 = 974608.8 Next, pay attention to reconsidering_access_paths_for_index_ordering, which means that the sorting will be optimized again in the end. The school_id_key index is selected here and vetoes the full table scan selected in the above where condition: "plan_changed": true. For details, see group-by-optimization. { "reconsidering_access_paths_for_index_ordering": { "clause": "GROUP BY", "index_order_summary": { "table": "`salary_static`", "index_provides_order": true, "order_direction": "asc", "index": "school_id_key", "plan_changed": true, "access_type": "index_scan" } } } In fact, there is also a bug in sort index optimization, see Bug # 93845 for details. optimizationBy analyzing the SQL execution process, we can find that the wrong index is selected because there are too many records in the year_school_key index back table, which causes the estimated query cost to be greater than the full table scan and ultimately selects the wrong index. Therefore, to reduce the execution time of the SQL, the next optimization solution is to reduce the table return operation of the SQL, that is, to make the SQL perform index coverage. The SQL statement involves only three fields: school_id, salary, and year. Therefore, a joint index of these three indexes is created, and attention is paid to the order of these three fields in the joint index: the where filter statement is executed first, so the year field is in the first place in the joint index; the group by statement is essentially the same as the order by statement, so it is placed after the where statement, that is, in the second place in the joint index; salary is placed at the end of the joint index only to reduce table returns. CREATE INDEX year_school_salary_key ON salary_static (year, school_id, salary); After creating the joint index, the results after executing the SQL statement are as follows. It only took 0.2 seconds to complete the query, which is 35 times less than the time of the school_id_key index. Calculation of return rateThe above problem is that there are too many SQL queries at one time, resulting in too high a cost for returning to the table. In fact, the critical value of the above phenomenon can be calculated: Assume that the size of a row of records is a bytes, the number of records in the table is b, and the critical number of records is c, then the number of record pages in the table is b*a/1024/16 Query cost of full table scan = I/O cost + CPU cost = b*a/1024/16 * 1.0 + b * 0.2 Query cost of index scan = I/O cost + CPU cost = c * 1.0 + c * 0.2 = c * 1.2 b*a/1024/16 * 1.0 + b * 0.2 = c * 1.2 Critical ratio = c/b = (a/1024/16 + 0.2)/1.2 = a * 5E-5 + 0.1667 That is, when a SQL query exceeds approximately 17% of the records in the table and a covering index cannot be used, the cost of returning the index to the table is too high and a full table scan is chosen. And this ratio increases slightly as the byte size of a single row increases. This is the end of this article about the case of index invalidation caused by MySQL table return. For more relevant content about index invalidation caused by MySQL table return, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Several ways to solve CSS style conflicts (summary)
>>: JavaScript history object explained
What is a memory leak? A memory leak means that a...
To execute a shell command in Docker, you need to...
Table of contents Explanation of v-text on if for...
Use auto.js to automate daily check-in Due to the...
This article example shares the specific code of ...
MySQL (5.6 and below) parses json #json parsing f...
The party that creates a new connection is equiva...
I have encountered the Nginx 502 Bad Gateway erro...
In this blog, I will walk you through the process...
Table of contents 1. Create HTML structure 2. Cre...
The command pattern is a behavioral design patter...
Preface Everyone knows how to run a jar package o...
cause I once set up WordPress on Vultr, but for w...
1.watch listener Introducing watch import { ref, ...
Bugs As shown in the figure, I started to copy th...