MySQL table return causes index invalidation case explanation

MySQL table return causes index invalidation case explanation

Introduction

When the MySQL InnoDB engine queries records and cannot use index coverage, it is necessary to perform a table return operation to obtain the required fields of the record.

Before executing SQL, MySQL will perform SQL optimization, index selection and other operations. MySQL will estimate the query cost required for each index and the query cost required without using the index, and choose a method that MySQL thinks has the lowest cost to perform the SQL query operation. When the amount of data in the table is large, MySQL often estimates that the query cost of the table return operation is too high, resulting in incorrect index usage.

Case

The following example creates a test table in MySQL version 5.6 and a Linux environment with 1CPU and 2G memory, and creates nearly 2 million records for testing.

CREATE TABLE `salary_static` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key',
  `school_id` int(11) NOT NULL COMMENT 'school id',
  `student_id` int(11) NOT NULL COMMENT 'Graduate id',
  `salary` int(11) NOT NULL DEFAULT '0' COMMENT 'Graduation salary',
  `year` int(11) NOT NULL COMMENT 'Graduation year',
  PRIMARY KEY (`id`),
  KEY `school_id_key` (`school_id`) USING BTREE,
  KEY `year_school_key` (`year`,`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Graduate salary statistics';
delimiter //
CREATE PROCEDURE init_salary_static() 
BEGIN 
	DECLARE year INT;
	DECLARE schid INT;
	DECLARE stuid INT;
	SET year = 2000;
	WHILE year < 2020 DO
		START TRANSACTION; 
		SET schid = 1;
		WHILE schid < 100 DO
			SET stuid = 1;
			WHILE stuid < 1000 DO
				insert into salary_static(school_id,student_id,salary,year) values ​​(schid,stuid,floor(rand()*10000),year);
				SET stuid = stuid + 1;
			END WHILE;
			SET schid = schid + 1;
		END WHILE;
		SET year = year + 1;
		COMMIT; 
	END WHILE;
END //
delimiter ;
call init_salary_static();

After the test data is created, execute the following SQL statement for statistical query.

select school_id,avg(salary) from salary_static where year between 2016 and 2019 group by school_id;

It is expected that the SQL should use the year_school_key index for querying. However, through the explain command, it can be found that the SQL uses the school_id_key index. And because the wrong index is used, the SQL performs a full table scan, resulting in a query time of 7 seconds.

insert image description here

insert image description here

After forcing the use of the year_school_key index for querying, it was found that the query time of the SQL was sharply reduced to 0.6 seconds, which is 10 times shorter than the time of the school_id_key index.

select school_id,avg(salary) from salary_static force index(year_school_key) where year between 2015 and 2019 group by school_id;

insert image description here

insert image description here

analyze

Use MySQL's optimizer tracing (supported by MySQL 5.6) to analyze the SQL execution plan:

SET optimizer_trace="enabled=on";
select school_id,avg(salary) from salary_static where year between 2016 and 2019 group by school_id;
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

The output result is a JSON file, which shows the execution plan of the SQL optimization process and index selection process in MySQL.

Focus on the content under range_analysis in the json of the execution plan, which shows the index selection during the where range query process. table_scan indicates a full table scan, which estimates that 1,973,546 records need to be scanned. However, since a full table scan uses a clustered index and is a sequential IO read, the query cost of each record is very small, and the final calculated query cost is 399,741. range_scan_alternatives indicates a range query using an index. The year_school_key index estimates that 812,174 records need to be scanned. However, random IO reads are required due to the need to return to the table. The final calculated query cost is 974,610. Therefore, for the where query process, the full table scan is finally chosen instead of the index.

"range_analysis": {
  "table_scan": {
	"rows": 1973546,
	"cost": 399741
  },
  "potential_range_indices": [
	{
	  "index": "PRIMARY",
	  "usable": false,
	  "cause": "not_applicable"
	},
	{
	  "index": "school_id_key",
	  "usable": true,
	  "key_parts": [
		"school_id",
		"id"
	  ]
	},
	{
	  "index": "year_school_key",
	  "usable": true,
	  "key_parts": [
		"year",
		"school_id",
		"id"
	  ]
	}
  ],
  "setup_range_conditions": [
  ],
  "group_index_range": {
	"chosen": false,
	"cause": "not_applicable_aggregate_function"
  },
  "analyzing_range_alternatives": {
	"range_scan_alternatives": [
	  {
		"index": "year_school_key",
		"ranges": [
		  "2016 <= year <= 2019"
		],
		"index_dives_for_eq_ranges": true,
		"rowid_ordered": false,
		"using_mrr": false,
		"index_only": false,
		"rows": 812174,
		"cost": 974610,
		"chosen": false,
		"cause": "cost"
	  }
	],
	"analyzing_roworder_intersect": {
	  "usable": false,
	  "cause": "too_few_roworder_scans"
	}
  }
}

The query cost value here can be calculated manually, cost = I/O cost (one cost for each read of the record page, each cost is 1.0) + CPU cost (one cost for each record, each cost is 0.2).

Full table scan query cost

When table_scan is used for full table scanning, it is estimated that 1973546 records need to be scanned. The show table status like "salary_static" command shows that the total number of records in the table is 82411520 bytes (Data_length). Each record page in innodb is 16KB, which means that a full table scan needs to read 82411520/1024/16 = 5030 record pages.

  • I/O Cost
5030 * 1.0 = 5030
  • CPU Cost
1973546 * 0.2 = 394709.2
  • Total query cost
5030 + 394709.2 = 399739.2

Index query cost

It is estimated that 812,174 records need to be scanned when indexing year_school_key. To use this index, you need to first query the rowId through the index and then return to the table through the rowId. MySQL considers that each table return requires a separate I/O cost

  • CPU Cost
812174 * 0.2 = 162434.8
  • I/O Cost
812174 * 1.0 = 812174
  • Total query cost
162434.8 + 812174 = 974608.8

Next, pay attention to reconsidering_access_paths_for_index_ordering, which means that the sorting will be optimized again in the end. The school_id_key index is selected here and vetoes the full table scan selected in the above where condition: "plan_changed": true. For details, see group-by-optimization.

{
    "reconsidering_access_paths_for_index_ordering": {
      "clause": "GROUP BY",
      "index_order_summary": {
        "table": "`salary_static`",
        "index_provides_order": true,
        "order_direction": "asc",
        "index": "school_id_key",
        "plan_changed": true,
        "access_type": "index_scan"
      }
    }
}

In fact, there is also a bug in sort index optimization, see Bug # 93845 for details.

optimization

By analyzing the SQL execution process, we can find that the wrong index is selected because there are too many records in the year_school_key index back table, which causes the estimated query cost to be greater than the full table scan and ultimately selects the wrong index.

Therefore, to reduce the execution time of the SQL, the next optimization solution is to reduce the table return operation of the SQL, that is, to make the SQL perform index coverage. The SQL statement involves only three fields: school_id, salary, and year. Therefore, a joint index of these three indexes is created, and attention is paid to the order of these three fields in the joint index: the where filter statement is executed first, so the year field is in the first place in the joint index; the group by statement is essentially the same as the order by statement, so it is placed after the where statement, that is, in the second place in the joint index; salary is placed at the end of the joint index only to reduce table returns.

CREATE INDEX year_school_salary_key ON salary_static (year, school_id, salary);

After creating the joint index, the results after executing the SQL statement are as follows. It only took 0.2 seconds to complete the query, which is 35 times less than the time of the school_id_key index.

insert image description here

insert image description here

Calculation of return rate

The above problem is that there are too many SQL queries at one time, resulting in too high a cost for returning to the table. In fact, the critical value of the above phenomenon can be calculated:

Assume that the size of a row of records is a bytes, the number of records in the table is b, and the critical number of records is c, then the number of record pages in the table is b*a/1024/16

Query cost of full table scan = I/O cost + CPU cost = b*a/1024/16 * 1.0 + b * 0.2


Query cost of index scan = I/O cost + CPU cost = c * 1.0 + c * 0.2 = c * 1.2


b*a/1024/16 * 1.0 + b * 0.2 = c * 1.2
Critical ratio = c/b 
= (a/1024/16 + 0.2)/1.2
= a * 5E-5 + 0.1667

That is, when a SQL query exceeds approximately 17% of the records in the table and a covering index cannot be used, the cost of returning the index to the table is too high and a full table scan is chosen. And this ratio increases slightly as the byte size of a single row increases.

This is the end of this article about the case of index invalidation caused by MySQL table return. For more relevant content about index invalidation caused by MySQL table return, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed explanation of table return and index coverage examples in MySQL
  • MySQL uses covering index to avoid table return and optimize query
  • How to use MySQL covering index and table return
  • How much does MySQL table return performance damage

<<:  Several ways to solve CSS style conflicts (summary)

>>:  JavaScript history object explained

Recommend

Vue: Detailed explanation of memory leaks

What is a memory leak? A memory leak means that a...

How to execute Linux shell commands in Docker

To execute a shell command in Docker, you need to...

Parsing the commonly used v-instructions in vue.js

Table of contents Explanation of v-text on if for...

Use auto.js to realize the automatic daily check-in function

Use auto.js to automate daily check-in Due to the...

Implementing a simple calculator with javascript

This article example shares the specific code of ...

Detailed example of MySQL (5.6 and below) parsing JSON

MySQL (5.6 and below) parses json #json parsing f...

How to remotely connect to MySQL database with Navicat Premium

The party that creates a new connection is equiva...

Nginx 502 Bad Gateway Error Causes and Solutions

I have encountered the Nginx 502 Bad Gateway erro...

The process of installing Docker in Linux system

In this blog, I will walk you through the process...

JavaScript exquisite snake implementation process

Table of contents 1. Create HTML structure 2. Cre...

JavaScript Design Pattern Command Pattern

The command pattern is a behavioral design patter...

The best way to start a jar package project under Centos7 server

Preface Everyone knows how to run a jar package o...

How to use Docker Swarm to build WordPress

cause I once set up WordPress on Vultr, but for w...

The use and difference between vue3 watch and watchEffect

1.watch listener Introducing watch import { ref, ...

Mybatis mysql delete in operation can only delete the first data method

Bugs As shown in the figure, I started to copy th...