Reasons why MySQL 8.0 statistics are inaccurate

Reasons why MySQL 8.0 statistics are inaccurate

Preface

Whether it is Oracle or MySQL, the new features introduced in the new version, on the one hand, bring improvements to the product's functionality, performance, user experience, etc., but on the other hand, may also bring some problems, such as code bugs, problems caused by incorrect customer usage, etc.

Case Study

MySQL 5.7 scenario

(1) First, create two tables and insert data

mysql> select version();
+------------+
| version() |
+------------+
| 5.7.30-log |
+------------+
1 row in set (0.00 sec)

mysql> show create table test\G
*************************** 1. row ***************************
    Table: test
Create Table: CREATE TABLE `test` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `k` int(10) unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> show create table sbtest1\G
*************************** 1. row ***************************
    Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `k` int(10) unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.14 sec)

(2) Check the statistical information of the two tables, both are relatively accurate

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| table_schema | table_name | table_rows |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| table_schema | table_name | table_rows |
+--------------+------------+------------+
| test | sbtest1 | 947263 |
+--------------+------------+------------+
1 row in set (0.00 sec)

(3) We continue to insert 10 million records into the test table and check the statistics again. The statistics are still relatively accurate because by default, if the data change exceeds 10%, the statistics will be updated.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 10000100 |
+----------+
1 row in set (1.50 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| table_schema | table_name | table_rows |
+--------------+------------+------------+
| test | test | 9749036 |
+--------------+------------+------------+
1 row in set (0.00 sec)

MySQL 8.0 scenario

(1) Next, let’s look at the situation under 8.0. Similarly, we create two tables and insert the same records.

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.20 |
+-----------+
1 row in set (0.00 sec)

mysql> show create table test\G
*************************** 1. row ***************************
    Table: test
Create Table: CREATE TABLE `test` (
 `id` int unsigned NOT NULL AUTO_INCREMENT,
 `k` int unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> show create table sbtest1\G
*************************** 1. row ***************************
    Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
 `id` int unsigned NOT NULL AUTO_INCREMENT,
 `k` int unsigned NOT NULL DEFAULT '0',
 `c` char(120) NOT NULL DEFAULT '',
 `pad` char(60) NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci MAX_ROWS=1000000
1 row in set (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.02 sec)

(2) Check the statistical information of the two tables, both are relatively accurate

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | sbtest1 | 947468 |
+--------------+------------+------------+
1 row in set (0.01 sec)

(3) Similarly, we continue to insert 10 million records into the test table and check the statistics again. We find that table_rows still shows 100 records, which is a large deviation.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 10000100 |
+----------+
1 row in set (0.33 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

Cause Analysis

So what causes inaccurate statistics? In fact, MySQL 8.0 caches the statistical information in the view tables and statistics in order to improve the query efficiency of information_schema. The cache expiration time is determined by the parameter information_schema_stats_expiry, which defaults to 86400s. If you want to obtain the latest statistical information, you can use the following two methods:

(1) analyze table

(2) Set information_schema_stats_expiry=0

Keep exploring

So what are the consequences of inaccurate statistical information? Will it affect the execution plan? Next, we test it again

Test 1: The number of records in table test is 100, and the number of records in table sbtest1 is 1 million.

Execute the following SQL and check the execution plan. It uses NLJ. The small table test is used as the driving table (full table scan), and the large table sbtest1 is used as the driven table (primary key association). The execution efficiency is very fast.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.02 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | sbtest1 | 947468 |
+--------------+------------+------------+
1 row in set (0.01 sec)

mysql> select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977';
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 1 | 501885 | 08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977 | 63188288836-92351140030-06390587585-66802097351-49282961843 |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-4664
+----+-------------+-------+------------+--------+---------------+--------+---------+-----------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+--------+---------+-----------+----------+----------+-------------+
| 1 | SIMPLE | t | NULL | ALL | PRIMARY | NULL | NULL | NULL | 100 | 10.00 | Using where |
| 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.t.id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+--------+---------+-----------+----------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

Test 2: Table test has about 10 million records, and table sbtest1 has 1 million records

Execute SQL again and check the execution plan. It also follows NLJ. The small table sbtest1 is used as the driving table and the large table test is used as the driven table. This is also the correct execution plan.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 10000100 |
+----------+
1 row in set (0.33 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.02 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='test';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | test | 100 |
+--------------+------------+------------+
1 row in set (0.00 sec)

mysql> select table_schema,table_name,table_rows from tables where table_name='sbtest1';
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ROWS |
+--------------+------------+------------+
| test | sbtest1 | 947468 |
+--------------+------------+------------+
1 row in set (0.01 sec)

mysql> select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977';
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 1 | 501885 | 08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977 | 63188288836-92351140030-06390587585-66802097351-49282961843 |
+----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.37 sec)

mysql> explain select t.* from test t inner join sbtest1 t1 on t.id=t1.id where tc='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977' and t1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977';
+----+-------------+-------+------------+--------+---------------+--------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+--------+---------+------------+--------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 947468 | 10.00 | Using where |
| 1 | SIMPLE | t | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.t1.id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+--------+---------+------------+--------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

Why didn't the optimizer choose the wrong execution plan? As mentioned in the previous article, MySQL 8.0 stores metadata information in the data dictionary table under the mysql library. The information_schema library only provides relatively convenient views for users to query. Therefore, when the optimizer selects an execution plan, it will obtain statistical information from the data dictionary table and generate a correct execution plan.

Summarize

To improve the query efficiency of information_schema, MySQL 8.0 caches the statistical information in the views tables and statistics. The cache expiration time is determined by the parameter information_schema_stats_expiry (it is recommended to set the parameter value to 0). This may cause users to be unable to obtain the latest and accurate statistical information when querying the corresponding views, but it does not affect the selection of the execution plan.

The above are the details of the reasons why MySQL 8.0 statistics are inaccurate. For more information about inaccurate MySQL 8.0 statistics, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Gearman + MySQL to achieve persistence operation example
  • Detailed explanation of deploying MySQL using Docker (data persistence)
  • Detailed explanation of Java emoji persistence in MySQL
  • MySQL 8 new features: how to modify persistent global variables
  • MySQL 8 new features: detailed explanation of persistence of auto-increment primary key
  • Overview of MySQL Statistics
  • Detailed explanation of MySQL persistent statistics

<<:  Javascript basics about built-in objects

>>:  5 Commands to Use the Calculator in Linux Command Line

Recommend

Vue basic instructions example graphic explanation

Table of contents 1. v-on directive 1. Basic usag...

30 free high-quality English ribbon fonts

30 free high-quality English ribbon fonts for down...

Vue Virtual DOM Quick Start

Table of contents Virtual DOM What is virtual dom...

How to use Linux commands in IDEA

Compared with Windows system, Linux system provid...

Vue implements the right slide-out layer animation

This article example shares the specific code of ...

Solution to ERROR 1054 (42S22) when changing password in MySQL 5.7

I have newly installed MySQL 5.7. When I log in, ...

Implementation of Docker private library

Installing and deploying a private Docker Registr...

How to install mysql via yum on centos7

1. Check whether MySQL is installed yum list inst...

Ubuntu 16.04 64-bit compatible with 32-bit programs in three steps

Step 1: Confirm the architecture of your system d...

Vue+echart realizes double column chart

This article shares the specific code of vue+echa...

When MySQL is upgraded to 5.7, WordPress reports error 1067 when importing data

I recently upgraded MySQL to 5.7, and WordPress r...

11 ways to remove duplicates from js arrays

In actual work or interviews, we often encounter ...