Detailed explanation of count without filter conditions in MySQL

Detailed explanation of count without filter conditions in MySQL

count(*)

accomplish

1. MyISAM: Stores the total number of rows in the table on disk, and can directly return the number of rows for queries without filtering conditions.

If there is a count(*) filter condition, MyISAM cannot return quickly.

2. InnoDB: Read data row by row from the storage engine and then accumulate the count

Due to MVCC, it is uncertain how many rows InnoDB should return at the same time.

Example

Assume that table t has 10,000 records

session A session B session C
BEGIN;
SELECT COUNT(*) FROM t; (returns 10000)
INSERT INTO t; (Insert a row)
BEGIN;
INSERT INTO t (insert a row);
SELECT COUNT(*) FROM t; (returns 10000) SELECT COUNT(*) FROM t; (returns 10002) SELECT COUNT(*) FROM T; (returns 10001)

At the last moment, three sessions query the total number of rows of t at the same time, but the results are different

The default transaction isolation level of InnoDB is RR, which is implemented through MVCC

  • Each transaction needs to determine whether each row of records is visible to itself

optimization

1. InnoDB is an index-organized table

  • Clustered index tree: leaf nodes are data
  • Secondary index tree: leaf nodes are primary key values

2. The space occupied by the secondary index tree is much smaller than that of the clustered index tree

3. The optimizer will traverse the smallest index tree to minimize the amount of data scanned while ensuring the correct logic.

  • For count operations without filtering conditions, the effect is the same no matter which index tree is traversed
  • The optimizer will choose the best index tree for count(*)

show table status

mysql> SHOW TABLE STATUS\G;
*************************** 1. row ***************************
 Name: t
 Engine: InnoDB
 Version: 10
 Row_format: Dynamic
 Rows: 100256
 Avg_row_length: 47
 Data_length: 4734976
Max_data_length: 0
 Index_length: 5275648
 Data_free: 0
 Auto_increment: NULL
 Create_time: 2019-02-01 17:49:07
 Update_time: NULL
 Check_time: NULL
 Collation: utf8_general_ci
 Checksum: NULL
 Create_options:
 Comment:

SHOW TABLE STATUS also estimates through sampling (very imprecise), with an error of 40%~50%

Maintenance count

cache

plan

  • Use Redis to save the total number of rows in the table (no filtering conditions)
  • Each time a row is inserted into this table, Redis counts +1, and each time a row is deleted, Redis counts -1

shortcoming

Lost Updates

1. Redis may lose updates

2. Solution: After Redis restarts abnormally, execute count(*) once in the database

  • Abnormal restarts are uncommon, so the cost of a full table scan is acceptable

Imprecise logic – fatal

1. Scenario: Display the total number of operation records and the 100 most recent operation records

2. Redis and MySQL are two different storage systems and do not support distributed transactions, so it is impossible to get an accurate and consistent view.

Timing A

At time T3, session B finds the 100 rows that contain the most recently inserted record, but Redis has not yet added +1, which is a logical inconsistency.

time session A session B
T1
T2 Insert a row of data R;
T3 Read Redis count;
Query the latest 100 records;
T4 Redis count + 1;

Timing B

At time T3, session B finds that the 100 rows do not contain the most recently inserted record, but Redis has added 1, which is a logical inconsistency.

time session A session B
T1
T2 Redis count + 1;
T3 Read Redis count;
Query the latest 100 records;
T4 Insert a row of data R;

database

  • Put the count value into a separate count table C in the database
  • Using the crash-safe feature of InnoDB, the problem of crash loss is solved
  • Using InnoDB's transaction support feature, the problem of consistent view is solved
  • At time T3, session B's transaction of session A has not been committed yet, so the count value +1 of table C is not visible to itself, and the logic is consistent.

time session A session B
T1
T2 BEGIN;
The count value in Table C + 1;
T3 BEGIN;
Read the count value of meter C;
Query the latest 100 records;
COMMIT;
T4 Insert a row of data R;
COMMIT;

Performance of count

Semantics

1. count() is an aggregate function that makes judgments on the returned result set row by row.

If the parameter value of the count function is not NULL, the cumulative value is added by 1, otherwise it is not added and the cumulative value is returned.

2. count (field F)

  • Field F may be NULL
  • Indicates the total number of non-NULL fields in the result set that meet the conditions.

3. count(primary key ID), count(1), count(*)

  • Cannot be NULL
  • Indicates the total number of result sets that meet the conditions returned

4. The InnoDB engine returns whatever fields the server layer requires.

  • count(*) is an exception, it does not return the entire row, only the empty row

Performance comparison

count(field F)

1. If field F is defined as not allowing NULL, read this field from the record line by line, and accumulate it by line after passing the judgment.

  • Judging from the table structure, it is impossible for this field to be NULL

2. If field F is defined to allow NULL, read this field from the record line by line, and accumulate it by line after passing the judgment.

  • Judging from the table structure, this field may be NULL
  • Determine whether the field value is actually NULL

3. If there is no secondary index on field F, you can only traverse the entire table (clustered index)

4. Since InnoDB must return field F, the optimizer can make fewer optimization decisions

  • For example, you cannot choose the best index to traverse

count(primary key ID)

  • InnoDB will traverse the entire table (clustered index), take out the id value of each row, and return it to the server layer
  • After the server layer gets the id, it determines that it cannot be NULL, and then accumulates it by row
  • The optimizer may choose the best index to traverse

count(1)

  1. The InnoDB engine will traverse the entire table (clustered index), but will not take values
  2. The server layer puts a number 1 into each returned row to determine whether it is NULL and accumulates it row by row.
  3. count(1) is faster than count(primary key ID) because count(primary key ID) involves two operations.
  • Parsing Data Rows
  • Copy field value

count(*)

  1. count(*) does not retrieve all values, but is specially optimized to not retrieve values, because "*" is definitely not NULL, and accumulates by row
  2. No value: InnoDB returns a blank row to tell the server layer that it is not NULL and can be counted

Efficiency ranking

  1. count(field F) < count(primary key ID) < count(1) ≈ count(*)
  2. Try to use count(*)

Example

mysql> SHOW CREATE TABLE prop_action_batch_reward\G;
*************************** 1. row ***************************
 Table: prop_action_batch_reward
Create Table: CREATE TABLE `prop_action_batch_reward` (
 `id` bigint(20) NOT NULL,
 `source` int(11) DEFAULT NULL,
 `serial_id` bigint(20) NOT NULL,
 `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `user_ids` mediumtext,
 `serial_index` tinyint(4) DEFAULT '0',
 PRIMARY KEY (`id`),
 UNIQUE KEY `uniq_serial_id_source_index` (`serial_id`,`source`,`serial_index`),
 KEY `idx_create_time` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

count(field F)

No Index

There is no index on user_ids, but InnoDB must return the user_ids field, so it can only traverse the clustered index.

mysql> EXPLAIN SELECT COUNT(user_ids) FROM prop_action_batch_reward;
+----+-------------+--------------------------+------+---------------+------+------+------+------+------+------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+------+---------------+------+------+------+------+------+------+
| 1 | SIMPLE | prop_action_batch_reward | ALL | NULL | NULL | NULL | NULL | 16435876 | NULL |
+----+-------------+--------------------------+------+---------------+------+------+------+------+------+------+

mysql> SELECT COUNT(user_ids) FROM prop_action_batch_reward;
+-----------------+
| count(user_ids) |
+-----------------+
|17689788 |
+-----------------+
1 row in set (10.93 sec)

With index

1. There is an index on serial_id, which can be traversed by uniq_serial_id_source_index

2. However, since InnoDB must return the serial_id field, it will not traverse the logically equivalent idx_create_time.

  • If idx_create_time is selected and the serial_id field is returned, this means that the table must be returned.
mysql> EXPLAIN SELECT COUNT(serial_id) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------------------+--------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------------------+--------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL | uniq_serial_id_source_index | 15 | NULL | 16434890 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------------------+--------+------+----------+-------------+

mysql> SELECT COUNT(serial_id) FROM prop_action_batch_reward;
+------------------+
| count(serial_id) |
+------------------+
|17705069 |
+------------------+
1 row in set (5.04 sec)

count(primary key ID)

The optimizer chooses the optimal index idx_create_time to traverse instead of the clustered index

mysql> EXPLAIN SELECT COUNT(id) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL | idx_create_time | 5 | NULL | 16436797 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+

mysql> SELECT COUNT(id) FROM prop_action_batch_reward;
+-----------+
| count(id) |
+-----------+
|17705383 |
+-----------+
1 row in set (4.54 sec)

count(1)

mysql> EXPLAIN SELECT COUNT(1) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL | idx_create_time | 5 | NULL | 16437220 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+

mysql> SELECT COUNT(1) FROM prop_action_batch_reward;
+----------+
| count(1) |
+----------+
|17705808 |
+----------+
1 row in set (4.12 sec)

count(*)

mysql> EXPLAIN SELECT COUNT(*) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL | idx_create_time | 5 | NULL | 16437518 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+

mysql> SELECT COUNT(*) FROM prop_action_batch_reward;
+----------+
| count(*) |
+----------+
|17706074|
+----------+
1 row in set (4.06 sec)

References

"MySQL Practice 45 Lectures"

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support for 123WORDPRESS.COM.

You may also be interested in:
  • Optimized implementation of count() for large MySQL tables
  • Usage and performance optimization techniques of aggregate function count in MySQL
  • Sharing on count optimization issues in innodb in mysql
  • Let's talk about the performance of MySQL's COUNT(*)
  • Detailed explanation of the correct use of the count function in MySQL
  • A brief discussion on MySQL count of rows
  • Summary of methods to improve mysql count
  • Summary of the differences between count(*), count(1) and count(col) in MySQL
  • Detailed explanation of mySQL count data examples in multiple tables
  • Use and optimization of MySQL COUNT function

<<:  The whole process record of introducing Vant framework into WeChat applet

>>:  Linux exposes Sudo privilege escalation vulnerability, any user can also run root commands

Recommend

Solution to the problem of MySQL thread in Opening tables

Problem Description Recently, there was a MySQL5....

WeChat Mini Program User Authorization Best Practices Guide

Preface When developing WeChat applets, you often...

Design Reference Beautiful and Original Blog Design

All blogs listed below are original and uniquely ...

Implementation of JavaScript downloading linked images and uploading them

Since we are going to upload pictures, the first ...

Tomcat multi-instance deployment and configuration principles

1. Turn off the firewall and transfer the softwar...

HTML realizes hotel screening function through form

<!doctype html> <html xmlns="http:/...

Linux kernel device driver character device driver notes

/******************** * Character device driver**...