Let's talk about the performance of MySQL's COUNT(*)

Let's talk about the performance of MySQL's COUNT(*)

Preface

Basically, programmers in the workplace use count(*), count(1) or count(primary key) to count the number of rows in a database table. Do you understand the differences and performance between them?

In fact, during the development process, it is a very time-consuming operation for programmers to count the total number of rows in a large table. So which method should we use to count faster?

Next, let’s talk about the method and performance of counting the total number of rows in MySQL.

Which is faster: count(*), count(1), or count(primary key)?

1. Create a table and insert 10 million records for experimental testing:

# Create a test table CREATE TABLE `t6` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(50) NOT NULL,
 `status` tinyint(4) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `idx_status` (`status`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

# Create a stored procedure to insert 1000w data CREATE PROCEDURE insert_1000w()
BEGIN
  DECLARE i INT;
  SET i=1;
  WHILE i<=10000000 DO
    INSERT INTO t6(name,status) VALUES('god-jiang-666',1);
    SET i=i+1;
  END WHILE;
END;

#Call the stored procedure to insert 10 million rows of data call insert_1000w();

2. Analyze the experimental results

# It took 0.572 seconds select count(*) from t6; 

insert image description here

# It took 0.572 seconds select count(1) from t6; 

insert image description here

# It took 0.580 seconds select count(id) from t6; 

insert image description here

# It took 0.620 seconds select count(*) from t6 force index (primary); 

insert image description here

From the above experiment, we can conclude that count(*) and count(1) are the fastest, followed by count(id), and the slowest is when count uses a mandatory primary key .

Let's continue to test their respective execution plans:

explain select count(*) from t6;
show warnings; 

insert image description here

insert image description here

explain select count(1) from t6;
show warnings; 

insert image description here

insert image description here

explain select count(id) from t6;
show warnings; 

insert image description here

insert image description here

explain select count(*) from t6 force index (primary);
show warnings; 

insert image description here

insert image description here

From the above experiment, we can draw the following three points:

  1. The MySQL query optimizer rewrites count(*) to count(0) and selects the idx_status index.
  2. Both count(1) and count(id) select the idx_statux index
  3. After adding force index (primary), the forced index is used.

This idx_status is equivalent to a secondary auxiliary index tree. Its purpose is to illustrate that when InnoDB processes count(*), if there is an auxiliary index tree, it will give priority to the auxiliary index tree to count the total number of rows.

To verify the conclusion that count(*) will give priority to the auxiliary index tree, let's continue with the following experiment:

# Delete the idx_status index and continue to execute count(*)
alter table t6 drop index idx_status;

explain select count(*) from t6; 

insert image description here

From the above experiment, we can conclude that if the auxiliary index tree idx_status is deleted, count(*) will choose to use the primary key index. So the conclusion is: count(*) will give priority to the auxiliary index. If there is no auxiliary index, the primary key index will be used.

Why does count(*) prefer secondary indexes?

Prior to MySQL 5.7.18, InnoDB processed count(*) statements by scanning the clustered index.

Starting with MySQL 5.7.18, InnoDB handles COUNT(*) statements by traversing the smallest available secondary index. If no secondary index exists, the clustered index is scanned.

Why does the new version use secondary indexes to process count(*)?

Because the leaf nodes of the InnoDB secondary index tree store the primary key, and the leaf nodes of the primary key index tree store the entire row of data, the secondary index tree is smaller than the primary key index tree. Therefore, the query optimizer gives priority to secondary indexes based on cost considerations. So index count(*) is faster than count(primary key).

Summarize

The conclusion of this article is count(*)=count(1)>count(id) .

Why is count(id) slower without the primary key index? Because count(id) needs to retrieve the primary key, then check whether it is empty, and then accumulate it, which is more expensive.

Count(*) will count all NOT NULL and NULL fields, while count(id) will not count NULL fields, so we should try to use NOT NULL when building tables and give it a default of empty.

Finally, when you want to total the number of rows in a database table in the future, you can boldly use count(*) or count(1).

References

  • Chapter 6 of "High Performance MySQL" (3rd Edition) Optimizing COUNT() Queries
  • "MySQL Practice 45 Lectures" by Lin Xiaobin

This is the end of this article about the performance of MySQL COUNT(*). For more information about MySQL COUNT(*), please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Optimized implementation of count() for large MySQL tables
  • Usage and performance optimization techniques of aggregate function count in MySQL
  • Sharing on count optimization issues in innodb in mysql
  • Detailed explanation of the correct use of the count function in MySQL
  • A brief discussion on MySQL count of rows
  • Summary of methods to improve mysql count
  • Detailed explanation of count without filter conditions in MySQL
  • Summary of the differences between count(*), count(1) and count(col) in MySQL
  • Detailed explanation of mySQL count data examples in multiple tables
  • Use and optimization of MySQL COUNT function

<<:  Limit HTML text box input to only numbers and decimal points

>>:  Use non-root users to execute script operations in docker containers

Recommend

Two ways to enable firewall in Linux service

There are two ways: 1. Service method Check the f...

Python writes output to csv operation

As shown below: def test_write(self): fields=[] f...

Several ways to easily traverse object properties in JS

Table of contents 1. Self-enumerable properties 2...

The phenomenon of margin-top collapse and the specific solution

What is margin-top collapse Margin-top collapse i...

Application of CSS3 animation effects in activity pages

background Before we know it, a busy year is comi...

A brief discussion on the role of Vue3 defineComponent

Table of contents defineComponent overload functi...

Solve the problem of blocking positioning DDL in MySQL 5.7

In the previous article "MySQL table structu...

iframe multi-layer nesting, unlimited nesting, highly adaptive solution

There are three pages A, B, and C. Page A contains...

Solutions to MySQL OOM (memory overflow)

OOM stands for "Out Of Memory", which m...

The three new indexes added in MySQL 8 are hidden, descending, and functions

Table of contents Hidden, descending, and functio...

What is Nginx load balancing and how to configure it

What is Load Balancing Load balancing is mainly a...

How to create LVM for XFS file system in Ubuntu

Preface lvm (Logical Volume Manager) logical volu...

Building FastDFS file system in Docker (multi-image tutorial)

Table of contents About FastDFS 1. Search for ima...

Introducing the code checking tool stylelint to share practical experience

Table of contents Preface text 1. Install styleli...