Usage and performance optimization techniques of aggregate function count in MySQL

Usage and performance optimization techniques of aggregate function count in MySQL

The environment of this article is Windows 10, and the MySQL version is 5.7.12-log

1. Basic Use

There are two basic functions of count:

  • Count the number of data in a column;
  • Count the number of rows in the result set;

Used to obtain the number of data that meet the conditions. However, there are some situations that are different from the impression in use, such as when count is applied to one column, multiple columns, and when * is used to express the entire row, the effect is different.

The example table is as follows:

CREATE TABLE `NewTable` (
`id` int(11) NULL DEFAULT NULL ,
`name` varchar(30) NULL DEFAULT NULL ,
`country` varchar(50) NULL DEFAULT NULL ,
`province` varchar(30) NULL DEFAULT NULL ,
`city` varchar(30) NULL DEFAULT NULL 
)ENGINE=InnoDB 

這里寫圖片描述

1.1 Do not count NULL values

If there are NULL values, they will be filtered out in the returned results.

select count(country) from person;

The returned results are as follows:

這里寫圖片描述

If the data item that meets the conditions does not exist, the structure returns 0. This method is often used to determine whether data that meets the conditions exists; the returned data type is bigint.

1.2 Processing of count(*)

The processing of count(*) is a little different. It will return the number of all data, but will not filter out NULL values. It is not equivalent to expanding into all columns, but will directly ignore all columns and count all rows directly. The statement is as follows:

select count(*) from person;

The returned results are as follows:

這里寫圖片描述

When you want to return the number of all data, but don't want to include columns that are all NULL, it is impossible to use count(*). However, in 1.1, it is mentioned that count will filter out NULL when acting on a column, so is it correct to write it directly like this?

select count(id, `name`, country, province, city) from person;

That's wrong. Count can only work on a single column, not multiple columns, so the above writing is wrong.

In addition, the count(*) statement has been optimized in the MyISAM storage engine. The number of data rows in each table is stored in the storage engine and can be obtained quickly. However, in transactional storage engines, such as InnoDB, multiple transactions are involved.

1.3 Processing of count(distinct …)

count(distinct ...) returns the number of rows that are distinct but not NULL. This is different from using only distinct because distinct does not filter NULL values. For details, see How to use distinct in MySQL.

- If there is no data that meets the conditions, it returns 0;
- This statement can be applied to multiple columns. When there is a difference between the columns, the entire row of data is considered different, which has the same effect as distinct applied to multiple columns.

select count(DISTINCT country) from person;

The returned results are as follows:

這里寫圖片描述

But for the combination of count(*) and count(distinct), as follows:

select count(DISTINCT *) from person;

This statement is wrong and cannot be executed, so it is different from select count(DISTINCT *) from person.

2. Performance Optimization

Usually, the count(*) operation requires scanning a large number of rows in the data table. Avoiding scanning a large amount of data becomes the key to optimizing the statement. This issue can be considered from the following two perspectives.

2.1 Optimization at the database level

2.1.1 For count(*)

MySQL has been optimized for count(*) internally. The query using explain is as follows:

EXPLAIN select count(*) from person;

這里寫圖片描述

It can be seen that the query does not use a full table scan or an index, and does not even need to query the data table. In the example database above, the storage engine of the library is InnoDB, and there is neither a primary key nor an index.

2.2 Counting a single column

The query is as follows:

EXPLAIN select count(country) from person where id > 2;

這里寫圖片描述

It was found that the entire table was scanned without a primary key and index. One of the most direct ways to avoid scanning a large number of rows in the data is to use indexes:

When id is set as a general index: INDEX abc (id) USING BTREE .

Execute the query as follows:

EXPLAIN select count(country) from person where id > 2;

The results are as follows:

這里寫圖片描述

At this time, it is found that the index is not used and the full table scan is still performed. When the following is executed:

EXPLAIN select count(country) from person where id > 4;

The results are as follows:

這里寫圖片描述

This is a range query using an index, which is obviously better than the above.

But the question is, why do we use indexes sometimes and not sometimes? In the first query above, possible keys were detected but not used? If anyone knows please explain!

Set id as the primary key and execute the query as follows:

EXPLAIN select count(country) from person where id > 2;

The results are as follows:

這里寫圖片描述

2.2 Optimizing at the application level

To optimize at the application level, you can consider introducing a cache subsystem into the system architecture, such as Memcached, which was commonly used in the past, or Redis, which is very popular now, but this will increase the complexity of the system.

MySQL group by and aggregate functions (sum, count, etc.) examples

First, let's take a look at MySQL aggregate functions.

A special function in MySQL: aggregate function, SUM, COUNT, MAX, MIN, AVG, etc. The fundamental difference between these functions and other functions is that they generally act on multiple records. For example:

SELECT SUM(score) FROM table

This sql means to query the sum of all score columns in the table table.

Next, we use an example to explain how to use aggregate functions in the group by statement.

The book table is as follows:

id first_name last_name city
1 Jason Martin Toronto
2 Alison Mathews Vancouver
3 James Mathews Vancouver
4 Celia Rice Vancouver
5 David Larry New York

Now we need to group the cities and get the number of records in each group. We need the count aggregation function.

SELECT *,count(*) FROM book GROUP BY city

The result is:

id first_name last_name city count(*)
1 Jason Martin Toronto 1
2 Alison Mathews Vancouver 3
5 David Larry New York 1

First, divide the returned records into multiple groups based on city. This is the literal meaning of GROUP BY. After grouping, aggregate functions are then used to operate on different fields (one or more records) in each group.

You may also be interested in:
  • MySQL aggregate function sorting
  • MySQL grouping queries and aggregate functions
  • Analysis of MySQL query sorting and query aggregation function usage
  • How to add conditional expressions to aggregate functions in MySql
  • MySQL essential basics: grouping function, aggregate function, grouping query detailed explanation

<<:  Method to detect whether ip and port are connectable

>>:  Solve the problem of yum installation error Protected multilib versions

Recommend

How to add website icon?

The first step is to prepare an icon making softwa...

Example of implementing todo application with Vue

background First of all, I would like to state th...

Use non-root users to execute script operations in docker containers

After the application is containerized, when the ...

Use three.js to achieve cool acid style 3D page effects

This article mainly introduces how to use the Rea...

Detailed tutorial on how to install mysql8.0 using Linux yum command

1. Do a good job of cleaning before installation ...

A brief discussion on the $notify points of element

My original intention was to encapsulate the $not...

How to implement second-level scheduled tasks with Linux Crontab Shell script

1. Write Shell script crontab.sh #!/bin/bash step...

Detailed steps to modify MySQL stored procedures

Preface In actual development, business requireme...

Example of converting JavaScript flat array to tree structure

Table of contents 10,000 pieces of data were lost...

Summary of Problems in Installation and Usage of MySQL 5.7.19 Winx64 ZIP Archive

Today I learned to install MySQL, and some proble...

How to implement parallel downloading of large files in JavaScript

Table of contents 1. HTTP Range Request 1.1 Range...