Usage and performance optimization techniques of aggregate function count in MySQL

Usage and performance optimization techniques of aggregate function count in MySQL

The environment of this article is Windows 10, and the MySQL version is 5.7.12-log

1. Basic Use

There are two basic functions of count:

  • Count the number of data in a column;
  • Count the number of rows in the result set;

Used to obtain the number of data that meet the conditions. However, there are some situations that are different from the impression in use, such as when count is applied to one column, multiple columns, and when * is used to express the entire row, the effect is different.

The example table is as follows:

CREATE TABLE `NewTable` (
`id` int(11) NULL DEFAULT NULL ,
`name` varchar(30) NULL DEFAULT NULL ,
`country` varchar(50) NULL DEFAULT NULL ,
`province` varchar(30) NULL DEFAULT NULL ,
`city` varchar(30) NULL DEFAULT NULL 
)ENGINE=InnoDB 

這里寫圖片描述

1.1 Do not count NULL values

If there are NULL values, they will be filtered out in the returned results.

select count(country) from person;

The returned results are as follows:

這里寫圖片描述

If the data item that meets the conditions does not exist, the structure returns 0. This method is often used to determine whether data that meets the conditions exists; the returned data type is bigint.

1.2 Processing of count(*)

The processing of count(*) is a little different. It will return the number of all data, but will not filter out NULL values. It is not equivalent to expanding into all columns, but will directly ignore all columns and count all rows directly. The statement is as follows:

select count(*) from person;

The returned results are as follows:

這里寫圖片描述

When you want to return the number of all data, but don't want to include columns that are all NULL, it is impossible to use count(*). However, in 1.1, it is mentioned that count will filter out NULL when acting on a column, so is it correct to write it directly like this?

select count(id, `name`, country, province, city) from person;

That's wrong. Count can only work on a single column, not multiple columns, so the above writing is wrong.

In addition, the count(*) statement has been optimized in the MyISAM storage engine. The number of data rows in each table is stored in the storage engine and can be obtained quickly. However, in transactional storage engines, such as InnoDB, multiple transactions are involved.

1.3 Processing of count(distinct …)

count(distinct ...) returns the number of rows that are distinct but not NULL. This is different from using only distinct because distinct does not filter NULL values. For details, see How to use distinct in MySQL.

- If there is no data that meets the conditions, it returns 0;
- This statement can be applied to multiple columns. When there is a difference between the columns, the entire row of data is considered different, which has the same effect as distinct applied to multiple columns.

select count(DISTINCT country) from person;

The returned results are as follows:

這里寫圖片描述

But for the combination of count(*) and count(distinct), as follows:

select count(DISTINCT *) from person;

This statement is wrong and cannot be executed, so it is different from select count(DISTINCT *) from person.

2. Performance Optimization

Usually, the count(*) operation requires scanning a large number of rows in the data table. Avoiding scanning a large amount of data becomes the key to optimizing the statement. This issue can be considered from the following two perspectives.

2.1 Optimization at the database level

2.1.1 For count(*)

MySQL has been optimized for count(*) internally. The query using explain is as follows:

EXPLAIN select count(*) from person;

這里寫圖片描述

It can be seen that the query does not use a full table scan or an index, and does not even need to query the data table. In the example database above, the storage engine of the library is InnoDB, and there is neither a primary key nor an index.

2.2 Counting a single column

The query is as follows:

EXPLAIN select count(country) from person where id > 2;

這里寫圖片描述

It was found that the entire table was scanned without a primary key and index. One of the most direct ways to avoid scanning a large number of rows in the data is to use indexes:

When id is set as a general index: INDEX abc (id) USING BTREE .

Execute the query as follows:

EXPLAIN select count(country) from person where id > 2;

The results are as follows:

這里寫圖片描述

At this time, it is found that the index is not used and the full table scan is still performed. When the following is executed:

EXPLAIN select count(country) from person where id > 4;

The results are as follows:

這里寫圖片描述

This is a range query using an index, which is obviously better than the above.

But the question is, why do we use indexes sometimes and not sometimes? In the first query above, possible keys were detected but not used? If anyone knows please explain!

Set id as the primary key and execute the query as follows:

EXPLAIN select count(country) from person where id > 2;

The results are as follows:

這里寫圖片描述

2.2 Optimizing at the application level

To optimize at the application level, you can consider introducing a cache subsystem into the system architecture, such as Memcached, which was commonly used in the past, or Redis, which is very popular now, but this will increase the complexity of the system.

MySQL group by and aggregate functions (sum, count, etc.) examples

First, let's take a look at MySQL aggregate functions.

A special function in MySQL: aggregate function, SUM, COUNT, MAX, MIN, AVG, etc. The fundamental difference between these functions and other functions is that they generally act on multiple records. For example:

SELECT SUM(score) FROM table

This sql means to query the sum of all score columns in the table table.

Next, we use an example to explain how to use aggregate functions in the group by statement.

The book table is as follows:

id first_name last_name city
1 Jason Martin Toronto
2 Alison Mathews Vancouver
3 James Mathews Vancouver
4 Celia Rice Vancouver
5 David Larry New York

Now we need to group the cities and get the number of records in each group. We need the count aggregation function.

SELECT *,count(*) FROM book GROUP BY city

The result is:

id first_name last_name city count(*)
1 Jason Martin Toronto 1
2 Alison Mathews Vancouver 3
5 David Larry New York 1

First, divide the returned records into multiple groups based on city. This is the literal meaning of GROUP BY. After grouping, aggregate functions are then used to operate on different fields (one or more records) in each group.

You may also be interested in:
  • MySQL aggregate function sorting
  • MySQL grouping queries and aggregate functions
  • Analysis of MySQL query sorting and query aggregation function usage
  • How to add conditional expressions to aggregate functions in MySql
  • MySQL essential basics: grouping function, aggregate function, grouping query detailed explanation

<<:  Method to detect whether ip and port are connectable

>>:  Solve the problem of yum installation error Protected multilib versions

Recommend

jQuery+swiper component realizes the timeline sliding year tab switching effect

Result: Implementation code: Need to be used with...

Telnet is moved to busybox-extras in Alpine image

The telnet in the Alpine image has been moved to ...

jQuery custom magnifying glass effect

This article example shares the specific code of ...

How to configure Nginx load balancing

Table of contents Nginx load balancing configurat...

Example of using @media responsive CSS to adapt to various screens

Definition and Use Using @media queries, you can ...

PHP-HTMLhtml important knowledge points notes (must read)

1. Use frameset, frame and iframe to realize mult...

MySql development of automatic synchronization table structure

Development Pain Points During the development pr...

Implementation of Docker deployment of Django+Mysql+Redis+Gunicorn+Nginx

I. Introduction Docker technology is very popular...

Introduction to the common API usage of Vue3

Table of contents Changes in the life cycle react...

Detailed explanation of galera-cluster deployment in cluster mode of MySQL

Table of contents 1: Introduction to galera-clust...

Example of horizontal arrangement of li tags in HTMl

Most navigation bars are arranged horizontally as...

How to simplify Redux with Redux Toolkit

Table of contents Problems Redux Toolkit solves W...