Optimizing the slow query of MySQL aggregate statistics data

Written in front

When we operate the database in our daily life, such as order table, access record table, and product table.

We often deal with statistical problems such as calculating the sum of data columns and the number of data rows.

As the business develops, these tables will become larger and larger. If not handled properly, the speed of querying statistics will become slower and slower until the business can no longer tolerate it.

Therefore, we need to first understand and think about these scenario knowledge points, and reserve some optimization space to support business development at the beginning of the design.

SQL Aggregate Function

In MySQL and other data, aggregate functions are supported to facilitate data calculation.

The following methods are common

Take the average value AVG()
SUM()
Maximum value MAX()
Minimum value MIN()
COUNT()

Demonstrate several simple SQL statements:

Query the total number of orders with u_id 100

select count(id) from orders where u_id = 100;

Query the total consumption of orders with u_id 100

select sum(order_amount) from orders where u_id = 100;

Find the best-selling products

select max(sell_num) from goods

Statistics on the number and total amount of orders in July

select count(id) as count, sum(order_amount) as total_amount 
from orders where order_date between 20190701 and 20190731 and is_pay = 1

If at this time, the total number of order tables is 100 million. And this SQL statement runs very slowly, how should we troubleshoot and optimize it?

Some students may say: There are many rows, so add an index on the date field, so that the screening will be very fast.

The total number is 100 million. Assuming there are 10 million orders in July, the screening speed will naturally increase a lot after adding the index. But is our problem really solved at this point?

In this kind of aggregate function, the result needs to be calculated by traversing each piece of data. For example, if we want to count the total number of orders, we need to read the order amount of each row and then add them up.

That is to say, in this statistical SQL, you need to first filter 10 million data from 100 million data, and then traverse these data for calculation. It will be very slow at this time.

Adding indexes does not solve the problem of slow aggregate function statistics

Optimizing the aggregation statistics solution

Budget in Advance

Create a statistical data table, separated by date, such as: how many orders were sold on 20190801, the amount and other data.
When an order is generated (the data can be counted after payment is completed), the amount and quantity are added to the corresponding date in the statistical data table.

It should be noted that if there are scenarios such as refunds that will affect the reduction of data, remember to handle them accordingly

When we need to count the data for August, we only need to traverse and calculate the thirty or so rows of data for this month.

Timed landing

We can use easyswoole, scheduled tasks, etc. To calculate the total at a fixed time (for example, every 20 minutes), and then update it to the statistical data table.

Advantages: There is less processing to be done, and there is no need to change the API such as refund operation. It only needs to rely on the data of the original order table, and regularly count and refresh the statistical data.

It should be noted that different delivery frequencies should be set according to the popularity of different orders. For example, if the probability of data changes within a week is relatively large, it may be delivered within 20 minutes. The data from a year ago is unlikely to change, so you can choose to synchronize it once a day, or even not refresh it if you are sure it will not change.

Summarize

Indexes cannot solve the problem of slow SQL statements for statistical aggregation of data

Aggregate functions should be used with caution or not at all, because we cannot estimate how many rows of data we need to scan to calculate the future data volume.

The optimization plan is inseparable from the statistical table, and the statistical data calculated in a certain period need to be stored.

This is the end of this article about the slow query of MySQL aggregate statistical data. For more relevant content about the slow query of MySQL aggregate statistical data, please search for previous articles on 123WORDPRESS.COM or continue to browse the related articles below. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

MySQL and PHP basics and applications: data query
A brief discussion on whether too many MySQL data queries will cause OOM
Detailed explanation of MySQL database tens of millions of data query and storage
MySQL json format data query operation
MySQL big data query optimization experience sharing (recommended)
MySQL and PHP basics and applications: data query statements

<<: Detailed explanation of Promises in JavaScript

>>: How to dynamically modify container port mapping in Docker

In-depth understanding of JavaScript event execution mechanism

Optimizing the slow query of MySQL aggregate statistics data

Written in front

SQL Aggregate Function

Optimizing the aggregation statistics solution

Summarize

In-depth understanding of JavaScript event execution mechanism

Application of Beautiful Style Sheets in XHTML+CSS Web Page Creation

The latest MySQL 5.7.23 installation and configuration graphic tutorial

Tutorial on installing VMWare15.5 under Linux

hr horizontal line style example code

Briefly understand the MYSQL database optimization stage

After Apache is installed, the service cannot be started (error code 1 appears when starting the service)

Analysis of problems caused by MySQL case sensitivity

Innodb system table space maintenance method

Summary of installation steps and problems encountered in decompressing the mysql5.7.24 version

Recommend

Setting up Docker proxy under CentOS 7 (environment variable configuration of Systemd service under Linux)

A brief discussion on the use and analysis of nofollow tags

Summary of three ways to implement ranking in MySQL without using order by

mysql code to implement sequence function

JavaScript design pattern learning proxy pattern

CSS stacking and z-index example code

The problem of Chinese garbled characters appearing when connecting to MySQL database in Idea

CSS flexible layout FLEX, media query and mobile click event implementation

Bootstrap 3.0 learning notes button style

idea uses docker plug-in to achieve one-click automated deployment

A set of code based on Vue-cli supports multiple projects

MySQL learning to create and operate databases and table DDL for beginners

Simple steps to implement H5 WeChat public account authorization

How to use CSS media query aspect-ratio less

Vue3.0 uses the vue-grid-layout plug-in to implement drag layout