Optimizing the slow query of MySQL aggregate statistics data

Optimizing the slow query of MySQL aggregate statistics data

Written in front

When we operate the database in our daily life, such as order table, access record table, and product table.

We often deal with statistical problems such as calculating the sum of data columns and the number of data rows.

As the business develops, these tables will become larger and larger. If not handled properly, the speed of querying statistics will become slower and slower until the business can no longer tolerate it.

Therefore, we need to first understand and think about these scenario knowledge points, and reserve some optimization space to support business development at the beginning of the design.

SQL Aggregate Function

In MySQL and other data, aggregate functions are supported to facilitate data calculation.

The following methods are common

Take the average value AVG()
SUM()
Maximum value MAX()
Minimum value MIN()
COUNT()

Demonstrate several simple SQL statements:

Query the total number of orders with u_id 100

select count(id) from orders where u_id = 100;

Query the total consumption of orders with u_id 100

select sum(order_amount) from orders where u_id = 100;

Find the best-selling products

select max(sell_num) from goods

Statistics on the number and total amount of orders in July

select count(id) as count, sum(order_amount) as total_amount 
from orders where order_date between 20190701 and 20190731 and is_pay = 1

If at this time, the total number of order tables is 100 million. And this SQL statement runs very slowly, how should we troubleshoot and optimize it?

Some students may say: There are many rows, so add an index on the date field, so that the screening will be very fast.

The total number is 100 million. Assuming there are 10 million orders in July, the screening speed will naturally increase a lot after adding the index. But is our problem really solved at this point?

In this kind of aggregate function, the result needs to be calculated by traversing each piece of data. For example, if we want to count the total number of orders, we need to read the order amount of each row and then add them up.

That is to say, in this statistical SQL, you need to first filter 10 million data from 100 million data, and then traverse these data for calculation. It will be very slow at this time.

Adding indexes does not solve the problem of slow aggregate function statistics

Optimizing the aggregation statistics solution

Budget in Advance

Create a statistical data table, separated by date, such as: how many orders were sold on 20190801, the amount and other data.
When an order is generated (the data can be counted after payment is completed), the amount and quantity are added to the corresponding date in the statistical data table.

It should be noted that if there are scenarios such as refunds that will affect the reduction of data, remember to handle them accordingly

When we need to count the data for August, we only need to traverse and calculate the thirty or so rows of data for this month.

Timed landing

We can use easyswoole, scheduled tasks, etc. To calculate the total at a fixed time (for example, every 20 minutes), and then update it to the statistical data table.

Advantages: There is less processing to be done, and there is no need to change the API such as refund operation. It only needs to rely on the data of the original order table, and regularly count and refresh the statistical data.

It should be noted that different delivery frequencies should be set according to the popularity of different orders. For example, if the probability of data changes within a week is relatively large, it may be delivered within 20 minutes. The data from a year ago is unlikely to change, so you can choose to synchronize it once a day, or even not refresh it if you are sure it will not change.

Summarize

Indexes cannot solve the problem of slow SQL statements for statistical aggregation of data

Aggregate functions should be used with caution or not at all, because we cannot estimate how many rows of data we need to scan to calculate the future data volume.

The optimization plan is inseparable from the statistical table, and the statistical data calculated in a certain period need to be stored.

This is the end of this article about the slow query of MySQL aggregate statistical data. For more relevant content about the slow query of MySQL aggregate statistical data, please search for previous articles on 123WORDPRESS.COM or continue to browse the related articles below. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL and PHP basics and applications: data query
  • A brief discussion on whether too many MySQL data queries will cause OOM
  • Detailed explanation of MySQL database tens of millions of data query and storage
  • MySQL json format data query operation
  • MySQL big data query optimization experience sharing (recommended)
  • MySQL and PHP basics and applications: data query statements

<<:  Detailed explanation of Promises in JavaScript

>>:  How to dynamically modify container port mapping in Docker

Recommend

A brief discussion on the use and analysis of nofollow tags

Controversy over nofollow There was a dispute bet...

Summary of three ways to implement ranking in MySQL without using order by

Assuming business: View the salary information of...

mysql code to implement sequence function

MySQL implements sequence function 1. Create a se...

JavaScript design pattern learning proxy pattern

Table of contents Overview Implementation Protect...

CSS stacking and z-index example code

Cascading and Cascading Levels HTML elements are ...

CSS flexible layout FLEX, media query and mobile click event implementation

flex layout Definition: The element of Flex layou...

Bootstrap 3.0 learning notes button style

This article mainly explains the style of buttons...

idea uses docker plug-in to achieve one-click automated deployment

Table of contents environment: 1. Docker enables ...

A set of code based on Vue-cli supports multiple projects

Table of contents Application Scenario Ideas Proj...

MySQL learning to create and operate databases and table DDL for beginners

Table of contents 1. Operate the database 1.1 Cre...

Simple steps to implement H5 WeChat public account authorization

Preface Yesterday, there was a project that requi...

How to use CSS media query aspect-ratio less

CSS media query has a very convenient aspect rati...

Vue3.0 uses the vue-grid-layout plug-in to implement drag layout

Table of contents 1. Plugins 2. Interlude 3. Impl...