MYSQL Left Join optimization (10 seconds to 20 milliseconds)

【Function Background】
[Raw SQL]
【Original SQL analysis】
【Analysis steps】
[Optimized SQL]
【Optimized SQL analysis】

Combining the content from work, I would like to share with you the process of Left Jon optimization, hoping to give you new ideas.

【Function Background】

We need to count the number of purchased goods and the number of after-sales goods according to the user order number and merchant number. The tables and relationships involved are shown in the figure below:

Unfortunately, when the engineers initially designed the table structure, they did not record the total number of purchased items in the merchant order table, nor did they record the number of after-sales items in the merchant order's after-sales form.

[Raw SQL]

select 
  o.no,s_order.no,sum(s_item.count),sum(after_sale_item.count)
  from 
  buyer_order 
  left join seller_order s_order on o.id = s_order.buyer_order_id 
  left join seller_order_item s_item on s_order.id = s_item.seller_order_id
  left join seller_order_after_sale after_sale on s_order.id = after_sale.seller_order_id
  left join seller_order_after_sale_item after_sale_item on after_sale.id = after_sale_item.after_sale_id
where o.add_time >='2019-05-01'
group by
  o.id,s_order.id
order by 
  o.id
limit 0,10

The above SQL statements use indexes for several key fields.

【Original SQL analysis】

This is a very common SQL statement, and there is nothing wrong with the logic.

This SQL statement contains many connection queries. If the number of after-sales orders increases, the number of connected data will increase.

After loading all the qualified data into memory, group and count them according to order.id and s_order.id. What will happen if there are 1 million data? What would you do if you were to use code to implement such statistics?

Sort the counted data by order.id and take out the first 10 data.

From the above SQL, we can see that all the data that meets the conditions needs to be loaded into memory, grouped, counted, sorted, and finally paginated. Can we reduce the amount of data we load? Is it possible to reduce the database CPU usage? Is it possible to obtain a small amount of data first and then perform statistics?

Based on the above problems, we optimized

【Analysis steps】

As an observer, we don't understand what kind of data our function needs to output at first, so we need to understand what kind of data each table stores and what the relationship is between them.

We forget what the original SQL was like, and think about it again according to the data we need, and don't fall into the vortex of the original SQL again.

In response to the above questions, how to reduce data loading? Is it possible to paginate the data first and then perform separate statistics on the paginated data?

So do we need to optimize group by? We need to find a way to paginate first.

Have you thought of some methods?

[Optimized SQL]

select 
  o.id,o.no,s_order.no,
  (select sum(sot.count) from seller_order so 
    left join seller_order_item sot on so.id = sot.seller_order_id 
        where so.id =s_order.id ),
  (select sum(osat.count) from seller_order_after_sale osa 
    left join seller_order_after_sale_item osat on osa.id = osat.after_sale_id
        where osa.seller_order_id = s_order.id )
  from 
  buyer_order 
  left join seller_order s_order on o.id = s_order.buyer_order_id 
where o.addTime >='2019-05-01'
order by 
  o.id
limit 0,10

【Optimized SQL analysis】

It is very intuitive to find that we removed the group by, because grouping by order.id, s_order.id actually only connects the buyer_order and seller_order tables, and logically they are grouped the same way.
If group by is not used, we can reduce the CPU processing of data grouping, and we only connect the main table data, reducing the data loaded into memory.
The above operations complete what we said before about paging the data first. We retrieved 10 pieces of data.
Next, we count the number of goods sold and the number of after-sales for the 10 data items.
At this time, everyone discovered that we actually only counted the 10 data items that were paged out, but it turned out that we should group all the data and then take 10 items for statistics. It can be found that this operation greatly reduces the statistical processing of the data. We only need to collect the data we need.

The effect of the above optimization may be far beyond your imagination.

In actual work, the number of linked tables is more than that in our example. When the unoptimized SQL is executed without paging, it is found that there are 700,000 data in total. It took us more than 10 seconds to paginate and retrieve 10 data. The amount of data is not large, but most of the time is consumed in grouping and data statistics. You can try to write a piece of code to group and count these data to understand the complexity.

In fact, whether you take out 10 or all of them, the time is basically the same (not considering IO), because the statistics are performed first.

After the optimization, only about 20,000 data are loaded into the memory, and no statistics are performed. First, 10 data are taken out, and then statistics are performed on the 10 data. The logic is much simpler than before. The optimized SQL execution time is within 20 milliseconds.

In fact, if the corresponding quantities are recorded in both the order table and the after-sales table, the number of tables will be even smaller, and there is no need for a subquery. Sometimes you still need to consider statistical needs when designing a table.

This is the end of this article about MYSQL Left Join optimization (optimization from 10 seconds to 20 milliseconds). For more relevant MYSQL Left Join optimization content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

How to solve the problem of invalid left join in MySQL and the precautions for its use
MySQL joint table query basic operation left-join common pitfalls
Analysis of MySQL multiple left join query usage
Detailed explanation of how to use join to optimize SQL in MySQL
Some tips for optimizing straight_join in MySQL
A brief analysis of Mysql Join syntax and performance optimization

<<: Use xshell to connect to the Linux server

>>: Commonly used js function methods in the front end