MySQL cleverly uses sum, case and when to optimize statistical queries

MySQL cleverly uses sum, case and when to optimize statistical queries

I was recently working on a project at the company that involved developing statistical reports. Due to the relatively large amount of data, the query statement I wrote before took about ten seconds to query 500,000 pieces of data. Later, with the guidance of my boss, I used sum, case...when... to rewrite the SQL and the performance was immediately improved to one second. In order to explain the problem and solution clearly and concisely, I will simplify the demand model here.

The database now has an order table (a simplified intermediate table) with the following structure:

CREATE TABLE `statistic_order` (
 `oid` bigint(20) NOT NULL,
 `o_source` varchar(25) DEFAULT NULL COMMENT 'Source number',
 `o_actno` varchar(30) DEFAULT NULL COMMENT 'Activity number',
 `o_actname` varchar(100) DEFAULT NULL COMMENT 'Participation activity name',
 `o_n_channel` int(2) DEFAULT NULL COMMENT 'Shopping Mall',
 `o_clue` varchar(25) DEFAULT NULL COMMENT 'Clue category',
 `o_star_level` varchar(25) DEFAULT NULL COMMENT 'Order star rating',
 `o_saledep` varchar(30) DEFAULT NULL COMMENT 'Marketing Department',
 `o_style` varchar(30) DEFAULT NULL COMMENT 'Car type',
 `o_status` int(2) DEFAULT NULL COMMENT 'Order status',
 `syctime_day` varchar(15) DEFAULT NULL COMMENT 'Format date by day',
 PRIMARY KEY (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

The project requirements are as follows:

Count the number of source numbers for each day within a certain period of time. The source number corresponds to the o_source field in the data table, and the field value may be CDE, SDE, PDE, CSE, SSE.

Source classification flows over time

At the beginning, I wrote this SQL:

select S.syctime_day,
 (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = 'CDE') as 'CDE',
 (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = 'CDE') as 'SDE',
 (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = 'CDE') as 'PDE',
 (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = 'CDE') as 'CSE',
 (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = 'CDE') as 'SSE'
 from statistic_order S where S.syctime_day > '2016-05-01' and S.syctime_day < '2016-08-01' 
 GROUP BY S.syctime_day order by S.syctime_day asc;

This writing method uses a subquery. Without adding an index, this SQL statement was executed on 550,000 data. It took nearly ten minutes to wait in the workbench, and finally a connection interruption was reported. Through the explain interpreter, you can see that the SQL execution plan is as follows:

Each query performs a full table scan. The five subqueries DEPENDENT SUBQUERY indicate that they depend on external queries. This query mechanism first performs an external query to obtain the date results after group by, and then the subqueries query the number of CDEs, SDEs, etc. in the corresponding dates. Its efficiency can be imagined.

After adding indexes on o_source and syctime_day, the efficiency is greatly improved, and the query results are obtained in about five seconds:

Looking at the execution plan, we find that the number of rows scanned has been greatly reduced, and the full table scan is no longer performed:

This is certainly not fast enough. If the amount of data reaches millions, the query speed will definitely be intolerable. I have been wondering if there is a way to query all the results directly by traversing once, similar to traversing the list collection in Java, counting once when encountering a certain condition, so that a full table scan can be performed to query the result set, the result index, the efficiency should be very high. Under the guidance of the boss, we used the sum aggregation function and the "strange" usage of case...when...then... to effectively solve this problem.
The specific SQL is as follows:

 select S.syctime_day,
 sum(case when S.o_source = 'CDE' then 1 else 0 end) as 'CDE',
 sum(case when S.o_source = 'SDE' then 1 else 0 end) as 'SDE',
 sum(case when S.o_source = 'PDE' then 1 else 0 end) as 'PDE',
 sum(case when S.o_source = 'CSE' then 1 else 0 end) as 'CSE',
 sum(case when S.o_source = 'SSE' then 1 else 0 end) as 'SSE'
 from statistic_order S where S.syctime_day > '2015-05-01' and S.syctime_day < '2016-08-01' 
 GROUP BY S.syctime_day order by S.syctime_day asc;

I won't explain too much about the usage of case...when...then in MySQL. This SQL is easy to understand. First, we traverse the records one by one, group by classifies the dates, and the sum aggregate function sums the values ​​of a certain date. The key point is that case...when...then cleverly adds conditions to the sum. When o_source = 'CDE', the count is 1, otherwise it is 0; when o_source = 'SDE'...

The execution of this statement took only a little over a second, which is ideal for performing statistics of this dimension on more than 500,000 data points.

The execution plan shows that although the number of scanned rows has increased, only one full table scan is performed, and it is a SIMPLE query, so the execution efficiency is naturally high:

If you have a better solution or idea for this problem, please leave a message.

Summarize

This is the end of this article about how to use sum, case, and when in MySQL to optimize statistical queries. For more information about optimizing statistical queries in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Example of using judgment statements (IF ELSE/CASE WHEN) in SQL Server
  • Solve the problem of mybatis case when error
  • Oracle uses decode function or CASE-WHEN to implement custom sorting
  • MySQL case when usage example analysis
  • This article will show you how to use SQL CASE WHEN in detail

<<:  Getting Started with CSS3 Animation in 10 Minutes

>>:  Prototype and prototype chain prototype and proto details

Recommend

How to modify the ssh port number in Centos8 environment

Table of contents Preface start Preface The defau...

How to implement a multi-terminal bridging platform based on websocket in JS

Table of contents 1. What to debug 2. Features of...

HTML basic structure_Powernode Java Academy

Many times when learning web page development, th...

MySQL multi-table query detailed explanation

Eating well and getting enough rest sounds simple...

Linux lossless expansion method

Overview The cloud platform customer's server...

Detailed explanation of the usage of the ESCAPE keyword in MySQL

MySQL escape Escape means the original semantics ...

Advantages and disadvantages of common MySQL storage engines

Table of contents View all storage engines InnoDB...

Web developers are concerned about the coexistence of IE7 and IE8

I installed IE8 today. When I went to the Microso...

Mysql backup multiple database code examples

This article mainly introduces the Mysql backup m...

echars 3D map solution for custom colors of regions

Table of contents question extend Solving the pro...