Specific use of MySQL window functions

Specific use of MySQL window functions

I have previously held a poll for my fans to find the最熟悉的陌生人in MySQL~~Which technical points in MySQL are both familiar and unfamiliar to you?

The top three are pretty much what I expected, they are:

1. Cursor 2. Window function 3. Clustered index

Although these three points are rarely used in daily life,面試中卻常被問到. It is worth mentioning that many interviewers only have a vague understanding of the questions. .

Today I want to talk to you about window functions. MySQL從8.0開始支持窗口函數. Maybe your company's MySQL version cannot satisfy you yet, but I suggest you try one locally. It's really great!

insert image description here

Okay, without further ado, as usual, let’s start with the appetizer and look at today’s test table data.

The test table used for demonstration in this article is chh_baozipu , which means emmm...Chen Haha’s Baozi Shop. Let me tell you secretly, Brother Ha opened a steamed bun shop this year and this table shows the profit of the steamed bun shop in the past six months~

mysql> SELECT * from chh_baozipu;
+----+--------------------+-------+---------+
| id | product | sales | month |
+----+--------------------+-------+---------+
| 1 | Pork and Scallion Buns | 600 | 2021-11 |
| 2 | Pork and Scallion Buns | 1600 | 2021-10 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 |
| 6 | Pork and Scallion Buns | 1000 | 2021-06 |
| 7 | Wheat stuffed buns | 700 | 2021-11 |
| 8 | Wheat stuffed buns | 200 | 2021-10 |
| 9 | Wheat stuffed buns | 300 | 2021-09 |
| 10 | Wheat stuffed buns | 0 | 2021-08 |
| 11 | Wheat stuffed buns | 100 | 2021-07 |
| 12 | Wheat stuffed buns | 200 | 2021-06 |
+----+--------------------+-------+---------+
12 rows in set (0.00 sec)

How to say it? Come to my shop sometime and I’ll treat you all to some buns stuffed with wheat.

1. What is a window function?

1. How to understand the window?

In fact, the concept of窗口is very important. If we want to learn window functions, we can't just know one thing but not the other; we have to figure out what窗口represents so that we know when to use it.

Let’s take the test table as an example and make some statistics:包子鋪的豬肉大蔥包子這半年截至每月累計利潤.

SELECT *,SUM(sales) over(ORDER BY `month`) as cumulative profit from chh_baozipu where product='Pork and green onion buns';
mysql> SELECT *,SUM(sales) over(ORDER BY `month`) as cumulative profit from chh_baozipu where product='Pork and green onion buns';
+----+--------------------+-------+---------+--------------+
| id | product | sales | month | cumulative profit|
+----+--------------------+-------+---------+--------------+
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | 1000 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 2600 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 3400 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 4400 |
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | 6000 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 6600 |
+----+--------------------+-------+---------+--------------+
6 rows in set (0.00 sec)

From this SQL statement, we can see that窗口for the first row with id=6 is the first row,窗口for the second row with id=5 is the first two rows, and so on (as shown in the figure below).

insert image description here

It can be seen窗口就是范圍的意思, which can be understood as a collection of records (rows); window function is a special function that滿足某種條件的記錄集合上執行計算.

For each record, a function must be executed within this window. For some functions, the window size is fixed depending on the record, which is靜態窗口. For some functions, on the contrary, different records correspond to different windows. This dynamically changing window is called滑動窗口. I believe you will understand this sentence more thoroughly after reading this article [dog head].

2. What is a window function?

Window functions are also called OLAP函數(Online Anallytical Processing), which can perform real-time analysis and processing on data.

In what scenarios are window functions mostly used? There are two main categories:

  • Ranking issues, for example: check the monthly profit ranking of a steamed bun shop;
  • TOPN questions, for example: Find the two months with the highest profits for each type of bun;

Our common window functions and aggregate functions are:

  • Specialized window functions: rank() , dense_rank() , row_number()
  • Aggregate functions: max() , min() , count() , sum() , avg()

Because aggregate functions can also be used in window functions, window functions and ordinary aggregate functions are easily confused. The differences between the two are as follows:

  • 聚合函數多條記錄聚合為一條; whereas window functions每條記錄都會執行,有幾條記錄執行完還是幾條.
  • Aggregate functions can also be used in window functions, which I will illustrate with an example.

2. Usage of Window Functions

Basic syntax:

<Window function> OVER (PARTITION BY <column name for grouping> ORDER BY <column name for sorting>);
-- The over keyword is used to specify the window range of the function.
-- partition by is used to group tables.
-- The order by clause is used to sort the grouped results.

Note: The window function performs a secondary operation on the result of the where or group by clause. Therefore, it will be executed in the order of SQL statements. The window function is usually placed in the select clause (before the from clause). For example, you can drag the previous SQL statement upwards to see it~

What are the window functions? I'm too lazy to draw, so I'll borrow lulin916 's map~~

insert image description here

  • Ordinal function: row_number() / rank() / dense_rank()
  • Distribution function: percent_rank() / cume_dist()
  • Before and after functions: lag() / lead()
  • Head and tail functions: first_val() / last_val()
  • Other functions: nth_value() / nfile()

Let's take a look at some examples:

1. Ordinal function: row_number() / rank() / dense_rank()

ROW_NUMBER(): Sequential sorting - 1, 2, 3

RANK(): Sort in parallel, skipping duplicate numbers - 1, 1, 3

DENSE_RANK(): parallel sorting, without skipping duplicate numbers - 1, 1, 2

mysql> SELECT *,ROW_NUMBER() over(ORDER BY sales desc) as pro_ROW_NUMBER,rank() over(ORDER BY sales desc) as pro_rank,DENSE_RANK() over(ORDER BY sales desc) as pro_DENSE_RANK from chh_baozipu where product='Pork and Scallion Buns';
+----+--------------------+-------+---------+----------------+----------+----------------+
| id | product | sales | month | pro_ROW_NUMBER | pro_rank | pro_DENSE_RANK |
+----+--------------------+-------+---------+----------------+----------+----------------+
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | 1 | 1 | 1 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 2 | 1 | 1 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 3 | 3 | 2 |
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | 4 | 3 | 2 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 5 | 5 | 3 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 6 | 6 | 4 |
+----+--------------------+-------+---------+----------------+----------+----------------+
6 rows in set (0.00 sec)

As can be seen from the above examples, the three window functions serve three different typical business needs, which are sufficient to handle our sorting statistics.

In the future, when students are asked in interviews or written tests, please don’t talk about low-level solutions such as self-query nesting, otherwise don’t say you know me~ Dogs

2. Distribution function: percent_rank() / cume_dist()

This distribution function is basically not used, so we won’t talk about it. Students who are interested can search Baidu by themselves~

3. Before and after function: lag(expr,n) / lead(expr,n)

expr will be mentioned later, so let me explain it in a unified way: expr can be表達式or列名

The before and after functions are often used to return the value of expr in前n行(LAG(expr,n))或后n行(LEAD(expr,n)) current row.

Application scenario: Query the difference between the scores of the top n students and the current student's score

The inner SQL first uses the LAG() function to get the score of the previous student, and the outer SQL then subtracts the scores of the current student and the previous student to get the score difference diff.

It would be a bit awkward to use Ha Ge's test table here. . But you definitely know what I mean. Let’s check it out:

mysql> SELECT *,lag(sales,1) over win as pro_lag,lead(sales,1) over win as pro_lead from chh_baozipu WINDOW win as (PARTITION BY product ORDER BY sales desc);
+----+--------------------+-------+---------+---------+----------+
| id | product | sales | month | pro_lag | pro_lead |
+----+--------------------+-------+---------+---------+----------+
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | NULL | 1600 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 1600 | 1000 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 1600 | 1000 |
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | 1000 | 800 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 1000 | 600 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 800 | NULL |
| 7 | Wheat stuffed buns | 700 | 2021-11 | NULL | 300 |
| 9 | Wheat stuffed buns | 300 | 2021-09 | 700 | 200 |
| 8 | Wheat stuffed buns | 200 | 2021-10 | 300 | 200 |
| 12 | Wheat stuffed buns | 200 | 2021-06 | 200 | 100 |
| 11 | Wheat stuffed buns | 100 | 2021-07 | 200 | 0 |
| 10 | Wheat stuffed buns | 0 | 2021-08 | 100 | NULL |
+----+--------------------+-------+---------+---------+----------+
12 rows in set (0.00 sec)

Here I would like to ask the students if they have noticed that this SQL is different from the previous SQL? What are the differences?

SELECT *,
lag(sales,1) over win as pro_lag,
lead(sales,1) over win as pro_lead
from chh_baozipu where product='Pork and green onion buns' 
WINDOW win as (PARTITION BY product ORDER BY sales desc);

1.把窗口提取出來設置了別名

In fact, this method brings up the window and設置別名為:win , just like we use aliases when writing SQL. This will look simpler and more comfortable, right?

Someone asked programmers what simplicity they want? Others will think your code is awesome if they can’t understand it. It's obvious that this kind of students have never been beaten by the society. When you encounter the ancestral code that appears once in a hundred years, you will understand what simplicity is (borrowing the picture from Fat Brother).

insert image description here

2. Added PARTITION BY product to the window

This keyword in the over clause means controlling the contents of the window. In the basic syntax above, I told you that there are two keywords in over:

  • partition by is to group the window contents;
  • order by is to sort the window contents after grouping;

In fact, there are more interesting ways to control the window range~~

There are two ways滑動窗口的范圍指定: row-based and range-based. I will focus on the commonly used基于行to control the window range.

The BETWEEN frame_start AND frame_end syntax is usually used to indicate a row range. frame_start and frame_end can support the following keywords to identify different dynamic row records:

  • CURRENT ROW The boundary is the current row, usually used with other range keywords
  • UNBOUNDED PRECEDING The boundary is the first row in the partition
  • UNBOUNDED FOLLOWING The boundary is the last row in the partition
  • expr PRECEDING The boundary is the current row minus the value of expr
  • expr FOLLOWING The boundary is the current row plus the value of expr

Let’s look at a few examples:

① Calculate the aggregate window function of the current row and the previous n rows (a total of n+1 rows)

In the following example, the control window size is the sum of the profits of the current month + the previous two months. Let's take a look at the effect:

SELECT *,SUM(sales) OVER win as 'Sum of profits for the past three months'
FROM chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month` ROWS 2 PRECEDING);
mysql> SELECT *,SUM(sales) OVER win as 'Sum of profits for the past three months'
    -> FROM chh_baozipu 
	-> WINDOW win as (PARTITION BY product ORDER BY `month` ROWS 2 PRECEDING);
+----+--------------------+-------+---------+--------------------------+
| id | product | sales | month | Total profit for the past three months |
+----+--------------------+-------+---------+--------------------------+
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | 1000 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 2600 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 3400 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 3400 |
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | 3400 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 3200 |
| 12 | Wheat stuffed buns | 200 | 2021-06 | 200 |
| 11 | Wheat stuffed buns | 100 | 2021-07 | 300 |
| 10 | Wheat stuffed buns | 0 | 2021-08 | 300 |
| 9 | Wheat stuffed buns | 300 | 2021-09 | 400 |
| 8 | Wheat stuffed buns | 200 | 2021-10 | 500 |
| 7 | Wheat stuffed buns | 700 | 2021-11 | 1200 |
+----+--------------------+-------+---------+--------------------------+
12 rows in set (0.00 sec)

② Calculate the aggregate window function of the current row, the first n1 rows, and the last n2 rows

In the following example, the control window size is the sum of profits from the month before and the month after the current month. Let's take a look at the effect:

SELECT *,SUM(sales) OVER win as 'Sum of profits for the first three months' 
FROM chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month` ROWS BETWEEN n1 PRECEDING AND n2 FOLLOWING);
mysql> SELECT *,SUM(sales) OVER win as 'Sum of profits from the previous month to the next month' FROM chh_baozipu WINDOW win as (PARTITION BY product ORDER BY `month` ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+----+--------------------+-------+---------+--------------------------+
| id | product | sales | month |Sum of profits from the previous month to the next month|
+----+--------------------+-------+---------+--------------------------+
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | 2600 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 3400 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 3400 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 3400 |
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | 3200 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 2200 |
| 12 | Wheat stuffed buns | 200 | 2021-06 | 300 |
| 11 | Wheat stuffed buns | 100 | 2021-07 | 300 |
| 10 | Wheat stuffed buns | 0 | 2021-08 | 400 |
| 9 | Wheat stuffed buns | 300 | 2021-09 | 500 |
| 8 | Wheat stuffed buns | 200 | 2021-10 | 1200 |
| 7 | Wheat stuffed buns | 700 | 2021-11 | 900 |
+----+--------------------+-------+---------+--------------------------+
12 rows in set (0.00 sec)

4. Head and tail functions: FIRST_VALUE(expr), LAST_VALUE(expr)

The head and tail functions are used to: return the first or last value of expr;

Application scenario: As of now, query當前最大月收入and當前最小月收入by date sorting.

SELECT *,
FIRST_VALUE(sales) over win as 'Current maximum monthly income',
LAST_VALUE(sales) over win as 'Current minimum monthly income' 
from chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month`);
mysql> SELECT *,FIRST_VALUE(sales) over win as 'Current maximum monthly income',LAST_VALUE(sales) over win as 'Current minimum monthly income' from chh_baozipu WINDOW win as (PARTITION BY product ORDER BY `month`);
+----+--------------------+-------+---------+-----------------------+-----------------------+
| id | product | sales | month | Current maximum monthly income | Current minimum monthly income |
+----+--------------------+-------+---------+-----------------------+-----------------------+
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | 1000 | 1000 |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 1000 | 1600 |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 1000 | 800 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 1000 | 1000 |
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | 1000 | 1600 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 1000 | 600 |
| 12 | Wheat stuffed buns | 200 | 2021-06 | 200 | 200 |
| 11 | Wheat stuffed buns | 100 | 2021-07 | 200 | 100 |
| 10 | Wheat stuffed buns | 0 | 2021-08 | 200 | 0 |
| 9 | Wheat stuffed buns | 300 | 2021-09 | 200 | 300 |
| 8 | Wheat stuffed buns | 200 | 2021-10 | 200 | 200 |
| 7 | Wheat stuffed buns | 700 | 2021-11 | 200 | 700 |
+----+--------------------+-------+---------+-----------------------+-----------------------+
12 rows in set (0.00 sec)

5. Other functions: nth_value() / nfile()

nfile() is not commonly used, so we will not go into details here; here we only mention NTH_VALUE(expr,n) function;

NTH_VALUE Purpose: Returns the value of the nth expr in the window.

Application scenario: As of now, the profits of the second and third place in the monthly profit list of Chen Haha Steamed Bun Shop are displayed.

SELECT *,
nth_value(sales,2) over win as 'currently the second highest monthly income',
nth_value(sales,3) over win as 'Currently ranked third in monthly income' 
from chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month`);
mysql> SELECT *,nth_value(sales,2) over win as 'Currently ranked second in monthly income',nth_value(sales,3) over win as 'Currently ranked third in monthly income' from chh_baozipu WINDOW win as (PARTITION BY product ORDER BY `month`);
+----+--------------------+-------+---------+--------------------------------+--------------------------------+
| id | product | sales | month | Current second-ranked monthly income | Current third-ranked monthly income |
+----+--------------------+-------+---------+--------------------------------+--------------------------------+
| 6 | Pork and Scallion Buns | 1000 | 2021-06 | NULL | NULL |
| 5 | Pork and Scallion Buns | 1600 | 2021-07 | 1600 | NULL |
| 4 | Pork and Scallion Buns | 800 | 2021-08 | 1600 | 800 |
| 3 | Pork and Scallion Buns | 1000 | 2021-09 | 1600 | 800 |
| 2 | Pork and Scallion Buns | 1600 | 2021-10 | 1600 | 800 |
| 1 | Pork and Scallion Buns | 600 | 2021-11 | 1600 | 800 |
| 12 | Wheat stuffed buns | 200 | 2021-06 | NULL | NULL |
| 11 | Wheat stuffed buns | 100 | 2021-07 | 100 | NULL |
| 10 | Wheat stuffed buns | 0 | 2021-08 | 100 | 0 |
| 9 | Wheat stuffed buns | 300 | 2021-09 | 100 | 0 |
| 8 | Wheat stuffed buns | 200 | 2021-10 | 100 | 0 |
| 7 | Wheat stuffed buns | 700 | 2021-11 | 100 | 0 |
+----+--------------------+-------+---------+--------------------------------+--------------------------------+
12 rows in set (0.00 sec)

Chapter Summary

That’s all I have to say about window functions. Window functions are something new that I discovered after I came into contact with MySQL 8. Suddenly I feel that the MySQL development team is still very creative. Each version adds some new ways of playing, which are of course very practical. I hope that MySQL 9.0 will bring us more surprises.

This is the end of this article about the specific use of MySQL window functions. For more relevant MySQL window function content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Mysql8.0 uses window functions to solve sorting problems
  • MySQL 8.0 Window Function Introduction and Summary
  • Case analysis of SQL window functions to achieve efficient paging query
  • Quickly learn about SQL window functions

<<:  Analysis of the principles of several ways to imitate the magnifying glass effect in CSS3

>>:  Vue+ElementUI implements paging function-mysql data

Recommend

A practical record of an accident caused by MySQL startup

Table of contents background How to determine whe...

JavaScript implements front-end countdown effect

This article shares the specific code of JavaScri...

Database backup in docker environment (postgresql, mysql) example code

Table of contents posgresql backup/restore mysql ...

Detailed explanation of various HTTP return status codes

When a request is sent to your server to display ...

Detailed explanation of the middleman mode of Angular components

Table of contents 1. Middleman Model 2. Examples ...

Customization Method of Linux Peripheral File System

Preface Generally speaking, when we talk about Li...

Html Select option How to make the default selection

Adding the attribute selected = "selected&quo...

HTML code that can make IE freeze

We simply need to open any text editor, copy the f...

Detailed explanation of nginx-naxsi whitelist rules

Whitelist rule syntax: BasicRule wl:ID [negative]...

Eight examples of how Vue implements component communication

Table of contents 1. Props parent component ---&g...

CSS code abbreviation div+css layout code abbreviation specification

Using abbreviations can help reduce the size of yo...

CentOS 6.5 i386 installation MySQL 5.7.18 detailed tutorial

Most people compile MySQL and put it in the syste...

Vue uses vue meta info to set the title and meta information of each page

title: vue uses vue-meta-info to set the title an...