Recently, I was adding a series of statistical functions to the backend of a certain website, and encountered many situations where counting by conditions was required. I have tried several methods, and I have briefly recorded them below for your reference. Problem Description To make the discussion easier to understand, I have simplified the issue a bit and removed a lot of the background. Once upon a time there was an emperor who had 50 concubines. These concubines gave birth to 100,000 sons for him unjustly. The emperor was very distressed. It was difficult to manage such a large number of sons. Moreover, he wanted to know how many sons each concubine had given birth to so that he could reward them according to their merits. This was very difficult to do. So the emperor hired a programmer to help him write a program to use a database to store the information of all his sons, so that he could use the program to count and manage them. The structure of the database is as follows:
The emperor divided his concubines into two levels, the Concubines in the Celestial Palace (numbered less than 25) and the Concubines in the Underground Palace (numbered greater than or equal to 25). He wanted to know which one had stronger fertility, the Concubines in the Celestial Palace or the Concubines in the Underground Palace. So, programmers started writing SQL Queries. Method 1: Using GROUP BY SQL Query Execution Results
Running time on 100,000 rows: 0.0335 seconds analyze The biggest problem with this GROUP BY approach is that it is impossible to distinguish the results obtained. Which of these two numbers is the number of princes born to the ladies in the Heavenly Palace, and which one is the number of princes born to the ladies in the Underground Palace? have no idea. So, even though it gives a total, it's meaningless. Therefore, in order to distinguish the statistical results, the condition mother > 24 must also be reflected as a field in the result set. The modified SQL is as follows: Execution Results
When the conditional expression is used as a field, the value of the field is the value of the conditional expression. Therefore, in our example, type = 1 means that the value of mother > 24 is 1. Therefore, the numbers in the second row represent the number of princes born to the ladies in the underground palace. After the modifications, we can see that the goddesses in the heavenly palace are slightly better. Pros and Cons The disadvantages are obvious. Since conditional expressions are used as the basis for grouping, it can only make binary divisions and is not suitable for situations where multiple categories need to be divided for statistical purposes. For example, it would be impossible to count the number of children born by concubines No. 1 to 10, No. 11 to 24, and No. 25 to 50 respectively. In addition, since GROUP BY is used, sorting is involved and the execution time is longer. I haven't found any advantages to this approach yet. Method 2: Using nested SELECT The goal can also be achieved by using nested SELECTs, counting the data under a condition in each SELECT clause, and then integrating these statistical data with a main SELECT. SQL Query SELECT ( SELECT COUNT( * ) FROM `prince` WHERE `mother` >24 ) AS `digong`, ( SELECT COUNT( * ) FROM `prince` WHERE `mother` <=24 ) AS `tiangong` Execution Results
Running time on 100,000 rows: 0.0216 seconds analyze This nested SELECT method is very intuitive. It counts the values under each condition separately and summarizes them at the end. It is easy to understand and is no different from natural language. Pros and Cons The advantage is that it is intuitive and faster than GROUP BY. Although there are three SELECT statements, which seems to be two more than the GROUP BY solution, it does not involve sorting, which saves a lot of time. The disadvantage may be that there are too many sentences, and students who are obsessed with the number of sentences may feel uncomfortable. Method 3: Using CASE WHEN The CASE WHEN statement is very powerful and can define flexible query conditions, making it very suitable for classification statistics. SQL Query SELECT COUNT( CASE WHEN `mother` >24 THEN 1 ELSE NULL END ) AS `digong`, COUNT( CASE WHEN `mother` <= 24 THEN 1 ELSE NULL END ) AS `tiangong` FROM prince Execution Results
Running time on 100,000 rows: 0.02365825 seconds analyze The key to this method is Here, COUNT and CASE WHEN are used together to achieve categorical counting. First use CASE WHEN. When the condition is met, set the field value to 1. When the condition is not met, set the field value to NULL. Then the COUNT function only counts non-NULL fields. Then the problem is solved. Pros and Cons As for the advantage, this method does not involve sorting, so the running time is comparable to that of method 2, and the number of SELECT statements is reduced to 1. The disadvantage is that the sentences are relatively long, and students who are obsessed with sentence length may feel uncomfortable. Summarize For conditional counting of certain categories, try not to use GROUP BY to avoid sorting and speed up the execution of the query. If you need to classify based on the value of a field, and the value of the field is variable, for example, the emperor wants to count the number of children born by each concubine, and he may keep marrying many concubines. In this case, using methods 2 and 3 is not very effective, and it is simpler and more convenient to use a GROUP BY. The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: Detailed explanation of the process of setting up nvidia-docker environment on Linux server
>>: Vue realizes the sliding cross effect of the ball
What is an index? An index is a data structure th...
1. What problems did we encounter? In standard SQ...
The domestic market still has a certain demand fo...
This article shares the specific code of js to ac...
1. Environmental Preparation 1.MySQL installation...
Table of contents 1. Download 2. Deployment 3. Ng...
In the previous article, we explained how nginx r...
Introduction Recently I found that there is an AR...
I just started working a few days ago and install...
This article shares the specific code of Javascri...
The task of concurrency control in a database man...
Table of contents Image capture through svg CSS p...
The load is generally estimated during system des...
The effect to be achieved: When the mouse is plac...
After the user logs out, if the back button on the...