How to solve the problem of invalid left join in MySQL and the precautions for its use

How to solve the problem of invalid left join in MySQL and the precautions for its use

Mysql left join is invalid and how to use it

When I was writing SQL today, I found that using left join did not retrieve all the data from the left table, which made me depressed for a while. Later, after careful study, I realized that I had made a common sense mistake (I am a rookie)

This is the original sql

This query cannot retrieve all the data in the tb_line table, which is embarrassing...

Later I learned that when we perform multi-table queries, a temporary table will be formed before executing the where statement.

On is the conditional screening in the temporary table. When using left join, the data in the left table will be queried regardless of whether the condition is true or not. If the condition is false, it will be displayed as null.

where is the filtering condition after the temporary table is generated

In the first figure, I put the filter condition of the tb_vehicle table in the where clause. The data with false condition generated by the left join will be filtered out in the where clause v.del_flag='0' (because the del_flag is empty for false condition data).

So it seems that I used left join, but in fact, the result of writing it this way is the same as using inner join.

The correct SQL is as follows:

In the temporary table, we can filter the conditions so that we can get the data in the left table.

Summarize:

When using left join and need to do conditional query, you need to carefully consider whether to put the conditional filter after on or after where.

Mysql left join pitfall avoidance guide

Phenomenon

LEFT JOIN is very common when we use MySQL to query, such as how many comments there are on an article in a blog, how many comments there are on a product in a mall, how many likes there are on a comment, and so on. However, due to unfamiliarity with keywords such as join, on, where, etc., sometimes the query results may not meet expectations, so today I will summarize them and help you avoid pitfalls.

Here I will give a scenario and ask two questions. If you can answer them correctly, you don’t need to read this article.

Suppose there is a class management application, there is a table classes, which stores all classes; there is a table students, which stores all students. The specific data is as follows:

SELECT * FROM classes;

SELECT * FROM students;

Now there are two requirements:

Find the name of each class and the number of female students corresponding to it

Find the total number of students in a class

For requirement 1, most people can think of the following two ways of writing SQL without thinking. Which one is correct?

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    and s.gender = 'F'
    group by c.name

or

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where s.gender = 'F'
    group by c.name

For requirement 2, most people can think of the following two ways of writing SQL without thinking. Which one is correct?

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where c.name = 'Class 1' 
    group by c.name

or

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    and c.name = 'Class 1' 
    group by c.name

Please don’t scroll down! ! Give your own answer first, the correct answer is below.

~

~

~

The answer is that the first statement is correct for both requirements . To understand this problem, you need to understand the execution principle of MySQL for left join, which will be expanded in the next section.

source

MySQL uses a nested loop approach to handle left joins. Take the following statement as an example:

SELECT * FROM LT LEFT JOIN RT ON P1(LT,RT)) WHERE P2(LT,RT)

P1 is the on filter condition, which is considered TRUE if missing. P2 is the where filter condition, which is also considered TRUE if missing. The execution logic of this statement can be described as:

FOR each row lt in LT {//Traverse each row of the left table BOOL b = FALSE;
  FOR each row rt in RT such that P1(lt, rt) {// Traverse each row of the right table and find the row that meets the join condition IF P2(lt, rt) {// Satisfy the where filter condition t:=lt||rt;// Merge the row and output the row}
    b=TRUE; // lt has a corresponding row in RT}
  IF (!b) { // After traversing RT, if lt has no corresponding row in RT, try to fill a row with null IF P2(lt,NULL) {// After filling null, the where filter condition t:=lt||NULL is satisfied; // Output the row filled with lt and null }         
  }
}

Of course, in actual situations, MySQL will use buffers for optimization and reduce the number of row comparisons, but this does not affect the key execution process and is not within the scope of this article.

From this pseudocode, we can see two points:

If you want to restrict the right table, you must do it in the on condition. If you do it in the where condition, it may cause data missing and cause the rows in the left table that have no matching rows in the right table to not appear in the final result, which violates our understanding of left join. Because for the rows in the left table that have no matching rows in the right table, b=FALSE after traversing the right table, we will try to fill the right table with NULL. However, at this time, our P2 restricts the rows in the right table. If NULL does not satisfy P2 (NULL generally does not satisfy the restriction conditions, unless it is IS NULL), it will not be added to the final result, resulting in missing results.

If there is no where condition, no matter what restrictions the on condition places on the left table, each row of the left table will have at least one row of synthetic results. For a row of the left table, if there is no corresponding row in the right table, b=FALSE after the right table traversal is completed, and a row of NULL will be used to generate data, which is redundant. Therefore, where must be used to filter the left table.

The following are the execution results and error causes of the error statements for the two requirements:

Requirement 1

Requirement 2

Requirement 1: Due to the restriction on the right table in the where condition, data is missing (the result of class 4 should be 0)

Requirement 2 has redundant data due to the restriction on the left table in the on condition (the results of other classes are also available, but they are still wrong)

Summarize

Through the above problem phenomenon and analysis, we can draw the conclusion: in the left join statement, the left table filter must be placed in the where condition, and the right table filter must be placed in the on condition, so that the result can be neither too much nor too little, just right.

SQL may seem simple, but it actually contains many details and principles. A small confusion can cause the results to be inconsistent with expectations. Therefore, you should pay attention to these details and principles to avoid making mistakes at critical moments.

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM.

You may also be interested in:
  • A brief discussion on the underlying principle of mysql join
  • Analysis of usage scenarios of JOIN in SQL statements
  • MYSQL database basics - Join operation principle
  • The process of quickly converting mysql left join to inner join
  • Why do code standards require SQL statements not to have too many joins?
  • MySQL efficient query left join and group by (plus index)
  • MySQL join buffer principle
  • Detailed explanation of various join summaries of SQL

<<:  Introduction to the use of anchors (named anchors) in HTML web pages

>>:  Tomcat breaks the parent delegation mechanism to achieve isolation of Web applications

Recommend

Some ways to eliminate duplicate rows in MySQL

SQL statement /* Some methods of eliminating dupl...

Eight implementation solutions for cross-domain js front-end

Table of contents 1. jsonp cross-domain 2. docume...

Pure CSS to achieve click to expand and read the full text function

Note When developing an article display list inte...

Record the steps of using mqtt server to realize instant communication in vue

MQTT Protocol MQTT (Message Queuing Telemetry Tra...

Example of how to set up a multi-column equal height layout with CSS

Initially, multiple columns have different conten...

Implementation example of Docker rocketmq deployment

Table of contents Preparation Deployment process ...

Methods and steps for deploying go projects based on Docker images

Dependence on knowledge Go cross-compilation basi...

The difference between this.$router and this.$route in Vue and the push() method

The official document states: By injecting the ro...

How to control the proportion of Flex child elements on the main axis

background Flex layout achieves alignment and spa...

Solve the problem of docker's tls (ssl) certificate expiration

Problem phenomenon: [root@localhost ~]# docker im...

Implementation of Webpack3+React16 code splitting

Project Background Recently, there is a project w...

Encapsulation method of Vue breadcrumbs component

Vue encapsulates the breadcrumb component for you...

How to use MySQL covering index and table return

Two major categories of indexes Storage engine us...

Example steps for using AntV X6 with Vue.js

Table of contents 0x0 Introduction 0x1 Installati...