How to solve the problem of invalid left join in MySQL and the precautions for its use

How to solve the problem of invalid left join in MySQL and the precautions for its use

Mysql left join is invalid and how to use it

When I was writing SQL today, I found that using left join did not retrieve all the data from the left table, which made me depressed for a while. Later, after careful study, I realized that I had made a common sense mistake (I am a rookie)

This is the original sql

This query cannot retrieve all the data in the tb_line table, which is embarrassing...

Later I learned that when we perform multi-table queries, a temporary table will be formed before executing the where statement.

On is the conditional screening in the temporary table. When using left join, the data in the left table will be queried regardless of whether the condition is true or not. If the condition is false, it will be displayed as null.

where is the filtering condition after the temporary table is generated

In the first figure, I put the filter condition of the tb_vehicle table in the where clause. The data with false condition generated by the left join will be filtered out in the where clause v.del_flag='0' (because the del_flag is empty for false condition data).

So it seems that I used left join, but in fact, the result of writing it this way is the same as using inner join.

The correct SQL is as follows:

In the temporary table, we can filter the conditions so that we can get the data in the left table.

Summarize:

When using left join and need to do conditional query, you need to carefully consider whether to put the conditional filter after on or after where.

Mysql left join pitfall avoidance guide

Phenomenon

LEFT JOIN is very common when we use MySQL to query, such as how many comments there are on an article in a blog, how many comments there are on a product in a mall, how many likes there are on a comment, and so on. However, due to unfamiliarity with keywords such as join, on, where, etc., sometimes the query results may not meet expectations, so today I will summarize them and help you avoid pitfalls.

Here I will give a scenario and ask two questions. If you can answer them correctly, you don’t need to read this article.

Suppose there is a class management application, there is a table classes, which stores all classes; there is a table students, which stores all students. The specific data is as follows:

SELECT * FROM classes;

SELECT * FROM students;

Now there are two requirements:

Find the name of each class and the number of female students corresponding to it

Find the total number of students in a class

For requirement 1, most people can think of the following two ways of writing SQL without thinking. Which one is correct?

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    and s.gender = 'F'
    group by c.name

or

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where s.gender = 'F'
    group by c.name

For requirement 2, most people can think of the following two ways of writing SQL without thinking. Which one is correct?

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where c.name = 'Class 1' 
    group by c.name

or

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    and c.name = 'Class 1' 
    group by c.name

Please don’t scroll down! ! Give your own answer first, the correct answer is below.

~

~

~

The answer is that the first statement is correct for both requirements . To understand this problem, you need to understand the execution principle of MySQL for left join, which will be expanded in the next section.

source

MySQL uses a nested loop approach to handle left joins. Take the following statement as an example:

SELECT * FROM LT LEFT JOIN RT ON P1(LT,RT)) WHERE P2(LT,RT)

P1 is the on filter condition, which is considered TRUE if missing. P2 is the where filter condition, which is also considered TRUE if missing. The execution logic of this statement can be described as:

FOR each row lt in LT {//Traverse each row of the left table BOOL b = FALSE;
  FOR each row rt in RT such that P1(lt, rt) {// Traverse each row of the right table and find the row that meets the join condition IF P2(lt, rt) {// Satisfy the where filter condition t:=lt||rt;// Merge the row and output the row}
    b=TRUE; // lt has a corresponding row in RT}
  IF (!b) { // After traversing RT, if lt has no corresponding row in RT, try to fill a row with null IF P2(lt,NULL) {// After filling null, the where filter condition t:=lt||NULL is satisfied; // Output the row filled with lt and null }         
  }
}

Of course, in actual situations, MySQL will use buffers for optimization and reduce the number of row comparisons, but this does not affect the key execution process and is not within the scope of this article.

From this pseudocode, we can see two points:

If you want to restrict the right table, you must do it in the on condition. If you do it in the where condition, it may cause data missing and cause the rows in the left table that have no matching rows in the right table to not appear in the final result, which violates our understanding of left join. Because for the rows in the left table that have no matching rows in the right table, b=FALSE after traversing the right table, we will try to fill the right table with NULL. However, at this time, our P2 restricts the rows in the right table. If NULL does not satisfy P2 (NULL generally does not satisfy the restriction conditions, unless it is IS NULL), it will not be added to the final result, resulting in missing results.

If there is no where condition, no matter what restrictions the on condition places on the left table, each row of the left table will have at least one row of synthetic results. For a row of the left table, if there is no corresponding row in the right table, b=FALSE after the right table traversal is completed, and a row of NULL will be used to generate data, which is redundant. Therefore, where must be used to filter the left table.

The following are the execution results and error causes of the error statements for the two requirements:

Requirement 1

Requirement 2

Requirement 1: Due to the restriction on the right table in the where condition, data is missing (the result of class 4 should be 0)

Requirement 2 has redundant data due to the restriction on the left table in the on condition (the results of other classes are also available, but they are still wrong)

Summarize

Through the above problem phenomenon and analysis, we can draw the conclusion: in the left join statement, the left table filter must be placed in the where condition, and the right table filter must be placed in the on condition, so that the result can be neither too much nor too little, just right.

SQL may seem simple, but it actually contains many details and principles. A small confusion can cause the results to be inconsistent with expectations. Therefore, you should pay attention to these details and principles to avoid making mistakes at critical moments.

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM.

You may also be interested in:
  • A brief discussion on the underlying principle of mysql join
  • Analysis of usage scenarios of JOIN in SQL statements
  • MYSQL database basics - Join operation principle
  • The process of quickly converting mysql left join to inner join
  • Why do code standards require SQL statements not to have too many joins?
  • MySQL efficient query left join and group by (plus index)
  • MySQL join buffer principle
  • Detailed explanation of various join summaries of SQL

<<:  Introduction to the use of anchors (named anchors) in HTML web pages

>>:  Tomcat breaks the parent delegation mechanism to achieve isolation of Web applications

Recommend

How to display TIF format images in browser

The browser displays TIF format images Copy code T...

js to achieve interesting countdown effect

js interesting countdown case, for your reference...

Application of CSS3 animation effects in activity pages

background Before we know it, a busy year is comi...

10 Underused or Misunderstood HTML Tags

Here are 10 HTML tags that are underused or misun...

Linux system command notes

This article describes the linux system commands....

Optimized implementation of count() for large MySQL tables

The following is my judgment based on the data st...

Briefly explain the use of group by in sql statements

1. Overview Group by means to group data accordin...

How to Easily Remove Source Installed Packages in Linux

Step 1: Install Stow In this example, we are usin...

HTML 5 Reset Stylesheet

This CSS reset is modified based on Eric Meyers...

How Database SQL SELECT Queries Work

As Web developers, although we are not profession...

Docker installs ClickHouse and initializes data testing

Clickhouse Introduction ClickHouse is a column-or...