Detailed explanation of MySQL three-value logic and NULL

Detailed explanation of MySQL three-value logic and NULL

What is NULL

NULL is used to represent missing values ​​or unknown data, not a value of a specific type. The NULL value in the data table means that the field where the value is located is empty. A field with a NULL value has no value. It is especially important to understand that the NULL value is different from 0 or an empty string.

Two kinds of NULL

This statement may seem strange to you, because there is only one kind of NULL in SQL. However, when discussing NULL, we generally think of it as two types: "unknown" and "not applicable, inapplicable".

Take the example of “not knowing what color the eyes of a person wearing sunglasses are”. This person’s eyes definitely have color, but if he doesn’t take off his glasses, others won’t know what color his eyes are. This is called the unknown. And “I don’t know what color the refrigerator’s eyes are” is “Not applicable”. Since refrigerators don't have eyes, the property of "eye color" doesn't apply to refrigerators. Statements like “the color of the refrigerator’s eyes” are as meaningless as statements like “the volume of a circle” or “the number of births a man gives birth to.” Normally, we are used to saying “I don’t know”, but there are many kinds of “I don’t know”. NULL in this case of "not applicable" is closer to "meaningless" than "indeterminate" in semantics. To summarize here: "Unknown" means "although we don't know it now, we can know it after certain conditions are met"; and "Not applicable" means "we can't know it no matter how hard we try."

EF Codd, the inventor of the relational model, first gave this classification. The following is his classification of "lost information"

Why must it be written as "IS NULL" instead of "= NULL"?

I believe many people have this confusion, especially those who have just learned SQL. Let's look at a specific case, assuming we have the following table and data

DROP TABLE IF EXISTS t_sample_null;
CREATE TABLE t_sample_null (
    id INT(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key',
    name VARCHAR(50) NOT NULL COMMENT 'Name',
    remark VARCHAR(500) COMMENT 'Remarks',
    primary key(id)
) COMMENT 'NULL example';

INSERT INTO t_sample_null(name, remark)
VALUES('zhangsan', '张三'),('李四', NULL);

We want to query the records with NULL comments (NULL is not a correct term, but we are used to it in daily life, see below for details). How to query? Many novices will write the following SQL:

-- SQL does not report an error, but no results can be found SELECT * FROM t_sample_null WHERE remark = NULL; 

There is no error when executing, but we cannot find the result we want. Why is that? Let's put this question aside for now and look at the next

Three-valued logic

This three-valued logic is not a ternary operation, but refers to three logical values. Some people may have doubts. Aren’t there only true and false logical values? Where does the third one come from? When saying this, we need to pay attention to the environment we are in. In mainstream programming languages ​​​​(C, JAVA, Python, JS, etc.), there are indeed only 2 logical values ​​​​but in SQL there is a third logical value: unknown. This is somewhat similar to what we usually say: right, wrong, I don’t know.

The logical value unknown and UNKNOWN, which is a form of NULL, are different things. The former is a clear Boolean logical value, while the latter is neither a value nor a variable. To distinguish them easily, the former is represented by lowercase letters unknown and the latter by uppercase letters UNKNOWN. To help you understand the difference between the two, let's look at a simple equation like x=x. When x is a logical value unknown, x=x is considered true, and when x is UNKNOWN, it is considered unknown.

-- This is a clear logical value comparison unknown = unknown → true

-- This is equivalent to NULL = NULL
UNKNOWN = UNKNOWN → unknown

Logical value table of three-valued logic

NOT

AND

OR

The blue part in the figure is the operation unique to three-valued logic, which does not exist in two-valued logic. All other SQL predicates can be composed of these three logical operations. In this sense, these logical tables can be said to be the matrix of SQL.

For NOT, it is easy to remember because the logical value table is relatively simple; however, for AND and OR, since there are many logical values ​​that can be combined, it is very difficult to remember them all. To make it easier to remember, please note that there is the following priority order between these three logical values.

AND case: false > unknown > true

OR situation: true > unknown > false

The logical value with higher priority will determine the calculation result. For example, true AND unknown, because unknown has a higher priority, the result is unknown. If it is true OR unknown, since true has a higher priority, the result is true. Remembering this order will make it easier to perform three-valued logic operations. It is important to remember that when an AND operation includes unknown, the result will definitely not be true (conversely, if the result of the AND operation is true, both parties involved in the operation must be true).

-- Assume a = 2, b = 5, c = NULL, the logical value of the following expression is as follows: a < b AND b > c → unknown
a > b OR b < c → unknown
a < b OR b < c → true
NOT (b <> c) → unknown

“IS NULL” instead of “= NULL”

Let's go back to the question: Why must we write "IS NULL" instead of "= NULL"?

The result of a comparison predicate on NULL is always unknown. The query results will only include rows where the judgment result in the WHERE clause is true, and will not include rows where the judgment result is false or unknown. Not only the equal sign, but also other comparison predicates for NULL will give the same result. So no matter whether remark is NULL or not, the comparison result is unknown, and no result will ever be returned. The following formulas will be judged as unknown

-- The following formulas will be judged as unknown
= NULL
> NULL
< NULL
<> NULL
NULL = NULL

So why can't a comparison predicate on NULL ever evaluate to true? This is because NULL is neither a value nor a variable. NULL is just a token that means "no value", and comparison predicates only apply to values. Therefore, it is meaningless to use a comparison predicate on NULL which is not a value. Statements like "the value of the column is NULL" or "NULL value" are wrong in themselves. Since NULL is not a value, it is not in the domain. If, on the other hand, one considers NULL to be a value, then we can think about it the other way around: what type of value is it? The values ​​in a relational database must be of a certain type, such as character or numeric. So, if NULL is a value, it must be of some type.

There are two reasons why NULL is easily considered a value. The first is that in high-level programming languages, NULL is defined as a constant (many languages ​​define it as the integer 0), which leads to our confusion. However, NULL in SQL is completely different from NULL in other programming languages. The second reason is that a predicate like IS NULL is made up of two words, so it is easy for us to treat IS as a predicate and NULL as a value. Especially since there are predicates like IS TRUE and IS FALSE in SQL, it is not unreasonable to think so by analogy. But as books on standard SQL remind people, we should think of IS NULL as a predicate. Therefore, it might be more appropriate to write IS_NULL.

Gentle Trap

Comparing predicates with NULL

The law of the excluded middle does not hold. The law of the excluded middle states that in the same thinking process, two contradictory ideas cannot be false at the same time, and one must be true, that is, "either A or not A"

Suppose we have a student table: t_student

DROP TABLE IF EXISTS t_student;
CREATE TABLE t_student (
    id INT(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key',
    name VARCHAR(50) NOT NULL COMMENT 'Name',
    age INT(3) COMMENT 'age',
    remark VARCHAR(500) NOT NULL DEFAULT '' COMMENT 'Remarks',
    primary key(id)
) COMMENT 'student information';

INSERT INTO t_student(name, age)
VALUE('zhangsan', 25),('wangwu', 60),('bruce', 32),('yzb', NULL),('boss', 18);

SELECT * FROM t_student;

The age of the data yzb in the table is NULL, which means that the age of yzb is unknown. In the real world, yzb is either 20 years old or not 20 years old, one of the two must be the case. This is undoubtedly a true proposition. So in the world of SQL, does the law of excluded middle still apply? Let's look at a SQL

SELECT * FROM t_student
WHERE age = 20 OR age <> 20;

At first glance, isn't this all the records in the query table? Let’s look at the actual results

yzb didn't find out, why? Let's analyze it. The age of yzb is NULL. Then the judgment steps for this record are as follows:

-- 1. John's age is NULL (unknown NULL!)
SELECT *
FROM t_student
WHERE age = NULL
OR age <> NULL;

-- 2. After using the comparison predicate on NULL, the result is unknown
SELECT *
FROM t_student
WHERE unknown
OR unknown;

-- 3. The result of unknown OR unknown is unknown (refer to the logical value table of three-valued logic)
SELECT *
FROM t_student
WHERE unknown;

The query result of the SQL statement contains only the rows where the judgment result is true. To make yzb appear in the results, you need to add the following "third condition"

-- Add 3 conditions: age is 20, or not 20, or age is unknown SELECT * FROM t_student
WHERE age = 20 
    OR age <> 20
    OR age IS NULL;

CASE Expressions and NULL

The simple CASE expression is as follows

CASE col_1
    WHEN = 1 THEN 'o'
    WHEN NULL THEN 'x'
END

This CASE expression will never return ×. This is because the second WHEN clause is an abbreviation for col_1 = NULL. As we know, the logical value of this expression is always unknown, and the judgment method of the CASE expression is the same as that of the WHERE clause, which only recognizes conditions with a logical value of true. The correct way to write it is to use a search CASE expression like this:

CASE WHEN col_1 = 1 THEN 'o'
    WHEN col_1 IS NULL THEN 'x'
END

NOT IN and NOT EXISTS are not equivalent

When we optimize the performance of SQL statements, a trick we often use is to rewrite IN into EXISTS. This is an equivalent rewrite and there is no problem. However, when NOT IN is rewritten as NOT EXISTS, the results may not be the same.

Let's take an example. We have two tables: t_student_A and t_student_B, which represent students in class A and class B respectively.

DROP TABLE IF EXISTS t_student_A;
CREATE TABLE t_student_A (
    id INT(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key',
    name VARCHAR(50) NOT NULL COMMENT 'Name',
    age INT(3) COMMENT 'age',
    city ​​VARCHAR(50) NOT NULL COMMENT 'City',
    remark VARCHAR(500) NOT NULL DEFAULT '' COMMENT 'Remarks',
    primary key(id)
) COMMENT 'student information';

INSERT INTO t_student_A(name, age, city)
VALUE
('zhangsan', 25, 'Shenzhen City'),('wangwu', 60, 'Guangzhou City'),
('bruce', 32, 'Beijing'),('yzb', NULL, 'Shenzhen'),
('boss', 43, 'Shenzhen City');

DROP TABLE IF EXISTS t_student_B;
CREATE TABLE t_student_B (
    id INT(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key',
    name VARCHAR(50) NOT NULL COMMENT 'Name',
    age INT(3) COMMENT 'age',
    city ​​VARCHAR(50) NOT NULL COMMENT 'City',
    remark VARCHAR(500) NOT NULL DEFAULT '' COMMENT 'Remarks',
    primary key(id)
) COMMENT 'student information';

INSERT INTO t_student_B(name, age, city)
VALUE
('Ma Huateng', 45, 'Shenzhen City'),('Ma San', 25, 'Shenzhen City'),
('Jack Ma', 43, 'Hangzhou'),('Robin Li', 41, 'Shenzhen'),
('young people', 25, 'Shenzhen');

 * FROM t_student_B;

Requirement: Query the students in Class B who are different in age from the students in Class A who live in Shenzhen, that is, query: Ma Huateng and Li Yanhong. How should this SQL be written, like this?

-- Find students in class B who are different in age from students in class A who live in Shenzhen?
SELECT * FROM t_student_B
WHERE age NOT IN (
    SELECT age FROM t_student_A 
    WHERE city = 'Shenzhen'
);

Let's look at the execution results

We found that the result was empty and no data could be found. Why is this? Here NULL starts to cause trouble again. Let's take a look at what happens step by step.

-- 1. Execute a subquery to obtain the age list SELECT * FROM t_student
WHERE age NOT IN(43, NULL, 25);

-- 2. Rewrite NOT IN using NOT and IN equivalently
SELECT * FROM t_student
WHERE NOT age IN (43, NULL, 25);

-- 3. Rewrite the predicate IN using OR equivalence
SELECT * FROM t_student
WHERE NOT ( (age = 43) OR (age = NULL) OR (age = 25) );

-- 4. Use De Morgan's laws to rewrite SELECT * FROM t_student
WHERE NOT (age = 43) AND NOT (age = NULL) AND NOT (age = 25);

-- 5. Use <> to rewrite NOT and =
SELECT * FROM t_student
WHERE (age <> 43) AND (age <> NULL) AND (age <> 25);

-- 6. After using <> for NULL, the result is unknown
SELECT * FROM t_student
WHERE (age <> 43) AND unknown AND (age <> 25);

-- 7. If the AND operation contains unknown, the result is not true (refer to the logical value table of three-valued logic)
SELECT * FROM t_student
WHERE false or unknown;

It can be seen that after a series of transformations, no record is judged as true in the WHERE clause. That is to say, if there is NULL in the selected column of the table used in the NOT IN subquery, the query result of the entire SQL statement will always be empty. This is a terrible phenomenon!

To get the correct result, we need to use the EXISTS predicate

-- Correct SQL statement: Ma Huateng and Li Yanhong will be queried in SELECT * FROM t_student_B B
WHERE NOT EXISTS ( 
    SELECT * FROM t_student_A A
    WHERE B.age = A.age
    AND A.city = 'Shenzhen' 
);

The execution results are as follows

Similarly, let's look at how this SQL statement handles rows where age is NULL.

-- 1. Perform a comparison operation with NULL in the subquery, and A.age is NULL
SELECT * FROM t_student_B B
WHERE NOT EXISTS (
    SELECT * FROM t_student_A A
    WHERE B.age = NULL
    AND A.city = 'Shenzhen' 
);

-- 2. After using "=" for NULL, the result is unknown
SELECT * FROM t_student_B B
WHERE NOT EXISTS (
    SELECT * FROM t_student_A A
    WHERE unknown
    AND A.city = 'Shenzhen' 
);

-- 3. If the AND operation contains unknown, the result will not be true
SELECT * FROM t_student_B B
WHERE NOT EXISTS (
    SELECT * FROM t_student_A A
    WHERE false or unknown
);

-- 4. The subquery returns no results, so conversely, NOT EXISTS is true
SELECT * FROM t_student_B B
WHERE true;

In other words, yzb was treated as "a person whose age is different from anyone else's." EXISTS will only return true or false, never unknown. Therefore, there is the confusing phenomenon that IN and EXISTS can be used interchangeably, but NOT IN and NOT EXISTS cannot be used interchangeably.

There are some other pitfalls, such as: qualified predicates and NULL, qualified predicates and extreme value functions are not equivalent, aggregate functions and NULL, etc.

Summarize

1. NULL is used to represent missing values ​​or unknown data. It is not a value of a specific type and cannot be used with predicates.

2. The result of using a predicate on NULL is unknown. When unknown is involved in logical operations, SQL will not run as expected. 3. IS NULL is a predicate, not: IS is the predicate, NULL is the value; similar to IS TRUE and IS FALSE. 4. To solve the various problems caused by NULL, the best way should be to add NOT NULL constraints to the table to try to exclude NULL.

The above is a detailed explanation of MySQL three-valued logic and NULL. For more information about MySQL three-valued logic and NULL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Why should MySQL fields use NOT NULL?
  • A small problem about null values ​​in MySQL
  • Some common mistakes with MySQL null
  • Detailed explanation of the difference between MySQL null and not null and null and empty value''''''''
  • Solution to MySQL IFNULL judgment problem
  • Distinguish between null value and empty character ('''') in MySQL
  • Detailed explanation of how to write mysql not equal to null and equal to null
  • Mysql NULL caused the pit
  • Summary of knowledge points related to null (IFNULL, COALESCE and NULLIF) in MySQL

<<:  Detailed description of common events and methods of html text

>>:  Understanding of the synchronous or asynchronous problem of setState in React

Recommend

Use personalized search engines to find the personalized information you need

Many people now live on the Internet, and searchin...

CocosCreator Getting Started Tutorial: Making Your First Game with TS

Table of contents premise TypeScript vs JavaScrip...

How to set static IP in CentOS7 on VirtualBox6 and what to note

Install CentOS 7 after installing VirtualBox. I w...

Detailed explanation of Nginx version smooth upgrade solution

Table of contents background: Nginx smooth upgrad...

A pitfall and solution of using fileReader

Table of contents A pitfall about fileReader File...

How to check where the metadata lock is blocked in MySQL

How to check where the metadata lock is blocked i...

One question to understand multiple parameters of sort command in Linux

The sort command is very commonly used, but it al...

A brief discussion on Linux signal mechanism

Table of contents 1. Signal List 1.1. Real-time s...

33 ice and snow fonts recommended for download (personal and commercial)

01 Winter Flakes (Individual only) 02 Snowtop Cap...

Detailed explanation of the basic use of react-navigation6.x routing library

Table of contents react-native project initializa...