Detailed explanation of the usage of NULL and NOT NULL when creating tables in MySQL

Detailed explanation of the usage of NULL and NOT NULL when creating tables in MySQL

Regarding some MySQL specifications, some companies have a requirement in their table creation specifications that all fields must be non-empty, which means that a default value is stored when there is no value. In fact, it should be said that all fields are non-empty absolutely. It should be said that they are as non-empty as possible. In some cases, it is impossible to give a default value.
So what are the considerations behind this requirement, storage space? What is the performance of related add, delete, query and modify operations? Or are there other considerations? This article will make a rough analysis of whether this theory is reasonable or feasible based on my personal understanding.

1. Storage-based considerations

The analysis of storage here requires a clear understanding of the storage format of MySQL data rows. Here, we can directly borrow some conclusions from this article, which provides a very clear analysis (in fact, it also refers to "MySQL Technical Content Innodb Storage Engine").
For the default Dynamic or Compact format data row structure, the row structure format is as follows:
|Variable length field length list (1~2 bytes) |NULL flag (1 byte) |Record header information (5 bytes) |RowID (6 bytes) |Transaction ID (6 bytes) |Rollback pointer (7 bytes) |Row content

1. For variable-length fields, when the related field value is NULL, the related field will not occupy storage space. NULL values ​​are not stored and do not take up space, but require a flag bit (one per row).
2. For variable-length fields, the relevant fields require NOT NULL. When stored as '', it does not take up space. If all dictionaries in a table are NOT NULL, the row header does not need a NULL flag.
3. All fields are fixed-length. No matter whether they are required to be NOT NULL or not, no flag is required. At the same time, there is no need to store the length of variable-length columns.

In view of the two cases of null value and non-empty (not null default ''), if the content stored in a field is empty, that is, there is nothing, the former is stored as null and the latter is stored as an empty string ''. The storage space size of the field content itself is the same.
However, if a table stores nullable fields, the header of the corresponding data row requires a 1-byte NULL flag. This determines that when storing the same data, if null is allowed, each row will have one more byte of storage space than in the case of not null.
This factor may be one of the reasons why some companies or individuals insist on the belief that "null fields are prohibited in all tables" (I personally have a negative attitude, you can try to set all fields in the database to not null and then default a value to see if there will be any chaos).
We will not do a "micro" analysis here, but will look at the differences directly from a "macro" perspective.

Test demo

Directly create a table with the same structure, but one table field is not null and the other table field is null. Then use this stored procedure to write data to the two tables at the same time with a ratio of 1:10 between null values ​​and non-null values. That is, 6 million rows of data are written with one row of data field being null for every 10 rows of data.

CREATE TABLE a
(
 id INT AUTO_INCREMENT,
 c2 VARCHAR(50) NOT NULL DEFAULT '',
 c3 VARCHAR(50) NOT NULL DEFAULT '',
 PRIMARY KEY (id)
);

CREATE TABLE b
(
 id INT AUTO_INCREMENT,
 c2 VARCHAR(50),
 c3 VARCHAR(50),
 PRIMARY KEY (id)
);


CREATE DEFINER=`root`@`%` PROCEDURE `create_test_data`(
 IN `loop_cnt` INT
)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
BEGIN
 DECLARE v2 , v3 VARCHAR(36);
 
 START TRANSACTION;
 
 while loop_cnt>0 do
 SET v2 = UUID();
 SET v3 = UUID();

 
 if (loop_cnt MOD 10) = 0 then
 INSERT INTO a (c2,c3) VALUES(DEFAULT,DEFAULT);
 INSERT INTO b (c2,c3) VALUES(DEFAULT,DEFAULT);
 else
 INSERT INTO a (c2,c3) VALUES (v2,v3);
 INSERT INTO b (c2,c3) VALUES (v2,v3);
 END if ;
 
 SET loop_cnt=loop_cnt-1;
 END while;
 COMMIT;

Tables a and b produce exactly the same data.

Check the storage space occupied and query the storage information of the two tables from information_schema.TABLES

1. The difference of one byte is reflected in avg_row_length. Because all fields in table a are not null, each row saves one byte of storage compared to table b.
2. The difference in total space: Table a has 662683648/1024/1024=631.98437500MB, Table b has 666877952/1024/1024=635.98437500MB,
In the current case, there is a 4MB difference in 6 million rows of data, which is within 1%. In fact, in actual situations, when there are more fields and the table size is larger, this difference will be much smaller than 1%.

In terms of storage space, if you tell me that you care about 1GB of storage space for a 1T database, any data/index fragmentation space, a little reserved space, junk file space, useless index space... are far greater than the extra difference brought by nullability.

2. Efficiency of adding, deleting, checking and modifying

Compare the read and write operations by continuously reading and writing data within a range to compare the reading conditions of tables a and b.
2.1.) First, the buffer pool is much larger than the table size, so there is no need to worry about the difference caused by physical IO. Currently, the data of the two tables are completely stored in the buffer pool.
2.1.) The read test operation is performed on the MySQL instance machine, so the difference caused by network instability can be ignored.

The difference between adding, deleting, checking and modifying is similar to the difference in storage space, or even smaller, because a single line differs by 1 byte, and a 5MB difference can only be seen when zoomed in to 600W+. For adding, deleting, checking and modifying, no obvious difference was found after various tests.

#!/usr/bin/env python3
import pymysql
import time
mysql_conn_conf = {'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'password': '******', 'db': 'db01'}


def mysql_read(table_name):
 conn = pymysql.connect(host=mysql_conn_conf['host'], port=mysql_conn_conf['port'], database=mysql_conn_conf['db'], user=mysql_conn_conf['user'], password = mysql_conn_conf['password'])
 cursor = conn.cursor()
 try:
 cursor.execute(''' select id,c2,c3 from {0} where id>3888888 and id<3889999;'''.format(table_name))
 row = cursor.fetchall()
 except pymysql.Error as e:
 print("mysql execute error:", e)
 cursor.close()
 conn.close()


def mysql_write(loop,table_name):
 conn = pymysql.connect(host=mysql_conn_conf['host'], port=mysql_conn_conf['port'], database=mysql_conn_conf['db'], user=mysql_conn_conf['user'], password = mysql_conn_conf['password'])
 cursor = conn.cursor()
 try:
 if loop%10 == 0:
 cursor.execute(''' insert into {0}} (c2,c3) values(DEFAULT,DEFAULT)'''.format(table_name))
 else:
 cursor.execute(''' insert into {1}} (c2,c3) values(uuid(),uuid())'''.format(table_name))
 except pymysql.Error as e:
 print("mysql execute error:", e)
 cursor.close()
 conn.commit()
 conn.close()


if __name__ == '__main__':
 time_start = time.time()
 loop=10
 while loop>0:
 mysql_write(loop)
 loop = loop - 1

 time_end = time.time()
 time_c = time_end - time_start
 print('time cost', time_c, 's')

3. Semantic analysis and logical considerations on related fields

There are too many differences of opinion on this point, and it is also the most likely to cause gossip or controversy.

1. For character types, NULL means non-existence, and '' means empty. Non-existence and empty are not the same thing. I don't agree that we must use NOT NULL and give a default value.
2. For character types, NULL is not equal to NULL in any database, because when processing joins or where filters on related fields, there is no need to consider the situation where both sides of the connection are NULL. Once NULL is replaced by '', '' is equal to '', and this will result in semantics that are completely different from storing NULL.
3. For character types, once the relevant fields are set to '' by default, how to distinguish '' from an empty string? For example, the remarks field does not allow NULL and the default value is ''. How to distinguish between the empty expression of NULL and the empty string of the default value ''
4. For related query operations, if NULL is allowed, filtering non-NULL values ​​is where *** is not null, which is very clear and intuitive in terms of semantics. Once the field is not empty, it defaults to '', and the super disgusting writing method of where *** <>'' will be used. What exactly is being expressed, and the semantics have begun to become vague
5. For time types, default values ​​are not allowed in most cases. What is a suitable default value? Is the current time suitable? Is the year 2000 suitable? Is the opening time of the 2008 Beijing Olympics suitable?
6. For numeric types, such as int and decimal, if NULL is prohibited, what is the appropriate default value? Is 0 appropriate? Is -1 appropriate? -9999999…Is it appropriate? Is 10086 suitable? Is 1024 suitable? To be honest, no default is appropriate, NULL itself is the most appropriate.

My personal opinion is very clear. Unless there is a special requirement that a field must not have a NULL value, under normal circumstances, the NULL is NULL.
If NULL has no meaning, then the database should just not have NULL. In fact, which database doesn't have a NULL type?
Of course, it is not ruled out that some DBAs come up with some unfounded things in order to appear professional. There is a trend now that the more restrictions that can be put forward on the database, the more superior they feel.

I remembered an interesting thing about default values. When I was watching a video on Bilibili, a certain uploader mentioned that because Bilibili set the registered users as male by default and the date of birth as a certain specified date, the uploader obtained some incomprehensible data after analyzing the user's clicks.

My personal knowledge is limited, but the data is honest. I really want to know what other positive effects "all fields are not null" will bring, and how to measure this positive factor. Also, have you really done it? Can you prohibit the fields in all the database tables under the entire instance from being nullable?

This is the end of this article about the detailed usage of null and not null when creating tables in MySQL. For more relevant content about null and not null when creating tables in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Why should MySQL fields use NOT NULL?
  • Solve the problem of MySQL using not in to include null values
  • Solve the problem of not finding NULL from set operation to mysql not like
  • Detailed explanation of the difference between MySQL null and not null and null and empty value''''''''
  • Solution to the problem of null column in NOT IN filling pit in MySQL
  • Should nullable fields in MySQL be set to NULL or NOT NULL?
  • MySQL query empty fields or non-empty fields (is null and not null)
  • mysql not in, left join, IS NULL, NOT EXISTS efficiency problem record
  • MySQL not null constraint case explanation

<<:  JavaScript drag time drag case detailed explanation

>>:  Script to quickly list all host names (computer names) in the LAN under Linux

Recommend

Is mysql a relational database?

MySQL is a relational database management system....

How to use mixins in Vue

Table of contents Preface How to use Summarize Pr...

How to configure path alias for react scaffolding

The react version when writing this article is 16...

Some experience sharing on enabling HTTPS

As the domestic network environment continues to ...

JS implements simple addition and subtraction of shopping cart effects

This article example shares the specific code of ...

MySQL uses the truncate command to quickly clear all tables in a database

1. Execute the select statement first to generate...

An article to help you understand the basics of VUE

Table of contents What is VUE Core plugins in Vue...

How to use Docker to build a tomcat cluster using nginx (with pictures and text)

First, create a tomcat folder. To facilitate the ...

How to write object and param to play flash in firefox

Copy code The code is as follows: <object clas...

A simple example of how to implement fuzzy query in Vue

Preface The so-called fuzzy query is to provide q...

In-depth understanding of Mysql logical architecture

MySQL is now the database used by most companies ...

Detailed explanation of the seven data types in JavaScript

Table of contents Preface: Detailed introduction:...

How to solve "Unable to start mysql service error 1069"

Today, when I was on the road, a colleague sent m...