Why should MySQL fields use NOT NULL?

Why should MySQL fields use NOT NULL?

I recently joined a new company and found some minor problems in the database design. Many database fields do not have NOT NULL, which is simply unbearable for a patient with late-stage obsessive-compulsive disorder, so I wrote this article.

Based on most of the current development status, we will set all fields to NOT NULL and give default values.

  • Usually, the default value is set like this:
  • Integer, we generally use 0 as the default value.
  • String, default empty string

Time can be the default 1970-01-01 08:00:01 , or the default 0000-00-00 00:00:00 , but the connection parameters need to add zeroDateTimeBehavior=convertToNull . It is recommended not to use this default time format.

But, think about the reason, why should it be set to NOT NULL?

There is a passage from High Performance MySQL:

Try to avoid NULL

Many tables contain columns that can be NULL (empty values), even if the application does not need to store NULL values, because NULL is the default property of the column. It is usually best to specify that a column is NOT NULL unless you really need to store NULL values.

Queries involving NULL columns are more difficult for MySQL to optimize because NULL columns make indexes, index statistics, and value comparisons more complicated. Columns that can be NULL use more storage space and require special handling in MySQL. When a NULL-capable column is indexed, one extra byte is required for each index record, which in MyISAM can even cause a fixed-size index (such as an index on a single integer column) to become a variable-size index.

Generally, the performance improvement of changing a NULLable column to NOT NULL is small, so there is no need to find and modify this situation in an existing schema first (when tuning) unless it is certain that this will cause problems. However, if you plan to create an index on a column, you should try to avoid designing it to be NULL-able.

Of course there are exceptions. For example, it is worth mentioning that InnoDB uses a separate bit to store NULL values, so it has good space efficiency for sparse data. This does not apply to MyISAM.

The description in the book mentions several major issues. I will leave aside the MyISAM issue for now and focus on InnoDB as a consideration here.

  • If NOT NULL is not set, NULL is the default value of the column. If it is not necessary, try not to use NULL.
  • Using NULL brings more problems, such as more complicated indexes, index statistics, and value calculations. If you use indexes, you should avoid setting columns to NULL.
  • If it is an index column, it will cause storage space problems, require additional special processing, and will also cause more storage space to be occupied.
  • It has better space efficiency for sparse data. Sparse data refers to the situation where many values ​​are NULL and only a few rows have non-NULL values ​​in the columns.

default value

For MySql, if it is not actively set to NOT NULL, the default value when inserting data is NULL.

The meanings of the empty values ​​used by NULL and NOT NULL are different. NULL can be considered that the value of this column is unknown, while the empty value can be considered that we know the value, but it is empty.

For example, if a name field in a table is NULL, we can assume that we don’t know what the name is. Conversely, if it is an empty string, we can assume that we know there is no name, and it is a null value.

For most programs, there is no special need for fields to be NULL. On the contrary, NULL values ​​will cause problems such as null pointers in the program.

For most of the current situations where MyBatis is used, I recommend using the default generated insertSelective method or a purely manual insert method to avoid the problem of default values ​​not taking effect or insertion errors caused by adding NOT NULL fields.

Value calculation

Inaccurate aggregate functions

For columns with NULL values, NULL values ​​are ignored when using aggregate functions.

Now we have a table. name field is NULL by default. At this time, the result of count name is 1, which is wrong.

count(*) counts the number of rows in the table, while count(name) counts the non-NULL columns in the table.

= Invalid

For columns with NULL values, you cannot use the = expression to make judgments. The following query on name is invalid and you must use is NULL .

Operations with other values

NULL and any other value operation is NULL, including the value of the expression is also NULL.

age of the second record in user table is NULL, so it is still NULL after +1 . name is NULL, so the result is still NULL after the concat operation.

You can look at the following example again. Any operation with NULL will result in NULL. Imagine if a field you designed is NULL and you accidentally perform various operations on it, what will be the result. . .

distinct, group by, order by

For distinct and group by , all NULL values ​​are considered equal, and for order by , ascending NULL values ​​are sorted first.

Other issues

There is only one record with a name in the table. At this time, the expected result of querying name !=a should be to find the remaining two records, but it will be found that the expected result does not match.

Indexing issues

To verify the impact of NULL fields on indexes, add indexes to name and age respectively.

There are many online opinions that say that indexes cannot be used if the query is NULL. This is not accurate. According to the official document [3], indexes can be used normally when using is NULL and range queries. The actual verification results seem to be the same. See the following example.

Then we continued to insert some data into the database for testing. When the number of NULL column values ​​increased, we found that the index became invalid.

We know that the execution process of a query SQL is roughly like this:

First, the connector is responsible for connecting to the specified database, and then checking whether there is this statement in the query cache. If so, it returns the result directly.

If the cache does not hit, the analyzer is needed to perform syntax and lexical analysis on the SQL statement to determine whether the SQL statement is legal.

Now we come to the optimizer, which will choose which index to use more reasonably, and the specific execution plan of the SQL statement will be determined.

Finally, the executor is responsible for executing the statement, querying whether there is permission, and returning the execution result.

From the simple test results above, we can see that if there are NULLs in the index column, the optimizer will be more complicated and more difficult to optimize when making index selection, as mentioned in the book.

Storage Space

A row of records in the database is also stored in the final disk file as a row. For InnoDB, there are four row storage formats: REDUNDANT , COMPACT , DYNAMIC , and COMPRESSED .

The default row storage format of InnoDB is COMPACT . The storage format is shown below. The dotted part may not exist.

Variable-length field length list: If there are multiple fields, they are stored in reverse order. We only have one field, so we don’t need to consider that much. The storage format is hexadecimal. If there is no variable-length field, this part is not needed.

NULL value list: used to store the cases where the value in our record is NULL. If there are multiple NULL values, they are also stored in reverse order, and must be an integer multiple of 8 bits. If it is less than 8 bits, the high bits are filled with 0. 1 represents NULL, and 0 represents not NULL. If both are NOT NULL then this exists.

ROW_ID: The unique identifier of a row of records. When no primary key is specified, the automatically generated ROW_ID serves as the primary key.

TRX_ID: transaction ID.

ROLL_PRT: Rollback pointer.

Finally, the value of each column.

To illustrate the storage format issue, I'll create a table for testing. In this table, only the c1 field is NOT NULL, and the others can be NULL.

Variable field length list: The lengths of c1 and c3 fields are 1 and 2 respectively, so the length converted to hexadecimal is 0x01 0x02 , and after reverse order it is 0x02 0x01 .

NULL value list: Because there are columns that allow NULL, c2,c3,c4 are 010 respectively. The same is true after reversing. At the same time, the high bits are filled with 0 to make 8 bits. The result is 00000010 .

We will ignore the other fields for now. The result of the first record is, of course, we will not consider the result after encoding here.

This is the format of a complete data row. On the contrary, if we set all fields to NOT NULL and insert a data a,bb,ccc,dddd , the storage format should be like this:

Although we find that NULL itself does not occupy storage space, if NULL exists, it will occupy an extra byte of flag space.

Article reference documents:

https://dev.mysql.com/doc/refman/8.0/en/problems-with-null.html
https://dev.mysql.com/doc/refman/8.0/en/working-with-null.html
https://dev.mysql.com/doc/refman/5.6/en/is-null-optimization.html
https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format.html
https://www.cnblogs.com/zhoujinyi/articles/2726462.html

This is the end of this article about why MySQL fields use NOT NULL. For more information about using NOT NULL in MySQL fields, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Solve the problem of MySQL using not in to include null values
  • Solve the problem of not finding NULL from set operation to mysql not like
  • Detailed explanation of the difference between MySQL null and not null and null and empty value''''''''
  • Detailed explanation of the usage of NULL and NOT NULL when creating tables in MySQL
  • Solution to the problem of null column in NOT IN filling pit in MySQL
  • Should nullable fields in MySQL be set to NULL or NOT NULL?
  • MySQL query empty fields or non-empty fields (is null and not null)
  • mysql not in, left join, IS NULL, NOT EXISTS efficiency problem record
  • MySQL not null constraint case explanation

<<:  Detailed process of using nginx to build a webdav file server in Ubuntu

>>:  Web Design Summary

Recommend

Small details of web front-end development

1 The select tag must be closed <select><...

Detailed explanation of Linux kernel macro Container_Of

Table of contents 1. How are structures stored in...

How to expand the disk size of a virtual machine

After Vmvare sets the disk size of the virtual ma...

WeChat applet learning notes: page configuration and routing

I have been studying and reviewing the developmen...

Mini Programs enable product attribute selection or specification selection

This article shares the specific code for impleme...

JS cross-domain solution react configuration reverse proxy

Cross-domain solutions jsonp (simulate get) CORS ...

The latest 36 high-quality free English fonts shared

01. Infinity Font Download 02. Banda Font Download...

The principle and basic use of Vue.use() in Vue

Table of contents Preface 1. Understanding with e...

How to use Vue to implement CSS transitions and animations

Table of contents 1. The difference between trans...

Introduction to the usage of common XHTML tags

There are many tags in XHTML, but only a few are ...

MySQL Server 8.0.3 Installation and Configuration Methods Graphic Tutorial

This document records the installation and config...