PrefaceThe string types of MySQL database are CHAR, VARCHAR, BINARY, BLOB, TEXT, ENUM, and SET. Different types have completely different performances in terms of business design and database performance. CHAR and VARCHAR are the most commonly used. Today I will take you to learn more about the application of string types CHAR and VARCHAR. Definition of CHAR and VARCHARCHAR(N) is used to store characters of fixed length. The range of N is 0 ~ 255. Please keep in mind that N represents characters, not bytes. VARCHAR(N) is used to store variable-length characters. The range of N is 0 ~ 65536, and N also represents characters. If the size exceeds 65536 bytes, you can consider using a larger character type, TEXT or BLOB. Both have a maximum storage length of 4G. The difference is that BLOB has no character set attribute and is purely binary storage. Unlike traditional relational databases such as Oracle and SQL Server, the VARCHAR character type of the MySQL database can store a maximum of 65536 bytes. Therefore, in the MySQL database, the VARCHAR type is sufficient in most scenarios. Character Set In the table structure design, in addition to defining columns as CHAR and VARCHAR to store characters, it is also necessary to define the character set corresponding to the characters, because each character corresponds to a different binary value under different character set encodings. Common character sets include GBK and UTF8. It is usually recommended to set the default character set to UTF8. And with the rapid development of mobile Internet, it is recommended to set the default character set of MySQL to UTF8MB4. Otherwise, some emoji characters cannot be stored in the UTF8 character set. For example, the emoji smiley face has a corresponding character encoding of 0xF09F988E: If you forcibly insert emoji characters into a column whose character set is UTF8, MySQL will throw the following error message: mysql> SHOW CREATE TABLE emoji_test\G *************************** 1. row *************************** Table: emoji_test Create Table: CREATE TABLE `emoji_test` ( `a` varchar(100) CHARACTER SET utf8, PRIMARY KEY (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.01 sec) mysql> INSERT INTO emoji_test VALUES (0xF09F988E); ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'a' at row 1 Including MySQL version 8.0, the character set is set to UTF8MB4 by default. Before version 8.0, the default character set was Latin1. Because the default character sets of different versions are different, you need to explicitly configure the relevant parameters in the configuration file: [mysqld] character-set-server = utf8mb4 ... In addition, different character sets have different corresponding longest bytes for CHAR(N) and VARCHAR(N). For example, in the GBK character set, one character can be stored in a maximum of 2 bytes, and in the UTF8MB4 character set, one character can be stored in a maximum of 4 bytes. So from the perspective of the underlying storage kernel, under a multi-byte character set, the underlying implementations of CHAR and VARCHAR are exactly the same, both are variable-length storage! From the above example, we can see that CHAR(1) can store both 1 byte of 'a' and 4 bytes of emoji smiley face, so CHAR is also variable-length in nature. Since the current recommended default character set is UTF8MB4, when designing the table structure, you can replace all CHAR with VARCHAR, and the underlying storage is essentially the same. Sorting rules Collation is a rule for comparing and sorting strings. Each character set has a default collation, which you can view with the command SHOW CHARSET: mysql> SHOW CHARSET LIKE 'utf8%'; +---------+---------------+--------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+---------------+--------------------+--------+ | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | +---------+---------------+--------------------+--------+ 2 rows in set (0.01 sec) mysql> SHOW COLLATION LIKE 'utf8mb4%'; +----------------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +----------------------------+---------+-----+---------+----------+---------+---------------+ | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD | | utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD | | utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD | | utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE | ...... The sorting rules end with _ci, which means case insensitivity, _cs, which means case sensitivity, and _bin, which means comparison by storing the binary representation of the characters. It should be noted that when comparing MySQL strings, the default collation is case-insensitive: mysql> SELECT 'a' = 'A'; +-----------+ | 'a' = 'A' | +-----------+ | 1 | +-----------+ 1 row in set (0.00 sec) mysql> SELECT CAST('a' as char) COLLATE utf8mb4_0900_as_cs = CAST('A' as CHAR) COLLATE utf8mb4_0900_as_cs as result; +--------+ | result | +--------+ | 0 | +--------+ 1 row in set (0.00 sec) Keep in mind that the table structure design for most businesses does not require the sorting rules to be set to case-sensitive! Unless you can understand what your business really needs. Correctly modify the character set Of course, I believe that many businesses do not consider the impact of character sets on business data storage during design, so character set conversion is required later. However, many students will find that after performing the following operations, they still cannot insert UTF8MB4 characters such as emoji: ALTER TABLE emoji_test CHARSET utf8mb4; In fact, the above modification only changes the character set of the table to UTF8MB4. The next time you add a new column, if you do not explicitly specify the character set, the character set of the new column will be changed to UTF8MB4. However, for existing columns, their default character set will not be changed. You can confirm this by running the SHOW CREATE TABLE command: mysql> SHOW CREATE TABLE emoji_test\G *************************** 1. row *************************** Table: emoji_test Create Table: CREATE TABLE `emoji_test` ( `a` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, PRIMARY KEY (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) You can see that the character set of column a is still UTF8, not UTF8MB4. Therefore, the correct command to modify the column character set should be ALTER TABLE ... CONVERT TO... so that the previous column a character set can be changed from UTF8 to UTF8MB4: mysql> ALTER TABLE emoji_test CONVERT TO CHARSET utf8mb4; Query OK, 0 rows affected (0.94 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SHOW CREATE TABLE emoji_test\G *************************** 1. row *************************** Table: emoji_test Create Table: CREATE TABLE `emoji_test` ( `a` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL, PRIMARY KEY (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) Practical Design of Business Table StructureUser gender design When designing a table structure, you will encounter some fields with fixed optional values. For example, the gender field (Sex) can only have male or female; or the state field (State) has limited valid values such as running, stopped, and restarted. CREATE TABLE `User` ( `id` bigint NOT NULL AUTO_INCREMENT, `sex` tinyint DEFAULT NULL, ...... PRIMARY KEY (`id`) )ENGINE=InnoDB; Among them, the tinyint column sex indicates the user's gender, but this design problem is quite obvious.
Prior to MySQL 8.0, you could use the ENUM string enumeration type, which allowed only a limited number of defined values to be inserted. If the parameter SQL_MODE is set to strict mode, inserting undefined data will result in an error: mysql> SHOW CREATE TABLE User\G *************************** 1. row *************************** Table: User Create Table: CREATE TABLE `User` ( `id` bigint NOT NULL AUTO_INCREMENT, `sex` enum('M','F') COLLATE utf8mb4_general_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB 1 row in set (0.00 sec) mysql> SET sql_mode = 'STRICT_TRANS_TABLES'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> INSERT INTO User VALUES (NULL,'F'); Query OK, 1 row affected (0.08 sec) mysql> INSERT INTO User VALUES (NULL,'A'); ERROR 1265 (01000): Data truncated for column 'sex' at row 1 Because the ENUM type is not a SQL standard data type, but a string type unique to MySQL. The error message thrown is not intuitive either. This implementation is always a bit regrettable, mainly because MySQL versions before 8.0 do not provide constraint functionality. Since MySQL 8.0.16, the database natively provides the CHECK constraint function, which can facilitate the design of finite state column types: mysql> SHOW CREATE TABLE User\G *************************** 1. row *************************** Table: User Create Table: CREATE TABLE `User` ( `id` bigint NOT NULL AUTO_INCREMENT, `sex` char(1) COLLATE utf8mb4_general_ci DEFAULT NULL, PRIMARY KEY (`id`), CONSTRAINT `user_chk_1` CHECK (((`sex` = _utf8mb4'M') or (`sex` = _utf8mb4'F'))) ) ENGINE=InnoDB 1 row in set (0.00 sec) mysql> INSERT INTO User VALUES (NULL,'M'); Query OK, 1 row affected (0.07 sec) mysql> INSERT INTO User VALUES (NULL,'Z'); ERROR 3819 (HY000): Check constraint 'user_chk_1' is violated. From this code, we can see that the constraint definition user_chk_1 in line 8 indicates the value range of the sex column, which can only be M or F. At the same time, when row 15 inserts the illegal data Z, you can see that MySQL explicitly throws a prompt about the illegal constraint. Account password storage design Remember, when designing the database table structure, never store passwords directly in the database table. Once a malicious user enters the system, there is a great risk of user data leakage. For example, in the financial industry, from a compliance perspective, all user privacy fields need to be encrypted, and even the business itself cannot know the information stored by the user (privacy data such as login passwords, mobile phones, credit card information, etc.). I believe that many developers will use the MD5 function to encrypt and store private data. This is correct because the MD5 algorithm is not reversible. However, the value after MD5 encryption is fixed. For example, the password 12345678 has a fixed MD5 value of 25d55ad283aa400af464c76d713c07ad. Therefore, MD5 can be brute-force cracked to calculate the MD5 values corresponding to all possible strings. If it is impossible to enumerate all string combinations, you can calculate some common passwords, such as 111111, 12345678, etc. The website I put in the document can be used to decrypt MD5 encrypted strings online. Therefore, when designing password storage, you need to add salt. The salt value of each company is different, so the calculated value is also different. If the salt value is psalt, the value of the password 12345678 in the database is: password = MD5('psalt12345678') This password storage design is an encryption algorithm with a fixed salt value, which has three main problems: If the salt value is leaked by a (former) employee, there is still a possibility that an external hacker can crack it at great profit; For the same password, the password storage value is the same. Once a user's password is leaked, the passwords of other users with the same password will also be leaked. The MD5 encryption algorithm is used fixedly. Once the MD5 algorithm is cracked, the impact will be great. Therefore, a truly good password storage design should be: dynamic salt + non-fixed encryption algorithm. I recommend designing passwords like this. The password column is stored in the following format: $salt$cryption_algorithm$value in:
At this time, the structure design of the User table is as follows: CREATE TABLE User ( id BIGINT NOT NULL AUTO_INCREMENT, name VARCHAR(255) NOT NULL, sex CHAR(1) NOT NULL, password VARCHAR(1024) NOT NULL, regDate DATETIME NOT NULL, CHECK (sex = 'M' OR sex = 'F'), PRIMARY KEY(id) ); SELECT * FROM User\G *************************** 1. row *************************** id: 1 name: David sex: M password: $fgfaef$v1$2198687f6db06c9d1b31a030ba1ef074 regDate: 2020-09-07 15:30:00 *************************** 2. row *************************** id: 2 name: Amy sex: F password: $zpelf$v2$0x860E4E3B2AA4005D8EE9B7653409C4B133AF77AEF53B815D31426EC6EF78D882 regDate: 2020-09-07 17:28:00 In the above example, the passwords of users David and Amy are both 12345678. However, due to the use of dynamic salt and dynamic encryption algorithm, the contents stored in the two are completely different. Even if a user with ulterior motives obtains the current password encryption algorithm, the user's stored passwords can be upgraded through the encryption algorithm $cryption_algorithm version to further prevent malicious data attacks. SummarizeThis is the end of this article about COLLATION in MySQL that you may have overlooked. For more information about COLLATION in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Web Standard Application: Redesign of Tencent QQ Home Page
>>: 5 solutions to CSS box collapse
<br />I'm basically going crazy with thi...
Volume data volume is an important concept of Doc...
In actual development or production environments,...
Table of contents 1 A brief introduction to syste...
A while ago, I wrote a blog post titled "Can...
The most common, most commonly used and most gener...
Links to the current page. ------------------- Com...
HTML+CSS 1. Understanding and knowledge of WEB st...
1. Basic implementation of limit In general, the ...
Achieve results Implementation ideas The melting ...
Table of contents Preface difficulty Cross-domain...
Table of contents Implementation ideas There are ...
I solved a problem tonight that has been botherin...
In many projects, it is necessary to implement th...
1. Introduction When the amount of data in the da...