Never use utf8 in MySQL, and always use utf8mb4. Introduction to utf8mb4 In MySQL/MariaDB, the utf8 character set is not a true implementation of Unicode, that is, it is not a true UTF-8 encoding because it supports characters of up to 3 bytes. The UTF-8 encoding can represent every symbol in the Unicode character set, ranging from U+000000 to U+10FFFF. That's 1,114,112 possible symbols. (Not all of these Unicode code points are assigned characters, but that doesn't stop UTF-8 from encoding them. Unicode actually maps characters to 1,112,064 code points in total.) UTF-8 is a variable-width encoding; it uses one to four (8-bit) bytes to encode each symbol. This saves space for storing low-code values. And it can fully represent all characters. UTF-8 is arguably the most popular encoding implementation of Unicode. The real UTF-8 encoding implemented in MySQL/MariaDB is utf8mb4, which supports up to 4 bytes of storage.
UTF8 byte count exceeded error If utf8 is used, when storing 4-byte characters such as emoji, an error Error: 1366 is usually reported:
utf8mb4 supportThe utf8mb4 in MySQL/MariaDB is a supplement to the original utf8 that can only store characters of 3 bytes in size. It is a true UTF-8 encoding. Supported since MySQL 5.5.3+. Starting from MySQL 8.0, the default character set has changed to utf8mb4. That is, if you use MySQL 8.0 and above, you don't have to worry about the byte length issue. The commonly used sorting rules corresponding to utf8mb4 are: utf8mb4_general_ci and utf8mb4_unicode_ci. Set the default character encoding to utf8mb4 and the corresponding sorting rules. View the current codeAfter logging in to MySQL/MariaDB, use SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; to view the character set and collation. View the character set: MariaDB [(none)]> show variables like 'character%'; +--------------------------+------------------------------------------------+ | Variable_name | Value | +--------------------------+------------------------------------------------+ | character_set_client | gbk | | character_set_connection | gbk | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | C:\Program Files\MariaDB 10.3n\share\charsets\ | +--------------------------+------------------------------------------------+ 8 rows in set (0.043 sec) Since it is installed under Chinese Windows, it contains gbk. Modify the MySQL/Mariadb configuration file and change the utf8 encoding to utf8mb4
my.cnf under Linux: > vim /etc/my.cnf # # This group is read both by the client and the server # use it for options that affect everything # [client-server] [mysqld] character-set-server = utf8mb4 init_connect = 'SET NAMES utf8mb4' collation-server=utf8mb4_unicode_ci character-set-client-handshake=FALSE [client] default-character-set=utf8mb4 [mysql] default-character-set=utf8mb4 # # include all files from the config directory # !includedir /etc/my.cnf.d my.ini under Windows: [mysqld] datadir=C:/Program Files/MariaDB 10.3n/data port=3306 character_set_server=utf8mb4 # Using unique option prefix 'character_set_client' is error-prone and can break in the future. Please use the full name 'character-set-client-handshake' instead. character-set-client-handshake=utf8mb4 # character_set_client=utf8mb4 collation-server=utf8mb4_unicode_ci init_connect = 'SET NAMES utf8mb4' skip-character-set-client-handshake=false character_set_filesystem = binary lower_case_table_names=2 [mysqldump] loose_character_set_client=utf8mb4 [mysql] default-character-set=utf8mb4 [client] port=3306 plugin-dir=C:/Program Files/MariaDB 10.3n/lib/plugin default-character-set=utf8mb4 Several setting items are described:
The pure and concise settings for the utf8mb4 character set and sorting rules are as follows, and only 5 settings are required: [client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci Restart MySQL/MariaDBIn Linux, use systemctl restart mariadb or systemctl restart mysqld. Must use restart (not reload). The service started or restarted by earlier versions of MariaDB is named mysqld (mysqld.service). systemctl restart mysqld Open "Services" in Windows, find the corresponding service of MariaDB or MySQL in the services, right-click and restart it. Check the character set and collation again After the above changes take effect after restart, check again, as shown below, it has become utf8mb4 and utf8mb4_unicode_ci. MariaDB [(none)]> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | collation_connection | utf8mb4_unicode_ci | | collation_database | utf8mb4_unicode_ci | | collation_server | utf8mb4_unicode_ci | +--------------------------+--------------------+ 10 rows in set (0.008 sec) Explanation of character_set_filesystem and character_set_systemFrom the character set and sorting rules shown above, you can see that two items are not utf8mb4.
Other references about character set settings The above settings are already the correct ones to modify utf8mb4. Several possible related settings are listed below for possible reference (although they are rarely used): [client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] explicit_defaults_for_timestamp = 1 # Posting it here as a tip to disable the Timestamp message, maybe it can help someone :) character-set-client-handshake = 0 # FALSE init_connect = 'SET character_set_system = utf8mb4' init_connect = 'SET character_set_connection = utf8mb4' init_connect = 'SET character_set_database = utf8mb4' init_connect = 'SET character_set_results = utf8mb4' init_connect = 'SET collation_database = utf8mb4_unicode_ci' init_connect = 'SET collation_connection = utf8mb4_unicode_ci' init_connect = 'SET NAMES utf8mb4' character-set-server = utf8mb4 #character_set_client = utf8mb4 collation-server = utf8mb4_unicode_ci collation_connection = utf8mb4_unicode_ci collation_database = utf8mb4_unicode_ci Multiple init_connect can also be set like this: init_connect = 'SET collation_connection = utf8mb4_unicode_ci,NAMES utf8mb4'. Unverified The complete process of switching the character set of an existing database to utf8mb4 If the database is created with utf8mb4, there will be no such troubles! ! ! Step 1: Create a backup Create a backup of all databases on the server to be upgraded. Safety first! Step 2: Upgrade MySQL server Upgrade MySQL server to v5.5.3+. When MySQL is upgraded to 8.0, there is no need to modify the MySQL server character set in step 5. The default is utf8mb4. MariaDB also needs to modify the character set. Step 3: Modify the database, tables, and columns Change the character set and collation properties of the database, tables, and columns to use utf8mb4. # For each database: ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; # For each table: ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; # For each column: ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; # (Don't copy and paste blindly! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.) # Or modefy statement ALTER TABLE table_name MODIFY column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; Step 4: Check the maximum length of column and index keys This is probably the most tedious part of the entire upgrade process. When converting from utf8 to utf8mb4, the maximum length of a column or index key in bytes is unchanged. Therefore, it is smaller in terms of characters, since the maximum length of a character is now four bytes instead of three. For example, a TINYTEXT column can hold up to 255 bytes, which correlates to 85 three-byte characters or 63 four-byte characters. Suppose you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. Given this requirement, you can't convert this column to utf8mb4 unless you also change the data type to a longer type, such as TEXT — because if you try to fill it with four-byte characters, you'll only be able to enter 63 characters, but no more. Step 5: Modify the connection, client, and server character sets In your application code, set the connection character set to utf8mb4. This can be done by simply replacing SET NAMES utf8 with SET NAMES utf8mb4. At the same time, the collation rules must also be modified accordingly, for example, SET NAMES utf8 COLLATE utf8_unicode_ci becomes SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci. Make sure to set both the client and server character sets. The MySQL configuration file (/etc/my.cnf) contains the following: [client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci Step 6: Repair and optimize all tables # For each table REPAIR TABLE table_name; OPTIMIZE TABLE table_name; This can be easily done in one go using the command-line mysqlcheck utility: $ mysqlcheck -u root -p --auto-repair --optimize --all-databases This will prompt for the root user's password, after which all tables in all databases will be repaired and optimized. refer toThe main reference is translated from: How to support full Unicode in MySQL databases, welcome to read the original text. This is the end of this article about how to support full Unicode in MySQL/MariaDB. For more information about MySQL/MariaDB supporting Unicode, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Several ways to implement image adaptive container with CSS (summary)
>>: Solution to the blank page after vue.js packaged project
load Request Success Request failed Click cmd and...
This article shares the specific code of vue unia...
SVG (Scalable Vector Graphics) is an image format...
student.xml <?xml version="1.0" enco...
Table of contents 1. Introduction 2. Component De...
Table of contents The first step of optimization:...
1. Always close HTML tags In the source code of p...
1. Add the following dependencies in pom.xml <...
Table of contents What is Flattening recursion to...
What is the input type="file"? I don'...
1. Download, install and configure mysql-8.0.15 1...
1. Command Introduction The read command is a bui...
Preface After this blog post was published, some ...
Requirement: Sometimes, when the page content is ...
The CSS position attribute specifies the element&...