MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci

MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci

Reference: MySQL character set summary

utf8mb4 has become the default character set in MySQL 8.0, with utf8mb4_0900_ai_ci as the default collation in MySQL 8.0.1 and later.

Only utf8mb4 is considered for new projects

UTF-8 encoding is a variable-length encoding mechanism that can store characters using 1 to 4 bytes.

Due to historical issues, the utf8 encoding in MySQL is not true UTF-8, but a truncated version with a maximum length of only 3 bytes. When encountering UTF-8 encoding that takes up 4 bytes, such as emoji characters or complex Chinese characters, storage exceptions will occur.

Starting from 5.5.3, MySQL began to use utf8mb4 encoding to implement full UTF-8, where mb4 means most bytes 4, which occupies a maximum of 4 bytes. Starting from 8.0, utf8mb4 will be used as the default character encoding in a certain version.

Set the server default character set to utf8mb4

When creating a database, if no character set is specified, the server's default character set is used. Setting the server's default character set to utf8mb4 can improve convenience.

Edit the MySQL configuration file

You only need to care about 5 system variables. If all of them are changed to utf8mb4, the modification will be successful:
character_set_client
character_set_connection
character_set_results
character_set_server
character_set_database

my.cnf is the configuration file of MySQL. Remember to back it up before modifying it:

vi /etc/my.cnf

After adding default-character-set=utf8 under [mysqld], the server cannot be started for unknown reasons. Later I changed it like this (MySQL 5.7):

[mysqld]
init_connect = 'SET collation_connection = utf8mb4_unicode_ci' 
init_connect = 'SET NAMES utf8mb4' 
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci 
skip-character-set-client-handshake
...
[client]
default-character-set=utf8mb4

The default setting for MySQL 8.0 is utf8mb4, so there is no need to change it. If you want to change it, the configuration file is as follows:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
character-set-server = utf8mb4
[client]
default-character-set=utf8mb4

Restart and confirm

You can see that the system encoding, connection encoding, server encoding, and client encoding are all set to UTF-8:

mysql> show variables like "%char%";
+--------------------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
| validate_password.special_char_count | 1 |
+--------------------------------------+--------------------------------+
9 rows in set (0.00 sec)

Character set related variables in MySQL

character_set_client: The character set of the client request data
character_set_connection: The character set in which data is received from the client and then transmitted
character_set_database: The character set of the default database. This character set is used regardless of how the default database is changed. If there is no default database, the character set specified by character_set_server is used. It is recommended that this variable be managed by the system itself and not defined manually.
character_set_filesystem: Convert the file name on the operating system to this character set, that is, convert character_set_client to character_set_filesystem. The default binary does not perform any conversion.
character_set_results: character set of the result set
character_set_server: The default character set of the database server
character_set_system: The character set used to store system metadata, always utf8, no need to set

When creating a database, specify the character set as utf8mb4

If the database default character set is not utf8mb4, you can specify the character set when creating the database:

CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

collation

In addition to storage, characters also need to be sorted or compared. It is recommended to use utf8mb4_unicode_ci, but there is no problem using utf8mb4_general_ci.

The default value of MySQL 8.0 is utf8mb4_0900_ai_ci, which is a type of utf8mb4_unicode_ci. The specific meanings are as follows:

  • uft8mb4 means using the UTF-8 encoding scheme, with each character occupying a maximum of 4 bytes.
  • 0900 refers to the Unicode Collation Algorithm version. (The Unicode Collation Algorithm is a method for comparing two Unicode strings that conforms to the requirements of the Unicode Standard).
  • ai refers to accent insensitivity. That is, there is no difference between e, è, é, ê, and ë when sorting.
  • ci means case-insensitive. That is, there is no difference between p and P when sorting.

utf8mb4 has become the default character set, with utf8mb4_0900_ai_ci as the default collation in MySQL 8.0.1 and later. Previously, utf8mb4_general_ci was the default collation. Since the utf8mb4_0900_ai_ci collation is now the default collation, new tables can store characters outside the Basic Multilingual Plane by default. Emojis can now be stored by default. If you need accent sensitivity and case sensitivity, you can use utf8mb4_0900_as_cs instead.

Summarize

This is the end of this article about MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci. For more relevant MySQL encoding utf8 and utf8mb4 content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Steps to change mysql character set to UTF8 under Linux system
  • Example of utf8mb4 collation in MySQL
  • How to change the encoding of MySQL database to utf8mb4
  • mysql charset=utf8 do you really understand what it means
  • How to change MySQL character set utf8 to utf8mb4
  • mysql garbled characters latin1 characters converted to UTF8 details

<<:  Detailed explanation of various practical uses of virtual device files in Linux system

>>:  Solution to the problem of repeated pop-up of Element's Message pop-up window

Recommend

The concept of MTR in MySQL

MTR stands for Mini-Transaction. As the name sugg...

A brief discussion on the magic of parseInt() in JavaScript

cause The reason for writing this blog is that I ...

Analysis of Nginx Rewrite usage scenarios and configuration methods

Nginx Rewrite usage scenarios 1. URL address jump...

CSS pseudo-class: empty makes me shine (example code)

Anyone who has read my articles recently knows th...

Docker implements container port binding local port

Today, I encountered a small problem that after s...

WeChat Mini Program implements the likes service

This article shares the specific code for the WeC...

In-depth explanation of currying of JS functions

Table of contents 1. Supplementary knowledge poin...

Detailed explanation of writing and using Makefile under Linux

Table of contents Makefile Makefile naming and ru...

Detailed explanation of making shooting games with CocosCreator

Table of contents Scene Setting Game Resources Tu...

How to use VUE to call Ali Iconfont library online

Preface Many years ago, I was a newbie on the ser...

Solve the problem of margin merging

1. Merge the margins of sibling elements The effe...

Implementation of a simple login page for WeChat applet (with source code)

Table of contents 1. Picture above 2. User does n...

MySql5.7.21 installation points record notes

The downloaded version is the Zip decompression v...

Node.js solves the problem of Chinese garbled characters in client request data

Node.js solves the problem of Chinese garbled cha...