MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci

MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci

Reference: MySQL character set summary

utf8mb4 has become the default character set in MySQL 8.0, with utf8mb4_0900_ai_ci as the default collation in MySQL 8.0.1 and later.

Only utf8mb4 is considered for new projects

UTF-8 encoding is a variable-length encoding mechanism that can store characters using 1 to 4 bytes.

Due to historical issues, the utf8 encoding in MySQL is not true UTF-8, but a truncated version with a maximum length of only 3 bytes. When encountering UTF-8 encoding that takes up 4 bytes, such as emoji characters or complex Chinese characters, storage exceptions will occur.

Starting from 5.5.3, MySQL began to use utf8mb4 encoding to implement full UTF-8, where mb4 means most bytes 4, which occupies a maximum of 4 bytes. Starting from 8.0, utf8mb4 will be used as the default character encoding in a certain version.

Set the server default character set to utf8mb4

When creating a database, if no character set is specified, the server's default character set is used. Setting the server's default character set to utf8mb4 can improve convenience.

Edit the MySQL configuration file

You only need to care about 5 system variables. If all of them are changed to utf8mb4, the modification will be successful:
character_set_client
character_set_connection
character_set_results
character_set_server
character_set_database

my.cnf is the configuration file of MySQL. Remember to back it up before modifying it:

vi /etc/my.cnf

After adding default-character-set=utf8 under [mysqld], the server cannot be started for unknown reasons. Later I changed it like this (MySQL 5.7):

[mysqld]
init_connect = 'SET collation_connection = utf8mb4_unicode_ci' 
init_connect = 'SET NAMES utf8mb4' 
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci 
skip-character-set-client-handshake
...
[client]
default-character-set=utf8mb4

The default setting for MySQL 8.0 is utf8mb4, so there is no need to change it. If you want to change it, the configuration file is as follows:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
character-set-server = utf8mb4
[client]
default-character-set=utf8mb4

Restart and confirm

You can see that the system encoding, connection encoding, server encoding, and client encoding are all set to UTF-8:

mysql> show variables like "%char%";
+--------------------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
| validate_password.special_char_count | 1 |
+--------------------------------------+--------------------------------+
9 rows in set (0.00 sec)

Character set related variables in MySQL

character_set_client: The character set of the client request data
character_set_connection: The character set in which data is received from the client and then transmitted
character_set_database: The character set of the default database. This character set is used regardless of how the default database is changed. If there is no default database, the character set specified by character_set_server is used. It is recommended that this variable be managed by the system itself and not defined manually.
character_set_filesystem: Convert the file name on the operating system to this character set, that is, convert character_set_client to character_set_filesystem. The default binary does not perform any conversion.
character_set_results: character set of the result set
character_set_server: The default character set of the database server
character_set_system: The character set used to store system metadata, always utf8, no need to set

When creating a database, specify the character set as utf8mb4

If the database default character set is not utf8mb4, you can specify the character set when creating the database:

CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

collation

In addition to storage, characters also need to be sorted or compared. It is recommended to use utf8mb4_unicode_ci, but there is no problem using utf8mb4_general_ci.

The default value of MySQL 8.0 is utf8mb4_0900_ai_ci, which is a type of utf8mb4_unicode_ci. The specific meanings are as follows:

  • uft8mb4 means using the UTF-8 encoding scheme, with each character occupying a maximum of 4 bytes.
  • 0900 refers to the Unicode Collation Algorithm version. (The Unicode Collation Algorithm is a method for comparing two Unicode strings that conforms to the requirements of the Unicode Standard).
  • ai refers to accent insensitivity. That is, there is no difference between e, è, é, ê, and ë when sorting.
  • ci means case-insensitive. That is, there is no difference between p and P when sorting.

utf8mb4 has become the default character set, with utf8mb4_0900_ai_ci as the default collation in MySQL 8.0.1 and later. Previously, utf8mb4_general_ci was the default collation. Since the utf8mb4_0900_ai_ci collation is now the default collation, new tables can store characters outside the Basic Multilingual Plane by default. Emojis can now be stored by default. If you need accent sensitivity and case sensitivity, you can use utf8mb4_0900_as_cs instead.

Summarize

This is the end of this article about MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci. For more relevant MySQL encoding utf8 and utf8mb4 content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Steps to change mysql character set to UTF8 under Linux system
  • Example of utf8mb4 collation in MySQL
  • How to change the encoding of MySQL database to utf8mb4
  • mysql charset=utf8 do you really understand what it means
  • How to change MySQL character set utf8 to utf8mb4
  • mysql garbled characters latin1 characters converted to UTF8 details

<<:  Detailed explanation of various practical uses of virtual device files in Linux system

>>:  Solution to the problem of repeated pop-up of Element's Message pop-up window

Recommend

Tips for creating two-dimensional arrays in JavaScript

Creation of a two-dimensional array in Js: First ...

SQL implementation of LeetCode (182. Duplicate mailboxes)

[LeetCode] 182.Duplicate Emails Write a SQL query...

How to create Baidu dead link file

There are two types of dead link formats defined b...

Nginx 502 Bad Gateway Error Causes and Solutions

I have encountered the Nginx 502 Bad Gateway erro...

Bootstrap realizes the effect of carousel

This article shares the specific code of Bootstra...

Summary of experience in using div box model

Calculation of the box model <br />Margin + ...

MySQL calculates the number of days, months, and years between two dates

The MySQL built-in date function TIMESTAMPDIFF ca...

Summary of several submission methods of HTML forms

The most common, most commonly used and most gener...

How to quickly build a static website on Alibaba Cloud

Preface: As a junior programmer, I dream of build...

How to delete an image in Docker

The command to delete images in docker is docker ...

How to start a transaction in MySQL

Preface This article mainly introduces how to sta...

Detailed explanation of the role of key in React

Table of contents Question: When the button is cl...

Docker image cannot be deleted Error: No such image: xxxxxx solution

Preface The docker image cannot be deleted. Check...

MySQL optimization tutorial: large paging query

Table of contents background LIMIT Optimization O...