Example of utf8mb4 collation in MySQL

Example of utf8mb4 collation in MySQL

Common utf8mb4 sorting rules in MySQL are:

  • utf8mb4_0900_ai_ci
  • utf8mb4_unicode_ci
  • utf8mb4_general_ci

When the default character set of a table is set to utf8mb4 but the collation is not explicitly specified:

  • In MySQL version 5.7, the default collation is utf8mb4_general_ci.
  • In MySQL version 8.0, the default collation is utf8mb4_0900_ai_ci.

Because the utf8mb4_0900_ai_ci collation is the collation introduced in MySQL 8.0, when you import a MySQL 8.0 table into MySQL 5.7 or MySQL 5.6, the character set may not be recognized.

[Err] 1273 - Unknown collation: 'utf8mb4_0900_ai_ci'

Parameter Control

In MySQL version 5.6, the collation_server parameter is used to set the default collation at the server level.

  • If the value of the collation_database parameter is not specified when the service is started, the value of the collation_server parameter is inherited by default.
  • If no collation is specified when creating a database, the value of the collation_database parameter is used by default.

The character_set_database and collation_database parameters are deprecated in MySQL 5.7 and will be removed in a subsequent version.
MySQL adds a new parameter default_collation_for_utf8mb4 to control the default collation when using the utf8mb4 character set. The value is utf8mb4_0900_ai_ci or utf8mb4_general_ci
The default_collation_for_utf8mb4 parameter takes effect under the following conditions:

  • When using the SHOW COLLATION and SHOW CHARACTER SET commands.
  • When creating or modifying a library, specify utf8mb4 but do not specify the encoding rule.
  • When you create or modify a table, specify utf8mb4 but do not specify an encoding rule.
  • When adding or modifying a column, specify utf8mb4 but do not specify the encoding rule.
  • Others use utf8mb4 but do not specify an encoding rule.

Comparison between utf8mb4_unicode_ci and utf8mb4_general_ci

1. Accuracy

  • The utf8mb4_unicode_ci sorting rule is based on standard unicode sorting and comparison, can handle special characters, and can accurately sort in various languages.
  • The utf8mb4_general_ci sorting rule is not based on standard unicode and cannot handle some special characters.

2. Performance

  • The utf8mb4_general_ci collation is relatively good in sorting performance
  • The utf8mb4_unicode_ci sorting rule implements a complex sorting algorithm to handle special characters, and its performance is slightly worse.
  • In most scenarios, there is no significant performance difference between the two.

Supplement: MySQL--utf8mb4 and equal value query test

Test Scripts

## Delete the test table DROP TABLE IF EXISTS tb2001;
DROP TABLE IF EXISTS tb2002;
DROP TABLE IF EXISTS tb2003;

## Create a test table CREATE TABLE tb2001(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_unicode_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4 ;

CREATE TABLE tb2002(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_general_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4;

CREATE TABLE tb2003(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_0900_ai_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4;

## Insert test data INSERT INTO tb2001(c1,c2) ​​VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);
INSERT INTO tb2002(c1,c2)VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);
INSERT INTO tb2003(c1,c2)VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);

## Equivalence query test SELECT * FROM tb2001 WHERE c1=0xF09F8D83;
SELECT * FROM tb2002 WHERE c1=0xF09F8D83;
SELECT * FROM tb2003 WHERE c1=0xF09F8D83;

SELECT * FROM tb2001 WHERE c2=0xF09F8D83;
SELECT * FROM tb2002 WHERE c2=0xF09F8D83;
SELECT * FROM tb2003 WHERE c2=0xF09F8D83;

Test Results

mysql> SELECT * FROM tb2001 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
| 2 | 🦊 | 🦊 |
| 3 | 🌠 | 🌠 |
+----+------+------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM tb2002 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
| 2 | 🦊 | 🦊 |
| 3 | 🌠 | 🌠 |
+----+------+------+
3 rows in set (0.01 sec)

mysql> SELECT * FROM tb2003 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> 
mysql> SELECT * FROM tb2001 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tb2002 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tb2003 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

Test Summary

  • Using the utf8mb4 character set can store special characters such as emoticons.
  • The utf8mb4_unicode_ci and utf8mb4_general_ci collations do not "correctly match" special characters.
  • The utf8mb4_0900_ai_ci collation can "correctly match" special characters, but it is only applicable to MySQL version 8.0.
  • The utf8mb4_bin sorting rule matches special characters based on a binary system, can "correctly match" special characters, and is applicable to all versions of MySQL.

This is the end of this article about utf8mb4 sorting rules in MySQL. For more information about MySQL utf8mb4 sorting, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Steps to change mysql character set to UTF8 under Linux system
  • MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci
  • How to change the encoding of MySQL database to utf8mb4
  • mysql charset=utf8 do you really understand what it means
  • How to change MySQL character set utf8 to utf8mb4
  • mysql garbled characters latin1 characters converted to UTF8 details

<<:  Perfect solution for JavaScript front-end timeout asynchronous operation

>>:  How to modify the default submission method of the form

Recommend

How to use not in to optimize MySql

Recently, when using select query in a project, I...

In-depth interpretation of /etc/fstab file in Linux system

Preface [root@localhost ~]# cat /etc/fstab # # /e...

Detailed explanation of JavaScript prototype and examples

Table of contents The relationship between the co...

HTTP header information interpretation and analysis (detailed summary)

HTTP Header Explanation 1. Accept: Tells the web s...

JavaScript imitates Taobao magnifying glass effect

This article shares the specific code for JavaScr...

Pure js to achieve typewriter effect

This article example shares the specific code of ...

Detailed explanation of JavaScript state container Redux

Table of contents 1. Why Redux 2. Redux Data flow...

Detailed explanation of Vue router routing

Table of contents 1. Basic use 2. Several points ...

Summary of MySql index, lock, and transaction knowledge points

This article summarizes the knowledge points of M...

Solution to the IP address not being displayed under Linux

Table of contents Preface Solution: Step 1 Step 2...

MySQL DML statement summary

DML operations refer to operations on table recor...