Example of utf8mb4 collation in MySQL

Example of utf8mb4 collation in MySQL

Common utf8mb4 sorting rules in MySQL are:

  • utf8mb4_0900_ai_ci
  • utf8mb4_unicode_ci
  • utf8mb4_general_ci

When the default character set of a table is set to utf8mb4 but the collation is not explicitly specified:

  • In MySQL version 5.7, the default collation is utf8mb4_general_ci.
  • In MySQL version 8.0, the default collation is utf8mb4_0900_ai_ci.

Because the utf8mb4_0900_ai_ci collation is the collation introduced in MySQL 8.0, when you import a MySQL 8.0 table into MySQL 5.7 or MySQL 5.6, the character set may not be recognized.

[Err] 1273 - Unknown collation: 'utf8mb4_0900_ai_ci'

Parameter Control

In MySQL version 5.6, the collation_server parameter is used to set the default collation at the server level.

  • If the value of the collation_database parameter is not specified when the service is started, the value of the collation_server parameter is inherited by default.
  • If no collation is specified when creating a database, the value of the collation_database parameter is used by default.

The character_set_database and collation_database parameters are deprecated in MySQL 5.7 and will be removed in a subsequent version.
MySQL adds a new parameter default_collation_for_utf8mb4 to control the default collation when using the utf8mb4 character set. The value is utf8mb4_0900_ai_ci or utf8mb4_general_ci
The default_collation_for_utf8mb4 parameter takes effect under the following conditions:

  • When using the SHOW COLLATION and SHOW CHARACTER SET commands.
  • When creating or modifying a library, specify utf8mb4 but do not specify the encoding rule.
  • When you create or modify a table, specify utf8mb4 but do not specify an encoding rule.
  • When adding or modifying a column, specify utf8mb4 but do not specify the encoding rule.
  • Others use utf8mb4 but do not specify an encoding rule.

Comparison between utf8mb4_unicode_ci and utf8mb4_general_ci

1. Accuracy

  • The utf8mb4_unicode_ci sorting rule is based on standard unicode sorting and comparison, can handle special characters, and can accurately sort in various languages.
  • The utf8mb4_general_ci sorting rule is not based on standard unicode and cannot handle some special characters.

2. Performance

  • The utf8mb4_general_ci collation is relatively good in sorting performance
  • The utf8mb4_unicode_ci sorting rule implements a complex sorting algorithm to handle special characters, and its performance is slightly worse.
  • In most scenarios, there is no significant performance difference between the two.

Supplement: MySQL--utf8mb4 and equal value query test

Test Scripts

## Delete the test table DROP TABLE IF EXISTS tb2001;
DROP TABLE IF EXISTS tb2002;
DROP TABLE IF EXISTS tb2003;

## Create a test table CREATE TABLE tb2001(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_unicode_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4 ;

CREATE TABLE tb2002(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_general_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4;

CREATE TABLE tb2003(
id INT AUTO_INCREMENT PRIMARY KEY,
c1 VARCHAR(100) COLLATE utf8mb4_0900_ai_ci,
c2 VARCHAR(100) COLLATE utf8mb4_bin
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4;

## Insert test data INSERT INTO tb2001(c1,c2) ​​VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);
INSERT INTO tb2002(c1,c2)VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);
INSERT INTO tb2003(c1,c2)VALUES(0xF09F8D83,0xF09F8D83),(0xF09FA68A,0xF09FA68A),(0xF09F8CA0,0xF09F8CA0);

## Equivalence query test SELECT * FROM tb2001 WHERE c1=0xF09F8D83;
SELECT * FROM tb2002 WHERE c1=0xF09F8D83;
SELECT * FROM tb2003 WHERE c1=0xF09F8D83;

SELECT * FROM tb2001 WHERE c2=0xF09F8D83;
SELECT * FROM tb2002 WHERE c2=0xF09F8D83;
SELECT * FROM tb2003 WHERE c2=0xF09F8D83;

Test Results

mysql> SELECT * FROM tb2001 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
| 2 | 🦊 | 🦊 |
| 3 | 🌠 | 🌠 |
+----+------+------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM tb2002 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
| 2 | 🦊 | 🦊 |
| 3 | 🌠 | 🌠 |
+----+------+------+
3 rows in set (0.01 sec)

mysql> SELECT * FROM tb2003 WHERE c1=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> 
mysql> SELECT * FROM tb2001 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tb2002 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM tb2003 WHERE c2=0xF09F8D83;
+----+------+------+
| id | c1 | c2 |
+----+------+------+
| 1 | πŸƒ | πŸƒ |
+----+------+------+
1 row in set (0.00 sec)

Test Summary

  • Using the utf8mb4 character set can store special characters such as emoticons.
  • The utf8mb4_unicode_ci and utf8mb4_general_ci collations do not "correctly match" special characters.
  • The utf8mb4_0900_ai_ci collation can "correctly match" special characters, but it is only applicable to MySQL version 8.0.
  • The utf8mb4_bin sorting rule matches special characters based on a binary system, can "correctly match" special characters, and is applicable to all versions of MySQL.

This is the end of this article about utf8mb4 sorting rules in MySQL. For more information about MySQL utf8mb4 sorting, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Steps to change mysql character set to UTF8 under Linux system
  • MySQL encoding utf8 and utf8mb4 utf8mb4_unicode_ci and utf8mb4_general_ci
  • How to change the encoding of MySQL database to utf8mb4
  • mysql charset=utf8 do you really understand what it means
  • How to change MySQL character set utf8 to utf8mb4
  • mysql garbled characters latin1 characters converted to UTF8 details

<<:  Perfect solution for JavaScript front-end timeout asynchronous operation

>>:  How to modify the default submission method of the form

Recommend

How to apply TypeScript classes in Vue projects

Table of contents 1. Introduction 2. Use 1. @Comp...

Use personalized search engines to find the personalized information you need

Many people now live on the Internet, and searchin...

A brief discussion on Axios's solution to remove duplicate requests

Table of contents 1. Cancel duplicate requests 2....

How to manually build a new image with docker

This article introduces the method of manually bu...

Tomcat's class loading mechanism process and source code analysis

Table of contents Preface 1. Tomcat class loader ...

Detailed tutorial on installing MySQL database on Alibaba Cloud Server

Table of contents Preface 1. Uninstall MySQL 2. I...

Example of how to quickly delete a 2T table in mysql in Innodb

Preface This article mainly introduces the releva...

MySQL query_cache_type parameter and usage details

The purpose of setting up MySQL query cache is: C...

How to effectively compress images using JS

Table of contents Preface Conversion relationship...

How to use vue.js to implement drag and drop function

Preface Adding drag and drop functionality is a g...

MySQL Series 10 MySQL Transaction Isolation to Implement Concurrency Control

Table of contents 1. Concurrent access control 2....

How to use Nginx to solve front-end cross-domain problems

Preface When developing static pages, such as Vue...

Detailed example of reading speed of js objects

1. Accessing literals and local variables is the ...