Solution to the problem of failure to insert emoji expressions into MySQL

Solution to the problem of failure to insert emoji expressions into MySQL

Preface

I always thought that UTF-8 was a universal solution to character set problems until I encountered this problem recently. I was working on a crawler for Sina Weibo recently. When saving data, I found that as long as I kept the emoji expression, the following exception would be thrown:

Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...'

As we all know, UTF-8 is 3 bytes, which includes most of the fonts we see in daily life. But 3 bytes are far from enough to accommodate all the text, so utf8mb4 was created. Utf8mb4 is a superset of utf8, which takes up 4 bytes and is backward compatible with utf8. The emoji expressions we use in daily life are 4 bytes.

So here we insert data into the utf8 table and the error Incorrect string value will be reported.

A quick Google search found the solution. The specific solution is as follows:

1. Change the character set of the data table to utf8mb4

This is very simple. You can find a lot of modification statements online. However, it is recommended to rebuild the table, use mysqldump -uusername -ppassword database_name table_name > table.sql to back up the corresponding data table, and modify the character set of the table creation statement to utf8mb4. Then, re-import the sql mysql -uusername -ppassword database_name < table.sql to complete the character set modification operation.

2. MySQL database version must be 5.5.3 or above

All the articles on the Internet state that only MySQL 5.5.3 or later supports utf8mb4. However, the database version I used was 5.5.18, and I was able to solve the problem in the end. So don't rush to ask the maintenance guy to upgrade the database first. Try to solve the problem yourself first.

3. Modify the database configuration file /etc/my.cnf and restart the mysql service

The main purpose is to modify the default character set of the database, as well as the character set for connection and query. [MySQL supports emoji emoticons and upgrades the encoding to UTF8MB4][1] This article has detailed setting methods, and [In-depth MySQL character set settings][2] This article has the functions of each character set set. You can read it for yourself.

4. Upgrade MySQL Connector to 5.1.21 and above

Of all the above operations, the most critical one is step 3, modifying the database configuration file, which probably modifies

[client]
# The default character set for client source data default-character-set = utf8mb4
[mysqld]
# The default character set on the server is character-set-server=utf8mb4
# Connection layer default character set collation-server=utf8mb4_unicode_ci
[mysql]
# Database default character set default-character-set = utf8mb4

These configurations specify the character sets used by the pipelines through which data passes from the client to the server. Problems with any of these pipelines may result in insertion failure or garbled characters.

But in many cases, the online database cannot modify the database files at will, so our operation and maintenance colleagues decisively rejected my request to modify the database configuration file (T_T)

So the only solution is to use code. At first, I was going to start with the character set specified when connecting to JDBC.

jdbc:mysql://localhost:3306/ding?characterEncoding=UTF-8

Mainly changing UTF-8 to utf8mb4 for Java Style Charset string should solve the problem, right?

Unfortunately, Java JDBC does not have a character set corresponding to utf8mb4. When using UTF-8, it is compatible with urf8mb4 and automatically converts the character set.

For example, to use 4-byte UTF-8 character sets with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting. – [MySQL: Using Character Sets and Unicode][3]

Later, I learned that in each query request, you can explicitly specify the character set to be used. Using set names utf8mb4 , you can specify the character set of this connection as utf8mb4, but this setting will become invalid after each connection is released.

The current solution is to explicitly call and execute set names utf8mb4 when you need to insert utf8mb4, such as:

jdbcTemplate.execute("set names utf8mb4");
jdbcTempalte.execute("...");

It should be noted that when we use the ORM framework, the framework will delay committing due to performance optimization reasons. Unless the transaction ends or the user actively calls for forced commit, set names utf8mb4 that is responsible for execution will still not take effect.

Here I use myBatis, taking MessageDao as an example

// MessageDao
public interface MessageDao {
 @Update("set names utf8mb4")
 public void setCharsetToUtf8mb4();
 @Insert("insert into tb_message ......")
 public void insert(Message msg);
}
// test code
SqlSession sqlSession = sqlSessioFactory.openSession();
messageDao = sqlSession.getMapper(MessageDao.class);
messageDao.setCharsetToUtf8mb4();
// Force commit sqlSession.commit();
messageDao.insert(message);

So far, the problem has been solved.

Alas, if only things could go so smoothly. In the project, the mybatis instance is managed by Spring, which means I can't get the sqlSession, which means I can't force commit. And because of the limitations of the Spring transaction framework, it does not allow users to explicitly call force commit. I'm still struggling with this issue.

There are two solutions:

  • Using AOP, when 4-byte UTF8 characters may be inserted, the front method executes set names utf8mb4 , but this solution cannot determine whether the AOP method will be managed by Spring transactions, and whether the connection obtained in the front method is the same session as the connection object obtained later.
  • Study the creation method of Spring JDBC and write a hook to execute set names utf8mb4 every time a new database connection is created, so that every connection is guaranteed to have the character set set.

Summarize

The above is the entire content of this article. I will experiment with the above two solutions when I have time. I hope the content of this article can be of some help to your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Detailed steps to store emoji expressions in MySQL
  • Reasons and solutions for failure to insert emoji expressions in MySQL
  • Analysis of the solution to MySQL's inability to store emoji expressions
  • How to deal with errors in MySQL's emoji table storage [change encoding to utf8mb4]
  • Analysis of solutions to the problem that MySQL cannot store Emoji expressions
  • How to enable Java backend MySQL database to support emoji expressions
  • Tutorial on how to set up MySQL to store emoji characters
  • How to insert Emoji expressions into MySQL

<<:  Packetdrill's concise user guide

>>:  How to configure ssh/sftp and set permissions under Linux operating system

Recommend

IE8 provides a good experience: Activities

Today I had a sneak peek at IE8 beta 1 (hereafter...

JS implements click drop effect

js realizes the special effect of clicking and dr...

Answers to several high-frequency MySQL interview questions

Preface: In interviews for various technical posi...

JavaScript offsetParent case study

1. Definition of offsetParent: offsetParent is th...

Explanation of the working principle and usage of redux

Table of contents 1. What is redux? 2. The princi...

Detailed explanation of MySQL foreign key constraints

Official documentation: https://dev.mysql.com/doc...

How to use ES6 class inheritance to achieve a gorgeous ball effect

Table of contents introduce Implementation steps ...

Causes and solutions for MySQL deadlock

The database, like the operating system, is a sha...

JS implements simple addition and subtraction of shopping cart effects

This article example shares the specific code of ...

How to query the latest transaction ID in MySQL

Written in front: Sometimes you may need to view ...

Teach you how to make cool barcode effects

statement : This article teaches you how to imple...

JavaScript functional programming basics

Table of contents 1. Introduction 2. What is func...

jQuery plugin to implement minesweeper game (2)

This article shares the second article of using j...

Use vue to implement handwritten signature function

Personal implementation screenshots: Install: npm...