MySQL full-text search usage examples

MySQL full-text search usage examples

1. Environmental Preparation

Before MySQL 5.7.6, full-text indexing only supported English full-text indexing, not Chinese full-text indexing. It was necessary to use a word segmenter to pre-process the Chinese paragraphs into words and then store them in the database. Starting from MySQL 5.7.6, MySQL has a built-in ngram full-text parser to support Chinese, Japanese, and Korean word segmentation. The MySQL version used in this article is 5.7.22, InnoDB database engine.

So here you need MySQL version greater than 5.7.6

-- Check the version of MySQL mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.33 |
+-----------+
1 row in set (0.02 sec)

Add word segmentation and minimum word length to the mysql configuration file (if already configured, you can ignore it)

ft_min_word_len The default minimum character length is 4, which is reasonable in English but needs to be modified in Chinese.

ngram_token_size The minimum length of the segmentation. For example, different lengths of the segmentation of hello world

n=1: 'you', 'good', 'world', 'world' 
n=2: 'Hello', 'Good World', 'World' 
n=3: 'Hello world', 'Good world' 
n=4: 'Hello World'
# /etc/mysql/mysql.conf.d/mysqld.cnf

ft_min_word_len = 2
ngram_token_size = 2


# If not, add configuration echo 'ft_min_word_len = 2
ngram_token_size = 2' >> mysqld.cnf

# Restart service /etc/init.d/mysql restart

-- View the configuration of mysql> 
SHOW VARIABLES LIKE 'ft_min_word_len';
SHOW VARIABLES LIKE 'ngram_token_size';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| ft_min_word_len | 2 |
+-----------------+-------+
1 row in set (0.02 sec)
 
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| ngram_token_size | 2 |
+------------------+-------+
1 row in set (0.03 sec)


2. Data Preparation

-- Demo of mysql for full-text search

mysql> CREATE TABLE `articles` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(50) DEFAULT NULL COMMENT 'Subject',
  `content` longtext NOT NULL COMMENT 'content',
  PRIMARY KEY (`id`),
  FULLTEXT KEY `title_content_index` (`content`,`title`) /*!50100 WITH PARSER `ngram` */ 
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.20 sec)
 

mysql> INSERT INTO articles (`title`, `content`) VALUES
        ('If', 'I will never think of you again in this life, except on some nights wet with tears, if you want'),
        ('Love', 'One day the road sign is moved, I hope you can be calm. One day the bridge pier is broken, I hope you can cross. One day the pillar falls, I hope you can be strong. One day the expectation fades, I hope you can understand'),
        ('Far and near', 'You look at me for a while and then at the clouds. I feel that when you look at me, I am very far away, and when you look at the clouds, you are very close'),
        ('Fragment', 'You stand on the bridge and look at the scenery,
The people who are watching the scenery are watching you from upstairs.
The bright moon adorns your window,
You decorate other people's dreams. '),
        ('Solo', 'I tell you I miss you like a stone statue, silence is not right. If silence is your sadness, you know this sadness hurts me the most');

Query OK, 5 rows affected (0.08 sec)
Records: 5 Duplicates: 0 Warnings: 0
 
mysql> SELECT * from articles where match(content, title) against('风景' in NATURAL LANGUAGE MODE) LIMIT 10;
+----+--------+--------------------------------------------------------------------------------------------------------------------------+
| id | title | content |
+----+--------+--------------------------------------------------------------------------------------------------------------------------+
| 10 | Fragment | You stand on the bridge and look at the scenery.
The people who are watching the scenery are watching you from upstairs.
The bright moon adorns your window,
You decorate other people's dreams. |
+----+--------+--------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

3. Start the show

  • NATURAL LANGUAGE MODE

Natural language mode is the default full-text search mode of MySQL. The natural language mode cannot use operators and cannot specify complex queries such as keywords must appear or must not appear.

  • BOOLEAN MODE

The BOOLEAN mode can use operators to support complex queries such as specifying that a keyword must appear or must not appear or whether the keyword weight is high or low.

  • Query Expansion

The query results not only match the results but also associate other results you need. (Similar to associated query, but the official website recommends only supporting phrase query, otherwise a lot of dirty data will appear)

-- Natural language mode (NATURAL LANGUAGE MODE) query and get the score mysql> SELECT id, title, MATCH ( content, title ) against ( '风景' IN NATURAL LANGUAGE MODE ) AS score FROM articles;
+----+-----------+--------------------+
| id | title | score |
+----+-----------+--------------------+
| 7 | if | 0 |
| 8 | Love | 0 |
| 9 | Far and Near | 0 |
| 10 | Fragment | 0.9771181344985962 |
| 11 | Soliloquy | 0 |
+----+-----------+--------------------+
5 rows in set (0.02 sec)

-- BOOLEAN MODE can be used to combine queries mysql> SELECT id, title FROM articles where MATCH ( content, title ) against ( '+风景-爱情' IN BOOLEAN MODE );
+----+--------+
| id | title |
+----+--------+
| 10 | Fragment |
+----+--------+
1 row in set (0.01 sec)

-- Query expansion (QUERY EXPANSION) can associate other results mysql> SELECT id, title FROM articles where MATCH ( content, title ) against ( '风景' WITH QUERY EXPANSION );
+----+--------+
| id | title |
+----+--------+
| 10 | Fragment |
| 11 | Soliloquy|
+----+--------+
2 rows in set (0.02 sec)
 

4. Word segmentation engine

Currently, the official website MeCab Full-Text Parser has a word segmentation plug-in that supports Japanese (which can better understand semantics)

Built-in full-text parser Because the default word boundary in English is a space, you can simply use spaces as delimiters when processing English text. However, when processing Chinese, it is necessary to understand the semantics and perform effective word segmentation. Therefore, when processing Chinese, Japanese, and Korean, MySQL provides ngram full-text (the configuration in this article is based on ngram Chinese word segmentation)

Summarize

advantage

  • Compared with like query, the efficiency is improved (no specific test was done)
  • Full-text search can index multiple fields at the same time, but like can only search a single field.

For Chinese word segmentation, it may be necessary to understand the semantics before effective word segmentation; for example, the English word hello world can be segmented by spaces, but for Chinese, it is necessary to understand the semantics before it can be segmented into hello/world.

Here I share the jieba word segmentation in python, which helps to understand the charm of Chinese word segmentation

Jieba word segmentation uses a Chinese vocabulary to calculate the association probability between Chinese characters that constitute words. Therefore, by calculating the probability between Chinese characters, the result of word segmentation can be formed.

In [1]: import jieba

In [2]: jieba.lcut("Hello World")
Building prefix dict from the default dictionary ...
Dumping model to file cache /var/folders/st/b16fyn3s57x_5vszjl599njw0000gn/T/jieba.cache
Loading model cost 0.937 seconds.
Prefix dict has been built successfully.
Out[2]: ['Hello', 'World']

In [3]: jieba.lcut("hello world")
Out[3]: ['hello', ' ', 'world']

For general projects, MySQL's full-text index can solve 80% of the needs. It can perfectly support Chinese search, automatic word segmentation, result sorting, combined query and other functions; but performance should be the bottleneck. Elastissearch can implement full-text search in a friendly way.

Full-text indexing cannot achieve the effect of "like", as connected sentences will form multiple words due to word segmentation.

References

Mysql fulltext

The above is the details of the usage example of MySQL full-text search. For more information about the use of MySQL full-text search, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • In-depth analysis of Chinese full-text search in MySQL 5.7
  • MySQL full-text search Chinese solution and example code
  • Mysql implementation of full-text search and keyword scoring method example
  • MySQL 5.7.25 full-text search tutorial

<<:  Detailed explanation of common methods of JavaScript arrays

>>:  Detailed tutorial on how to quickly install Zookeeper in Docker

Recommend

How to use ssh tunnel to connect to mysql server

Preface In some cases, we only know the intranet ...

Introduction to new ECMAscript object features

Table of contents 1. Object properties 1.1 Attrib...

Nginx handles http request implementation process analysis

Nginx first decides which server{} block in the c...

Detailed explanation of the relationship between React and Redux

Table of contents 1. The relationship between red...

Nodejs converts JSON string into JSON object error solution

How to convert a JSON string into a JSON object? ...

JS implements request dispatcher

Table of contents Abstraction and reuse Serial Se...

HTML Tutorial: Collection of commonly used HTML tags (4)

Related articles: Beginners learn some HTML tags ...

Vue method to verify whether the username is available

This article example shares the specific code of ...

Detailed explanation of how MySQL (InnoDB) handles deadlocks

1. What is deadlock? The official definition is a...

Detailed explanation of the simple use of MySQL query cache

Table of contents 1. Implementation process of qu...

How to use JSZip compression in CocosCreator

CocosCreator version: 2.4.2 Practical project app...

How to use uni-app to display buttons and search boxes in the top navigation bar

Recently, the company is preparing to develop an ...

Solve the compatibility issue between MySQL 8.0 driver and Alibaba Druid version

This article mainly introduces the solution to th...