1. Environmental PreparationBefore MySQL 5.7.6, full-text indexing only supported English full-text indexing, not Chinese full-text indexing. It was necessary to use a word segmenter to pre-process the Chinese paragraphs into words and then store them in the database. Starting from MySQL 5.7.6, MySQL has a built-in ngram full-text parser to support Chinese, Japanese, and Korean word segmentation. The MySQL version used in this article is 5.7.22, InnoDB database engine. So here you need MySQL version greater than 5.7.6 -- Check the version of MySQL mysql> select version(); +-----------+ | version() | +-----------+ | 5.7.33 | +-----------+ 1 row in set (0.02 sec) Add word segmentation and minimum word length to the mysql configuration file (if already configured, you can ignore it) ft_min_word_len The default minimum character length is 4, which is reasonable in English but needs to be modified in Chinese. ngram_token_size The minimum length of the segmentation. For example, different lengths of the segmentation of hello world n=1: 'you', 'good', 'world', 'world' n=2: 'Hello', 'Good World', 'World' n=3: 'Hello world', 'Good world' n=4: 'Hello World' # /etc/mysql/mysql.conf.d/mysqld.cnf ft_min_word_len = 2 ngram_token_size = 2 # If not, add configuration echo 'ft_min_word_len = 2 ngram_token_size = 2' >> mysqld.cnf # Restart service /etc/init.d/mysql restart -- View the configuration of mysql> SHOW VARIABLES LIKE 'ft_min_word_len'; SHOW VARIABLES LIKE 'ngram_token_size'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | ft_min_word_len | 2 | +-----------------+-------+ 1 row in set (0.02 sec) +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 2 | +------------------+-------+ 1 row in set (0.03 sec) 2. Data Preparation-- Demo of mysql for full-text search mysql> CREATE TABLE `articles` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `title` varchar(50) DEFAULT NULL COMMENT 'Subject', `content` longtext NOT NULL COMMENT 'content', PRIMARY KEY (`id`), FULLTEXT KEY `title_content_index` (`content`,`title`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.20 sec) mysql> INSERT INTO articles (`title`, `content`) VALUES ('If', 'I will never think of you again in this life, except on some nights wet with tears, if you want'), ('Love', 'One day the road sign is moved, I hope you can be calm. One day the bridge pier is broken, I hope you can cross. One day the pillar falls, I hope you can be strong. One day the expectation fades, I hope you can understand'), ('Far and near', 'You look at me for a while and then at the clouds. I feel that when you look at me, I am very far away, and when you look at the clouds, you are very close'), ('Fragment', 'You stand on the bridge and look at the scenery, The people who are watching the scenery are watching you from upstairs. The bright moon adorns your window, You decorate other people's dreams. '), ('Solo', 'I tell you I miss you like a stone statue, silence is not right. If silence is your sadness, you know this sadness hurts me the most'); Query OK, 5 rows affected (0.08 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> SELECT * from articles where match(content, title) against('风景' in NATURAL LANGUAGE MODE) LIMIT 10; +----+--------+--------------------------------------------------------------------------------------------------------------------------+ | id | title | content | +----+--------+--------------------------------------------------------------------------------------------------------------------------+ | 10 | Fragment | You stand on the bridge and look at the scenery. The people who are watching the scenery are watching you from upstairs. The bright moon adorns your window, You decorate other people's dreams. | +----+--------+--------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.02 sec) 3. Start the show
Natural language mode is the default full-text search mode of MySQL. The natural language mode cannot use operators and cannot specify complex queries such as keywords must appear or must not appear.
The BOOLEAN mode can use operators to support complex queries such as specifying that a keyword must appear or must not appear or whether the keyword weight is high or low.
The query results not only match the results but also associate other results you need. (Similar to associated query, but the official website recommends only supporting phrase query, otherwise a lot of dirty data will appear) -- Natural language mode (NATURAL LANGUAGE MODE) query and get the score mysql> SELECT id, title, MATCH ( content, title ) against ( '风景' IN NATURAL LANGUAGE MODE ) AS score FROM articles; +----+-----------+--------------------+ | id | title | score | +----+-----------+--------------------+ | 7 | if | 0 | | 8 | Love | 0 | | 9 | Far and Near | 0 | | 10 | Fragment | 0.9771181344985962 | | 11 | Soliloquy | 0 | +----+-----------+--------------------+ 5 rows in set (0.02 sec) -- BOOLEAN MODE can be used to combine queries mysql> SELECT id, title FROM articles where MATCH ( content, title ) against ( '+风景-爱情' IN BOOLEAN MODE ); +----+--------+ | id | title | +----+--------+ | 10 | Fragment | +----+--------+ 1 row in set (0.01 sec) -- Query expansion (QUERY EXPANSION) can associate other results mysql> SELECT id, title FROM articles where MATCH ( content, title ) against ( '风景' WITH QUERY EXPANSION ); +----+--------+ | id | title | +----+--------+ | 10 | Fragment | | 11 | Soliloquy| +----+--------+ 2 rows in set (0.02 sec) 4. Word segmentation engineCurrently, the official website MeCab Full-Text Parser has a word segmentation plug-in that supports Japanese (which can better understand semantics) Built-in full-text parser Because the default word boundary in English is a space, you can simply use spaces as delimiters when processing English text. However, when processing Chinese, it is necessary to understand the semantics and perform effective word segmentation. Therefore, when processing Chinese, Japanese, and Korean, MySQL provides ngram full-text (the configuration in this article is based on ngram Chinese word segmentation) Summarizeadvantage
For Chinese word segmentation, it may be necessary to understand the semantics before effective word segmentation; for example, the English word hello world can be segmented by spaces, but for Chinese, it is necessary to understand the semantics before it can be segmented into hello/world. Here I share the jieba word segmentation in python, which helps to understand the charm of Chinese word segmentation
In [1]: import jieba In [2]: jieba.lcut("Hello World") Building prefix dict from the default dictionary ... Dumping model to file cache /var/folders/st/b16fyn3s57x_5vszjl599njw0000gn/T/jieba.cache Loading model cost 0.937 seconds. Prefix dict has been built successfully. Out[2]: ['Hello', 'World'] In [3]: jieba.lcut("hello world") Out[3]: ['hello', ' ', 'world'] For general projects, MySQL's full-text index can solve 80% of the needs. It can perfectly support Chinese search, automatic word segmentation, result sorting, combined query and other functions; but performance should be the bottleneck. Elastissearch can implement full-text search in a friendly way. Full-text indexing cannot achieve the effect of "like", as connected sentences will form multiple words due to word segmentation. ReferencesMysql fulltext The above is the details of the usage example of MySQL full-text search. For more information about the use of MySQL full-text search, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Detailed explanation of common methods of JavaScript arrays
>>: Detailed tutorial on how to quickly install Zookeeper in Docker
This article example shares the specific code of ...
npx usage tutorial Tonight, when I was learning V...
The file name of the dynamic library file under L...
The CSS counter attribute is supported by almost ...
Preface For file or directory permissions in Linu...
Add multiple lines to the specified file in Docke...
background As we all know, nginx is a high-perfor...
Does time really exist? Some people believe that ...
Table of contents Why use setState Usage of setSt...
1. Summary: In general, they can be divided into ...
How to implement the "Set as homepage" ...
Table of contents 1. Background 2. Local custom i...
1. Tomcat service is not open Enter localhost:808...
If someone asked you whether running EXPLAIN on a...
Virtual machines are very convenient testing soft...