1. Introduction Today a colleague asked me how to use MySQL to implement a full-text search function similar to ElasticSearch and score the search keywords? I immediately had a question in my mind? Why not just use es? Simple, easy to use and fast. But he said that the amount of data is not large, the time given by the client is very limited, and there is no time to build es, so let's take a look at the full-text search function of MySQL! Starting from version 5.7.6, MySQL has a built-in ngram full-text parser to support Chinese, Japanese, and Korean word segmentation. Prior to MySQL 5.7.6, full-text indexing only supported English full-text indexing, not Chinese full-text indexing. It was necessary to use a word segmenter to pre-process the Chinese paragraphs into words and then store them in the database. This article was tested using Mysql 5.7.6, InnoDB database engine. mysql full text search 2. Full-text parser ngram An ngram is a sequence of n consecutive words in a text. The ngram full-text parser can segment text into words, where each word is a sequence of n consecutive words. n=1: 'you', 'good', 'world', 'world' n=2: 'Hello', 'Good World', 'World' n=3: 'Hello world', 'Good world' n=4: 'Hello World' MySQL uses the global variable ngram_token_size to configure the size of n in ngram. Its value range is 1 to 10, and the default value is 2. Usually ngram_token_size is set to the minimum number of words to be queried. If you need to search for single words, set ngram_token_size to 1. With the default value of 2, searching for a single word will yield no results. Because a Chinese word consists of at least two characters, the default value of 2 is recommended. Let's take a look at the default ngram_token_size of MySQL: show variables like 'ngram_token_size' There are two ways to set the ngram_token_size variable: 1. Specify when starting the mysqld command mysqld --ngram_token_size=2 2. Modify the MySQL configuration file [mysqld] ngram_token_size=2 3. Full-text index Taking a document data as an example, a new data table t_wenshu is created, and a full-text index is created for the document content field, and 100,000 test data are imported. 1. Create a full-text index when building a table CREATE TABLE `t_wenshu` ( `province` varchar(255) DEFAULT NULL, `caseclass` varchar(255) DEFAULT NULL, `casenumber` varchar(255) DEFAULT NULL, `caseid` varchar(255) DEFAULT NULL, `types` varchar(255) DEFAULT NULL, `title` varchar(255) DEFAULT NULL, `content` longtext, `updatetime` varchar(255) DEFAULT NULL, FULLTEXT KEY `content` (`content`) WITH PARSER `ngram` )ENGINE=InnoDB DEFAULT CHARSET=utf8; 2. Through alter table method ALTER TABLE t_wenshu ADD FULLTEXT INDEX content_index (content) WITH PARSER ngram; 3. Through create index method CREATE FULLTEXT INDEX content_index ON t_wenshu (content) WITH PARSER ngram; 4. Search Mode Natural Language Retrieval (IN NATURAL LANGUAGE MODE) Natural language mode is the default full-text search mode of MySQL. The natural language mode cannot use operators and cannot specify complex queries such as keywords must appear or must not appear. Boolean search (IN BOOLEAN MODE) Eliminate words that appear in more than half of the matching rows. For example, if every row contains the word "this", no results will be found when searching with "this". This is very useful when there are a large number of records, because the database believes that it is meaningless to find all the rows. In this case, "this" is almost regarded as a stopword. Operators can be used in Boolean search mode to support complex queries such as specifying that a keyword must appear or must not appear or whether the keyword weight is high or low.
Query expansion search Note: (WITH QUERY EXPANSION) Since query expansion may bring many irrelevant queries, use it with caution! 5. Search Query 1) Query the records containing "theft" in the content. The query statement is as follows select caseid,content, MATCH ( content) AGAINST ('Theft') as score from t_wenshu where MATCH ( content) AGAINST ('Theft' IN NATURAL LANGUAGE MODE) 2) Search for records containing "挑交惹事" in the content. The query statement is as follows select caseid,content, MATCH ( content) AGAINST ('寻怨惹事') as score from t_wenshu where MATCH ( content) AGAINST ('寻怨惹事' IN NATURAL LANGUAGE MODE) ; 3) Single Chinese character, search for records containing "我" in the content, the query statement is as follows select caseid,content, MATCH ( content) AGAINST ('我') as score from t_wenshu where MATCH ( content) AGAINST ('我' IN NATURAL LANGUAGE MODE) ; Note: Because the value of the global variable ngram_token_size is set to 2. If you want to query a single Chinese character, you need to modify ngram_token_size = 1 in the configuration file my.ini and restart the mysqld service. I will not try this here. 4) The query field content contains "dangerous driving" and "picking quarrels and provoking trouble" as follows: select caseid,content, MATCH (content) AGAINST ('+dangerous driving+picking quarrels and provoking troubles') as score from t_wenshu where MATCH (content) AGAINST ('+dangerous driving+picking quarrels and provoking troubles' IN BOOLEAN MODE); 5) The query field content contains "dangerous driving" but does not contain "picking quarrels and provoking trouble" as follows: select caseid,content, MATCH (content) AGAINST ('+dangerous driving-picking quarrels and provoking troubles') as score from t_wenshu where MATCH (content) AGAINST ('+dangerous driving-picking quarrels and provoking troubles' IN BOOLEAN MODE); 6) The query field conent contains "dangerous driving" or "picking quarrels and provoking trouble" as follows: select caseid,content, MATCH (content) AGAINST ('Dangerous driving and provoking trouble') as score from t_wenshu where MATCH (content) AGAINST ('Dangerous driving and provoking trouble' IN BOOLEAN MODE); VI. Conclusion 1) Before using MySQL full-text indexing, find out the support status of each version; 2) Full-text indexing is N times faster than like + %, but there may be accuracy issues; 3) If a large amount of data needs to be fully indexed, it is recommended to add the data first and then create the index; 4) For Chinese, you can use MySQL 5.7.6 or later versions, or third-party plug-ins such as Sphinx and Lucene; 5) The field name used by the MATCH() function must be consistent with the field name specified when creating the full-text index, and can only be fields of the same table and cannot cross tables; This is the end of this article about MySQL full-text search and keyword scoring. For more relevant MySQL full-text search and keyword scoring content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Common array operations in JavaScript
>>: Ubuntu starts the SSH service remote login operation
1. Introduction to Nginx Nginx is a web server th...
Table of contents 1. Create HTML structure 2. Cre...
Setting min-width and max-width properties in tab...
You may not have had any relevant needs for this ...
This article describes the mysql show operation w...
This article example shares the specific code of ...
1. Purpose Through this article, everyone can und...
MySQL Query Cache is on by default. To some exten...
1. Cancel the dotted box when the button is press...
In this article, I will show you how to develop a...
Table of contents 1. Prototype 2. Prototype chain...
Error occurs: When exporting the database from My...
Preface According to the project needs, Vue-touch...
#include <linux/moduleparam.h> 1. Module pa...
Using the CSS3 border-image property, you can set...