Preface Some time ago, due to project requirements, we needed to search chat records based on keywords. Isn’t this the function of a search engine? So the first thing that came to my mind was the ElasticSearch distributed search engine, but for some reasons, the company's server resources were relatively tight, and there were no extra machines to deploy an ElasticSearch service. In addition, the online time was relatively tight, and the amount of data was not large. Then I thought of MySQL's full-text index. IntroductionIn fact, MySQL has supported full-text indexing for a long time, but it has only supported English searches. Starting from version 5.7.6, MySQL has a built-in ngram full-text parser to support Chinese, Japanese, and Korean word segmentation. Mysql full-text index adopts the principle of inverted index. In the inverted index, the keyword is the primary key, and each keyword corresponds to a series of files in which the keyword appears. In this way, when a user searches for a keyword, the sorting program locates the keyword in the inverted index and can immediately find all the files containing the keyword. This article is tested based on MySQL 8.0, and the database engine used is InnoDB ngram full-text parserAn ngram is a sequence of n consecutive words in a text. The ngram full-text parser is able to tokenize text, where each word is a sequence of n consecutive words. For example, use the ngram full-text parser to segment "你好靓仔": n=1: 'you', 'good', 'pretty', 'boy' n=2: 'Hello', 'Very pretty', 'Handsome boy' n=3: 'You are so pretty', 'You are so pretty' n=4: 'Hello handsome boy' In MySQL, the global variable You can view the default show variables like 'ngram_token_size' There are two ways to set the value of the global variable 1. Specify when starting the mysqld command: mysqld --ngram_token_size=2 2. Modify the Mysql configuration file my.ini and add a line of parameters at the end: ngram_token_size=2 Create a full-text index1. Create a full-text index when building a table CREATE TABLE `article` ( `id` bigint NOT NULL, `url` varchar(1024) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '', `title` varchar(256) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '', `source` varchar(32) COLLATE utf8mb4_general_ci DEFAULT '', `keywords` varchar(32) COLLATE utf8mb4_general_ci DEFAULT NULL, `publish_time` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `title_index` (`title`) WITH PARSER `ngram` ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci; 2. Through alter table method ALTER TABLE article ADD FULLTEXT INDEX title_index(title) WITH PARSER ngram; 3. Through create index method CREATE FULLTEXT INDEX title_index ON article (title) WITH PARSER ngram; Search method1. Natural Language Search (NATURAL LANGUAGE MODE)Natural language mode is the default full-text search mode of MySQL. The natural language mode cannot use operators and cannot specify complex queries such as keywords must appear or must not appear. Example select * from article where MATCH(title) AGAINST ('Beijing Tourism' IN NATURAL LANGUAGE MODE); // If no mode is specified, the natural language mode is used by default select * from article where MATCH(title) AGAINST ('北京旅游'); It can be seen that in this mode, searching for "Beijing Travel" can search for content containing "Beijing" or "Travel", because it is divided into two keywords based on natural language. In the above example, the results returned are automatically sorted by matching degree, with the highest matching degree at the front. The matching degree is a non-negative floating point number. Example // Check the matching degree select * , MATCH(title) AGAINST ('Beijing Tourism') as score from article where MATCH(title) AGAINST ('Beijing Tourism' IN NATURAL LANGUAGE MODE); 2. Boolean search (BOOLEAN MODE)Boolean search mode can use operators to support complex queries such as specifying that a keyword must appear or must not appear or whether the keyword weight is high or low. Example // No operator // Contains "Dating" or "Strategy" select * from article where MATCH(title) AGAINST ('Dating Guide' IN BOOLEAN MODE); // Use operator // Must contain "Dating", can contain "Strategy" select * from article where MATCH(title) AGAINST ('+Dating Guide' IN BOOLEAN MODE); More operator examples: 'Dating Tips' No operator, means OR, either contains "Dating" or "Strategy" '+Dating+Strategies' Must contain both words '+Dating Tips' It must include "Dating", but the match is higher if it also includes "Strategy". '+Dating-Strategy' It must contain "Date" and cannot contain "Strategy". '+Dating~Strategy' "Dating" must be included, but if "Strategy" is also included, the match score is lower than the record without "Strategy". '+Dating+(>Strategies<Tips)' The query must contain records for "dating" and "strategies" or "dating" and "skills", but "dating strategies" has a higher match than "dating skills". 'Dating*' The query includes records that begin with "Appointment". '"Dating Tips"' Use double quotes to enclose the words to be searched, the effect is similar to like '%Dating Guide%', For example, "Dating strategies for beginners" will be matched, but "Dating strategies" will not be matched. Compare with LikeCompared with like query, full-text index has the following advantages:
And the performance of full-text search is better than that of like query The following is a test based on about 50w data: // like query select * from article where title like '%北京%'; // Full-text index query select * from article where MATCH(title) AGAINST ('北京' IN BOOLEAN MODE); It can be seen that the like query is 1.536s, and the full-text index query is 0.094s, which is about 16 times faster. SummarizeFull-text indexing enables fast searches, but there is also the overhead of maintaining the index. The larger the field length, the larger the full-text index created, which affects the throughput of DML statements. If the amount of data is not large, you can use full-text indexing for searching, which is simple and convenient. However, if the amount of data is large, it is recommended to use a dedicated search engine ElasticSearch to do this. This is the end of this article about the sample code for implementing a simple search engine in MySQL. For more relevant MySQL search engine content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Share 5 JS high-order functions
1. Check whether the MySQL service is started. If...
Automatic web page refresh: Add the following code...
The default table name is base_data and the json ...
Sometimes we may need to operate servers in batch...
Azure Container Registry is a managed, dedicated ...
Without further ado, let me show you the code. Th...
The benefits of using MySQL master-slave replicat...
1. Differences between JSON.stringify() and JSON....
1. Environmental Preparation 1.1 Basic Environmen...
today select * from table name where to_days(time...
As the number of visits increases, the pressure o...
Just like this effect, the method is also very si...
This article shares the specific code of JavaScri...
Prototype chain inheritance Prototype inheritance...
In Black Duck's 2017 open source survey, 77% ...