Implementing a search engine based on MySQL Preface: In fact, MySQL has supported full-text indexing for a long time, but it has only supported English searches. Starting from version 5.7.6, MySQL has a built-in ngram full-text parser to support Chinese, Japanese, and Korean word segmentation. Mysql full-text index adopts the principle of inverted index. In the inverted index, the keyword is the primary key, and each keyword corresponds to a series of files in which the keyword appears. In this way, when a user searches for a keyword, the sorting program locates the keyword in the inverted index and can immediately find all the files containing the keyword.
1. ngram full-text parserAn ngram is a sequence of n consecutive words in a text. The ngram full-text parser is able to tokenize text, where each word is a sequence of n consecutive words. For example, use the ngram full-text parser to segment "你好靓仔": n=1: 'you', 'good', 'pretty', 'boy' n=2: 'Hello', 'Very pretty', 'Handsome boy' n=3: 'You are so pretty', 'You are so pretty' n=4: 'Hello handsome boy' In MySQL, the global variable You can view the default show variables like 'ngram_token_size' There are two ways to set the value of the global variable (1) When starting the mysqld command, specify: mysqld --ngram_token_size=2 (2) Modify the MySQL configuration file my.ini and add a line of parameters at the end: ngram_token_size=2 2. Create a full-text index1. Create a full-text index when building a tableCREATE TABLE `article` ( `id` bigint NOT NULL, `url` varchar(1024) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '', `title` varchar(256) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '', `source` varchar(32) COLLATE utf8mb4_general_ci DEFAULT '', `keywords` varchar(32) COLLATE utf8mb4_general_ci DEFAULT NULL, `publish_time` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `title_index` (`title`) WITH PARSER `ngram` ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci; 2. Through alter table methodALTER TABLE article ADD FULLTEXT INDEX title_index(title) WITH PARSER ngram; 3. Through create index methodCREATE FULLTEXT INDEX title_index ON article (title) WITH PARSER ngram; 3. Search method1. Natural Language Search (NATURAL LANGUAGE MODE)Natural language mode is the default full-text search mode of MySQL. The natural language mode cannot use operators and cannot specify complex queries such as keywords must appear or must not appear. Example: select * from article where MATCH(title) AGAINST ('Beijing Tourism' IN NATURAL LANGUAGE MODE); // If no mode is specified, the natural language mode is used by default select * from article where MATCH(title) AGAINST ('北京旅游'); It can be seen that in this mode, searching for "Beijing Travel" can search for content containing "Beijing" or "Travel", because it is divided into two keywords based on natural language. In the above example, the results returned are automatically sorted by matching degree, with the highest matching degree at the front. The matching degree is a non-negative floating point number. Example: // Check the matching degree select * , MATCH(title) AGAINST ('Beijing Tourism') as score from article where MATCH(title) AGAINST ('Beijing Tourism' IN NATURAL LANGUAGE MODE); 2. Boolean search (BOOLEAN MODE)Boolean search mode can use operators to support complex queries such as specifying that a keyword must appear or must not appear or whether the keyword weight is high or low. Example: // No operator // Contains "Dating" or "Strategy" select * from article where MATCH(title) AGAINST ('Dating Guide' IN BOOLEAN MODE); // Use operator // Must contain "Dating", can contain "Strategy" select * from article where MATCH(title) AGAINST ('+Dating Guide' IN BOOLEAN MODE); More operator examples: 'Dating Tips' No operator, means OR, either contains "Dating" or "Strategy" '+Dating+Strategies' Must contain both words '+Dating Tips' It must include "Dating", but the match is higher if it also includes "Strategy". '+Dating-Strategy' It must contain "Date" and cannot contain "Strategy". '+Dating~Strategy' "Dating" must be included, but if "Strategy" is also included, the match score is lower than the record without "Strategy". '+Dating+(>Strategies<Tips)' The query must contain records for "dating" and "strategies" or "dating" and "skills", but "dating strategies" has a higher match than "dating skills". 'Dating*' The query includes records that begin with "Appointment". '"Dating Tips"' Use double quotes to enclose the words to be searched, the effect is similar to like '%Dating Guide%', For example, "Dating strategies for beginners" will be matched, but "Dating strategies" will not be matched. 4. Comparison with LikeCompared with like query, full-text index has the following advantages:
And the performance of full-text search is better than that of like query The following is a test based on about 50w data: // like query select * from article where title like '%北京%'; // Full-text index query select * from article where MATCH(title) AGAINST ('北京' IN BOOLEAN MODE); It can be seen that the like query is 1.536s, and the full-text index query is 0.094s, which is about 16 times faster. Full-text indexing enables fast searches, but there is also the overhead of maintaining the index. The larger the field length, the larger the full-text index created, which affects the throughput of DML statements. If the amount of data is not large, you can use full-text indexing for searching, which is simple and convenient. However, if the amount of data is large, it is recommended to use a dedicated search engine ElasticSearch to do this. The above is the details of implementing a simple search engine based on MySQL. For more information about implementing search engines with MySQL, please pay attention to other related articles on 123WORDPRESS.COM! I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Implementing custom radio and check box functions with pure CSS
>>: How to customize more beautiful link prompt effect with CSS
Table of contents 1. fill() syntax 2. Use of fill...
Preface In WeChat applet, you can use globalData ...
Table of contents 1. TypeScript is introduced int...
Table of contents Short Introduction 1. Check the...
Solution to the problem that there is no unzip co...
Preface After a failover occurs, a common problem...
one. wget https://dev.mysql.com/get/mysql57-commu...
Table of contents Preface Introduction to Closure...
With the development of Internet technology, user...
1. After creating the web project, you now need t...
In Nginx, there are some advanced scenarios where...
Prototype chain inheritance Prototype inheritance...
During the configuration of Jenkins+Tomcat server...
This article describes the MySQL transaction mana...
The connection method in MySQL table is actually ...