Preface
use Search with MATCH() ... AGAINST match() indicates that column is being searched, against indicates that string is being searched View the default word segmentation (use these words to distinguish different keywords); you can also customize the word segmentation to distinguish different keywords SELECT * FROM information_schema.INNODB_FT_DEFAULT_STOPWORD; like +-------+ | value | +-------+ | a | | about | | an | | are | | as | | at | | be | | by | | com | | de | | en | | for | | from | Three types of full-text search methods natural language search Pass a specific string to check through MATCH AGAINST. The default method boolean search Add operators to the search string, such as "+" means must contain, "-" means not contain, "*" means wildcard, even if the passed string is small or appears in the stop word, it will not be filtered out query expansion search The search string is used to perform a natural language search, then the words from the most relevant rows returned by the search are added to the search string and the search is done again, the query will return the rows from the second search Related parameters Configure related parameters innodb_ft_min_token_size The default value is 3, which means that at least 3 characters can be used as a keyword. Increasing this value can reduce the size of the full-text index. innodb_ft_max_token_size The default value is 84, which means that a keyword can contain up to 84 characters. Limiting this value can reduce the size of the full-text index. ngram_token_size The default value is 2, which means that two characters are used as a keyword for the built-in word segmentation parser. For example, to create a full-text index for "abcd", the keywords are 'ab', 'bc', 'cd' Note that these three parameters cannot be modified dynamically. If you modify these parameters, you need to restart the MySQL service and re-establish the full-text index. Test the innodb engine using full-text index Prepare 1. Goal
2. Set the following parameters to reduce disk IO pressure SET GLOBAL sync_binlog=100; SET GLOBAL innodb_flush_log_at_trx_commit=2; 3. Import 1kw data to test full-text indexing The data comes from online search Extraction code: iyip 4. The structure of an article table CREATE TABLE `article` ( `id` bigint(10) NOT NULL, `url` varchar(1024) CHARACTER SET latin1 NOT NULL DEFAULT '', `title` varchar(256) NOT NULL DEFAULT '', `source` varchar(32) DEFAULT '' COMMENT 'True source', `keywords` varchar(32) DEFAULT NULL, `publish_time` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `title_idx` (`title`) ) ENGINE=InnoDB Use myloader to import test data in multiple threads First decompress the test data tar -zxf mydumper_dump_article.tar.gz time myloader -u $user -p $passwd -S $socket -t 32 -d /datas/dump_article -v 3 5. Total data volume and data file and index file size after importing data SELECT COUNT(*) FROM `article`; +----------+ | COUNT(*) | +----------+ | 10000000 | +----------+ 1 row in set (7.85 sec) SELECT table_name, CONCAT(FORMAT(SUM(data_length) / 1024 / 1024,2),'M') AS dbdata_size, CONCAT(FORMAT(SUM(index_length) / 1024 / 1024,2),'M') AS dbindex_size, CONCAT(FORMAT(SUM(data_length + index_length) / 1024 / 1024 / 1024,2),'G') AS `db_size(G)`, AVG_ROW_LENGTH,table_rows,update_time FROM information_schema.tables WHERE table_schema = DATABASE() and table_name='article'; +------------+-------------+--------------+------------+----------------+------------+---------------------+ | table_name | dbdata_size | dbindex_size | db_size(G) | AVG_ROW_LENGTH | table_rows | update_time | +------------+-------------+--------------+------------+----------------+------------+---------------------+ | article | 3,710.00M | 1,003.00M | 4.60G | 414 | 9388739 | 2019-07-05 15:31:37 | +------------+-------------+--------------+------------+----------------+------------+---------------------+ Create a full-text index using the default method 1. The table already has a keyword field (a brief description of the article content), and "," is used as the word separator select keywords from article limit 10; +-------------------------------------------------+ | keywords | +-------------------------------------------------+ | NULL | | NULL | | ,marriage,love| | Hairstyle, Side Parting, Makeup, Fashion | | Xiao A, | | , clothing matching, women, fashion | | beautiful,female| | Valentine's Day, Dongguan, Women | | Skin, skin care, skin care, food nutrition, beauty, health care | | Sanlitun, Beijing, fashion | +-------------------------------------------------+ 2. Search for a keyword without full-text indexing A full table scan is required select count(*) from article where keywords like '%Fashion%'; +----------+ | count(*) | +----------+ | 163 | +----------+ 1 row in set (7.56 sec) 3. Create a full-text index for the keyword field (with, as the word segment) Set innodb_ft_min_token_size in the my.cnf configuration file and restart the MySQL service (the minimum number of characters used as a keyword is two, and the default number of characters used as a keyword is three) [mysqld] innodb_ft_min_token_size=2 3.1 Setting custom stopwords (i.e. word segmentation) USE mysql; CREATE TABLE my_stopwords(VALUE VARCHAR(30)) ENGINE = INNODB; INSERT INTO my_stopwords(VALUE) VALUE (','); SET GLOBAL innodb_ft_server_stopword_table = 'mysql/my_stopwords'; SHOW GLOBAL VARIABLES WHERE Variable_name IN('innodb_ft_min_token_size','innodb_ft_server_stopword_table'); +---------------------------------+--------------------+ | Variable_name | Value | +---------------------------------+--------------------+ | innodb_ft_min_token_size | 2 | | innodb_ft_server_stopword_table | mysql/my_stopwords | +---------------------------------+--------------------+ 3.2 Create a full-text index alter table article add fulltext index idx_full_keyword(keywords); * [ ] Query OK, 0 rows affected, 1 warning (1 min 27.92 sec) * [ ] Records: 0 Duplicates: 0 Warnings: 1 3.3 The remaining disk space must be sufficient. The original table is 4.6G, and the remaining disk space is 5.7G. Adding a full-text index will also fail. df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 7.8G 6.3G 1.2G 85% / tmpfs 1.9G 0 1.9G 0% /dev/shm /dev/mapper/vg_opt-lvol0 19G 12G 5.7G 68% /datas A temporary file of the original table size will be created 8.6K Jul 5 16:19 #sql-5250_3533.frm 4.4G Jul 5 16:20 #sql-ib117-1768830977.ibd alter table article add fulltext index idx_full_keyword(keywords); ERROR 1114 (HY000): The table 'article' is full 3.4 Use the created full-text index to query the number of times a keyword appears The query response time has been greatly improved, only 0.05s; using where keywords like '%时尚%' takes 7.56s select count(*) from article where match(keywords) against('%时尚%'); +----------+ | count(*) | +----------+ | 163 | +----------+ 1 row in set (0.05 sec) 3.5 If you need to fully match multiple keywords at the same time, use Boolean full-text search Indicates the number of records that completely match "Sanlitun, Beijing" select count(*) from article where match(keywords) against('+Sanlitun, Beijing' in boolean mode); +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.06 sec) Indicates the number of records matching "Sanlitun" or "Beijing" select count(*) from article where match(keywords) against('Sanlitun,Beijing'); +----------+ | count(*) | +----------+ | 8 | +----------+ 1 row in set (0.06 sec) 3.6 After creating a full-text index, some other files will be created
Creating a full-text index using the ngram parser 1. Create a full-text index for the title field (this field does not have fixed stopwords, and uses the ngram parser) You need to set ngram_token_size in the my.cnf configuration file (the default is 2, 2 characters as the keywords of ngram), and restart the MySQL service. Here we use the default value of 2. select title from article limit 10; +------------------------------------------------------------------------------+ | title | +------------------------------------------------------------------------------+ | worth IT | |Launchpad Jiangnan Leather Factory small show | |Rare moment behind the scenes of Raw: "Mad Man" is carried backstage| |Raw: The son yelled at his father, "You are a green tea son." He fought four people alone.| |Four groups of 30 square meters of finely decorated small apartments, a large number of pictures, and floor plans | |Sexy smoky cat eye makeup by the nightclub queen| |Big Show slams The Rock Johnson | |Girls' Generation Choi Sooyoung's clothing knowledge Lin Yoona, Huang Miying, Kim Taeyeon, and Jung Soojung| |Outdoor outing in Deyang, barbecue in the flower fields| +------------------------------------------------------------------------------+ 2. Create a full-text index for the title field alter table article add fulltext index ft_index_title(title) with parser ngram; Query OK, 0 rows affected (3 min 29.22 sec) Records: 0 Duplicates: 0 Warnings: 0 3. An inverted index will be created (the longer the title field is, the larger the inverted index created)
4. Search for a keyword in the title without creating a full-text index select count(*) from article where title like '%Outdoor%'; +----------+ | count(*) | +----------+ | 22058 | +----------+ 1 row in set (8.60 sec) select count(*) from article where title like '%Background%'; +----------+ | count(*) | +----------+ | 1142 | +----------+ 5. Use full-text indexing to search for a keyword Response time has been greatly improved select count(*) from article where match(title) against('Outdoor'); +----------+ | count(*) | +----------+ | 22058 | +----------+ 1 row in set (0.07 sec) select count(*) from article where title like '%Background%'; +----------+ | count(*) | +----------+ | 1142 | +----------+ 1 row in set (8.31 sec) 6. Note that when the number of characters in the search keyword is greater than 2 (ngram_token_size defines the size), inconsistency will occur In normal search, the number of records that actually contain this keyword is 6 select count(*) from article where title like '%公子大%'; +----------+ | count(*) | +----------+ | 6 | +----------+ 1 row in set (8.40 sec) Full text search, the number of records with the keyword is 9443 select count(*) from article where match(title) against('公子大'); +----------+ | count(*) | +----------+ |9443| +----------+ 1 row in set (0.06 sec) The actual number of records that contain this keyword is 1 select count(*) from article where title like '%花田自服%'; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (8.33 sec) The number of records that contain this keyword in the full text search is 3202 select count(*) from article where match(title) against('花田自服'); +----------+ | count(*) | +----------+ | 3202 | +----------+ 1 row in set (0.06 sec) in conclusion
refer to InnoDB FULLTEXT Indexes Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. You may also be interested in:
|
<<: WeChat applet implementation anchor positioning function example
>>: Ubuntu Docker installation in vmware (container building)
By turning on the Recycle Bin function, you can r...
This article records the detailed tutorial for in...
This article example shares the specific code for...
In more and more websites, the use of XHTML is rep...
1. Check whether the existing nginx supports ipv6...
1. Download and decompress 1. Introduction to Zoo...
In order to handle a large number of concurrent v...
(1) Server configuration: [root@localhost ~]# cd ...
The implementation principle of Vue2.0/3.0 two-wa...
Linux grep command The Linux grep command is used...
1. Introduction Our real servers should not be di...
We often see ads appear after a few seconds and t...
1. Nginx installation steps 1.1 Official website ...
Table of contents Preface 1. Get the length of a ...
Table of contents 1. React combined with Antd to ...