MySQL fuzzy query usage (regular, wildcard, built-in function)

MySQL fuzzy query usage (regular, wildcard, built-in function)
SELECT * from table where username like '%陈哈哈%' and hobby like '%牛逼'

This is a fuzzy query method that we often use in MySQL, which uses the wildcard % for matching. In fact, this is just the tip of the iceberg. In MySQL, there are many methods that support fuzzy matching, and each has its own advantages. Well, today, let me take you to lift up the skirt of MySQL and see how many little-known good things are hidden under fuzzy query.

1. MySQL wildcard fuzzy query (%,_)

1-1. Classification of wildcards

  • "%" Percent wildcard: represents any number of occurrences of any character (can be 0 times).
  • "_" underscore wildcard: indicates that only a single character can be matched, no more and no less, just one character. Of course, you can also like "陈____", no limit on the number.
  • LIKE operator: LIKE is used to instruct MySQL to use wildcards instead of direct equality matching for the subsequent search mode. However, if there is no wildcard after LIKE, LIKE will be executed as "=" by default during SQL execution optimization.

Note: If the like operator is not followed by a universal matcher (% or _), the effect is the same as "=". When SQL is optimized, the query optimizer will default to "=" for like. SELECT * FROM movies WHERE movie_name like '唐伯虎'; can only match the result of movie_name="唐伯虎", but not the result of "唐伯虎点秋香" or "唐伯虎点烟".

1-2. Use of wildcards

1) % Wildcard:

-- Fuzzy matching of data containing the word "网"

SELECT * from app_info where appName like '%网%';

-- Fuzzy matching of data ending with "网"

SELECT * from app_info where appName like '%网'; 

-- Fuzzy matching of data starting with "网"

SELECT * from app_info where appName like '网%'; 

-- Exact match, appName like '网' is equivalent to: appName = '网'

SELECT * from app_info where appName = '网';
-- Equivalent to SELECT * from app_info where appName like '网';

-- Fuzzy matching of data containing "xxx net xxx car xxx", such as "TuTu online car-hailing driver terminal, online car-hailing platform"

SELECT * from app_info where appName like '%网%车%';

2) _ Wildcard:

-- Query data ending with "网" and with a length of three characters, such as "联家网".

SELECT * from app_info where appName like '__网'; 

Note: '%__网、__%网' is equivalent to '%网'

-- The first three characters of the query are XX.com, and the following characters can be matched arbitrarily, such as "Chengtong network disk, mold network platform"

SELECT * from app_info where appName like '__网%'; 

-- Fuzzy matching of data containing "xx网x车xxx", such as: "Ctrip Online Car-hailing Client"

SELECT * from app_info where appName like '__网_车%'; 

Note:

Pay attention to case. When using fuzzy matching, that is, matching text, the default configuration of MySQL is not case-sensitive. When you use someone else's MySQL database, pay attention to whether it is case-sensitive. Whether it is case-sensitive depends on how the user configures MySQL. If it is case-sensitive, then a record like Test12 cannot be matched by a matching condition like "test__".
Note the trailing space, "%test" cannot match records like "test".
Note that the NULL and % wildcard characters can match any character, but cannot match NULL. That is to say, SELECT * FROM blog where title_name like '%'; will not match records where title_name is NULL.

1-3. Tips and suggestions:

As you can see, MySQL wildcards are quite useful. This functionality comes at a price, however: wildcard searches generally take longer to process than the other searches discussed previously, and consume more memory and other resources. Here are some tips to remember when using wildcards.

  • Don't overuse wildcards. If other operators can achieve the same purpose, you should use other operators.
  • When you do need to use wildcards, don't use them at the beginning of a search pattern unless absolutely necessary. Because MySQL executes from left to right after where, if the wildcard is placed at the beginning of the search pattern (the leftmost side), the search will be the slowest (because the entire database needs to be scanned).
  • Note carefully the placement of the wildcard characters. If placed in the wrong place, it may not return the data you want.

Careful friends will find that if there are symbols such as "%" and "_" in the data, wouldn't it conflict with the wildcard?

SELECT * from app_info where appName LIKE '%%%';
SELECT * from app_info where appName LIKE '%_%';

Indeed, the above two SQL statements query the entire table data, rather than the specified data with "%" and "_". The ESCAPE keyword is needed here to escape.

As shown below, ESCAPE is followed by a character with something in it. MySQL treats that symbol as an escape character. I usually write it as "/". Then, just like the escape characters in C language, such as '\n' and '\t', write this character before the % sign you need to escape.

SELECT * from app_info where appName LIKE '%/_%' ESCAPE '/'; 

But is there a higher-end solution to this situation? The kind that can make colleagues or leaders who check your code look at you with new eyes~~

Of course, let's take a look at the second type of MySQL fuzzy matching method - built-in function query

2. MySQL built-in function retrieval (locate, position, instr)

Continuing from the previous article, matching is performed through the built-in functions locate, position, and instr, which is equivalent to the str.contains() method in Java. It returns the position of the matching content in the string, which is better than wildcard matching in terms of efficiency and usability.

SELECT * from app_info where INSTR(`appName`, '%') > 0;
SELECT * from app_info where LOCATE('%', `appName`) > 0;
SELECT * from app_info where POSITION( '%' IN `appName`) > 0;

As shown above, the default value for the three built-in functions is: > 0, so the following > 0 can be added or not, and adding it makes it more readable.

OK, let’s take a look at how to use these three built-in functions.

First of all, let's make it clear that the subscripts in MySQL start from 1 from left to right, unlike Java where the leftmost subscript is 0. Therefore, in MySQL, when the subscript is 0, it means it does not exist.

2-1. LOCATE() function

Syntax: LOCATE(substr,str)

Returns the position of the first occurrence of substr in str. If substr does not exist in str, the return value is 0. If substr exists in str, the return value is: the position where substr first appears in str.

Note: LOCATE(substr, str) and POSITION(substr IN str) are synonyms and have the same functionality.

Syntax: LOCATE(substr, str, [pos])

The position of the first occurrence of substring substr in string str starting at position pos. If substr is not in str, returns 0. If substr or str is NULL, returns NULL.

SELECT locate('a', 'banana'); -- 2
SELECT locate('a', 'banana', 3); -- 4
SELECT locate('z', 'banana'); -- 0
SELECT locate(10, 'banana'); -- 0
SELECT locate(NULL , 'banana'); -- null
SELECT locate('a' , NULL ); -- null

Examples:

-- Use the LOCATE keyword for fuzzy matching, equivalent to: "like '%网%'"

SELECT * from app_info where LOCATE('网', `appName`) > 0;

-- Use the LOCATE keyword for fuzzy matching, and match "网" from the second character, so data such as "NetEase Cloud Games, Wanglai Merchants" will be filtered out

SELECT * from app_info where LOCATE('网', `appName`, 2) > 0;

2-2. POSITION() method

Syntax: POSITION(substr IN substr)

This method can be understood as an alias for the locate(substr, str) method, because it has the same function as the locate(substr, str) method.

Examples:

-- Use the POSITION keyword for fuzzy matching, equivalent to: "like '%网%'"

SELECT * from app_info where POSITION('网' IN `appName`);

2-3. INSTR() method

Syntax: INSTR(str,substr)

Returns the position of the first occurrence of substring substr within string str. The two-argument form of INSTR() is identical to that of LOCATE(), except that the order of the arguments is reversed.

Examples:

-- Use the INSTR keyword for fuzzy matching, which has the same function as LIKE, equivalent to: "like '%网%'"

SELECT * from app_info where INSTR(`appName`, '网');

-- The instr function is generally used to retrieve the position of a character in a string, which is equivalent to: "like '%网%'"

SELECT * from app_info where INSTR(`appName`, '网') > 0;

3. MySQL regular matching query based on regexp and rlike

The regexp and rlike keywords in MySQL are synonyms and have the same functions. This article is based on regexp.

REGEXP does not support wildcards "%, _", but supports regular matching rules . It is a more detailed and elegant matching method. Let's take a look.

-- Here are the parameter types contained in regexp

Parameter Type effect
(^) Matches the starting position of a string. For example, "^a" means a string that starts with the letter a.
($) Matches the end position of a string. For example, "X^" means a string ending with the letter X.
(.) This character is the dot in English, which matches any character, including carriage return, line feed, etc.
(*) The asterisk matches 0 or more characters, and there must be something before it. For example: select * from table where name regexp 'ba*' (can match "baaa")
(+)

The plus sign matches 1 or more characters, and must also be preceded by something. The plus sign is used similarly to the asterisk, except that the asterisk is allowed to appear 0 times, while the plus sign must appear at least once.

(?) The question mark matches 0 or 1 time.
{n} Matches the specified n
{n,} Matches no less than n
{n,m} Match nm

-- REGEXP '网' is equivalent to like '%网%'

SELECT * from app_info where appName REGEXP '网';
-- Equivalent to SELECT * from app_info where appName like '%网%';

3-1. OR in regexp: |

Function: You can search for one of multiple strings, equivalent to or

-- Supports "|" 'or' symbols, matching data containing "China" or "Internet" or "University", and supports overlapping multiple

SELECT * from app_info where appName REGEXP 'China|Internet|University'; 

-- To match data that hits "中国" and "网" at the same time, you can use ".+" to connect them, which means China xxxx network. Any number of characters are allowed in the middle, and the order cannot be reversed.

SELECT * from app_info where appName REGEXP '中国.+网'; 

3-2. Regular expression matching in REGEXP: []

Function: Match one of the characters in the [] symbol, support parsing regular expressions

-- Matches data containing English characters, case-insensitive by default

SELECT * from app_info where appName REGEXP '[az]'; 

-- Just like like, just negate the set and add "not REGEXP", I won't go into details below

SELECT * from app_info where appName not REGEXP '[az]'; 

--Matches data containing uppercase English characters. By default, case is ignored and the "BINARY" keyword needs to be added . For example, where appName REGEXP BINARY 'Hello'

-- Regarding case sensitivity: Regular expression matching in MySQL (since version 3.23.4) is not case sensitive.

SELECT * from app_info where appName REGEXP BINARY '[AZ]'; 

-- Matches data containing numbers

SELECT * from app_info where appName REGEXP '[0-9]';

-- Matches data containing numbers or English.

SELECT * from app_info where appName REGEXP '[a-z0-9]';

az, 0-9 are all considered as one unit, do not add extra symbols . A few days ago, I found a special case, a very interesting bug, share it with him

-- I added an extra "|" symbol when writing the query statement, thinking it was "or", and didn't pay attention to it, but I never expected that the number of results would be different.

SELECT * from app_info where appName REGEXP '[567]'; -- 87 results SELECT * from app_info where appName REGEXP '[5|6|7]'; -- 88 results 

I'm confused. Let's see which one is wrong.

-- It turns out that the "|" symbol also participates in the matching and is recognized as a unit. Coincidentally, there is a data: "Wireless Mixer" which matches exactly. What the hell is this DJ?

SELECT * from app_info where appName REGEXP '[5|6|7]' and pid not in (SELECT pid from app_info where appName REGEXP '[567]'); 

-- Query data starting with one of 5, 6, or 7

SELECT * from app_info where appName REGEXP '^[5|6|7]'; 

-- Query data ending with one of 5, 6, or 7

SELECT * from app_info where appName REGEXP '[5|6|7]$'; 

Tips: In MySQL, UTF-8 Chinese characters = 3 bytes; GBK Chinese characters = 2 bytes

-- Query data with appName byte length of 10 and any content

SELECT * from app_info where appName REGEXP '^.{10}$'; 

-- Query data where the length of appName bytes is 10 and all are in English

SELECT * from app_info where appName REGEXP '^[az]{10}$' ; 

-- Query the data whose appName byte length is 10 and all are in uppercase English, and add BINARY

SELECT * from app_info where appName REGEXP BINARY '^[AZ]{10}$'; 

-- Query data whose version_name byte length is 6 and all of them are numbers or "."

SELECT * from app_info where version_name REGEXP '^[0-9.]{6}$'; 

-- Query data whose version_name byte length is 6 and all of them are numbers or "."; the first digit is required to be 1

SELECT * from app_info where version_name REGEXP '^1[0-9.]{5}$' ; 

-- Query data whose version_name byte length is 6 and all of them are numbers or "."; the first digit is 1 and the last digit is 7

SELECT * from app_info where version_name REGEXP '^1[0-9.]{4}7$' ; 

-- Query data whose version_name byte length is more than 6 characters and all of them are numbers or "."; the first character must be 1 and the last character must be 7

SELECT * from app_info where version_name REGEXP '^1[0-9.]{4,}7$' ; 

-- Query the data whose version_name byte length is 6 to 8 characters and all of them are numbers or "."; the first digit must be 1 and the last digit must be 7

SELECT * from app_info where version_name REGEXP '^1[0-9.]{4,6}7$' ; 

-- The first character is not Chinese

SELECT * from app_info where appName REGEXP '^[ -~]'; 

-- The first character is Chinese

SELECT * from app_info where appName REGEXP '^[^ -~]'; 

-- Query data that does not contain Chinese characters

SELECT * from app_info where appName REGEXP '^([az]|[0-9]|[AZ])+$'; 

-- Data starting with 5 or F and containing English

SELECT * from app_info where appName REGEXP BINARY '^[5F][a-zA-Z].'; 

To match special symbols, such as ., you need to add \\ (note that there are two slashes), but if it is in [], you don’t need to add it:

-- Match name containing . select * from app_info where appName regexp '\\.';
-- Match name containing . select * from app_info where appName regexp '[.]'; 

3-3. Character class matching (posix)

There are some special symbols in MySQL that can represent different types of matches:

-- Matches names containing numbers

select * from app_info where appName regexp '[[:digit:]]';

Other character classes of this type include:

Character Classes effect
[:alnum:] Matches literal and numeric characters. (Equivalent to [A~Za~z0~9])
[:alpha:] Matches an alphabetic character. (Equivalent to [A~Za~z])
[:blank:] Matches space or tab (same as [\\\t])
[:cntrl:] Matches a control character (ASCII 0 to 37 and 127)
[:digit:] Matches a decimal digit. (Equivalent to [0-9])
[:graph:] Matches characters in the ASCII code range 33 to 126. Similar to [:print:], but excludes space characters.
[:print:] Any printable character
[:lower:] Matches lowercase letters, equivalent to [az]
[:upper:] Matches uppercase letters, equivalent to [AZ]
[:space:] Matches a whitespace character (same as [\\f\\n\\r\\t\\v])
[:xdigit:] Matches a hexadecimal digit. Equivalent to [0-9A-Fa-f]

This type of character class requires an additional layer of [] around the main character class.

3-4. [:<:] and [:>:]

There are two special ones in the character class above. These two are about position. [:<:] matches the beginning of the word, and [:>:] matches the end of the word. They are different from ^ and $.

The latter matches the beginning and end of the entire thing, while the former matches the beginning and end of a word.

-- Only matches the string that starts with a, such as abcd

select * from app_info where appName regexp '^a';

-- It can match the whole word starting with a, and it can also match the word in the middle starting with a, such as: dance after.

select * from app_info where appName regexp '[[:<:]]a'; 

[[:<:]] and [[:>:]] match an empty string at the beginning and end of a word respectively. The beginning and end of this word are not characters contained in alnum and cannot be an underscore.

select "a word a" REGEXP "[[:<:]]word[[:>:]]"; -- 1 (match) 
 
select "a xword a" REGEXP "[[:<:]]word[[:>:]]"; -- 0 (meaning no match) 
 
select "weeknights" REGEXP "^(wee|week)(knights|nights)$"; -- 1 (match)

IV. Conclusion

Well, this article ends here. Those who can see this are destined to be here. I hope this article can help you further understand MySQL. For more relevant MySQL fuzzy query content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Self-study of MySql built-in functions knowledge points summary
  • Detailed explanation of the use of the built-in function locate instr position find_in_set in MySQL efficient fuzzy search
  • PHP built-in function to generate random numbers example
  • PHP example of converting strings to integers without using built-in functions
  • Detailed explanation of how to use built-in functions in PHP to generate images
  • Several practical guides for using PHP built-in functions
  • PHP obtains memory usage through the built-in function memory_get_usage()
  • MySQL and PHP Basics and Applications: Built-in Functions

<<:  Solutions to common problems using Elasticsearch

>>:  HTML table tag tutorial (7): background color attribute BGCOLOR

Recommend

Core skills that web front-end development engineers need to master

The content involved in Web front-end development...

Introduction to installing and configuring JDK under CentOS system

Table of contents Preface Check and uninstall Ope...

Detailed explanation of samba folder sharing server configuration under centos

1. Introduction Recently I found that there are m...

How to monitor mysql using zabbix

Zabbix deployment documentation After zabbix is ​...

SQL implementation of LeetCode (197. Rising temperature)

[LeetCode] 197.Rising Temperature Given a Weather...

Solve the problem of not finding NULL from set operation to mysql not like

An interesting discovery: There is a table with a...

When should a website place ads?

I recently discussed "advertising" with...

JS code to achieve page switching effect

This article example shares the specific code of ...

How to ensure transaction characteristics of MySQL InnoDB?

Preface If someone asks you "What are the ch...

Detailed example of MySQL subquery

Subquery Classification Classification by returne...

Why developers must understand database locks in detail

1.Lock? 1.1 What is a lock? The real meaning of a...

How familiar are you with pure HTML tags?

The following HTML tags basically include all exis...