The MySQL version number is 5.7.28. Table A has 3.9 million records and uses the InnoDB engine. The varchar type field mac has been indexed using B-tree. Table B has only 5000+ records. There is a SQL instruction written like this: SELECT * FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted) It took 294.428s to get the results through the query. Yes, nearly 5 minutes. Use EXPLAIN to analyze: The access type is range, the index has been hit, and there are only 587776 rows. Why does the query take so long? The indexing method of mac uses B-tree. Let's compare it with HASH and summarize it briefly: B-tree index can be used for =, >, >=, <, <= and between calculations, while HASH can only perform equality operations and cannot perform range searches. Then IN is an equivalent operation, and both indexing methods are applicable. In this case, change the index method of mac to HASH, and the same query takes time. Since adjusting the index method cannot significantly improve the query performance of the statement, it can only be processed from the statement itself. In fact, anyone with a discerning eye can see at first glance that SELECT * is very performance-intensive. So we only query the fields required by the business and adjust the statement to: SELECT id,mileage FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted) There is no significant increase in time consumption. Since the IN method is so difficult to optimize, can we give up using LEFT JOIN? The statement is adjusted to: SELECT a.id,a.mileage FROM A a LEFT JOIN B b ON b.mac = a.mac WHERE b.create_time >= '2020-01-01' If it takes more than 5 minutes, give up. We know that when the number of conditions is small, there is no obvious difference between the effects of EXISTS and IN. But when there are many conditions, IN is more efficient than EXISTS. Let's try EXISTS: SELECT id,mileage FROM A a WHERE EXISTS(SELECT mac FROM B WHERE create_time >= '2020-01-01' AND mac = a.mac) It also takes more than 5 minutes. IN is indeed more efficient than EXISTS, so I gave up. So the final conclusion is that if IN is followed by a String with large amounts of data, you should be cautious. In the project, I use mac as a unique identifier to create a corresponding table with id. In table A, mac_id is used instead of mac, and IN(1,2,3...) is used when querying. The efficiency will be improved. Currently, using NoSQL is also a way. Summarize This is the end of this article about optimizing MySQL using IN for large amounts of data. For more information about optimizing MySQL using IN for large amounts of data, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Web project development VUE mixing and inheritance principle
>>: How to use libudev in Linux to get USB device VID and PID
Mouse effects require the use of setTimeout to ge...
This article uses examples to explain the princip...
Table of contents Written in front Preface What i...
【question】 When the outer table and the inner tab...
1. Modify 1 column update student s, city c set s...
Use meta to implement timed refresh or jump of th...
1. Storage Engine In the last section, we mention...
You always need data for development. As a server...
A few days ago, a colleague asked me a question a...
Table of contents 1. Introduction to platform bus...
In the database, both UNION and UNION ALL keyword...
Tomcat is an HTTP server that is the official ref...
This article shares the use of js and jQuery tech...
Share a real-time clock effect implemented with n...