Optimized record of using IN data volume in Mysql

Optimized record of using IN data volume in Mysql

The MySQL version number is 5.7.28. Table A has 3.9 million records and uses the InnoDB engine. The varchar type field mac has been indexed using B-tree. Table B has only 5000+ records.

There is a SQL instruction written like this:

SELECT * FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted)

It took 294.428s to get the results through the query. Yes, nearly 5 minutes.

Use EXPLAIN to analyze:

The access type is range, the index has been hit, and there are only 587776 rows. Why does the query take so long?

The indexing method of mac uses B-tree. Let's compare it with HASH and summarize it briefly: B-tree index can be used for =, >, >=, <, <= and between calculations, while HASH can only perform equality operations and cannot perform range searches. Then IN is an equivalent operation, and both indexing methods are applicable. In this case, change the index method of mac to HASH, and the same query takes time.

Since adjusting the index method cannot significantly improve the query performance of the statement, it can only be processed from the statement itself. In fact, anyone with a discerning eye can see at first glance that SELECT * is very performance-intensive. So we only query the fields required by the business and adjust the statement to:

SELECT id,mileage FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted)

There is no significant increase in time consumption.

Since the IN method is so difficult to optimize, can we give up using LEFT JOIN? The statement is adjusted to:

SELECT a.id,a.mileage FROM A a LEFT JOIN B b ON b.mac = a.mac WHERE b.create_time >= '2020-01-01'

If it takes more than 5 minutes, give up.

We know that when the number of conditions is small, there is no obvious difference between the effects of EXISTS and IN. But when there are many conditions, IN is more efficient than EXISTS. Let's try EXISTS:

SELECT id,mileage FROM A a WHERE EXISTS(SELECT mac FROM B WHERE create_time >= '2020-01-01' AND mac = a.mac)

It also takes more than 5 minutes. IN is indeed more efficient than EXISTS, so I gave up.

So the final conclusion is that if IN is followed by a String with large amounts of data, you should be cautious.

In the project, I use mac as a unique identifier to create a corresponding table with id. In table A, mac_id is used instead of mac, and IN(1,2,3...) is used when querying. The efficiency will be improved. Currently, using NoSQL is also a way.

Summarize

This is the end of this article about optimizing MySQL using IN for large amounts of data. For more information about optimizing MySQL using IN for large amounts of data, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MYSQL IN and EXISTS optimization examples
  • Optimization of not in and minus in MySQL
  • How to use not in to optimize MySql
  • Detailed Analysis of or, in, union and Index Optimization in MySQL
  • Implementation of MySQL select in subquery optimization

<<:  Web project development VUE mixing and inheritance principle

>>:  How to use libudev in Linux to get USB device VID and PID

Recommend

JavaScript to achieve mouse tailing effect

Mouse effects require the use of setTimeout to ge...

MySQL cursor principle and usage example analysis

This article uses examples to explain the princip...

...

Chrome plugin (extension) development guide (complete demo)

Table of contents Written in front Preface What i...

Solutions to the problem of table nesting and border merging

【question】 When the outer table and the inner tab...

How to use a field in one table to update a field in another table in MySQL

1. Modify 1 column update student s, city c set s...

Implementing timed page refresh or redirect based on meta

Use meta to implement timed refresh or jump of th...

Detailed installation tutorial of mysql 5.7 under CentOS 6 and 7

You always need data for development. As a server...

Sharing some details about MySQL indexes

A few days ago, a colleague asked me a question a...

Detailed explanation of the platform bus of Linux driver

Table of contents 1. Introduction to platform bus...

Basic usage of UNION and UNION ALL in MySQL

In the database, both UNION and UNION ALL keyword...

Tomcat server security settings method

Tomcat is an HTTP server that is the official ref...

Native JS to implement real-time clock

Share a real-time clock effect implemented with n...