Optimized record of using IN data volume in Mysql

The MySQL version number is 5.7.28. Table A has 3.9 million records and uses the InnoDB engine. The varchar type field mac has been indexed using B-tree. Table B has only 5000+ records.

There is a SQL instruction written like this:

SELECT * FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted)

It took 294.428s to get the results through the query. Yes, nearly 5 minutes.

Use EXPLAIN to analyze:

The access type is range, the index has been hit, and there are only 587776 rows. Why does the query take so long?

The indexing method of mac uses B-tree. Let's compare it with HASH and summarize it briefly: B-tree index can be used for =, >, >=, <, <= and between calculations, while HASH can only perform equality operations and cannot perform range searches. Then IN is an equivalent operation, and both indexing methods are applicable. In this case, change the index method of mac to HASH, and the same query takes time.

Since adjusting the index method cannot significantly improve the query performance of the statement, it can only be processed from the statement itself. In fact, anyone with a discerning eye can see at first glance that SELECT * is very performance-intensive. So we only query the fields required by the business and adjust the statement to:

SELECT id,mileage FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted)

There is no significant increase in time consumption.

Since the IN method is so difficult to optimize, can we give up using LEFT JOIN? The statement is adjusted to:

SELECT a.id,a.mileage FROM A a LEFT JOIN B b ON b.mac = a.mac WHERE b.create_time >= '2020-01-01'

If it takes more than 5 minutes, give up.

We know that when the number of conditions is small, there is no obvious difference between the effects of EXISTS and IN. But when there are many conditions, IN is more efficient than EXISTS. Let's try EXISTS:

SELECT id,mileage FROM A a WHERE EXISTS(SELECT mac FROM B WHERE create_time >= '2020-01-01' AND mac = a.mac)

It also takes more than 5 minutes. IN is indeed more efficient than EXISTS, so I gave up.

So the final conclusion is that if IN is followed by a String with large amounts of data, you should be cautious.

In the project, I use mac as a unique identifier to create a corresponding table with id. In table A, mac_id is used instead of mac, and IN(1,2,3...) is used when querying. The efficiency will be improved. Currently, using NoSQL is also a way.

Summarize

This is the end of this article about optimizing MySQL using IN for large amounts of data. For more information about optimizing MySQL using IN for large amounts of data, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in: