Optimized record of using IN data volume in Mysql

Optimized record of using IN data volume in Mysql

The MySQL version number is 5.7.28. Table A has 3.9 million records and uses the InnoDB engine. The varchar type field mac has been indexed using B-tree. Table B has only 5000+ records.

There is a SQL instruction written like this:

SELECT * FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted)

It took 294.428s to get the results through the query. Yes, nearly 5 minutes.

Use EXPLAIN to analyze:

The access type is range, the index has been hit, and there are only 587776 rows. Why does the query take so long?

The indexing method of mac uses B-tree. Let's compare it with HASH and summarize it briefly: B-tree index can be used for =, >, >=, <, <= and between calculations, while HASH can only perform equality operations and cannot perform range searches. Then IN is an equivalent operation, and both indexing methods are applicable. In this case, change the index method of mac to HASH, and the same query takes time.

Since adjusting the index method cannot significantly improve the query performance of the statement, it can only be processed from the statement itself. In fact, anyone with a discerning eye can see at first glance that SELECT * is very performance-intensive. So we only query the fields required by the business and adjust the statement to:

SELECT id,mileage FROM A WHERE mac IN("aa:aa:aa:aa:aa:aa","bb:bb:bb:bb:bb:bb:b",...more than 900 entries omitted)

There is no significant increase in time consumption.

Since the IN method is so difficult to optimize, can we give up using LEFT JOIN? The statement is adjusted to:

SELECT a.id,a.mileage FROM A a LEFT JOIN B b ON b.mac = a.mac WHERE b.create_time >= '2020-01-01'

If it takes more than 5 minutes, give up.

We know that when the number of conditions is small, there is no obvious difference between the effects of EXISTS and IN. But when there are many conditions, IN is more efficient than EXISTS. Let's try EXISTS:

SELECT id,mileage FROM A a WHERE EXISTS(SELECT mac FROM B WHERE create_time >= '2020-01-01' AND mac = a.mac)

It also takes more than 5 minutes. IN is indeed more efficient than EXISTS, so I gave up.

So the final conclusion is that if IN is followed by a String with large amounts of data, you should be cautious.

In the project, I use mac as a unique identifier to create a corresponding table with id. In table A, mac_id is used instead of mac, and IN(1,2,3...) is used when querying. The efficiency will be improved. Currently, using NoSQL is also a way.

Summarize

This is the end of this article about optimizing MySQL using IN for large amounts of data. For more information about optimizing MySQL using IN for large amounts of data, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MYSQL IN and EXISTS optimization examples
  • Optimization of not in and minus in MySQL
  • How to use not in to optimize MySql
  • Detailed Analysis of or, in, union and Index Optimization in MySQL
  • Implementation of MySQL select in subquery optimization

<<:  Web project development VUE mixing and inheritance principle

>>:  How to use libudev in Linux to get USB device VID and PID

Recommend

Detailed example of using useState in react

useState useState adds some internal state to a c...

CentOS 7 installation and configuration method graphic tutorial

This article records the detailed installation tu...

JavaScript code to implement Weibo batch unfollow function

A cool JavaScript code to unfollow Weibo users in...

Detailed explanation of encoding issues during MySQL command line operations

1. Check the MySQL database encoding mysql -u use...

Drawing fireworks effect of 2021 based on JS with source code download

This work uses the knowledge of front-end develop...

Detailed explanation of MySql data type tutorial examples

Table of contents 1. Brief Overview 2. Detailed e...

How to install openjdk in docker and run the jar package

Download image docker pull openjdk Creating a Dat...

Vue.js implements simple timer function

This article example shares the specific code of ...

Manually implement the two-way data binding principle of Vue2.0

In one sentence: Data hijacking (Object.definePro...

Solve the group by query problem after upgrading Mysql to 5.7

Find the problem After upgrading MySQL to MySQL 5...

JS uses clip-path to implement dynamic area clipping function

background Today, I was browsing CodePen and saw ...

The 6 Most Effective Ways to Write HTML and CSS

This article shares the 6 most effective methods,...