MySQL big data query optimization experience sharing (recommended)

Serious MySQL optimization!

If the amount of MySQL data is small, optimization is unnecessary. If the amount of data is large, optimization is indispensable. If a query is not optimized, it will take 10 seconds. If it is properly optimized, the same query will take 10 milliseconds.

What a painful realization this is!

MySQL optimization, in programmer's terms, means: index optimization and where condition optimization.

Experimental environment: MacBook Pro MJLQ2CH/A, MySQL 5.7, data volume: 2.12 million+

ONE:

 select * from article
 INNER JOIN (
 SELECT id
 FROM article
 WHERE
  length(content_url) > 0 and
  (select status from source where id = article.source_id)=1 and
  (select status from category where id = article.category_id)=1 and
  status = 1 and id < 2164931
 order by stick desc,pub_time desc
 limit 240,15
 ) AS t
USING(id);

At first glance, the boss will definitely want to kill me. Why do I need to do a self-association or an inner join? People on the XX floor, bring me my butcher knife, I want to kill the blogger! ! !

To be honest, my head wasn't pressed against the door when I went out in the morning, and I didn't want that to happen.

1. When the amount of data is large, you need to do paging query with a large offset, and this really speeds up the query. The reason is ---> Use the id in the join subtable to cover the entire table to avoid full table scan.

Look at my order by (whisper: it’s just an order by, who can’t write it), replace this order by with the field desc or explain in your own table and see what happens. Extra ---> filesort ! shit !

2. For this kind of order by with multiple conditions, we usually directly add indexes to the two fields respectively, but we still use Extra ---> filesort. Take another approach and add a joint index to all the conditions after order by. Note that the order must be consistent with your order by order. This way, the only thing left for Extra is where.

Let’s take a look at where，(select status from source where id = article.source_id)=1 and ... what a weird way of writing this!

3. I thought about using the join+index method, but after testing it, it turned out to be almost the same as this method. The production environment is written like this, so let's just leave it as it is. We can also save two indexes (source_id, category_id). No one can stop us from being lazy. If we suffer losses in the future, we can come back and continue to optimize.

4. I just got this point last night. The order of satisfying the where conditions is to satisfy the last condition first, from right to left. After deleting the index for testing, it is indeed effective, reducing the time from 6 seconds to 4 seconds. After optimizing the index, I tested again and found that the effect of the order on the time consumption is almost negligible, 0.X milliseconds.

TWO:

 select * from article
 INNER JOIN (
 SELECT id FROM article WHERE INSTR(ifnull(title,''),'战狼') > 0 and status != 9
 order by pub_time desc
 limit 100,10

 ) AS t USING(id);

Hmm——another inner join......

INSTR(ifnull(title,''),'Wolf Warrior') > 0, why not use like......

1. Considering that this is a search on the management platform, the search engine was not searched. The search engine only synchronizes data once an hour, so the data is incomplete. When searching, managers only care about the results they want. Like %XX% cannot use the index, and the efficiency is 5 times lower than instr. I also tested regexp '.*XX*.', which still takes a little longer than instr. So...

desc or explain, filesort.....add an index to pub_time and see if it works, or filesort...

2. There is another solution for this situation, SELECT id FROM article force index(pub_time) to specify the use of this index. But this way of writing lacks flexibility, OUT! I searched on Baidu and someone gave me some advice: create a joint index of status and pub_time (pub_time_status, with the order condition in front), and force this index automatically when performing a where query.

THREE:

select * from article where status != 9 order by pub_time desc limit 100000,25;
desc or explain, or filesort..... Didn't we create a joint index for status and pub_time before? Tell me why...

Well, I don’t know either. Create another joint index status_pub_time for status and pub_time. This time the where condition is in front, and there is no filesort in explain, but this index is not used. It hooks out pub_time_status . I don't understand

At the same time, I explained the SQL of TWO, which is as follows:

Deleting any one of these two will not work. If you delete one, sql will filesort!

FOUR:

SELECT * from follow
 where (((SELECT status FROM source WHERE id=follow.source_id)=1 and follow.type=1) or ((select status from topic WHERE id=follow.source_id)=1 and follow.type=2)) AND user_id=10054
 ORDER BY sort limit 15,15;
 SELECT * from follow inner join(
 SELECT id from follow
 where (((SELECT status FROM source WHERE id=follow.source_id)=1 and follow.type=1) or ((select status from topic WHERE id=follow.source_id)=1 and follow.type=2)) AND user_id=10054
 ORDER BY sort limit 15,15
 ) as t using(id);
 (SELECT id, source_id, user_id, temporary, sort, follow_time, read_time,type from follow where (SELECT status FROM source WHERE id=follow.source_id)=1 and follow.type=1 and user_id=10054)
 union all
 (SELECT id, source_id, user_id, temporary, sort, follow_time, read_time,type from follow where (select status from topic WHERE id=follow.source_id)=1 and follow.type=2 and user_id=10054)
 ORDER BY sort limit 15,15;

Look at these three SQL statements, interesting, aren’t they!

To be fair, I have optimized the index, user_id_sort(user_id,sort), so that where uses user_id to force this index.

First sentence: 0.48ms

Second sentence: 0.42ms

The third sentence: 6ms. The reason why it takes so long is that after the union (querying the table twice and merging it into a sub-table), the index cannot be used to cover the sort of the order by.

Sometimes union is not necessarily faster than or.

Summarize

The above is the MySQL big data query optimization experience shared by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!

You may also be interested in:

MySQL and PHP basics and applications: data query
A brief discussion on whether too many MySQL data queries will cause OOM
Detailed explanation of MySQL database tens of millions of data query and storage
Optimizing the slow query of MySQL aggregate statistics data
MySQL json format data query operation
MySQL and PHP basics and applications: data query statements

<<: Detailed explanation of Nginx log customization and enabling log buffer

>>: How to configure SSL for koa2 service

Do not start CSS pseudo-class names with numbers

MySQL big data query optimization experience sharing (recommended)

Do not start CSS pseudo-class names with numbers

Mysql implements master-slave configuration and multi-master-multi-slave configuration

Detailed tutorial on deploying Django project under CentOS

RHCE installs Apache and accesses IP with a browser

Things to note when migrating MySQL to 8.0 (summary)

Detailed explanation of the idea of implementing dynamic columns in angularjs loop object properties

Solution to occasional crash of positioning background service on Linux

Sample code for making a drop-down menu using pure CSS

Sample code for deploying Spring-boot project with Docker

Detailed explanation of browser negotiation cache process based on nginx

Recommend

How to install Docker on Windows Server 2016

How to display TIF format images in browser

What does href=# mean in a link?

Detailed explanation of 10 common HTTP status codes

Two ways to manage volumes in Docker

Web page creation basic declaration document type description (DTD

MySQL 5.7 generated column usage example analysis

Ideas and codes for implementing Vuex data persistence

A brief discussion on the correct approach to MySQL table space recovery

How to build Git service based on http protocol on VMware+centOS 8

Eight common SQL usage examples in MySQL

Tutorial on installing DAMO database on Centos7

Docker MQTT installation and use tutorial

Detailed explanation of how to use Vue+element to implement the tag at the top of the page

Web data storage: Cookie, UserData, SessionStorage, WebSqlDatabase