MySQL big data query optimization experience sharing (recommended)

MySQL big data query optimization experience sharing (recommended)

Serious MySQL optimization!

If the amount of MySQL data is small, optimization is unnecessary. If the amount of data is large, optimization is indispensable. If a query is not optimized, it will take 10 seconds. If it is properly optimized, the same query will take 10 milliseconds.

What a painful realization this is!

MySQL optimization, in programmer's terms, means: index optimization and where condition optimization.

Experimental environment: MacBook Pro MJLQ2CH/A, MySQL 5.7, data volume: 2.12 million+

ONE:

 select * from article
 INNER JOIN (
 SELECT id
 FROM article
 WHERE
  length(content_url) > 0 and
  (select status from source where id = article.source_id)=1 and
  (select status from category where id = article.category_id)=1 and
  status = 1 and id < 2164931
 order by stick desc,pub_time desc
 limit 240,15
 ) AS t
USING(id);

At first glance, the boss will definitely want to kill me. Why do I need to do a self-association or an inner join? People on the XX floor, bring me my butcher knife, I want to kill the blogger! ! !

To be honest, my head wasn't pressed against the door when I went out in the morning, and I didn't want that to happen.

1. When the amount of data is large, you need to do paging query with a large offset, and this really speeds up the query. The reason is ---> Use the id in the join subtable to cover the entire table to avoid full table scan.

Look at my order by (whisper: it’s just an order by, who can’t write it), replace this order by with the field desc or explain in your own table and see what happens. Extra ---> filesort ! shit !

2. For this kind of order by with multiple conditions, we usually directly add indexes to the two fields respectively, but we still use Extra ---> filesort. Take another approach and add a joint index to all the conditions after order by. Note that the order must be consistent with your order by order. This way, the only thing left for Extra is where.

Let’s take a look at where,(select status from source where id = article.source_id)=1 and ... what a weird way of writing this!

3. I thought about using the join+index method, but after testing it, it turned out to be almost the same as this method. The production environment is written like this, so let's just leave it as it is. We can also save two indexes (source_id, category_id). No one can stop us from being lazy. If we suffer losses in the future, we can come back and continue to optimize.

4. I just got this point last night. The order of satisfying the where conditions is to satisfy the last condition first, from right to left. After deleting the index for testing, it is indeed effective, reducing the time from 6 seconds to 4 seconds. After optimizing the index, I tested again and found that the effect of the order on the time consumption is almost negligible, 0.X milliseconds.

TWO:

 select * from article
 INNER JOIN (
 SELECT id FROM article WHERE INSTR(ifnull(title,''),'战狼') > 0 and status != 9
 order by pub_time desc
 limit 100,10

 ) AS t USING(id);

Hmm——another inner join......

INSTR(ifnull(title,''),'Wolf Warrior') > 0, why not use like......

1. Considering that this is a search on the management platform, the search engine was not searched. The search engine only synchronizes data once an hour, so the data is incomplete. When searching, managers only care about the results they want. Like %XX% cannot use the index, and the efficiency is 5 times lower than instr. I also tested regexp '.*XX*.', which still takes a little longer than instr. So...

desc or explain, filesort.....add an index to pub_time and see if it works, or filesort...

2. There is another solution for this situation, SELECT id FROM article force index(pub_time) to specify the use of this index. But this way of writing lacks flexibility, OUT! I searched on Baidu and someone gave me some advice: create a joint index of status and pub_time (pub_time_status, with the order condition in front), and force this index automatically when performing a where query.

THREE:

select * from article where status != 9 order by pub_time desc limit 100000,25;
desc or explain, or filesort..... Didn't we create a joint index for status and pub_time before? Tell me why...

Well, I don’t know either. Create another joint index status_pub_time for status and pub_time. This time the where condition is in front, and there is no filesort in explain, but this index is not used. It hooks out pub_time_status . I don't understand

At the same time, I explained the SQL of TWO, which is as follows:

Deleting any one of these two will not work. If you delete one, sql will filesort!

FOUR:

SELECT * from follow
 where (((SELECT status FROM source WHERE id=follow.source_id)=1 and follow.type=1) or ((select status from topic WHERE id=follow.source_id)=1 and follow.type=2)) AND user_id=10054
 ORDER BY sort limit 15,15;
 SELECT * from follow inner join(
 SELECT id from follow
 where (((SELECT status FROM source WHERE id=follow.source_id)=1 and follow.type=1) or ((select status from topic WHERE id=follow.source_id)=1 and follow.type=2)) AND user_id=10054
 ORDER BY sort limit 15,15
 ) as t using(id);
 (SELECT id, source_id, user_id, temporary, sort, follow_time, read_time,type from follow where (SELECT status FROM source WHERE id=follow.source_id)=1 and follow.type=1 and user_id=10054)
 union all
 (SELECT id, source_id, user_id, temporary, sort, follow_time, read_time,type from follow where (select status from topic WHERE id=follow.source_id)=1 and follow.type=2 and user_id=10054)
 ORDER BY sort limit 15,15;

Look at these three SQL statements, interesting, aren’t they!

To be fair, I have optimized the index, user_id_sort(user_id,sort), so that where uses user_id to force this index.

First sentence: 0.48ms

Second sentence: 0.42ms

The third sentence: 6ms. The reason why it takes so long is that after the union (querying the table twice and merging it into a sub-table), the index cannot be used to cover the sort of the order by.

Sometimes union is not necessarily faster than or.

Summarize

The above is the MySQL big data query optimization experience shared by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!

You may also be interested in:
  • MySQL and PHP basics and applications: data query
  • A brief discussion on whether too many MySQL data queries will cause OOM
  • Detailed explanation of MySQL database tens of millions of data query and storage
  • Optimizing the slow query of MySQL aggregate statistics data
  • MySQL json format data query operation
  • MySQL and PHP basics and applications: data query statements

<<:  Detailed explanation of Nginx log customization and enabling log buffer

>>:  How to configure SSL for koa2 service

Recommend

How to install Docker on Windows Server 2016

Recently Microsoft released Windows Server 2016, ...

How to display TIF format images in browser

The browser displays TIF format images Copy code T...

What does href=# mean in a link?

Links to the current page. ------------------- Com...

Detailed explanation of 10 common HTTP status codes

The HTTP status code is a 3-digit code used to in...

Two ways to manage volumes in Docker

In the previous article, I introduced the basic k...

Web page creation basic declaration document type description (DTD

Using CSS layout to create web pages that comply w...

MySQL 5.7 generated column usage example analysis

This article uses examples to illustrate the usag...

Ideas and codes for implementing Vuex data persistence

What is vuex vuex: is a state manager developed s...

A brief discussion on the correct approach to MySQL table space recovery

Table of contents Preliminary Notes Problem Repro...

How to build Git service based on http protocol on VMware+centOS 8

Table of contents 1. Cause 2. Equipment Informati...

Eight common SQL usage examples in MySQL

Preface MySQL continued to maintain its strong gr...

Tutorial on installing DAMO database on Centos7

1. Preparation After installing the Linux operati...

Docker MQTT installation and use tutorial

Introduction to MQTT MQTT (Message Queuing Teleme...

Web data storage: Cookie, UserData, SessionStorage, WebSqlDatabase

Cookie It is a standard way to save the state of ...