4 ways to optimize MySQL queries for millions of data

4 ways to optimize MySQL queries for millions of data

1. The reason why the limit is getting slower as time goes by

When we use limit to paging data, we will find that when we view the first few pages, the speed is very fast, for example, limit 200,25, it comes out instantly. But the speed becomes slower and slower as time goes by, especially after one million records, it becomes extremely stuck. What is the principle behind this? Let's first look at what the query SQL looks like when we turn the page to the back:

select * from t_name where c_name1='xxx' order by c_name2 limit 2000000,25;

The slowness of this query is actually caused by the large offset after limit. For example, the limit 2000000,25 above is equivalent to the database scanning 2000025 pieces of data, then discarding the first 20000000 pieces of data, and returning the remaining 25 pieces of data to the user. This approach is obviously unreasonable.

2. Million Data Simulation

1. Create employee tables and department tables, and write stored procedures to insert data

/*Department table, delete if it exists*/
drop table if EXISTS dep;
create table dep(
    id int unsigned primary key auto_increment,
    depno mediumint unsigned not null default 0,
    depname varchar(20) not null default "",
    memo varchar(200) not null default ""
);

/*Employee table, delete if exists*/
drop table if EXISTS emp;
create table emp(
    id int unsigned primary key auto_increment,
    empno mediumint unsigned not null default 0,
    empname varchar(20) not null default "",
    job varchar(9) not null default "",
    mgr mediumint unsigned not null default 0,
    hiredate datetime not null,
    sal decimal(7,2) not null,
    comn decimal(7,2) not null,
    depno mediumint unsigned not null default 0
);
/* Function to generate random string */
DELIMITER $
drop FUNCTION if EXISTS rand_string;
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
    DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmlopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    DECLARE return_str VARCHAR(255) DEFAULT '';
    DECLARE i INT DEFAULT 0;
    WHILE i < n DO
    SET return_str = CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
    SET i = i+1;
    END WHILE;
    RETURN return_str;
END $
DELIMITER;


/*Function to generate random department number*/
DELIMITER $
drop FUNCTION if EXISTS rand_num;
CREATE FUNCTION rand_num() RETURNS INT(5)
BEGIN
    DECLARE i INT DEFAULT 0;
    SET i = FLOOR(100+RAND()*10);
    RETURN i;
END $
DELIMITER;
/*Create a stored procedure: insert data into the emp table*/
DELIMITER $
drop PROCEDURE if EXISTS insert_emp;
CREATE PROCEDURE insert_emp(IN START INT(10),IN max_num INT(10))
BEGIN
    DECLARE i INT DEFAULT 0;
    /*set autocommit =0 Set autocommit to 0 and turn off the default commit*/
    SET autocommit = 0;
    REPEAT
    SET i = i + 1;
    INSERT INTO emp(empno,empname,job,mgr,hiredate,sal,comn,depno) VALUES ((START+i),rand_string(6),'SALEMAN',0001,now(),2000,400,rand_num());
    UNTIL i = max_num
    END REPEAT;
    COMMIT;
END $
DELIMITER;

/*Create a stored procedure: insert data into the dep table*/
DELIMITER $
drop PROCEDURE if EXISTS insert_dept;
CREATE PROCEDURE insert_dept(IN START INT(10), IN max_num INT(10))
BEGIN
    DECLARE i INT DEFAULT 0;
    SET autocommit = 0;
    REPEAT
    SET i = i+1;
    INSERT INTO dep( depno,depname,memo) VALUES((START+i),rand_string(10),rand_string(8));
    UNTIL i = max_num
    END REPEAT;
    COMMIT;
END $
DELIMITER;

2. Execute the stored procedure

/*Insert 120 records*/
call insert_dept(1,120);
/*Insert 5 million pieces of data*/
call insert_emp(0,5000000);

Inserting 5 million records may be slow

3.4 Query Methods

1. Ordinary limit paging

/*Offset is 100, take 25*/
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno order by a.id desc limit 100,25;
/*Offset is 4800000, take 25*/
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno order by a.id desc limit 4800000,25;

Execution Results

[SQL]
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno order by a.id desc limit 100,25;
Affected rows: 0
Time: 0.001s
[SQL]
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno order by a.id desc limit 4800000,25;
Affected rows: 0
Time: 12.275s

The further you go, the slower the query efficiency.

2. Use index coverage + subquery optimization

Because we have the primary key id and have built an index on it, we can first find the id value of the starting position in the index tree, and then query the row data based on the found id value.

/*Subquery gets the id of the position offset by 100, and gets 25 after this position*/
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id >= (select id from emp order by id limit 100,1)
order by a.id limit 25;

/*Subquery gets the id of the position offset by 4800000, and gets 25 after this position*/
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id >= (select id from emp order by id limit 4800000,1)
order by a.id limit 25;

Execution Results

[SQL]
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id >= (select id from emp order by id limit 100,1)
order by a.id limit 25;
Affected rows: 0
Time: 0.106s

[SQL]
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id >= (select id from emp order by id limit 4800000,1)
order by a.id limit 25;
Affected rows: 0
Time: 1.541s

3. Redefine the starting position

Applicable to tables whose primary keys are auto-incremental

/*Remember that the id of the last data in the previous paging is 100, so we will skip 100 and scan the table from 101*/
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id > 100 order by a.id limit 25;

/*Remember that the id of the last data in the previous paging is 4800000, so we will skip 4800000 and scan the table from 4800001*/
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id > 4800000
order by a.id limit 25;
[SQL]
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id > 100 order by a.id limit 25;
Affected rows: 0
Time: 0.001s

[SQL]
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
from emp a left join dep b on a.depno = b.depno
where a.id > 4800000
order by a.id limit 25;
Affected rows: 0
Time: 0.000s

This is the most efficient. No matter how the pages are divided, the time consumed is basically the same, because after executing the conditions, only 25 pieces of data are scanned.

4. Downgrade strategy (Baidu’s approach)

This strategy is the simplest and most effective, because general big data queries will have search conditions, and no one will pay attention to the content after page 100. When the number of pages the user queries is too large, just return an error. For example, Baidu can only search page 76.

The above are the details of 4 query optimization methods for MySQL millions of data. For more information on MySQL query optimization for millions of data, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL automatically inserts millions of simulated data operation code
  • MySQL single table million data records paging performance optimization skills
  • How to quickly insert millions of test data in MySQL
  • Implementation of inserting millions of records into MySQL database within 10 seconds

<<:  CSS code to control the background color of the web page

>>:  Detailed explanation of JavaScript function introduction

Recommend

Detailed explanation of MySQL date addition and subtraction functions

1. addtime() Add the specified number of seconds ...

How to view the creation time of files in Linux

1. Introduction Whether the creation time of a fi...

How to implement distributed transactions in MySQL XA

Table of contents Preface XA Protocol How to impl...

Detailed explanation of linux crm deployment code

Linux basic configuration Compile and install pyt...

JavaScript canvas implements moving the ball following the mouse

This article example shares the specific code of ...

Summary of solutions for MySQL not supporting group by

I downloaded and installed the latest version of ...

Use of Linux cal command

1. Command Introduction The cal (calendar) comman...

Example code for hiding element scrollbars using CSS

How can I hide the scrollbars while still being a...

Solution to prevent caching in pages

Solution: Add the following code in <head>: ...

Detailed explanation of the principle and usage of MySQL stored procedures

This article uses examples to explain the princip...

Introduction to Linux common hard disk management commands

Table of contents 1. df command 2. du command 3. ...

CSS code abbreviation div+css layout code abbreviation specification

Using abbreviations can help reduce the size of yo...

Specific operations of MYSQL scheduled clearing of backup data

1|0 Background Due to project requirements, each ...