MySQL optimization: how to write high-quality SQL statements

MySQL optimization: how to write high-quality SQL statements

Preface

There are a lot of information and methods on the Internet about database optimization, but the quality of many of them is uneven, some of the summaries are not thorough and the content is redundant. This article will give you a detailed introduction to 26 optimization suggestions. Let’s take a look at them.

1. When querying SQL, try not to use full search select *, but select + specific fields.

Counterexample:

select * from student;

Positive example:

select id,name,age from student;

reason:

  • Only the required fields are taken to save resources and reduce CPU, IO and network overhead.
  • When querying with select *, the covering index cannot be used, which will result in a table return query.
  • Using specific fields can reduce the impact of changes in table structure.

2. Use prepared statements for database operations

reason:

  • Precompiled statements can reuse plans and reduce the time required for SQL compilation
  • Can solve the SQL injection problem caused by dynamic SQL
  • Passing only parameters is more efficient than passing SQL statements
  • The same statement can be parsed once and used multiple times to improve processing efficiency

3. Do not use insert statements without a field list

Counterexample:

insert into values ​​('a', 'b', 'c');

Positive example:

insert into t(a, b, c) values ​​('a','b','c');

reason:

  • Insert statements without field names make it difficult to distinguish which fields they correspond to, and only full values ​​can be inserted, which makes readability poor.
  • Once the table structure changes, it is difficult to modify.

4. Try to avoid using or to connect conditions in the where clause

Example: Create a new user table with a common index userId. The table structure is as follows:

CREATE TABLE `user` (  
`id` int(11) NOT NULL AUTO_INCREMENT,  
`user_id` int(11) NOT NULL,  
`age` int(11) NOT NULL,  
`name` varchar(30) NOT NULL,  
PRIMARY KEY (`id`),  
KEY `idx_userId` (`userId`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

Query users whose userid is 1 or whose age is 18 years old

Counterexample:

select id, user_id, age, name from user where userid=1 or age =18

Positive example:

# Using union all 
select id, user_id, age, name from user where userid=1 union all select * from user where age = 18
# Or write two separate SQL statements: select id, user_id, age, name from user where userid=1; select * from user where age = 18

reason:

  • Using or may invalidate the index, resulting in a full table scan.
  • For or + fields without indexes, such as age above, suppose it uses the userId index, but when it comes to the age query condition, it still has to scan the entire table, which requires three steps: full table scan + index scan + merge. If it starts with a full table scan, it can be done in one scan.
  • MySQL has an optimizer. For efficiency and cost considerations, the index may become invalid when encountering an OR condition, which seems reasonable.

5. When using where conditions to query, limit the data to be queried to avoid returning redundant rows and avoid implicit conversion of data types.

Assuming id is of type int, query the data with id = 1

Counterexample:

select id, name from student where id = '1';

Positive example:

select id, name from student where id = 1;

reason:

  • Just query the data you need, avoid returning unnecessary data, and save costs.
  • Implicit conversion will cause index invalidation

6. Perform expression operations or function conversions on fields in the where clause, which will cause the system to abandon the use of indexes and perform a full table scan

Assume that the age field of the user table is indexed and query its data

Counterexample:

select name, age from user where age - 1 = 20;

Positive example:

select name, age from user where age = 21;

reason:

  • An index is added to age, but because of the operation query on it, the index is not effective, which greatly reduces efficiency.

7. Try to avoid using != or <> operators in the where clause, otherwise the engine will give up using the index and perform a full table scan.

(Applicable in MySQL)

Counterexample:

select age,name from user where age <> 18;

Positive example:

# You can consider writing two separate SQL statements: select age,name from user where age < 18;
select age,name from user where age > 18;

reason:

  • Using != and <> is likely to invalidate the index

8. For query optimization, you should consider creating indexes on the columns involved in where and order by to avoid full table scans as much as possible.

Counterexample:

select name, age, address from user where address ='深圳' order by age;

Positive example: Add an index and then query

alter table user add index idx_address_age (address,age)

9. Consider using default values ​​instead of null in where clauses

Counterexample: (This will check all data)

select user_id, name, age from user where age is not null;

Positive example:

# Set the table field age to 0 as the default value instead of null
select user_id, name, age from user where age > 0;
1
2

reason:

  • It does not mean that the index will not be used if is null or is not null is used. This is related to the MySQL version and query cost.
  • If the MySQL optimizer finds that the cost of using the index is higher than not using the index, it will definitely abandon the index. These conditions !=, > isnull, is not null often make the index invalid. In fact, it is because in general, the query cost is high and the optimizer automatically abandons the index.
  • If the null value is replaced with the default value, it is often possible to use the index, and the meaning will be relatively clear.

10. If the query result has only one record or only one record is needed (possibly the maximum/minimum value), it is recommended to use limit 1

Suppose there is a student table, and we want to find a person named Tom.

CREATE TABLE `student` (  
`id` int(11) NOT NULL,  
`name` varchar(50) DEFAULT NULL,  
`age` int(11) DEFAULT NULL,  
`date` datetime DEFAULT NULL,  
`sex` int(1) DEFAULT NULL,  
PRIMARY KEY (`id`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Counterexample:

select id,name from student where name='Tom '

Positive Example

select id,name from employee where name='Tom ' limit 1;

reason:

After adding limit 1 paging, once a corresponding record is found, the scan will not continue, and the efficiency will be greatly improved.
If name is the only index, there is no need to add limit 1, because the existence of limit is mainly to prevent full table scan and thus improve performance. If a statement itself can predict that it does not need a full table scan, there is little difference in performance with or without limit.

11. Optimize limit paging statements

When we do paging in daily life, we usually use limit to achieve it, but when the offset is particularly large, the query efficiency becomes low.

Counterexample:

select id, name, age from student limit 10000,10

Positive example:

# Solution 1: Return the maximum record (offset) of the last query
select id, name from student where id > 10000 limit 10;
# Solution 2: order by + index select id, name from student order by id limit 10000,10;
# Option 3: Limit the number of pages if the business allows:

reason:

  • When the offset is large, the query efficiency will be lower, because Mysql does not skip the offset and directly fetch the subsequent data, but first adds the offset + the number of entries to be fetched, and then discards the data in the previous offset section before returning.
  • If you use optimization solution 1, the last maximum query record (offset) is returned, so that the offset can be skipped, which greatly improves efficiency.
  • Solution 2 uses order by + index, which can also improve query efficiency.
  • For option 3, it is recommended to discuss with the business whether it is necessary to check the paging later. Because most users won't scroll too many pages back.

12. Try to avoid returning too much data to the client and use limit paging

Assume that the business requirement is that the user requests to view the movie data that he has watched in the past year.

Counterexample:

# Query all data at once select * from LivingInfo 
where watchId = useId 
and watchTime >= Date_sub(now(),Interval 1 Y)

Positive example:

# Pagination query select * from LivingInfo 
where watchId = useId 
and watchTime>= Date_sub(now(),Interval 1 Y) 
limit offset, pageSize

# If it is front-end paging, you can query the first 200 records first, because the average user should not scroll down too many pages select * from LivingInfo 
where watchId = useId 
and watchTime>= Date_sub(now(),Interval 1 Y) 
limit 200 ;

13. Optimize like statements

When using fuzzy keyword search using LIKE, LIKE is likely to make the index invalid.

Counterexample:

SELECT * FROM student
WHERE name LIKE '%strive_day';
-- Or use % to wrap SELECT * FROM student
WHERE name LIKE '%strive_day%';

Positive example:

SELECT * FROM student
WHERE name LIKE 'strive_day%';

reason:

  • Putting % in front will not use index search.
  • Putting % after a keyword will search using the index.
  • Wrapping keywords with % will not result in index search.
  • Without leading %, only trailing % will result in index query

14. Try to avoid using MySQL built-in functions on index columns

Example: Query users who have logged in in the last seven days (assuming the loginTime field is indexed)

Counterexample:

SELECT * FROM system_user user
WHERE Date_ADD(user.loginTime,Interval 7 DAY) >= now();

Positive example:

SELECT * FROM system_user user
WHERE user.loginTime >= Date_ADD(NOW(),INTERVAL - 7 DAY);

reason:

  • If you use MySQL built-in functions on the index column, the index will become invalid
  • If the index column does not have a built-in function, the index query will be performed

15. When using a joint index, pay attention to the order of the index columns, generally follow the leftmost matching principle

Suppose there is a joint index (user_id, age), with user_id in the front and age in the back.

Counterexample:

select user_id, name, age from user where age = 10;

Positive example:

# In accordance with the leftmost matching principle select user_id, name, age from user where userid = 1 and age = 21;
# In accordance with the leftmost matching principle select user_id, name, age from user where userid = 1;

reason:

  • When we create a joint index, such as (k1, k2, k3), it is equivalent to creating three indexes: (k1), (k1, k2) and (k1, k2, k3). This is the leftmost matching principle.
  • The joint index does not meet the leftmost principle, and the index will generally become invalid, but this is also related to the MySQL optimizer.

16. Use covering indexes when appropriate.

Covering indexes can make your SQL statements not need to return to the table. You can get all the required data by simply accessing the index, which greatly improves query efficiency.

Counterexample:

# Like fuzzy query, no index selection user_id, name, age from user where user_id like '%123%'
# If id is the primary key, then it is a normal index, that is, a covering index.
select user_id, name, age from user where userid like '%123%';

17. Remove redundant and duplicate indexes

Counterexample:

  KEY `idx_userId` (`userId`)
  KEY `idx_userId_age` (`userId`,`age`)

Positive example:

  KEY `idx_userId_age` (`userId`,`age`)
# Delete the index of userId (KEY `idx_userId_age` (`userId`,`age`))
# Because the combined index (A, B) is equivalent to creating both the (A) and (A, B) indexes.

reason:

  • Duplicate indexes need to be maintained, and the optimizer needs to consider them one by one when optimizing queries, which will affect performance.

18. Inner join, left join, right join, Inner join is preferred. If it is left join, the result of the left table should be as small as possible.

Inner join: When two tables are joined for query, only the result sets that fully match the two tables are retained.

When left join performs a join query on two tables, all rows in the left table will be returned, even if there are no matching records in the right table.

When right join performs a join query on two tables, all rows in the right table will be returned, even if there are no matching records in the left table.

Under the premise that SQL requirements are met, Inner join is preferred. If left join is used, the data results of the left table should be as small as possible, and if conditions permit, they should be placed on the left for processing.

Counterexample:

select name, age from tab1 t1 left join tab2 t2 on t1.age = t2.age where t1.id = 2;

Positive example:

select name, age from (select * from tab1 where id = 2) t1 left join tab2 t2 on t1.age = t2.age;

reason:

  • If the inner join is an equal join, the number of rows returned may be relatively small, so the performance will be relatively better.
  • When using a left join, the data result of the left table is as small as possible, and the conditions are processed on the left as much as possible, which means that the number of rows returned may be relatively small.

19. If you insert too much data, consider batch inserting

Counterexample:

for(User u :list)
{ INSERT into user(name,age) values(name, age)}

Positive example:

//Insert 500 values ​​at a time, insert into user(name,age) values ​​in batches
<foreach collection="list" item="item" index="index" separator=",">
 (#{item.name},#{item.age})
</foreach>

reason:

  • Batch insertion has good performance and reduces time loss.

20. Use the distinct keyword as little as possible

The distinct keyword is generally used to filter duplicate records to return non-duplicate records. When used when querying one field or a few fields, it brings optimization effects to the query. However, when there are many fields, it will greatly reduce the query efficiency.

Counterexample:

# Remove duplicate fields SELECT DISTINCT * from user;

Positive example:

select DISTINCT name from user;

reason:

  • The CPU time and occupied time of the statement with distinct are higher than those without distinct.
  • Because when querying many fields, if distinct is used, the database engine will compare the data and filter out duplicate data. However, this comparison and filtering process will take up system resources and CPU time.

21. Do not have more than 5 table joins

reason:

  • The more linked tables there are, the greater the compilation time and overhead.
  • The readability of the linked table is poor. Splitting the linked table into several smaller executions will improve readability.

22. How to optimize update statements when the amount of data is large.

When the amount of data is large, it is necessary to avoid modifying or deleting too much data at the same time, which will cause excessive CPU utilization and affect other people's access to the database.

Counterexample:

# Delete 100,000 or 1 million+ records at a time delete from user where id < 1000000;
# Or use a single loop operation, which is inefficient and time-consuming for (User user: list) {delete from user;}

Positive example:

# Delete in batches, such as 500 each time   
delete user where id < 500
delete user where id >= 500 and id < 1000;
...
delete user where id >= 999500 and id < 1000000;

reason:

  • If you delete/update too much data at one time, there may be a lock wait timeout exceed error, so it is recommended to operate in batches.

23. Use exist and in appropriately

Assume that table A represents the employee table of a company, and table B represents the department table. The SQL statement to query all employees in all departments is:

Counterexample:

select * from A where deptId in (select deptId from B);

This is equivalent to writing:

First query department table B
select deptId from B
Then query A's employees by department deptId
select * from A where A.deptId = B.deptId

It can be abstracted into a loop statement like this:

List<> resultSet ;    
for(int i = 0; i < B.length; i ++) {
 for(int j = 0; j < A.length; j ++) {
     if(A[i].id == B[j].id) {
         resultSet.add(A[i]);
            break;          
        }       
     }    
 }

We can also use exists to achieve the same query function

select * from A where exists (select 1 from B where A.deptId = B.deptId);

The above code is equivalent to:

select * from A, loop from table A first
select * from B where A.deptId = B.deptId, and then loop through table B.

Because the understanding of the exists query is to execute the main query first, obtain the data, and then put it into the subquery for condition verification. Based on the verification result (true or false), it is determined whether the data result of the main query can be retained.

Similarly, it can be abstracted into such a cycle:

List<> resultSet;    
for(int i = 0; i < A.length; i ++) {
 for(int j = 0; j < B.length; j ++) {
     if(A[i].deptId == B[j].deptId) {
         resultSet.add(A[i]);
            break;          
            }       
        }    
    }

reason:

  • The most difficult part of the database is linking and releasing it with the program. Suppose the link is used twice, and each time a query is made for a dataset of millions of times, and then it is done only twice. On the contrary, if a separate query is made each time, millions of links are established, and the link release application is repeated repeatedly.
  • The MySQL optimization principle is that small tables drive large tables, and small data sets drive large data sets, thereby achieving better performance.
  • We need to choose the one with the smallest outer loop. That is, if the amount of data in B is smaller than that in A, it is suitable to use in. If the amount of data in B is larger than that in A, it is suitable to choose exist.

24. Use numeric fields as much as possible. If the field only contains numerical information, try not to design it as character type.

Counterexample:

`king_id` varchar(20) NOT NULL COMMENT '123'

Positive example:

 `king_id` int(11) NOT NULL COMMENT '123'

reason:

  • Compared with numeric fields, character fields will reduce query and connection performance and increase storage overhead.

25. Try to replace union with union all

If there are no duplicate records in the search results, it is recommended to replace union with union all.

Counterexample:

select * from user where userid = 1
union
select * from user where age = 20

Positive example:

select * from user where userid = 1
union all
select * from user where age = 20

reason:

  • If you use union, regardless of whether the search results are repeated or not, an attempt will be made to merge them, and then sort them before outputting the final result.
  • If you know that there are no duplicate records in the search results, use union all instead of union to improve efficiency.

26. If the field type is a string, be sure to enclose it in quotation marks when using where, otherwise the index will be invalid.

Counterexample:

select * from user where userid = 1;

Positive example:

select * from user where userid ='1';

reason:

The first statement will not use the index if it is not enclosed in single quotes. This is because without single quotes, the string is compared with the number, and their types do not match. MySQL will perform implicit type conversion and convert them to floating-point numbers for comparison.

Summarize

This is the end of this article about MySQL optimization and how to write high-quality SQL statements. For more relevant MySQL optimization SQL statement content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief discussion of 30 common methods for optimizing SQL query in MySQL
  • Mysql query the most recent record of the sql statement (optimization)
  • 10 SQL statement optimization techniques to improve MYSQL query efficiency
  • 10 tips for optimizing MySQL SQL statements
  • MySQL SQL statement analysis and query optimization detailed explanation
  • Analyze the sql statement efficiency optimization issues of Mysql table reading, writing, indexing and other operations
  • Tips for optimizing MySQL SQL statements
  • 19 common and effective methods for MySQL optimization (recommended!)

<<:  9 ways to show and hide CSS elements

>>:  Several methods of deploying multiple front-end projects with nginx

Recommend

Vue implements sending emoticons in chat box

The specific code for sending emoticons in the vu...

How to display JSON data in HTML

background: Sometimes we need to display json dat...

MySQL database implements OLTP benchmark test based on sysbench

Sysbench is an excellent benchmark tool that can ...

Vue improves page response speed through lazy loading

Table of contents Overview What is lazy loading? ...

Introduction to the process of building your own FTP and SFTP servers

FTP and SFTP are widely used as file transfer pro...

How to use css variables in JS

How to use css variables in JS Use the :export ke...

MySQL 8.0.18 deployment and installation tutorial under Windows 7

1. Preliminary preparation (windows7+mysql-8.0.18...

Detailed tutorial on docker-compose deployment and configuration of Jenkins

Docker-compose deployment configuration jenkins 1...

Implementation of Nginx hot deployment

Table of contents Semaphore Nginx hot deployment ...

W3C Tutorial (16): Other W3C Activities

This section provides an overview of some other i...

Detailed explanation of selinux basic configuration tutorial in Linux

selinux ( Security-Enhanced Linux) is a Linux ker...

Several methods of deploying multiple front-end projects with nginx

I have summarized 3 methods to deploy multiple fr...

Analysis of pitfalls in rounding operation of ROUND function in MySQL

This article uses examples to illustrate the pitf...

Vue implements small search function

This article example shares the specific code of ...