Some methods to optimize query speed when MySQL processes massive data

Some methods to optimize query speed when MySQL processes massive data

In the actual projects I participated in, I found that when the amount of data in the MySQL table reaches millions, the efficiency of ordinary SQL queries drops sharply, and if there are many query conditions in the where clause, the query speed is simply intolerable. I once tested a conditional query on a table containing more than 4 million records (with indexes). The query time was as high as 40 seconds. I believe that any user would go crazy with such a high query delay. Therefore, how to improve the query efficiency of SQL statements is very important. The following are 30 SQL query optimization methods that are widely circulated on the Internet:

1. Try to avoid using the != or <> operator in the where clause, otherwise the engine will abandon the use of the index and perform a full table scan.

2. To optimize the query, try to avoid full table scans. First, consider creating indexes on the columns involved in where and order by.

3. Avoid using null value judgment on fields in the where clause, otherwise the engine will abandon the use of indexes and perform a full table scan, such as:
select id from t where num is null
You can set a default value of 0 on num to ensure that there is no null value in the num column in the table, and then query it like this:
select id from t where num=0

4. Try to avoid using or to connect conditions in the where clause, otherwise the engine will abandon the use of indexes and perform a full table scan, such as:
select id from t where num=10 or num=20
You can query like this:
select id from t where num=10
union all
select id from t where num=20

5. The following query will also result in a full table scan: (no leading percent sign)
select id from t where name like '�c%'
To improve efficiency, you can consider full-text retrieval.

6. Use in and not in with caution, otherwise it will lead to a full table scan, such as:
select id from t where num in(1,2,3)
For continuous values, use between instead of in:
select id from t where num between 1 and 3

7. If parameters are used in the where clause, a full table scan will also be caused. Because SQL resolves local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; it must make the choice at compile time. However, if the access plan is built at compile time, the value of the variable is still unknown and cannot be used as an input for index selection. The following statement will perform a full table scan:
select id from t where num=@num
You can force the query to use the index instead:
select id from t with(index(index name)) where num=@num

8. Try to avoid expression operations on fields in the where clause, as this will cause the engine to abandon the use of indexes and perform a full table scan. like:
select id from t where num/2=100
Should be changed to:
select id from t where num=100*2

9. Try to avoid performing function operations on fields in the where clause, as this will cause the engine to abandon the use of indexes and perform a full table scan. like:
select id from t where substring(name,1,3)='abc' – id whose name starts with abc
select id from t where datediff(day,createdate,'2005-11-30′)=0–'2005-11-30′ generated id
Should be changed to:
select id from t where name like 'abc%'
select id from t where createdate>='2005-11-30′ and createdate<'2005-12-1′

10. Do not perform functions, arithmetic operations, or other expression operations on the left side of the "=" in the where clause, otherwise the system may not be able to use the index correctly.

11. When using an index field as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the system uses the index. Otherwise, the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries, such as those that require the generation of an empty table structure:
select col1,col2 into #t from t where 1=0
This type of code will not return any result set, but will consume system resources. It should be changed to this:
create table #t(…)

13. In many cases, using exists instead of in is a good choice:
select num from a where num in(select num from b)
Replace it with the following:
select num from a where exists(select 1 from b where num=a.num)

14. Not all indexes are effective for queries. SQL optimizes queries based on the data in the table. When there is a large amount of repeated data in the index column, the SQL query may not use the index. For example, if a table has a sex field with almost half male and half female, then even if an index is built on sex, it will not affect the query efficiency.

15. The more indexes there are, the better. Although indexes can improve the efficiency of corresponding selects, they also reduce the efficiency of inserts and updates, because the index may be rebuilt during inserts or updates. Therefore, how to build indexes requires careful consideration, depending on the specific situation. The number of indexes for a table should not exceed 6. If there are too many, you should consider whether indexes on columns that are not frequently used are necessary.

16. Avoid updating clustered index data columns as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the column value changes, the order of the entire table records will be adjusted, which will consume considerable resources. If the application system needs to frequently update clustered index data columns, you need to consider whether the index should be built as a clustered index.

17. Try to use numeric fields. If the field contains only numerical information, try not to design it as character type, as this will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in the string one by one when processing queries and connections, but for numeric types, only one comparison is enough.

18. Use varchar/nvarchar instead of char/nchar whenever possible. First, variable-length fields take up less storage space, which can save storage space. Second, for queries, searching in a relatively small field is obviously more efficient.

19. Do not use select * from t anywhere. Replace "*" with a specific field list and do not return any unused fields.

20. Try to use table variables instead of temporary tables. If the table variable contains a lot of data, be aware that the indexes are very limited (only the primary key index).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable. Using them appropriately can make certain routines more efficient, for example, when you need to repeatedly reference a data set in a large table or a commonly used table. However, for one-time events, it is better to use an export table.

23. When creating a new temporary table, if the amount of data to be inserted at one time is large, select into can be used instead of create table to avoid creating a large amount of logs and increase the speed; if the amount of data is not large, in order to ease the resources of the system table, create table first and then insert.

24. If temporary tables are used, be sure to explicitly delete all temporary tables at the end of the stored procedure, first truncate table, then drop table, this can avoid locking the system table for a long time.

25. Try to avoid using cursors because of their poor efficiency. If the data operated by the cursor exceeds 10,000 rows, you should consider rewriting it.

26. Before using cursor-based methods or temporary table methods, you should first look for set-based solutions to solve the problem. Set-based methods are usually more effective.

27. Like temporary tables, cursors are not unusable. Using a FAST_FORWARD cursor with small data sets is often superior to other row-by-row processing methods, especially when several tables must be referenced to obtain the required data. Routines that include "aggregates" in the result set will generally execute faster than using cursors. If development time permits, try both the cursor-based approach and the set-based approach to see which one works better.

28. Set SET NOCOUNT ON at the beginning of all stored procedures and triggers, and set SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement in stored procedures and triggers is executed.

29. Try to avoid returning large amounts of data to the client. If the amount of data is too large, consider whether the corresponding demand is reasonable.

30. Try to avoid large transaction operations and improve the system's concurrency capabilities.

You may also be interested in:
  • A practical record of checking and processing duplicate MySQL records on site
  • MySQL's method of dealing with duplicate data (preventing and deleting)
  • MySQL study notes on handling duplicate data
  • How to handle concurrent updates of MySQL data
  • Detailed explanation of MySQL execution principle, logical layering, and changing database processing engine
  • MySQL data processing sorting and explaining the operations of adding, deleting and modifying

<<:  Complete steps to use element in vue3.0

>>:  Detailed explanation of selinux basic configuration tutorial in Linux

Recommend

What does input type mean and how to limit input

Common methods for limiting input 1. To cancel the...

10 Popular Windows Apps That Are Also Available on Linux

According to data analysis company Net Market Sha...

Summary of the top ten problems of MySQL index failure

Table of contents background 1. The query conditi...

jQuery plugin to implement minesweeper game (2)

This article shares the second article of using j...

Modify the jvm encoding problem when Tomcat is running

question: Recently, garbled data appeared when de...

Tutorial on using $attrs and $listeners in Vue

Table of contents introduce Example Summarize int...

Detailed example of using the distinct method in MySQL

A distinct Meaning: distinct is used to query the...

Windows DNS server exposed "worm-level" vulnerability, has existed for 17 years

Vulnerability Introduction The SigRed vulnerabili...

Linux system command notes

This article describes the linux system commands....

Detailed steps to install python3.7 on CentOS6.5

1. Download Python 3 wget https://www.python.org/...

JavaScript closure details

Table of contents 1. What is a closure? 2. The ro...

Tips for creating two-dimensional arrays in JavaScript

Creation of a two-dimensional array in Js: First ...