Understanding the MySQL query optimization process

Understanding the MySQL query optimization process

MySQL query optimization requires three steps: parsing, preprocessing, and optimization. During any of these processes, errors may occur. This article will not discuss error handling in depth, but will help you understand how MySQL executes queries so that you can write better queries.

Parsers and preprocessors

Initially, MySQL's parser breaks the query into a series of instructions and builds a "parse tree" from them. The parser uses MySQL's SQL syntax to translate and validate query statements. For example, the parser ensures that the instructions in the query are valid and in the correct order, and checks for errors such as unmatched quotes in strings.

The preprocessor checks the constructed parse tree for semantic information that the parser cannot handle. For example, the existence of tables and columns is checked, and field names and aliases are processed to ensure that column references are unambiguous. Next, the preprocessor checks permissions, which is usually pretty fast (unless your server has a bunch of permissions configured).

Query Optimizer

After passing through the parser and preprocessor, the parse tree is determined to be valid and can be processed by the optimizer and ultimately transformed into a query plan. There are often many ways to execute a query that produces the same result, and the optimizer's job is to find the best option.

MySQL uses an optimizer based on cost estimation, which means that it tries to predict the cost of multiple execution plans and chooses the one with the lowest cost. The original unit cost was a random 4KB data page read, but now it has become more complex and includes the cost of performing WHERE comparison conditions. You can view the query optimizer's estimate of the cost of a query statement by displaying the Last_query_cost session variable.

SELECT SQL_NO_CACHE COUNT(*) FROM sakila.film_actor;
SHOW STATUS LIKE 'Last_query_cost';

The Last_query_cost displayed means that the optimizer estimates that it will need to perform the corresponding number of random data page accesses to complete the query. This is based on the following statistical estimates:

  • The number of data pages occupied by the data table or index;
  • Candidate values ​​for the index;
  • The data length corresponding to the data row, key, and key value distribution.

The optimizer does not take into account cache estimates - it assumes that the result is read from disk I/O every time. The optimizer may not always choose the best execution plan for the following reasons:

  • The statistics themselves may be wrong. The statistical results on the server side depend on the storage engine, which may be very accurate or very inaccurate. For example, InnoDB does not keep the exact row count of a table due to its MVCC architecture.
  • The estimated cost is not equivalent to the actual running cost, so even if the statistics are accurate, the query cost will deviate from MySQL's estimate to a greater or lesser extent. A query plan that reads more data pages might also be cheaper, for example if the disk I/O access is faster in sequential order or if the results are already in cache. Therefore, the optimizer itself does not know how many I/O operations a query will incur.
  • MySQL artificial optimization may be different from what we expect. What we want may be faster execution time, but MySQL does not only pursue speed, it seeks to minimize cost. Therefore, passing the cost is not necessarily scientific.
  • MySQL does not take into account concurrent queries, which may affect the speed at which queries run.
  • MySQL does not always optimize based on cost estimates. Sometimes it is just a matter of following some rules, such as using the full-text index if there is a full-text match condition (MATCH method). Even if there is a faster alternative index and non-full-text query condition, MySQL will not execute the query faster.
  • The optimizer does not consider the cost of operations that are not under its control, such as executing stored procedures or custom functions.
  • The optimizer is not always able to estimate every execution plan, and sometimes it will overlook a more optimal plan.

The MySQL query optimizer is a very complex part that uses many optimization methods to convert a query statement into a query execution plan. There are usually two types of optimization: static optimization and dynamic optimization. Static optimization can be performed simply by inspecting the parse tree. For example, the optimizer can convert the WHERE condition into an equation through mathematical operation rules. Static optimization is not related to specific values, such as constant values ​​in WHERE conditions. They are executed once and remain valid even if the query is executed again with different values. It can be understood as "compile-time optimization".

In contrast, dynamic optimization is context-specific and depends on a variety of factors. For example, the value in the WHERE condition or the corresponding number of data rows in the index. This process needs to be re-estimated for each query and can be understood as "runtime optimization". Here are some typical optimization methods for MySQL:

  • Reordering of union queries: Data tables do not necessarily need to be united in the order of the query statements. Determining the best order for joining queries is a very important optimization.
  • Convert outer joins to inner joins: An outer join does not necessarily need to be queried as an outer join. Some factors, such as WHERE conditions and data table structure can make an outer join query equivalent to an inner join. MySQL can recognize these cases and rewrite the join query.
  • Apply mathematical equivalent formulas: MySQL applies mathematical equivalent conversions to simplify expressions. It can be done by expanding and reducing constants, eliminating impossible cases and constant expressions. For example, the expression (5=5 AND a>5) is simplified to (a>5). Similarly, (a 5 AND b=c AND a=5. These rules are very useful for queries with conditions.
  • COUNT(), MIN(), and MAX() optimizations: Indexes and columns with null values ​​can often help MySQL optimize these functions. For example, when looking for the minimum value in the leftmost column of a binary tree, MySQL can request only the first row of data in the index. This can even be done during the query optimization phase, and treated as a constant value for the rest of the query. The same is true for querying the maximum value, you only need to read the last u rows. If the server uses this optimization, you can see "Select tables optimized away" in EXPLAIN. This means that the optimizer removed the table from the query plan and replaced it with a constant. Similarly, COUNT(*) queries can also be optimized in some storage engines (such as MyISAM, which always stores the exact number of rows in the table) when no WHERE condition is specified.
  • Evaluating and simplifying constant expressions: Once MySQL detects that an expression can be simplified to a constant, it does so during the optimization phase. For example, a user-defined variable can be converted to a constant if it does not change during the query. Surprisingly, during the optimization phase, some statements that you think are a query are also converted to constants. An example is MIN() on an index. This situation can also be extended to constant queries on primary keys or independent indexes. If the WHERE condition specifies a constant for such an index, the optimizer knows that MySQL should look for the corresponding value at the beginning of the query. This value is then treated as a constant in the rest of the query. Here is an example:
EXPLAIN SELECT film.film_id, film_actor.actor_id
FROM sakila.film
	INNER JOIN sakila.film_actor USING(film_id)
WHERE film.film_id = 1;

MySQL will split this query into two steps, so the analysis result will have two rows. The first step is to find the corresponding data row in the film table. Because the query is based on the primary key film_id, MySQL knows that there is only one row of data. Therefore, the ref of the query analysis result at this time is a constant. In the second step, MySQL treats film_id as a known value, so the ref for the query on film_actor is also a constant. Other similar scenarios include when the constraint in the WHERE, USING, or ON condition is an equality. In this example, MySQL knows that the film_id in the USING condition is the same value in all queries, and this value must be the same as the film_id in the WHERE condition.

  • Covering index: MySQL sometimes uses index data to avoid reading row data if the index contains all the columns required by the query.
  • Subquery optimization: MySQL can transform some types of subqueries into more efficient variants, simplifying them into index queries rather than independent queries.
  • Early abort: MySQL can abort the query process early after satisfying the query result. The most obvious example is the LIMIT condition. There are also some other situations where early termination is required. For example, MySQL can abort the entire query after detecting a possible condition, as shown in the following example:
EXPLAIN SELECT film.film_id FROM sakila.film WHERE film_id=1;

In the Extra field of the analysis results, you will see "Impossible WHERE noticed after reading const tables". There are other situations where early termination may occur, for example:

SELECT film.film_id
FROM sakila.film
	LEFT OUTER JOIN sakila.film_actor USING (film_id)
WHERE sakila.film_actor.film_id IS NULL;

This query excludes movies that have actors in them. Each movie may have multiple actors, but once an actor is found, MySQL stops processing the current movie and moves on to the next one. A similar situation occurs for DISTINCT and NOT EXISTS.

  • Equivalence propagation: MySQL recognizes whether the columns held in a query are equivalent. For example, in a JOIN condition, the WHERE condition affects the same columns as in the following query:
SELECT film.film_id
FROM sakila.film
	INNER JOIN sakila.film_actor USING(film_id)
WHERE film.film_id > 500;

MySQL will know that the WHERE constraint applies not only to the film table, but also to the film_actor table. However, this optimization effect may not be achieved for other databases.

  • IN query comparison: For many database servers, IN query is equivalent to multiple OR conditions, and the two are logically equivalent. But this is not the case in MySQL. MySQL sorts the list values ​​of the IN query and uses binary search to check whether the query value is in the list. This reduces the complexity of the algorithm from O(n) to O(log n).

In fact, MySQL uses many more optimization methods than those listed above, and it is impossible to list them all here. Just remember the complexity of MySQL's optimizer and how smart it is. Therefore, the optimizer should be allowed to play its role instead of optimizing the query statement indefinitely until the MySQL optimizer has no room for improvement. Of course, although MySQL's optimizer is very smart, it does not necessarily give the best results. Sometimes you know the best result, but MySQL may not know it. In this case, you can optimize the query statement to help MySQL complete the optimization work, but sometimes you need to add query hints, or rewrite the query, modify the data table design or add indexes.

The above is the details of understanding the MySQL query optimization process. For more information about MySQL query optimization, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL index failure principle
  • The principles and defects of MySQL full-text indexing
  • MySQL index principle and usage example analysis
  • MySQL paging query optimization techniques
  • MySQL group query optimization method
  • MySQL index principle and query optimization detailed explanation

<<:  Implementation of Jenkins+Docker continuous integration

>>:  Recommend 60 paging cases and good practices

Recommend

ffmpeg Chinese parameter description and usage examples

1. When ffmpeg pushes video files, the encoding f...

How to elegantly implement WeChat authorized login in Vue3 project

Table of contents Preface Prepare Implementation ...

Mac VMware Fusion CentOS7 configuration static IP tutorial diagram

Table of contents Install CentOS7 Configuring Sta...

Implementing access control and connection restriction based on Nginx

Preface Nginx 's built-in module supports lim...

How to embed flash video format (flv, swf) files in html files

Flash file formats: .FLV and .SWF There are two ex...

MySQL high availability cluster deployment and failover implementation

Table of contents 1. MHA 1. Concept 2. Compositio...

Native JS to achieve draggable login box

This article shares a draggable login box impleme...

Detailed explanation of common usage of MySQL query conditions

This article uses examples to illustrate the comm...

How does MySQL connect to the corresponding client process?

question For a given MySQL connection, how can we...

Detailed explanation of nginx forward proxy and reverse proxy

Table of contents Forward Proxy nginx reverse pro...

Tutorial on installing MySQL with Docker and implementing remote connection

Pull the image docker pull mysql View the complet...

MySQL Series 3 Basics

Table of contents Tutorial Series 1. Introduction...

Why the CSS attribute value clear:right does not work in detail

Using the clear property to clear floats is a comm...

How to maintain MySQL indexes and data tables

Table of contents Find and fix table conflicts Up...