A brief discussion on whether too many MySQL data queries will cause OOM

A brief discussion on whether too many MySQL data queries will cause OOM

My host has only 100GB of memory. If I want to perform a full table scan on a 200GB table, will the memory of the DB host be used up?

When performing logical backup, isn't it just scanning the entire database? If this were to happen, all the memory would be consumed and the logical backup would have failed long ago, wouldn't it?
So it seems that there should be no problem with a full table scan of the large table. Why is this?

Impact of full table scan on the server layer

Suppose we now want to perform a full table scan on a 200G InnoDB table db1.t. Of course, if you want to save the scan results on the client, you would use a command like this:

mysql -h$host -P$port -u$user -p$pwd -e 
	"select * from db1.t" > $target_file

InnoDB data is stored in the primary key index, so a full table scan actually directly scans the primary key index of table t. Since this query statement has no other judgment conditions, each row found can be directly put into the result set and then returned to the client.

So, where does this "result set" exist?
The server does not need to save a complete result set. The process of getting and sending data is as follows:

  • Get a line and write it to net_buffer. The size of this memory is defined by the parameter net_buffer_length, the default is 16k
  • Repeatedly fetch rows until net_buffer is full, then call the network interface to send them out
  • If the transmission is successful, the net_buffer is cleared, and then the next line is taken and written to the net_buffer.
  • If the sending function returns EAGAIN or WSAEWOULDBLOCK, it means that the local network stack (socket send buffer) is full and enters waiting. Wait until the network stack is writable again before continuing to send

Query result sending process

visible:

  • When a query is sent, the maximum amount of memory occupied by MySQL is net_buffer_length , which will not reach 200G.
  • The socket send buffer cannot reach 200G (default definition /proc/sys/net/core/wmem_default). If the socket send buffer is full, the data reading process will be suspended.

So MySQL is actually "reading and sending". This means that if the client receives data slowly, the MySQL server will not be able to send the results, and the transaction execution time will be longer.

For example, the following status is the result seen by show processlist on the server when the client does not read the socket receive buffer content.

Server sends blocked


If you see that the State is always "Sending to client", it means that the network stack on the server is full.

If the client uses the –quick parameter, the mysql_use_result method is used: read one row and process it one row at a time. Assume that the logic of a certain business is relatively complex. If the logic to be processed after reading each row of data is very slow, it will cause the client to take a long time to fetch the next row of data, which may result in the result shown in the above figure.

Therefore, for normal online business, if a query returns few results, it is recommended to use the mysql_store_result interface to directly save the query results to local memory.

Of course, the premise is that the query returns few results. If there are too many, the client will occupy nearly 20G of memory because a large query is executed. In this case, you need to use the mysql_use_result interface instead.

If you see many threads in the "Sending to client" state in the MySQL database that you are responsible for maintaining, it means that you should ask your business development colleagues to optimize the query results and evaluate whether so many returned results are reasonable.

If you want to quickly reduce the number of threads in this state, you can set net_buffer_length to a larger value.

Sometimes, I see many query statements on the instance with the status "Sending data", but there are no network problems. Why does it take so long to send data?
The state changes of a query statement are as follows:

  • After the MySQL query statement enters the execution phase, first set the status to Sending data
  • Then, send the column-related information (meta data) of the execution result to the client
  • Continue to execute the statement flow
  • After execution is complete, set the status to an empty string.

That is, "Sending data" does not necessarily mean "sending data", but may mean any stage in the executor process. For example, you can construct a lock waiting scenario and see the Sending data status.

Reading the entire table is locked:

session1 session2
begin
select * from t where id=1 for update
Start a transaction
select * from t lock in share mode
(blocked)

Sending data status

It can be seen that session2 is waiting for the lock, and the status is displayed as Sending data.

  • "Sending to client" is displayed only when a thread is in the "waiting for the client to receive the result" state.
  • If it says "Sending data", it just means "executing"

Therefore, the query results are sent to the client in segments, so scanning the entire table and returning a large amount of data will not explode the memory.

The above is the processing logic of the server layer. How is it handled in the InnoDB engine?

Impact of full table scan on InnoDB

One of the functions of InnoDB memory is to save update results and cooperate with redo log to avoid random disk writes.

The data pages in memory are managed in the Buffer Pool (BP for short). In WAL, BP plays the role of accelerating updates.
BP can also speed up queries.

Because of WAL, when a transaction is committed, the data page on the disk is old. If there is a query to read the data page immediately, should the redo log be applied to the data page immediately?

unnecessary. Because at this time, the result of the memory data page is the latest, so you can directly read the memory page. At this time, the query does not need to read the disk, and the results are directly obtained from the memory, which is very fast. Therefore, Buffer Pool can speed up queries.

The acceleration effect of BP on queries depends on an important indicator, namely: memory hit rate.
You can view the current BP hit rate of a system in the show engine innodb status results. Generally speaking, for an online system with stable services to ensure that the response time meets the requirements, the memory hit rate must be above 99%.

Execute show engine innodb status and you can see the words "Buffer pool hit rate", which shows the current hit rate. For example, the hit rate in the picture below is 100%.


If all data pages required by the query can be obtained directly from the memory, that is the best, corresponding to a hit rate of 100%.

The size of the InnoDB Buffer Pool is determined by the parameter innodb_buffer_pool_size . It is generally recommended to set it to 60%~80% of the available physical memory.

About ten years ago, the amount of data on a single machine was hundreds of GB, while the physical memory was several GB; now, although many servers have 128 GB or even higher memory, the amount of data on a single machine has reached the T level.

Therefore, it is common for innodb_buffer_pool_size to be smaller than the amount of data on disk. If a Buffer Pool is full and a data page needs to be read from the disk, an old data page must be eliminated.

InnoDB Memory Management

The Least Recently Used (LRU) algorithm is used to eliminate the longest unused data.

  • Basic LRU algorithm

TODO

  • InnoDB manages the LRU algorithm of BP, which is implemented using a linked list:
  • state1, the head of the linked list is P1, indicating that P1 is the data page that has been accessed most recently
  • At this time, a read request accesses P3, so it changes to state 2 and P3 is moved to the front.
  • State 3 means that the data page being accessed does not exist in the linked list, so a new data page Px needs to be applied for in BP and added to the head of the linked list. However, since the memory is full, new memory cannot be requested. So clear the Pm data page memory at the end of the linked list, store the contents of Px, and put it at the head of the linked list

Finally, the data page Pm that has not been accessed for the longest time is eliminated.
What happens if a full table scan is performed at this time? If you want to scan a 200G table, and this table is a historical data table, no business accesses it normally.

Then, by scanning according to this algorithm, all the data in the current BP will be eliminated and the contents of the data pages accessed during the scanning process will be stored. In other words, the data in BP mainly contains the data in this historical data table.

For a library that is providing business services, this is not acceptable. You will see that the BP memory hit rate drops sharply, disk pressure increases, and SQL statement responses become slower.

Therefore, InnoDB cannot use the original LRU directly. InnoDB optimizes it.

Improved LRU algorithm

InnoDB divides the linked list into New area and Old area in a 5:3 ratio. In the figure, LRU_old points to the first position of the old area, which is 5/8 of the entire linked list. That is, 5/8 near the head of the linked list is the New area, and 3/8 near the tail of the linked list is the Old area.

Improved LRU algorithm execution process:

1. State 1, to access P3, since P3 is in the New area, just like the LRU before optimization, move it to the head of the linked list => State 2
2. Then, if you want to access a new data page that does not exist in the current linked list, the data page Pm is still eliminated, but the newly inserted data page Px is placed at LRU_old.
3. For data pages in the old area, the following judgment must be made each time they are accessed:

  • If the data page exists in the LRU list for more than 1 second, move it to the head of the list.
  • If the data page exists in the LRU list for less than 1 second, its position remains unchanged. 1s is controlled by the parameter innodb_old_blocks_time, with a default value of 1000 and a unit of ms.

This strategy is tailored to handle operations such as full table scans. Or scan the 200G historical data table:
4. During the scanning process, all newly inserted data pages are placed in the old area
5. There are multiple records in a data page. This data page will be accessed multiple times. However, due to sequential scanning, the time interval between the first access and the last access of this data page will not exceed 1 second, so it will still be retained in the old area.
6. Continue to scan subsequent data. The previous data page will not be accessed again, so it will never have the opportunity to move to the head of the linked list (New area) and will soon be eliminated.

It can be seen that the biggest benefit of this strategy is that although BP is also used in the process of scanning this large table, it has no impact on the young area, thus ensuring the query hit rate of the Buffer Pool in response to normal business.

summary

MySQL uses the logic of calculating and sending at the same time, so for query results with a large amount of data, the complete result set will not be saved on the server side. Therefore, if the client does not read the results in time, it will block the MySQL query process, but it will not explode the memory.

As for the InnoDB engine, due to the elimination strategy, large queries will not cause a surge in memory usage. In addition, because InnoDB has improved the LRU algorithm, the impact of full table scans of cold data on the Buffer Pool can be controlled.

Full table scans are still relatively IO-intensive, so full table scans cannot be performed directly on the online master database during business peak hours.

This is the end of this article on whether too many MySQL data queries will cause OOM. For more relevant MySQL data query OOM content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL and PHP basics and applications: data query
  • Detailed explanation of MySQL database tens of millions of data query and storage
  • Optimizing the slow query of MySQL aggregate statistics data
  • MySQL json format data query operation
  • MySQL big data query optimization experience sharing (recommended)
  • MySQL and PHP basics and applications: data query statements

<<:  Summary of pitfalls in virtualbox centos7 nat+host-only networking

>>:  Set the input to read-only via disabled and readonly

Recommend

Detailed explanation of CSS text decoration text-decoration &amp; text-emphasis

In CSS, text is one of the most common things we ...

Solution to elementui's el-popover style modification not taking effect

When using element-ui, there is a commonly used c...

Angular framework detailed explanation of view abstract definition

Preface As a front-end framework designed "f...

Vue implements the browser-side code scanning function

background Not long ago, I made a function about ...

Summary of commonly used commands for docker competition submission

Log in to your account export DOCKER_REGISTRY=reg...

mysql5.7.14 decompressed version installation graphic tutorial

MySQL is divided into Community Edition (Communit...

In-depth understanding of Vue-cli4 routing configuration

Table of contents Preface - Vue Routing 1. The mo...

How to test network speed with JavaScript

Table of contents Preface Summary of the principl...

The difference between hash mode and history mode in vue-router

vue-router has two modes hash mode History mode 1...

A brief introduction to the usage of decimal type in MySQL

The floating-point types supported in MySQL are F...

Basic statements of MySQL data definition language DDL

MySQL DDL statements What is DDL, DML. DDL is dat...

Detailed explanation of JS ES6 variable destructuring assignment

Table of contents 1. What is deconstruction? 2. A...

In-depth explanation of JavaScript this keyword

Table of contents 1. Introduction 2. Understand t...

Common date comparison and calculation functions in MySQL

Implementation of time comparison in MySql unix_t...