My host has only 100GB of memory. If I want to perform a full table scan on a 200GB table, will the memory of the DB host be used up? When performing logical backup, isn't it just scanning the entire database? If this were to happen, all the memory would be consumed and the logical backup would have failed long ago, wouldn't it? Impact of full table scan on the server layerSuppose we now want to perform a full table scan on a 200G InnoDB table db1.t. Of course, if you want to save the scan results on the client, you would use a command like this: mysql -h$host -P$port -u$user -p$pwd -e "select * from db1.t" > $target_file InnoDB data is stored in the primary key index, so a full table scan actually directly scans the primary key index of table t. Since this query statement has no other judgment conditions, each row found can be directly put into the result set and then returned to the client. So, where does this "result set" exist?
Query result sending process visible:
So MySQL is actually "reading and sending". This means that if the client receives data slowly, the MySQL server will not be able to send the results, and the transaction execution time will be longer. For example, the following status is the result seen by show processlist on the server when the client does not read the socket receive buffer content. Server sends blocked If you see that the State is always "Sending to client", it means that the network stack on the server is full. If the client uses the –quick parameter, the mysql_use_result method is used: read one row and process it one row at a time. Assume that the logic of a certain business is relatively complex. If the logic to be processed after reading each row of data is very slow, it will cause the client to take a long time to fetch the next row of data, which may result in the result shown in the above figure. Therefore, for normal online business, if a query returns few results, it is recommended to use the mysql_store_result interface to directly save the query results to local memory. Of course, the premise is that the query returns few results. If there are too many, the client will occupy nearly 20G of memory because a large query is executed. In this case, you need to use the mysql_use_result interface instead. If you see many threads in the "Sending to client" state in the MySQL database that you are responsible for maintaining, it means that you should ask your business development colleagues to optimize the query results and evaluate whether so many returned results are reasonable. If you want to quickly reduce the number of threads in this state, you can set net_buffer_length to a larger value. Sometimes, I see many query statements on the instance with the status "Sending data", but there are no network problems. Why does it take so long to send data?
That is, "Sending data" does not necessarily mean "sending data", but may mean any stage in the executor process. For example, you can construct a lock waiting scenario and see the Sending data status. Reading the entire table is locked:
Sending data status It can be seen that session2 is waiting for the lock, and the status is displayed as Sending data.
Therefore, the query results are sent to the client in segments, so scanning the entire table and returning a large amount of data will not explode the memory. The above is the processing logic of the server layer. How is it handled in the InnoDB engine? Impact of full table scan on InnoDBOne of the functions of InnoDB memory is to save update results and cooperate with redo log to avoid random disk writes. The data pages in memory are managed in the Buffer Pool (BP for short). In WAL, BP plays the role of accelerating updates. Because of WAL, when a transaction is committed, the data page on the disk is old. If there is a query to read the data page immediately, should the redo log be applied to the data page immediately? unnecessary. Because at this time, the result of the memory data page is the latest, so you can directly read the memory page. At this time, the query does not need to read the disk, and the results are directly obtained from the memory, which is very fast. Therefore, Buffer Pool can speed up queries. The acceleration effect of BP on queries depends on an important indicator, namely: memory hit rate. Execute show engine innodb status and you can see the words "Buffer pool hit rate", which shows the current hit rate. For example, the hit rate in the picture below is 100%. If all data pages required by the query can be obtained directly from the memory, that is the best, corresponding to a hit rate of 100%. The size of the InnoDB Buffer Pool is determined by the parameter innodb_buffer_pool_size . It is generally recommended to set it to 60%~80% of the available physical memory. About ten years ago, the amount of data on a single machine was hundreds of GB, while the physical memory was several GB; now, although many servers have 128 GB or even higher memory, the amount of data on a single machine has reached the T level. Therefore, it is common for innodb_buffer_pool_size to be smaller than the amount of data on disk. If a Buffer Pool is full and a data page needs to be read from the disk, an old data page must be eliminated. InnoDB Memory ManagementThe Least Recently Used (LRU) algorithm is used to eliminate the longest unused data.
TODO
Finally, the data page Pm that has not been accessed for the longest time is eliminated. Then, by scanning according to this algorithm, all the data in the current BP will be eliminated and the contents of the data pages accessed during the scanning process will be stored. In other words, the data in BP mainly contains the data in this historical data table. For a library that is providing business services, this is not acceptable. You will see that the BP memory hit rate drops sharply, disk pressure increases, and SQL statement responses become slower. Therefore, InnoDB cannot use the original LRU directly. InnoDB optimizes it. Improved LRU algorithm InnoDB divides the linked list into New area and Old area in a 5:3 ratio. In the figure, LRU_old points to the first position of the old area, which is 5/8 of the entire linked list. That is, 5/8 near the head of the linked list is the New area, and 3/8 near the tail of the linked list is the Old area. Improved LRU algorithm execution process: 1. State 1, to access P3, since P3 is in the New area, just like the LRU before optimization, move it to the head of the linked list => State 2
This strategy is tailored to handle operations such as full table scans. Or scan the 200G historical data table: It can be seen that the biggest benefit of this strategy is that although BP is also used in the process of scanning this large table, it has no impact on the young area, thus ensuring the query hit rate of the Buffer Pool in response to normal business. summaryMySQL uses the logic of calculating and sending at the same time, so for query results with a large amount of data, the complete result set will not be saved on the server side. Therefore, if the client does not read the results in time, it will block the MySQL query process, but it will not explode the memory. As for the InnoDB engine, due to the elimination strategy, large queries will not cause a surge in memory usage. In addition, because InnoDB has improved the LRU algorithm, the impact of full table scans of cold data on the Buffer Pool can be controlled. Full table scans are still relatively IO-intensive, so full table scans cannot be performed directly on the online master database during business peak hours. This is the end of this article on whether too many MySQL data queries will cause OOM. For more relevant MySQL data query OOM content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Summary of pitfalls in virtualbox centos7 nat+host-only networking
>>: Set the input to read-only via disabled and readonly
In CSS, text is one of the most common things we ...
When using element-ui, there is a commonly used c...
Preface As a front-end framework designed "f...
background Not long ago, I made a function about ...
Log in to your account export DOCKER_REGISTRY=reg...
MySQL is divided into Community Edition (Communit...
Table of contents Preface - Vue Routing 1. The mo...
Table of contents Preface Summary of the principl...
vue-router has two modes hash mode History mode 1...
The decompressed version of MYSQL is installed 1:...
The floating-point types supported in MySQL are F...
MySQL DDL statements What is DDL, DML. DDL is dat...
Table of contents 1. What is deconstruction? 2. A...
Table of contents 1. Introduction 2. Understand t...
Implementation of time comparison in MySql unix_t...