Problem Description Recently, there was a MySQL5.6.21 server. After the application was released, the concurrent threads Threads_running increased rapidly to about 2000. A large number of threads were waiting for Opening tables and Closing tables, and the related logical access on the application side timed out. 【Analysis process】 1. After the application is released at 16:10, Opened_tables continues to increase, as shown in the following figure: Check the pt-stalk log file captured during the failure. At the time point 2019-01-18 16:29:37, the value of Open_tables was 3430, and the configuration value of table_open_cache was 2000. When the Open_tables value is greater than the table_open_cache value, each time a new session opens a table, some of them cannot hit the table cache and have to be reopened. This reflects the phenomenon that a large number of threads are in the opening tables state. 2. The total number of tables in this instance plus the system database is 851, which is much smaller than 2000 in table_open_cache. Why does Open_tables reach 3430? From the official documentation, we can get an explanation. https://dev.mysql.com/doc/refman/5.6/en/table-cache.html table_open_cache is related to max_connections. For example, for 200 concurrent running connections, specify a table cache size of at least 200 * N, where N is the maximum number of tables per join in any of the queries which you execute. At that time, the number of concurrent threads reached 1980. Assuming that 30% of these concurrent connections accessed two tables and the rest accessed a single table, the cache size would reach (1980*30%*2+1980*70%*1)=2574 3. QPS was relatively stable before and after the release. From the perspective of external requests, there was no sudden increase in connection requests. However, after the release, threads_running rose to a high of nearly 2000 and continued. It is speculated that the problem was triggered by a certain SQL statement issued. 4. Check the processlist information captured at that time. There is a SQL statement with high concurrent access, which queries 8 physical tables. The SQL sample is as follows: <code>select id,name,email from table1 left join table2<br>union all<br>select id,name,email from table3 left join table4<br>union all<br>select id,name,email from table5 left join table6<br>union all<br>select id,name,email from table7 left join table8<br>where id in ('aaa');</code> 5. Create the same 8 tables in the test environment, clear the table cache, and compare the results before and after a single session executes SQL. The value of Open_tables will increase by 8. If there is high concurrency, the value of Open_tables will increase significantly. Problem Reproduction In the test environment, a high-concurrency access scenario was simulated and 1,000 threads were used to execute the above SQL statements simultaneously. This reproduced a similar phenomenon to the production environment. The number of Open_tables quickly reached 3,800, and a large number of processes were in the Opening tables and Closing tables states. Optimization plan 1. After locating the cause of the problem, we communicated with our development colleagues and suggested optimizing the SQL, reducing the number of single-statement SQL query tables, or significantly reducing the concurrent access frequency of the SQL. However, before the development colleagues had time to optimize, the failure occurred again in the production environment. When the DBA was troubleshooting, he increased table_open_cache from 2000 to 4000. The CPU usage rate increased, but the effect was not obvious. The problem of waiting for opening tables still existed. 2. Analyze the pstack information captured during the failure and aggregate it with pt-pmp. You can see that a large number of threads are waiting for mutex resources when opening_table: #0 0x0000003f0900e334 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003f0900960e in _L_lock_995 () from /lib64/libpthread.so.0 #2 0x0000003f09009576 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000069ce98 in open_table(THD*, TABLE_LIST*, Open_table_context*) () #4 0x000000000069f2ba in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) () #5 0x000000000069f3df in open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int) () #6 0x00000000006de821 in execute_sqlcom_select(THD*, TABLE_LIST*) () #7 0x00000000006e13cf in mysql_execute_command(THD*) () #8 0x00000000006e4d8f in mysql_parse(THD*, char*, unsigned int, Parser_state*) () #9 0x00000000006e62cb in dispatch_command(enum_server_command, THD*, char*, unsigned int) () #10 0x00000000006b304f in do_handle_one_connection(THD*) () #11 0x00000000006b3177 in handle_one_connection () #12 0x0000000000afe5ca in pfs_spawn_thread () #13 0x0000003f09007aa1 in start_thread () from /lib64/libpthread.so.0 #14 0x0000003f088e893d in clone () from /lib64/libc.so.6 At this time, the mutex conflict in table_cache_manager is very serious. Since the default value of the table_open_cache_instances parameter in MySQL 5.6.21 is 1, increasing the table_open_cache_instances parameter and increasing the table cache partitions should alleviate the contention. 3. In the test environment, we adjusted the two parameters table_open_cache_instances=32 and table_open_cache=6000, and executed the problematic SQL concurrently with 1000 threads. This time, the threads waiting for Opening tables and Closing tables disappeared, and MySQL's QPS also increased from 12000 to 55000. Source code analysis Check the code for the logic of table_open_cache: 1. The Table_cache::add_used_table function is as follows. When the table opened by the new connection does not exist in the table cache, the opened table is added to the used tables list: bool Table_cache::add_used_table(THD *thd, TABLE *table) { Table_cache_element *el; assert_owner(); DBUG_ASSERT(table->in_use == thd); /* Try to get Table_cache_element representing this table in the cache from array in the TABLE_SHARE. */ el = table->s->cache_element[table_cache_manager.cache_index(this)]; if (!el) { /* If TABLE_SHARE doesn't have a pointer to the element representing table In this cache, the element for the table must be absent from the table cache. Allocate new Table_cache_element object and add it to the cache and array in TABLE_SHARE. */ DBUG_ASSERT(! my_hash_search(&m_cache, (uchar*)table->s->table_cache_key.str, table->s->table_cache_key.length)); if (!(el= new Table_cache_element(table->s))) return true; if (my_hash_insert(&m_cache, (uchar*)el)) { delete el; return true; } table->s->cache_element[table_cache_manager.cache_index(this)] = el; } /* Add table to the used tables list */ el->used_tables.push_front(table); m_table_count++; free_unused_tables_if_necessary(thd); return false; } 2. Each time add_used_table is called, the Table_cache::free_unused_tables_if_necessary function is called. When m_table_count > table_cache_size_per_instance &&m_unused_tables is satisfied, remove_table is executed to clear the redundant cache in the m_unused_tables list. Among them, table_cache_size_per_instance = table_cache_size / table_cache_instances. The default configuration of MySQL5.6 is 2000/1=2000. When the value of m_table_count is greater than 2000 and m_unused_tables is not empty, remove_table is executed to clear the table cache in m_unused_tables. In this way, m_table_count, which is the value of Open_tables, will normally remain around 2000. void Table_cache::free_unused_tables_if_necessary(THD *thd) { /* We have too many TABLE instances around let us try to get rid of them. Note that we might need to free more than one TABLE object, and thus need the below loop, in case when table_cache_size is changed dynamically, at server run time. */ if (m_table_count > table_cache_size_per_instance && m_unused_tables) { mysql_mutex_lock(&LOCK_open); while (m_table_count > table_cache_size_per_instance && m_unused_tables) { TABLE *table_to_free= m_unused_tables; remove_table(table_to_free); intern_close_table(table_to_free); thd->status_var.table_open_cache_overflows++; } mysql_mutex_unlock(&LOCK_open); } } 3. Increase table_cache_instances to 32. When Open_tables exceeds (2000/32=62), the condition will be met, which will accelerate the cleanup of m_unused_tables in the above logic, further reduce the number of table caches, and cause Table_open_cache_overflows to increase. 4. When table_open_cache_instances increases from 1 to 32, one LOCK_open lock is distributed to 32 m_lock mutexes, greatly reducing lock contention. /** Acquire lock on table cache instance. */ void lock() { mysql_mutex_lock(&m_lock); } /** Release lock on table cache instance. */ void unlock() { mysql_mutex_unlock(&m_lock); } Solving the problem We also took the following optimization measures in our production environment to solve the problem: Summarize When the Opening tables waiting problem occurs, 2. Set an appropriate table cache and increase the values of table_open_cache_instances and table_open_cache parameters. You may also be interested in:
|
<<: A brief analysis of how to access Windows 10 host folders from CentOS 8 virtual machines
>>: Vue+Vant implements the top search bar
Linux change hostname command 1. If you only need...
What is an HTML file? HTML stands for Hyper Text M...
The default template method is similar to vue2, u...
Table of contents 1. Principle of animation funct...
View installation and uninstallation # View rpm -...
Whether MySQL needs to commit when performing ope...
1. Download https://dev.mysql.com/downloads/mysql...
Preface When we were writing the web page style a...
1. Download the virtual machine version 15.5.1 I ...
Table of contents When to use Structural branches...
During system maintenance, you may need to check ...
I. Overview When writing HTML templates, spaces a...
This article shares the specific code of Vue impo...
This article shares the specific code of Vue.js t...
Mouse effects require the use of setTimeout to ge...