Solution to the problem of MySQL thread in Opening tables

Solution to the problem of MySQL thread in Opening tables

Problem Description

Recently, there was a MySQL5.6.21 server. After the application was released, the concurrent threads Threads_running increased rapidly to about 2000. A large number of threads were waiting for Opening tables and Closing tables, and the related logical access on the application side timed out.

【Analysis process】

1. After the application is released at 16:10, Opened_tables continues to increase, as shown in the following figure:

Check the pt-stalk log file captured during the failure. At the time point 2019-01-18 16:29:37, the value of Open_tables was 3430, and the configuration value of table_open_cache was 2000.

When the Open_tables value is greater than the table_open_cache value, each time a new session opens a table, some of them cannot hit the table cache and have to be reopened. This reflects the phenomenon that a large number of threads are in the opening tables state.

2. The total number of tables in this instance plus the system database is 851, which is much smaller than 2000 in table_open_cache. Why does Open_tables reach 3430?

From the official documentation, we can get an explanation.

https://dev.mysql.com/doc/refman/5.6/en/table-cache.html

table_open_cache is related to max_connections. For example, for 200 concurrent running connections, specify a table cache size of at least 200 * N, where N is the maximum number of tables per join in any of the queries which you execute.

At that time, the number of concurrent threads reached 1980. Assuming that 30% of these concurrent connections accessed two tables and the rest accessed a single table, the cache size would reach (1980*30%*2+1980*70%*1)=2574

3. QPS was relatively stable before and after the release. From the perspective of external requests, there was no sudden increase in connection requests. However, after the release, threads_running rose to a high of nearly 2000 and continued. It is speculated that the problem was triggered by a certain SQL statement issued.

4. Check the processlist information captured at that time. There is a SQL statement with high concurrent access, which queries 8 physical tables. The SQL sample is as follows:

<code>select id,name,email from table1 left join table2<br>union all<br>select id,name,email from table3 left join table4<br>union all<br>select id,name,email from table5 left join table6<br>union all<br>select id,name,email from table7 left join table8<br>where id in ('aaa');</code>

5. Create the same 8 tables in the test environment, clear the table cache, and compare the results before and after a single session executes SQL. The value of Open_tables will increase by 8. If there is high concurrency, the value of Open_tables will increase significantly.

Problem Reproduction

In the test environment, a high-concurrency access scenario was simulated and 1,000 threads were used to execute the above SQL statements simultaneously. This reproduced a similar phenomenon to the production environment. The number of Open_tables quickly reached 3,800, and a large number of processes were in the Opening tables and Closing tables states.

Optimization plan

1. After locating the cause of the problem, we communicated with our development colleagues and suggested optimizing the SQL, reducing the number of single-statement SQL query tables, or significantly reducing the concurrent access frequency of the SQL.

However, before the development colleagues had time to optimize, the failure occurred again in the production environment. When the DBA was troubleshooting, he increased table_open_cache from 2000 to 4000. The CPU usage rate increased, but the effect was not obvious. The problem of waiting for opening tables still existed.

2. Analyze the pstack information captured during the failure and aggregate it with pt-pmp. You can see that a large number of threads are waiting for mutex resources when opening_table:

#0 0x0000003f0900e334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003f0900960e in _L_lock_995 () from /lib64/libpthread.so.0
#2 0x0000003f09009576 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000069ce98 in open_table(THD*, TABLE_LIST*, Open_table_context*) ()
#4 0x000000000069f2ba in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) ()
#5 0x000000000069f3df in open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int) ()
#6 0x00000000006de821 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
#7 0x00000000006e13cf in mysql_execute_command(THD*) ()
#8 0x00000000006e4d8f in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
#9 0x00000000006e62cb in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
#10 0x00000000006b304f in do_handle_one_connection(THD*) ()
#11 0x00000000006b3177 in handle_one_connection ()
#12 0x0000000000afe5ca in pfs_spawn_thread ()
#13 0x0000003f09007aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003f088e893d in clone () from /lib64/libc.so.6

At this time, the mutex conflict in table_cache_manager is very serious.

Since the default value of the table_open_cache_instances parameter in MySQL 5.6.21 is 1, increasing the table_open_cache_instances parameter and increasing the table cache partitions should alleviate the contention.

3. In the test environment, we adjusted the two parameters table_open_cache_instances=32 and table_open_cache=6000, and executed the problematic SQL concurrently with 1000 threads. This time, the threads waiting for Opening tables and Closing tables disappeared, and MySQL's QPS also increased from 12000 to 55000.
In the same situation, only table_open_cache=6000 was adjusted, and the number of processes waiting for Opening tables dropped from 861 to 203. The problem was alleviated. More than 600 processes have changed from waiting for Opening tables to running status, and QPS increased to about 40,000. However, the problem cannot be cured.

Source code analysis

Check the code for the logic of table_open_cache:

1. The Table_cache::add_used_table function is as follows. When the table opened by the new connection does not exist in the table cache, the opened table is added to the used tables list:

bool Table_cache::add_used_table(THD *thd, TABLE *table)
{
 Table_cache_element *el;
 
 assert_owner();
 
 DBUG_ASSERT(table->in_use == thd);
 
 /*
 Try to get Table_cache_element representing this table in the cache
 from array in the TABLE_SHARE.
 */
 el = table->s->cache_element[table_cache_manager.cache_index(this)];
 
 if (!el)
 {
 /*
  If TABLE_SHARE doesn't have a pointer to the element representing table
  In this cache, the element for the table must be absent from the table
  cache.
 
  Allocate new Table_cache_element object and add it to the cache
  and array in TABLE_SHARE.
 */
 DBUG_ASSERT(! my_hash_search(&m_cache,
         (uchar*)table->s->table_cache_key.str,
         table->s->table_cache_key.length));
 
 if (!(el= new Table_cache_element(table->s)))
  return true;
 
 if (my_hash_insert(&m_cache, (uchar*)el))
 {
  delete el;
  return true;
 }
 
 table->s->cache_element[table_cache_manager.cache_index(this)] = el;
 }
 
 /* Add table to the used tables list */ 
 el->used_tables.push_front(table);
 
 m_table_count++; free_unused_tables_if_necessary(thd);
 
 return false;
}

2. Each time add_used_table is called, the Table_cache::free_unused_tables_if_necessary function is called. When m_table_count > table_cache_size_per_instance &&m_unused_tables is satisfied, remove_table is executed to clear the redundant cache in the m_unused_tables list. Among them, table_cache_size_per_instance = table_cache_size / table_cache_instances. The default configuration of MySQL5.6 is 2000/1=2000. When the value of m_table_count is greater than 2000 and m_unused_tables is not empty, remove_table is executed to clear the table cache in m_unused_tables. In this way, m_table_count, which is the value of Open_tables, will normally remain around 2000.

void Table_cache::free_unused_tables_if_necessary(THD *thd)
{
 /*
 We have too many TABLE instances around let us try to get rid of them.
 
 Note that we might need to free more than one TABLE object, and thus
 need the below loop, in case when table_cache_size is changed dynamically,
 at server run time.
 */
 if (m_table_count > table_cache_size_per_instance && m_unused_tables)
 {
 mysql_mutex_lock(&LOCK_open);
 while (m_table_count > table_cache_size_per_instance &&
   m_unused_tables)
 {
  TABLE *table_to_free= m_unused_tables;  
  remove_table(table_to_free);
  intern_close_table(table_to_free);
  thd->status_var.table_open_cache_overflows++;
 }
 mysql_mutex_unlock(&LOCK_open);
 }
}

3. Increase table_cache_instances to 32. When Open_tables exceeds (2000/32=62), the condition will be met, which will accelerate the cleanup of m_unused_tables in the above logic, further reduce the number of table caches, and cause Table_open_cache_overflows to increase.

4. When table_open_cache_instances increases from 1 to 32, one LOCK_open lock is distributed to 32 m_lock mutexes, greatly reducing lock contention.

/** Acquire lock on table cache instance. */
 void lock() { mysql_mutex_lock(&m_lock); }
 /** Release lock on table cache instance. */
 void unlock() { mysql_mutex_unlock(&m_lock); }

Solving the problem

We also took the following optimization measures in our production environment to solve the problem:
1. Separate read and write, increase read nodes, and disperse the pressure on the master database;
2. Adjust table_open_cache_instances=16;
3. Adjust table_open_cache=6000;

Summarize

When the Opening tables waiting problem occurs,
1. It is recommended to find out the SQL statements that frequently open tables, optimize the SQL, reduce the number of tables queried by a single SQL statement, or significantly reduce the concurrent access frequency of the SQL.

2. Set an appropriate table cache and increase the values ​​of table_open_cache_instances and table_open_cache parameters.

You may also be interested in:
  • Mybatis learning road mysql batch add data method
  • Detailed explanation of how to implement secondary cache with MySQL and Redis
  • In-depth explanation of MySQL common index and unique index
  • How to use MySQL's geometry type to handle longitude and latitude distance problems
  • How to view mysql binlog (binary log)
  • Docker creates MySQL explanation
  • PHP date() format MySQL insert datetime method
  • How to solve SQL injection problem with pymysql
  • Steps to export the fields and related attributes of MySQL tables
  • MySQL limit performance analysis and optimization

<<:  A brief analysis of how to access Windows 10 host folders from CentOS 8 virtual machines

>>:  Vue+Vant implements the top search bar

Recommend

Detailed explanation of Linux host name modification command

Linux change hostname command 1. If you only need...

HTML basics HTML structure

What is an HTML file? HTML stands for Hyper Text M...

Example code comparing different syntax formats of vue3

The default template method is similar to vue2, u...

Detailed explanation of JavaScript animation function encapsulation

Table of contents 1. Principle of animation funct...

Detailed explanation of rpm installation in mysql

View installation and uninstallation # View rpm -...

Does MySql need to commit?

Whether MySQL needs to commit when performing ope...

Basic installation process of mysql5.7.19 under winx64 (details)

1. Download https://dev.mysql.com/downloads/mysql...

Tutorial on how to modify element.style inline styles

Preface When we were writing the web page style a...

Graphic tutorial on installing Mac system in virtual machine under win10

1. Download the virtual machine version 15.5.1 I ...

Implementation example of JS native double-column shuttle selection box

Table of contents When to use Structural branches...

Linux system to view CPU, machine model, memory and other information

During system maintenance, you may need to check ...

Vue imports Echarts to realize line scatter chart

This article shares the specific code of Vue impo...

Vue.js implements music player

This article shares the specific code of Vue.js t...

JavaScript to achieve mouse tailing effect

Mouse effects require the use of setTimeout to ge...