A brief introduction to MySQL database optimization techniques

A brief introduction to MySQL database optimization techniques

A mature database architecture is not designed with high availability, high scalability and other features from the beginning. It is only as the number of users increases that the infrastructure is gradually improved. This article mainly discusses the problems and optimization solutions faced by MySQL database in its development cycle. Putting aside the front-end application, it can be roughly divided into the following five stages:

Phase 1: Database table design

After the project is approved, the development department will develop the project according to the needs of the product department.
Development engineers will design the table structure at the beginning of the development project. For the database, table structure design is very important. If it is not designed properly, it will directly affect the speed at which users access the website and the user experience will be poor! There are many specific factors that affect this situation, such as slow queries (inefficient query statements), improper indexing, database congestion (locks), etc. Of course, there is a team in the testing department that will do product testing and find bugs.
Since development engineers focus on different things, they will not consider whether the database design is reasonable in the early stage, but will complete the function implementation and delivery as soon as possible. Once the project is online and has a certain amount of traffic, hidden problems will be exposed, and it will not be so easy to modify them at this time!

Phase 2: Database deployment

It's time for the operations engineer to come on board and launch the project.
In the early stage of a project, the number of visits is usually very small. At this stage, a single deployment of Web+database is sufficient to handle around 1000 QPS (query rate per second). Taking into account single point failure, high availability should be achieved. MySQL master-slave replication + Keepalived can be used to achieve dual-machine hot standby. Mainstream HA software includes: Keepalived (recommended) and Heartbeat.

Phase 3: Database performance optimization

If MySQL is deployed on an ordinary X86 server, without any optimization, MySQL can theoretically handle about 1500 QPS. After optimization, it may be increased to about 2000 QPS. Otherwise, when the number of visits reaches about 1,500 concurrent connections, the database processing performance may respond slowly, and the hardware resources are relatively abundant. At this time, it is time to consider performance optimization issues. So how can we maximize the performance of the database? It mainly focuses on hardware configuration, database configuration, and architecture, which are specifically divided into the following:

3.1 Hardware Configuration

If conditions permit, you must use SSD solid-state drives instead of SAS mechanical hard drives and adjust the RAID level to RAID1+0, which has better read and write performance than RAID1 and RAID5. After all, the pressure on the database mainly comes from disk I/O.
The Linux kernel has a feature that it divides the cache area (system cache and data cache) from the physical memory to store hot data. Through the file system's delayed write mechanism, it will be synchronized to the disk only when the conditions are met (such as the cache area size reaches a certain percentage or the sync command is executed). That is to say, the larger the physical memory, the larger the allocated buffer area and the more cached data. Of course, server failure will result in the loss of certain cached data. It is recommended that the physical memory be at least 50% surplus.

3.2 Database Configuration Optimization

There are two most widely used storage engines in MySQL: one is MyISAM, which does not support transaction processing, has fast read performance, and uses table-level locks. The other is InnoDB, which supports transaction processing (ACID properties) and is designed for large data processing and row-level locking.
Table lock: low overhead, large locking granularity, high probability of deadlock, and relatively low concurrency.
Row lock: high overhead, small locking granularity, low probability of deadlock, and high relative concurrency.
Why do table locks and row locks occur? Mainly to ensure data integrity. For example, if a user is operating a table and other users also want to operate the table, they have to wait until the first user finishes the operation before other users can operate. This is the function of table locks and row locks. Otherwise, if multiple users operate a table at the same time, data conflicts or anomalies will definitely occur.
Based on these aspects, using the InnoDB storage engine is the best choice and is also the default storage engine for MySQL 5.5+. Each storage engine has many related operating parameters. The following lists the parameters that may affect database performance.
Default values ​​of common parameters:

max_connections = 151
# Process the maximum number of connections at the same time. It is recommended to set the maximum number of connections to about 80% of the upper limit of connections. sort_buffer_size = 2M
# The buffer size when querying sorting, which only works for order by and group by, is recommended to be increased to 16M
open_files_limit = 1024 
# Limit the number of open files. If the value viewed by show global status like 'open_files' is equal to or greater than the open_files_limit value, the program will be unable to connect to the database or will freeze.

MyISAM parameter default values:

key_buffer_size = 16M
# Index buffer size, generally set to 30-40% of physical memory
read_buffer_size = 128K 
# Read operation buffer size, it is recommended to set 16M or 32M
query_cache_type = ON
# Enable query cache function query_cache_limit = 1M 
# Query cache limit, only query results below 1M will be cached to prevent the result data from overwriting the cache pool query_cache_size = 16M 
# Check the buffer size, which is used to cache SELECT query results. The next time the same SELECT query is run, the results will be returned directly from the cache pool. You can increase this value by multiples.

InnoDB parameter default values:

innodb_buffer_pool_size = 128M
# Index and data buffer size, it is recommended to set it to about 70% of physical memory innodb_buffer_pool_instances = 1  
# Number of buffer pool instances. It is recommended to set 4 or 8 innodb_flush_log_at_trx_commit = 1 
# Key parameter. 0 means that the log is written and synchronized to the disk approximately every second. A database failure will result in the loss of about 1 second of transaction data. 1. After each SQL statement is executed, it is written to the log and synchronized to the disk. This has a large I/O overhead. After executing the SQL statement, you have to wait for the log to be read and written, which is inefficient. 2 means that the log is only written to the system cache and then synchronized to the disk every second. This is very efficient. If the server fails, the transaction data will be lost. For those who do not have very high requirements for data security, setting 2 is recommended. It has high performance and the effect is obvious after modification.
innodb_file_per_table = OFF 
# Whether to share the tablespace. The default setting for versions 5.7+ is ON. The idbdata file in the shared tablespace keeps growing, affecting the I/O performance to a certain extent. It is recommended to enable independent tablespace mode. The index and data of each table are stored in their own independent tablespace, which allows a single table to be moved between different databases.
innodb_log_buffer_size = 8M 
# Log buffer size. Since the log is refreshed once a second at most, it usually does not need to exceed 16M.

3.3 System kernel parameter optimization

Most MySQL servers are deployed on Linux systems, so some parameters of the operating system will also affect MySQL performance. The following is an appropriate optimization of Linux kernel parameters.

net.ipv4.tcp_fin_timeout = 30
#TIME_WAIT timeout, the default is 60s
net.ipv4.tcp_tw_reuse = 1  
# 1 means enabling reuse, allowing TIME_WAIT sockets to be reused for new TCP connections, 0 means disabling net.ipv4.tcp_tw_recycle = 1  
# 1 means enabling fast recycling of TIME_WAIT sockets, 0 means disabling net.ipv4.tcp_max_tw_buckets = 4096  
# The system maintains the maximum number of TIME_WAIT sockets. If this number is exceeded, the system will randomly clear some TIME_WAIT and print a warning message net.ipv4.tcp_max_syn_backlog = 4096
# Enter the maximum length of the SYN queue. Increasing the queue length can accommodate more waiting connections. In Linux systems, if the number of file handles opened by a process exceeds the system default value of 1024, a "too many files open" message will be prompted, so the open file handle limit needs to be adjusted.
Restart to make it permanent:
# vi /etc/security/limits.conf 
* soft nofile 65535
* hard nofile 65535
Effective immediately for the current user:
# ulimit -SHn 65535

Phase 4: Database Architecture Expansion

As the business volume grows, the performance of a single database server can no longer meet business needs, and it is time to consider adding a server expansion architecture. The main idea is to decompose the load of a single database, break through the disk I/O performance, store hot data in the cache, and reduce the frequency of disk I/O access.

4.1 Increase cache

Add a cache system to the database to cache hot data in memory. If the requested data is in the cache, MySQL will no longer be requested, reducing the database load. There are two types of cache implementation: local cache and distributed cache. Local cache caches data in the local server memory or file. Distributed cache can cache massive amounts of data and has good scalability. The mainstream distributed cache systems are: memcached and redis. Memcached has stable performance, data is cached in memory, and the speed is very fast. The theoretical QPS can reach about 80,000. If you want data persistence, choose redis, the performance is not lower than memcached.
Working process:

4.2 Master-slave replication and read-write separation

In a production environment, business systems usually read more and write less, so a master-slave architecture can be deployed. The master database is responsible for write operations and performs dual-machine hot standby, and multiple slave databases perform load balancing and are responsible for read operations. Mainstream load balancers: LVS, HAProxy, Nginx.
How to achieve read-write separation? Most companies implement read-write separation at the code level, which is highly efficient. Another method is to achieve read-write separation through a proxy program, which is less used in enterprises and will increase middleware consumption. Mainstream middleware proxy systems include MyCat, Atlas, etc.
In this MySQL master-slave replication topology architecture, the load of a single machine is dispersed, greatly improving the database concurrency capability. If one slave server can handle 1500 QPS, then three can handle 4500 QPS, and it is easy to scale horizontally.
Sometimes, when faced with applications that perform a large number of write operations, the write performance of a single machine cannot meet business requirements. You can do two-way replication (dual-master), but there is one problem that needs to be noted: if both master servers provide read and write operations to the outside world, data inconsistency may occur. The reason for this is that the program has the probability of operating two databases at the same time. The simultaneous update operations will cause conflicts or inconsistencies in the data of the two databases.
You can set each table ID field to auto-increment and unique: auto_increment_increment and auto_increment_offset, or you can write an algorithm to generate a random unique number.
You can also consider the MGR (multi-master replication) cluster launched by the official in the past two years.

4.3 Database Sharding

Database sharding is to separate related tables in the database into different databases according to the business, such as web, bbs, blog and other databases. If the business volume is large, the separated database can be used as a master-slave replication architecture to further avoid excessive pressure on a single database. 4.4 Sub-table

The amount of data is increasing dramatically. A table in the database has millions of data, which makes query and insertion take too long. How can we solve the pressure of a single table? You should consider splitting this table into multiple small tables to reduce the pressure on a single table and improve processing efficiency. This method is called table sharding.
The table sharding technology is more complicated. It requires modifying the SQL statements in the program code and manually creating other tables. You can also use the merge storage engine to implement table sharding, which is relatively simple. After the tables are divided, the program operates on a master table. This master table does not store data, but only some relationships between the sub-tables and the method of updating data. The master table will distribute the pressure to different small tables according to different queries, thereby improving concurrency and disk I/O performance.
The split table is divided into vertical split and horizontal split:
Vertical splitting: Split the original table with many fields into multiple tables to solve the problem of table width. You can put infrequently used fields in a separate table, put large fields in a separate table, or put closely related fields in a table.
Horizontal splitting: Split the original table into multiple tables, each with the same structure, to solve the problem of large amount of data in a single table.

4.5 Partitioning

Partitioning is to divide the data of a table into multiple blocks according to the fields in the table structure (such as range, list, hash, etc.). These blocks can be on one disk or on different disks. After partitioning, it is still a table on the surface, but the data is hashed in multiple locations. In this way, multiple hard disks can process different requests at the same time, thereby improving the disk I/O read and write performance.
Note: Adding cache, sub-library, sub-table and partition is mainly done by programmers or DBAs.

Phase 5: Database Maintenance

Database maintenance is the job of database engineers or operation and maintenance engineers, including system monitoring, performance analysis, performance tuning, database backup and recovery and other major tasks.

5.1 Key indicators of performance status

Technical terms: QPS (Queries Per Second) and TPS (Transactions Per Second)
Check the running status through show status, there will be more than 300 status information records, among which several values ​​can help us calculate QPS and TPS, as follows:

Uptime: The actual time the server has been running, in seconds
Questions: The number of queries that have been sent to the database
Com_select: number of queries, actual database operations
Com_insert: number of insertions
Com_delete: number of deletions
Com_update: number of updates
Com_commit: transaction times
Com_rollback: number of rollbacks

So, here is the calculation method, calculate QPS based on Questions

mysql> show global status like 'Questions';
mysql> show global status like 'Uptime';
QPS = Questions / Uptime

Calculate TPS based on Com_commit and Com_rollback:

mysql> show global status like 'Com_commit';
mysql> show global status like 'Com_rollback';
mysql> show global status like 'Uptime';
TPS = (Com_commit + Com_rollback) / Uptime

Another way to calculate:

Calculate QPS based on Com_select, Com_insert, Com_delete, and Com_update:  
mysql> show global status where Variable_name in('com_select','com_insert','com_delete','com_update');
Wait for 1 second and execute again to get the interval difference. The QPS is the value of each variable in the second time minus the corresponding variable value in the first time.

TPS calculation method:

mysql> show global status where Variable_name in('com_insert','com_delete','com_update');
When calculating TPS, we do not count the query operations, and only need to calculate the four values ​​of insert, delete, and update.

After netizens tested these two calculation methods, it was concluded that when there are many MyISAM tables in the database, using Questions calculation is more accurate. When there are many InnoDB tables in the database, it is more accurate to use Com_* for calculation.

5.2 Enable slow query log

MySQL enables slow query logs to analyze which SQL statements are slow. Dynamic enabling is supported:

mysql> set global slow-query-log=on 
# Enable slow query logmysql> set global slow_query_log_file='/var/log/mysql/mysql-slow.log'; 
# Specify the location of the slow query log filemysql> set global log_queries_not_using_indexes=on;  
# Record queries that do not use indexesmysql> set global long_query_time=1;  
# Only record slow queries with processing time of more than 1 second. To analyze slow query logs, you can use the mysqldumpslow tool that comes with MySQL. The analyzed logs are relatively simple.
mysqldumpslow -t 3 /var/log/mysql/mysql-slow.log  
# To view the slowest three queries, you can also use Percona's pt-query-digest tool, which has comprehensive log analysis capabilities and can analyze slow log, binlog, and general log.
Analyze the slow query log: pt-query-digest /var/log/mysql/mysql-slow.log
Analyze binlog log: mysqlbinlog mysql-bin.000001 >mysql-bin.000001.sql 
pt-query-digest --type=binlog mysql-bin.000001.sql 
Analyze general logs: pt-query-digest --type=genlog localhost.log

5.3 Database backup

Backing up the database is the most basic task and the most important one, otherwise the consequences will be serious, you know! For high-frequency backup strategies, it is crucial to choose a stable and fast tool. If the database size is less than 2G, it is recommended to use the official logical backup tool mysqldump. If the size is over 2G, it is recommended to use Percona's physical backup tool xtrabackup, otherwise it will be as slow as a snail. Both tools support hot standby under the InnoDB storage engine without affecting business read and write operations.

5.4 Database Repair

Sometimes the MySQL server suddenly loses power or shuts down abnormally, which can cause table corruption and make it impossible to read table data. At this time, you can use the two tools that come with MySQL to repair it, myisamchk and mysqlcheck. The former can only repair MyISAM tables and stop the database, while the latter can repair both MyISAM and InnoDB tables online.
Note: It is best to back up the database before repairing.

myisamchk common parameters:
 -f --force Force repair, overwrite old temporary files, generally not used -r --recover Recovery mode -q --quik Quick recovery -a --analyze Analyze table -o --safe-recover Old recovery mode, if -r cannot repair, you can try this parameter -F --fast Only check tables that are not closed normally For example: myisamchk -r -q *.MYI
Common parameters of mysqlcheck:
 -a --all-databases Check all databases -r --repair Repair table -c --check Check table, default option -a --analyze Analyze table -o --optimize Optimize table -q --quik Quickest check or repair table -F --fast Only check tables that are not closed normally For example: mysqlcheck -r -q -uroot -p123456 weibo

5.5 MySQL Server Performance Analysis


Focus:
id: CPU utilization percentage. An average of less than 60% is normal, but it is already quite busy.
wa: CPU waiting time for disk IO response. Generally, a value greater than 5 indicates a large amount of disk read and write traffic.


KB_read/s, KB_wrtn/s The amount of data read and written per second, mainly evaluated based on the maximum read and write speed of the disk per second.

r/s, w/s: The number of read and write requests per second, which can be understood as IOPS (input and output per second), is one of the main indicators for measuring disk performance.
await: average IO response time per second. If it is greater than 5, it means that the disk response is slow and exceeds its own performance.
util: Disk utilization percentage. An average of less than 60% is normal, but it is already quite busy.

summary

Due to the original design limitations of relational databases, they are unable to cope with large data processing. Therefore, NoSQL (non-relational database) has become popular. It is inherently inspiring and has the characteristics of distribution, high performance, and high reliability. It makes up for certain inherent deficiencies of relational databases and is very suitable for storing unstructured data. Mainstream NoSQL databases include: MongoDB, HBase, Cassandra, etc.

The improvement of optimization effect at the database level alone is not very obvious. It is mainly necessary to choose the appropriate database according to the business scenario!

This is the end of this article about simply understanding MySQL database optimization techniques. For more relevant MySQL database optimization techniques, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Briefly understand the MYSQL database optimization stage
  • MySQL database optimization: index implementation principle and usage analysis
  • MySQL database optimization: detailed explanation of table and database sharding operations
  • Detailed explanation of eight ways to optimize MySQL database (classic must-read)
  • Some practices of MySQL stand-alone database optimization
  • Summary of MySQL database optimization technology and index usage skills
  • Summary of configuration techniques for MySQL database optimization technology
  • A brief discussion on MySQL database optimization from the perspective of operation and maintenance (Li Zhenliang)
  • MySQL database optimization details
  • 9 Tips for MySQL Database Optimization

<<:  Summary of Vue's cross-domain problem handling and solutions

>>:  VMware virtual machine three connection methods example analysis

Recommend

Docker Machine in-depth explanation

Differences between Docker and Docker Machine Doc...

mysql group_concat method example to write group fields into one row

This article uses an example to describe how to u...

Some points on using standard HTML codes in web page creation

The most common mistake made by many website desi...

Sample code for implementing radar chart with vue+antv

1. Download Dependency npm install @antv/data-set...

SQL fuzzy query report: ORA-00909: invalid number of parameters solution

When using Oracle database for fuzzy query, The c...

How to set up the terminal to run applications after Ubuntu starts

1. Enter start in the menu bar and click startup ...

How to convert JavaScript array into tree structure

1. Demand The backend provides such data for the ...

How to configure Jupyter notebook in Docker container

Jupyter notebook is configured under the docker c...

MySQL multi-instance configuration solution

1.1 What is MySQL multi-instance? Simply put, MyS...

How to implement remote access control in Centos 7.4

1. SSH remote management SSH is a secure channel ...

Learn MySQL in a simple way

Preface The database has always been my weak poin...

Detailed explanation of MySQL semi-synchronization

Table of contents Preface MySQL master-slave repl...