How MySQL supports billions of traffic

How MySQL supports billions of traffic

1 Master-slave read-write separation

Most Internet businesses are more read than write, so the priority is to consider how the DB can support a higher number of queries. First, it is necessary to distinguish between read and write traffic, which makes it convenient to expand the read traffic separately, that is, master-slave read and write separation.

If a sudden increase in front-end traffic causes the slave database to be overloaded, the DBA will prioritize expanding the capacity of the slave database. This way, the read traffic to the DB will fall to multiple slave databases, reducing the load on each slave database. The developer will then try their best to block the traffic at the DB layer.

Cache VS MySQL read-write separation Due to the difficulty of development and maintenance, the introduction of cache will introduce complexity. Issues such as cache data consistency, penetration, and anti-avalanche need to be considered, and one more type of component needs to be maintained. Therefore, it is recommended to use read-write separation first, and then use Cache when it can't be handled.

1.1 core

Master-slave read-write separation generally copies the data of a DB into one or more copies and writes them to other DB servers:

  • The original DB is the main database, responsible for data writing
  • The target DB is the slave database, responsible for data query

Therefore, the key to master-slave read-write separation is:

  • Copy of data

Master-slave replication

  • Shield the changes in the way of accessing DB caused by master-slave separation

Let developers feel like they are still using a single DB

2 Master-Slave Replication

MySQL master-slave replication relies on binlog, which records all changes on MySQL and saves them on disk in binary form in binary log files.

Master-slave replication is to transfer the data in the binlog from the master database to the slave database, usually asynchronously: the master database operation will not wait for the binlog synchronization to complete.

2.1 Master-slave replication process

  • When the slave connects to the master, it creates an I/O thread to request the master's updated binlog and writes the received binlog to the relay log file. The master also creates a log dump thread to send the binlog to the slave.
  • The slave database will also create an SQL thread, read the relay log, and replay it in the slave database to achieve master-slave consistency.

Using an independent log dump thread is asynchronous to avoid affecting the main update process of the master database. After receiving the information, the slave database does not write it to the slave database storage, but writes it to a relay log. This is to avoid writing to the actual storage of the slave database, which is more time-consuming and ultimately causes a longer delay between the slave database and the master database.

  • The process of master-slave asynchronous replication

For performance reasons, the master database write process does not wait for the master-slave synchronization to complete before returning the result. In extreme cases, for example, the disk is damaged or the machine loses power before the binlog on the master database is written to the disk, resulting in binlog loss and inconsistent master-slave data. But the probability is very low and tolerable.

After the master database goes down, the inconsistency of master-slave data caused by the loss of binlog can only be restored manually.

After master-slave replication, you can:

  • When writing, only write to the main library
  • When reading data, only read from the database

In this way, even if the write request locks the table or record, it will not affect the execution of the read request. Under high concurrency, multiple slaves can be deployed to share the read traffic, that is, one master and multiple slaves support high concurrency reading.

The slave database can also be used as a backup database to avoid data loss due to failure of the master database.

Can increasing the number of slaves without limit support higher concurrency?
NO! The more slaves there are, the more I/O threads will be connected to them. The master database will also have to create the same number of log dump threads to handle replication requests, which consumes a lot of master database resources. At the same time, the master database is limited by its network bandwidth. Therefore, a master database can usually only connect to 3 to 5 slaves.

2.2 Side Effects of Master-Slave Replication

For example, the operation of posting to Moments contains data:

  • Synchronous Operation

If update DB

  • Asynchronous Operations

For example, synchronizing the Moments content to the review system

Therefore, after updating the main database, the friend circle ID will be written into MQ, and the Consumer will obtain the friend circle information from the database based on the ID and then send it to the review system.
At this time, if there is a delay between the master and slave DBs, the friend circle information cannot be retrieved from the slave database, resulting in an exception!

Schematic diagram of the impact of master-slave delay on business

2.3 Avoiding Delays in Master-Slave Replication

What should I do? In fact, there are many solutions, and the core idea is to try not to query data from the database. Therefore, for the above case, there are the following solutions:

2.3.1 Data Redundancy

When sending MQ, you can send not only the friend circle ID, but all the friend circle information required by the Consumer, avoiding re-querying data from the DB.

This solution is recommended because it is simple enough, but it may cause a single message to be larger, thereby increasing the bandwidth and time for message sending.

2.3.2 Using Cache

While writing to the DB synchronously, write the Moments data to the cache. This way, when the Consumer obtains Moments information, it will first query the cache, which can also ensure data consistency.

This solution is suitable for scenarios where new data is added. If you are updating data, updating the cache first may cause data inconsistency. For example, two threads update data at the same time:

  • Thread A updates the cache data to 1
  • Another thread B updates the cache data to 2
  • Then thread B updates the DB data to 2
  • Thread A updates the DB data to 1

The final DB value (1) and Cache value (2) are inconsistent!

2.3.3 Query the main database

Instead of querying the slave database in the Consumer, you can query the master database.

Be cautious when using it. Make sure that the query volume is not large and is within the tolerance of the main database. Otherwise, it will cause great pressure on the main database.

Do not use this solution unless it is absolutely necessary. Because an interface for querying the main database is provided, it is difficult to ensure that others do not abuse this method.

The master-slave synchronization delay is also easy to overlook when troubleshooting.
Sometimes you may encounter strange problems where you cannot obtain information from the DB. You may wonder whether there is some logic in the code that deletes the previously written content, but you may find that you can read the data again when you query it again after a while. This is basically a master-slave delay problem.
Therefore, the time that the slave database lags behind is generally taken as a key DB indicator for monitoring and alarm. The normal time is at the ms level, and an alarm is triggered when it reaches the s level.

How to judge the master-slave delay time warning through which indicator in which database? In the slave library, monitor show slave
The value of the Seconds_Behind_Master parameter in the status\G command output determines whether a master-slave delay occurs.
This parameter value is copied by comparing the timestamp of the event executed by sql_thread and io_thread.
The difference is obtained by comparing the timestamp (abbreviated as ts) of the event.
However, if the io_thread thread load of the replication synchronization master library bin_log log is too high, Seconds_Behind_Master will always be 0, that is, no early warning can be given, and it is not accurate enough to judge the delay by the value of Seconds_Behind_Master. In fact, you can also compare the binlog positions of the master and slave.

3 How to access the DB

Using master-slave replication to copy data to multiple nodes also realizes the read-write separation of DB. At this time, the use of DB also changes:

  • Previously, only one DB address was needed
  • Now we need to use one master database address and multiple slave database addresses, and distinguish between write operations and query operations. Combined with "sharding of databases and tables", the complexity is greatly increased.

In order to reduce the complexity of implementation, many DB middlewares have emerged in the industry to solve DB access problems, which can be roughly divided into:

3.1 Inside the application

For example, TDDL (Taobao Distributed Data Layer) runs embedded in the application in the form of code. It can be regarded as a data source agent. Its configuration manages multiple data sources. Each data source corresponds to a DB, which may be a master database or a slave database.
When there is a DB request, the middleware sends the SQL statement to a specified data source and then returns the processing result.

advantage

It is easy to use and has low deployment cost. Because it is embedded inside the application and runs with the program, it is suitable for small teams with weak operation and maintenance capabilities.

shortcoming

Lack of multi-language support, all developed in Java, unable to support other languages. Version upgrades also rely on updates from the user.

3.2 Independently deployed proxy layer solution

Such as Mycat, Atlas, and DBProxy.

This type of middleware is deployed on an independent server. The business code is like using a single DB. In fact, it manages many data sources internally. When there is a DB request, it will make necessary changes to the SQL statement and then send it to the specified data source.

advantage

  • Generally uses the standard MySQL communication protocol, so it can support multiple languages
  • Independent deployment, so it is easy to maintain and upgrade, suitable for large and medium-sized teams with operation and maintenance capabilities

shortcoming

  • All SQL statements need to cross the network twice: from the application to the proxy layer and from the proxy layer to the data source, so there will be some performance loss.

4 Conclusion

Master-slave replication can be extended to a technology that replicates storage data between storage nodes, which can achieve data redundancy to achieve backup and improve horizontal expansion capabilities.

When using master-slave replication, consider:

  • The trade-off between master-slave consistency and write performance

If all slave nodes are guaranteed to be written successfully, the write performance will definitely be affected; if only writing to the master node returns success, data synchronization failure may occur on the slave node, resulting in inconsistency between the master and the slave. Internet projects generally prioritize performance over strong data consistency

  • Master-Slave Delay

This will cause many strange problems where data cannot be read.

Many real cases:

  • Redis achieves read-write separation through master-slave replication
  • Index fragments stored in Elasticsearch can also be replicated to multiple nodes
  • When written to HDFS, the file will also be copied to multiple DataNodes

Different components have different requirements for replication consistency and latency, and adopt different solutions, but the design ideas are the same.

FAQ

If there are a large number of orders, hashing them to different databases via userId is beneficial for front-end user order queries, but the back-end system page needs to view all orders and sort them, so SQL execution is very slow. What should I do about this?

Since the backend system cannot directly query the data in the sub-databases and sub-tables, you can consider synchronizing the data to a separate backend database or to ES.

This concludes this article on how MySQL supports billions of traffic. For more information about MySQL billions of traffic, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Introduction to 3 architectural expansion methods for MySQL database to cope with high-traffic websites

<<:  Various front-end printing methods of web: CSS controls web page printing style

>>:  Differences in the hr separator between browsers

Recommend

Use shell script to install python3.8 environment in CentOS7 (recommended)

One-click execution To install Python 3.8 in a vi...

Javascript closure usage scenario principle detailed

Table of contents 1. Closure 2. Closure usage sce...

Solution to nginx hiding version number and WEB server information

Nginx can not only hide version information, but ...

Use of js optional chaining operator

Preface The optional chaining operator (?.) allow...

How to use rem adaptation in Vue

1. Development environment vue 2. Computer system...

Methods of adaptive web design (good access experience on mobile phones)

1. Add the viewport tag to the HTML header. At th...

How to install git on linux

1. Introduction Git is a free, open source distri...

Implementation code for installing vsftpd in Ubuntu 18.04

Install vsftpd $ sudo apt-get install vsftpd -y S...

How to retrieve password for mysql 8.0.22 on Mac

Mac latest version of MySQL 8.0.22 password recov...

Implementation of CSS scroll bar style settings

webkit scrollbar style reset 1. The scrollbar con...

How to deploy MongoDB container with Docker

Table of contents What is Docker deploy 1. Pull t...

How to optimize MySQL deduplication operation to the extreme

Table of contents 1. Clever use of indexes and va...