PrefaceWith the rapid increase in the amount of information, the development of hardware equipment has gradually been unable to keep up with the processing power requirements of application systems. At this point, how do we address the system's performance requirements? There is only one way, which is to improve the system's scalability by transforming the system's architecture and combining multiple low-processing hardware devices to achieve a high-processing system. In other words, we must carry out scalable design. 1. What is scalability?Before discussing scalability, many friends may ask: We often hear people talk about how well a certain website or system is designed in terms of scalability and how excellent its architecture is, but what exactly is scalability? What is scalable? What is scalability? In fact, these are the three words that we often hear: Scale, Scalable and Scalability. From the perspective of the database, scale means that our database can provide stronger service capabilities and stronger processing capabilities. Scalable means that the database system can provide stronger processing capabilities after corresponding upgrades (including increasing the processing power of a single machine or increasing the number of servers). Theoretically, any database system is scalable, but the required implementation methods are different. Finally, Scalability refers to the difficulty of improving the processing power of a database system through corresponding upgrades. Although in theory any system can achieve improved processing power through corresponding upgrades, the upgrade costs (funds and manpower) required to improve the same processing power for different systems are different. This is what we call the great differences in the Scalability of various database application systems.
First of all, we need to understand that the scalability of a database system is actually mainly reflected in two aspects: one is horizontal expansion, and the other is vertical expansion, which is what we often call Scale Out and Scale Up. Scale Out refers to horizontal expansion, outward expansion, that is, increasing the overall processing capacity by adding processing nodes. To put it more practically, it means increasing the overall processing capacity by adding machines. Scale Up refers to vertical expansion, which means increasing the overall processing capacity by increasing the processing capacity of the current processing nodes. To put it simply, it is achieved by upgrading the configuration of existing servers, such as increasing memory, increasing CPU, increasing the hardware configuration of the storage system, or directly replacing them with servers with stronger processing capabilities and more advanced storage systems. By comparing the two scale methods, we can easily see their respective advantages and disadvantages. Scale Out Advantages:
Scale Out Disadvantages:
Scale Up Advantages:
Scale Up Disadvantages: High-end equipment is expensive, has little competition, and is easily restricted by manufacturers; However, from a long-term perspective, Scale Out will have greater advantages, and it is also an inevitable trend after the system reaches a certain scale. Because no matter what, the processing power of a single machine will always be limited by hardware technology, and the development speed of hardware technology is always limited, and it is often difficult to keep up with the speed of business development. Moreover, the higher the processing power of high-end equipment, the worse its cost performance will always be. Therefore, building a high-processing distributed cluster using multiple cheap PC servers will always be a goal for companies to save costs and improve overall processing capabilities. Although you may encounter various technical problems when achieving this goal, it is always worth studying and practicing. In the following content, we will focus on the Scale Out aspect for analysis and design. To be able to scale out well, a distributed system design is necessary. For databases, if we want to scale out better, we have only two directions. One is to achieve expansion by continuously replicating data to realize many completely identical data sources. The other is to achieve expansion by dividing a centralized data source into many data sources. Next, let's take a look at the principles that need to be followed in designing a database application system architecture with good scalability. 2. Principle of minimizing transaction correlationWhen building a distributed database cluster, many people are concerned about transaction issues. After all, transactions are a very core function in the database. In the traditional centralized database architecture, transaction problems are very easy to solve and can be completely guaranteed by relying on the database's own very mature transaction mechanism. However, once our database is used as a distributed architecture, many transactions that were originally completed in a single database may now need to span multiple database hosts. In this way, the original single-machine transactions may need to introduce the concept of distributed transactions. But everyone must have some understanding that distributed transactions themselves are a very complex mechanism. Whether it is a large commercial database system or various open source database systems, although most database manufacturers have basically implemented this function, there are more or less various limitations. There are also some bugs that may cause certain transactions to not be guaranteed well or to not be completed smoothly. At this time, we may need to seek other alternatives to solve this problem. After all, transactions cannot be ignored. No matter how we implement them, they always need to be implemented. At present, there are three main solutions: First, when designing Scale Out, design the segmentation rules reasonably to ensure that the data required by the transaction is on the same MySQL Server as much as possible to avoid distributed transactions.If we can ensure that all transactions can be completed on a single MySQL Server when designing data segmentation rules, our business needs can be achieved more easily, and the application can meet architectural changes with minimal adjustments, greatly reducing overall costs. After all, database architecture transformation is not just the DBA's job, it also requires a lot of external cooperation and support. Even when designing a brand new system, we still have to consider the overall investment in each environment and each work, not only the cost investment of the database itself, but also the corresponding development cost. If there is a conflict of interest between the various links, we must make a trade-off based on subsequent expansion and overall cost to find a balance point that is most suitable for the current stage. However, even if our sharding rules are well designed, it is difficult to have all the data required for all transactions on the same MySQL Server. Therefore, although this solution requires the lowest cost, most of the time it can only take into account most of the core affairs and is not a perfect solution. Second, large transactions are divided into multiple small transactions. The database ensures the integrity of each small transaction, and the application controls the overall transaction integrity between the small transactions.Compared with the previous solution, this solution will bring more application modifications and have more stringent requirements on the application. The application not only needs to split up many of the original large transactions, but also needs to ensure the integrity of each small transaction. In other words, the application itself needs to have certain transaction capabilities, which will undoubtedly increase the technical difficulty of the application. However, this solution also has many advantages of its own. First of all, our data segmentation rules will be simpler and it will be difficult to encounter restrictions. And simpler means lower maintenance costs. Secondly, without too many restrictions on data segmentation rules, the database will be more scalable and will not be subject to too many constraints. When performance bottlenecks occur, the existing database can be quickly split further. Finally, the database is further away from the actual business logic, which is more conducive to subsequent architecture expansion. Third, combine the above two solutions, integrate their respective advantages and avoid their respective disadvantages.The previous two solutions have their own advantages and disadvantages, and are basically contradictory to each other. We can fully utilize their respective advantages, adjust the design principles of the two solutions, and strike a balance in the overall architectural design. For example, we can ensure that the data required for some core transactions are on the same MySQL Server, while other transactions that are not particularly important can be split into small transactions and combined with the application system. Moreover, for some transactions that are not particularly important, we can also conduct in-depth analysis to see whether it is inevitable to use transactions. By following this principle of balanced design, we can avoid the application having to handle too many small transactions to ensure its overall integrity, while also avoiding the situation where too many complex splitting rules increase the difficulty of subsequent maintenance and hinder scalability. Of course, not all application scenarios must be solved by combining the above two solutions. For example, for applications that do not have particularly strict transaction requirements, or where the transactions themselves are very simple, the relevant requirements can be met through slightly designed splitting rules. We can just use the first solution to avoid the need for the application to maintain the overall integrity of certain small transactions. This can reduce the complexity of the application to a great extent. As for applications with very complex transaction relationships and high correlation between data, there is no need for us to design hard to keep transaction data centralized, because no matter how hard we try, it is difficult to meet the requirements, and most of the time we will encounter a situation where we lose sight of the big picture. In this case, we might as well keep the database as simple as possible and let the application make some sacrifices. In many current large-scale Internet applications, there are use cases for all of the above solutions. For example, Ebay, which is well known to everyone, is largely a combination of the third solution. In the process of combination, the second option is the main one and the first option is the auxiliary one. In addition to the needs of their application scenarios, the reason for choosing this architecture is that its strong technical capabilities also provide a guarantee for developing a sufficiently robust application system. Another example is a large domestic BBS application system (its real name is not disclosed), its transaction correlation is not particularly complex, and the data correlation between various functional modules is not particularly high. It completely adopts the first solution and completely avoids the transaction data source spanning multiple MySQL Servers by reasonably designing data splitting rules. Finally, we also need to understand one point, that is, the more transactions, the better, but the fewer, the better and the smaller the better. Regardless of which solution we use, when we design our application, we need to make the data transactionally less relevant, or even eliminate the need for transactional relevance. Of course, this is only relative, and certainly only some data can do this. However, once a certain part of the data has no transaction relevance, the overall complexity of the system may be reduced to a significant level, and both the application and the database system may pay much less cost. 3. Data consistency principleRegardless of whether we scale up or scale out, no matter how we design our architecture, ensuring the eventual consistency of data is a principle that must not be violated. I believe that all readers must be very clear about the importance of ensuring this principle. Moreover, ensuring data consistency is just like ensuring transaction integrity. When we design the system to scale out, we may also encounter some problems. Of course, if you scale up, you may rarely encounter this kind of trouble. Of course, in the eyes of many people, data consistency also falls into the category of transaction integrity to some extent. However, in order to highlight its importance and relevant characteristics, I will analyze it separately here. So how can we ensure data consistency while scaling out? Many times this problem gives us a headache just like ensuring transaction integrity, and it has also attracted the attention of many architects. After many people's practice, everyone finally summarized the BASE model. That is: basically available, flexible state, basically consistent and eventually consistent. These words may seem complicated and profound, but in fact, you can simply understand them as the principle of non-real-time consistency. In other words, the application system is implemented through relevant technologies, so that the entire system allows data to be in a non-real-time state for a short period of time while meeting the needs of users, and subsequent technologies are used to ensure that the data is ultimately in a consistent state. This theoretical model sounds simple, but we will encounter many difficulties in its actual implementation. First of all, the first question is do we need to make all data consistent in non-real time? I think most of my readers will definitely vote against it. If not all data is non-real-time consistent, how can we determine which data needs to be consistent in real time and which data only needs non-real-time eventual consistency? In fact, this can basically be said to be a division of business priorities for each module. For applications with higher priorities, they naturally belong to the camp that ensures real-time consistency of data, while applications with slightly lower priorities can be considered to be divided into the camp that allows inconsistency within a short period of time but eventually consistency. This is a very tricky problem. We cannot make decisions casually, but need to make decisions through very detailed analysis and careful evaluation. Because not all data can appear in an inconsistent state in the system for a short period of time, and not all data can be processed later to reach a consistent state, so at least these two types of data need to be consistent in real time. In order to distinguish between these two types of data, we must conduct a detailed analysis of the business scenarios and commercial needs and then conduct a full evaluation to draw a conclusion. Secondly, how to make inconsistent data in the system eventually consistent? Generally speaking, we must clearly separate the business modules designed for this type of data from the business modules that require real-time consistent data. Then, through the relevant asynchronous mechanism technology and the corresponding background process, the currently inconsistent data will be further processed through the data, logs and other information in the system, so that the final data is in a completely consistent state. Using different background processes for different modules can not only avoid data disorder, but also enable concurrent execution to improve processing efficiency. For information such as user message notifications, there is no need to achieve strict real-time consistency. It is only necessary to record the messages that need to be processed and then let the background processing process handle them in sequence to avoid congestion of the foreground business. Finally, avoid front-end online interaction between real-time consistent data and eventually consistent data. Due to the inconsistency of the two types of data states, it is very likely that the two types of data will become disordered during the interaction process. All non-real-time consistent data and real-time consistent data should be effectively isolated in the application as much as possible. Even in some special scenarios, it is necessary to physically isolate records in different MySQL Servers. 4. High availability and data security principlesIn addition to the above two principles, I would also like to mention the two aspects of system high availability and data security. After our Scale Out design, the overall scalability of the system will indeed be greatly improved, and the overall performance will naturally be easily improved. However, maintaining the overall system availability has become more difficult than before. Because the overall system architecture is complex, both the application and database environment will be larger and more complex than before. The most direct impact of this is that maintenance is more difficult and system monitoring is more difficult. If such a design modification results in our system frequently crashing and experiencing downtime, I think no one will be able to accept it. Therefore, we must use various technical means to ensure that the system availability will not decrease, and may even be improved overall. Therefore, this naturally leads to another principle in our Scale Out design process, which is the principle of high availability. No matter how the system architecture is adjusted, the overall availability of the system cannot be reduced. In fact, when discussing system availability, it is natural to bring up another closely related principle, which is the data security principle. To achieve high availability, the data in the database must be secure enough. The security referred to here does not refer to malicious attacks or theft, but to abnormal losses. In other words, we must ensure that our data will not be lost when software/hardware failure occurs. Once the data is lost, there is no availability at all. Moreover, data itself is the core resource of the database application system, and the principle that it must not be lost is beyond doubt. The best way to ensure high availability and data security is through a redundancy mechanism. All software and hardware devices eliminate single point risks, and all data exists in multiple copies. This is the only way to better ensure this principle. In terms of technology, we can achieve this through technologies such as MySQL Replication and MySQL Cluster. SummarizeNo matter how we design the architecture, no matter how our scalability changes, some of the principles mentioned in this chapter are very important. Whether it is the principle of solving certain problems, the principle of assurance, the principle of ensuring availability, or the principle of ensuring data security, we should always pay attention to and keep them in mind during design. The reason why MySQL database is so popular in the Internet industry is that in addition to its open source characteristics and ease of use, another very important factor is that it has a great advantage in scalability. The characteristics of its different storage engines can cope with various application scenarios. Its Replication and Cluster features are very effective means to improve scalability. The above is the detailed content of the basic principles of MySQL scalable design. For more information about MySQL scalable design, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Web page experience: Web page color matching
>>: Sample code for implementing image drawer effect with CSS3
I have read countless my.cnf configurations on th...
Preface 1. This article uses MySQL 8.0 version Co...
This article takes the connection error ECONNREFU...
Table of contents 1 Introduction 2 Prerequisites ...
Table of contents Class void pointing ES6 Arrow F...
Preface I have an old laptop with Win7. In order ...
Designing navigation for a website is like laying...
Preface The electricity in my residence has been ...
Problem Description After installing Qt5.15.0, an...
Table of contents Two modules for using nginx for...
Generally, during Qingming Festival, the National...
pt-heartbeat When the database is replicated betw...
Preface The project requirement is to determine w...
Table of contents 1. Globally registered componen...
1. MYSQL index Index: A data structure that helps...