Seven solutions for classic distributed transactions between MySQL and Golan

Seven solutions for classic distributed transactions between MySQL and Golan

Preface:

With the rapid development of business and the increasing complexity of business, almost every company's system will move from monolithic to distributed, especially to microservice architecture. Then, we will inevitably encounter the problem of distributed transactions.
This article first introduces the relevant basic theories, then summarizes the most classic transaction solutions, and finally gives solutions to the out-of-order execution of sub-transactions (idempotence, empty compensation, and suspension problems) to share with everyone.

1. Basic theory

Before explaining the specific solution, let us first understand the basic theoretical knowledge involved in distributed transactions.

Let's take money transfer as an example. If A needs to transfer 100 yuan to B, then the balance of A needs to be -100 yuan, and the balance of B needs to be +100 yuan. The entire transfer must ensure that A-100 and B+100 succeed at the same time, or fail at the same time. Let’s see how to solve this problem in various scenarios.

1.1 Transactions

The function of operating multiple statements as a whole is called a database transaction. A database transaction can ensure that all operations within the scope of the transaction can succeed or fail.

Transactions have four properties:原子性,一致性,隔離性, and持久性. These four properties are often referred to as ACID properties.

  • Atomicity : All operations in a transaction are either completed or not completed at all, and will not end at some intermediate stage. If an error occurs during the execution of a transaction, it will be restored to the state before the transaction started, as if the transaction had never been executed.
  • Consistency : The integrity of the database is not compromised before a transaction begins and after the transaction ends. Integrity, including foreign key constraints and application-defined constraints, will not be violated.
  • Isolation : The ability of a database to allow multiple concurrent transactions to read, write, and modify its data at the same time. Isolation can prevent data inconsistency caused by cross-execution when multiple transactions are executed concurrently.
  • Durability : After a transaction is completed, changes to the data are permanent and will not be lost even if the system fails.

If our business system is not complicated and we can modify the data in one database and one service to complete the transfer, then we can use database transactions to ensure the correct completion of the transfer business.

1.2 Distributed Transactions

Interbank transfer business is a typical distributed transaction scenario. Suppose A needs to transfer money to B across banks, then the data of two banks is involved. ACID of the transfer cannot be guaranteed through a local transaction of a database, and can only be solved through distributed transactions.

A distributed transaction means that the transaction initiator, resources and resource manager, and transaction coordinator are located on different nodes of the distributed system. In the above transfer business, user A-100 operation and user B+100 operation are not located on the same node. In essence, distributed transactions are designed to ensure the correct execution of data operations in distributed scenarios.

In a distributed environment, distributed transactions, in order to meet the needs of availability, performance and degraded services, reduce the requirements of consistency and isolation, on the one hand, follow the BASE theory (BASE-related theories involve a lot of content, interested students can refer to the BASE theory):

  • Basic Availability
  • Soft state
  • Eventual consistency

Similarly, distributed transactions also partially follow the ACID specification:

  • Atomicity: Strictly follow
  • Consistency: Consistency after the transaction is completed is strictly followed; consistency during the transaction can be appropriately relaxed
  • Isolation: Parallel transactions cannot affect each other; visibility of intermediate transaction results allows for safe relaxation
  • Persistence: Strictly follow

2. Distributed transaction solutions

Due to the distributed transaction solution, it is impossible to achieve complete ACID guarantees, and there is no perfect solution that can solve all business problems. Therefore, in actual applications, the most suitable distributed transaction solution will be selected according to the different characteristics of the business.

2.1 Two-Phase Commit/XA

XA is a distributed transaction specification proposed by X/Open organization. The XA specification mainly defines the interface between the (global) transaction manager (TM) and the (local) resource manager (RM). Local databases such as mysql play the role of RM in XA

XA is divided into two stages:

  • Phase 1 ( prepare ): All participating RMs prepare to execute transactions and lock the required resources. When the participant ready , he reports to TM that he is ready.
  • Phase 2 ( commit/rollback ): When the transaction manager (TM) confirms that all participants (RM) are ready, it sends a commit command to all participants.

Currently, most mainstream databases support XA transactions, including mysql , oracle , sqlserver , and postgre

An XA transaction consists of one or more resource managers (RMs), a transaction manager (TM), and an pplicationProgram (ApplicationProgram).

The three roles of RM, TM, and AP here are classic role divisions, which will run through subsequent transaction modes such as Saga and Tcc.

Taking the above transfer as an example, the sequence diagram of a successfully completed XA transaction is as follows:

If any participant fails prepare , TM will notify all participants that have completed prepare to roll back.

The characteristics of XA transactions are:

  • Simple to understand, easier to develop
  • The resource is locked for a long time and the concurrency is low

If readers want to further study XA , go language as well as PHP , Python , Java , C# , Node , etc. can refer to DTM

2.2 SAGA

Saga is a solution mentioned in this database paper sagas . The core idea is to split a long transaction into multiple local short transactions, which are coordinated by the Saga transaction coordinator. If they end normally, they will be completed normally. If a step fails, the compensation operation will be called once in the reverse order.

Taking the above transfer as an example, the sequence diagram of a successfully completed SAGA transaction is as follows:

Once Saga reaches the Cancel stage, Cancel is not allowed to fail in business logic. If the response is not successful due to network or other temporary failures, TM will keep retrying until Cancel returns successfully.

Characteristics of Saga transactions:

  • High concurrency, no need to lock resources for a long time like XA transactions
  • Need to define normal operation and compensation operation, the development workload is larger than XA
  • The consistency is weak. For transfers, it may happen that user A has already deducted the money, but the transfer fails.

SAGA content in the paper is quite extensive, including two recovery strategies and concurrent execution of branch transactions. Our discussion here only includes the simplest SAGA

SAGA is applicable to many scenarios, including long transactions and business scenarios that are not sensitive to intermediate results.

If readers want to further study SAGA, they can refer to DTM, which includes examples of SAGA success and failure rollback, as well as the handling of various network exceptions.

2.3 TCC

The concept of TCC (Try-Confirm-Cancel) was first proposed by Pat Helland in a paper published in 2007 titled "Life beyond Distributed Transactions: an Apostate's Opinion" .

TCC is divided into 3 stages:

  • Try phase: Try to execute, complete all business checks (consistency), and reserve necessary business resources (quasi-isolation)
  • Confirm phase: Confirms the actual execution of the business without any business check. Only the business resources reserved in the Try phase are used. The Confirm operation requires idempotent design and needs to be retried after Confirm fails.
  • Cancel stage: cancel the execution and release the business resources reserved in the Try stage. The exception handling solutions in Cancel phase are basically the same as those in the Confirm phase, and require idempotent design.

Taking the transfer above as an example, usually the amount will be frozen in Try but not deducted, deducted in Confirm , and unfrozen in Cancel .

The timing diagram of a successfully completed TCC transaction is as follows:

Confirm/Cancel phase of TCC is not allowed to return failure in business logic. If it cannot return success due to network or other temporary failures, TM will keep retrying until Confirm/Cancel returns success.

The features of TCC are as follows:

  • High concurrency and no long-term resource lock-in.
  • The development workload is large, and Try/Confirm/Cancel interface needs to be provided.
  • The consistency is good, and there will be no situation where SAGA has deducted the money but the transfer fails in the end.
  • TCC is applicable to order-based businesses and businesses with constraints on intermediate states.

If readers want to further study TCC , they can refer to DTM

2.4 Local Message Table

The local message table solution was originally proposed by ebay architect Dan Pritchett in an article published to ACM in 2008. The core of the design is to ensure the asynchronous execution of tasks that require distributed processing through messages.

The general process is as follows:

Writing local messages and business operations are placed in one transaction to ensure the atomicity of business and message sending. Either they all succeed or they all fail.

Fault tolerance mechanism:

  • When the balance deduction transaction fails, the transaction is rolled back directly without any subsequent steps.
  • Failure in producing messages in round-robin order and failure in increasing balance transactions will be retried.

Features of the local message table:

  • Long transactions only need to be split into multiple tasks, which is easy to use
  • The producer needs to create an additional message table
  • Each local message table needs to be polled
  • If the consumer logic cannot be successfully retried, more mechanisms are needed to roll back the operation.

Applicable to businesses that can be executed asynchronously and do not need to be rolled back for subsequent operations

2.5 Transaction Messages

In the above-mentioned local message table solution, the producer needs to create an additional message table and also needs to poll the local message table, which places a heavy business burden. Alibaba's open source RocketMQ 4.3 and later versions officially support transactional messages. Transactional messages essentially put the local message table on RocketMQ to solve the atomicity problem between message sending on the production side and local transaction execution.

Transaction message sending and submission:

Send message (half message)
The server stores the message and responds to the writing result of the message to execute local transactions according to the sending result (if the writing fails, the half message is not visible to the business and the local logic is not executed)
Execute Commit or Rollback according to the local transaction status ( Commit operation publishes messages, which are visible to consumers)

The normal sending flow chart is as follows:

Compensation process:

For transaction messages without Commit/Rollback (messages in pending state), a "review" is initiated from the server.
The Producer receives the callback message and returns the status of the local transaction corresponding to the message, which is Commit or Rollback
The transaction message solution is very similar to the local message table mechanism. The main difference is that the original related local table operations are replaced by a reverse query interface.

The characteristics of transaction messages are as follows:

  • Long transactions only need to be split into multiple tasks, and a reverse query interface is provided, which is easy to use.
  • If the consumer logic cannot be successfully retried, more mechanisms are needed to roll back the operation.

Applicable to businesses that can be executed asynchronously and do not need to be rolled back for subsequent operations

2.6 Best Efforts Notification

The initiator of notification uses a certain mechanism to make its best effort to notify the recipient of the business processing results. Specifically include:

There is a certain message repeat notification mechanism. Because the recipient of the notification may not have received the notification, there must be a certain mechanism to repeat the message.
Message proofreading mechanism. If the recipient is not notified despite best efforts, or the recipient needs to consume the message again after consuming it, the recipient can actively query the notifying party for message information to meet the demand.
The local message table and transaction message introduced earlier are both reliable messages. How are they different from the best effort notification introduced here?

Reliable message consistency: the party initiating the notification needs to ensure that the message is sent and sent to the receiving notification party. The reliability of the message is mainly guaranteed by the party initiating the notification.

Best effort notification: the initiator of notification makes its best effort to notify the recipient of notification of the business processing result, but the message may not be received. In this case, the recipient of notification needs to actively call the initiator's interface to query the business processing result. The reliability of the notification depends on the recipient of notification.

In terms of solutions, best efforts notification requires:

  • Provide an interface so that the notification recipient can query the business processing results through the interface
  • The message queue ACK mechanism gradually increases the notification interval by 1min , 5min , 10min , 30min , 1h , 2h , 5h , and 10h until the upper limit of the time window required for notification is reached. No more notifications

Best effort notification is applicable to business notification types. For example, the result of a WeChat transaction is notified to each merchant through best effort notification. There are callback notifications and transaction query interfaces.

2.7 AT Transaction Mode

This is a transaction mode in Alibaba's open source project Seata, also known as FMT in Ant Financial. The advantage is that the transaction mode is used in a similar way to the XA mode. The business does not need to write various compensation operations, and the rollback is automatically completed by the framework. The disadvantage is also similar to XA, with long locks, which does not meet high concurrency scenarios. From a performance perspective, AT mode is better than XA, but it also brings new problems such as dirty rollback.

3. Exception handling

Network and business failures may occur in every link of distributed transactions. These problems require the business party of distributed transactions to implement three features: anti-air rollback, idempotence, and anti-hanging.

3.1 Abnormal situations

The following uses TCC transactions to illustrate these exceptions:

Empty rollback:

When the second-stage Cancel method is called without calling the TCC resource Try method, the Cancel method needs to recognize that this is an empty rollback and return success directly.

The reason for this is that when a branch transaction is in a service downtime or network anomaly, the branch transaction call is recorded as failed. At this time, the Try phase is not executed. When the fault is recovered, the distributed transaction is rolled back and the Cancel method of the second phase is called, resulting in an empty rollback.

Idempotence:

Since any request may result in network anomalies and duplicate requests, all distributed transaction branches need to ensure idempotence.

suspension:

Suspension means that for a distributed transaction, the second-stage Cancel interface is executed before the Try interface.

The reason for this is that when RPC calls the branch transaction try, the branch transaction is registered first, and then the RPC call is executed. If the network for the RPC call is congested at this time, after the RPC times out, TM will notify RM to roll back the distributed transaction. It is possible that after the rollback is completed, the RPC request of Try will reach the participant for actual execution.

Let's take a look at a timing diagram of network anomalies to better understand the above problems.

  • When the business processes request 4, Cancel is executed before Try, and an empty rollback needs to be processed
  • When the business processes request 6, Cancel is executed repeatedly, which requires idempotence
  • When the business processes request 8, Try is executed after Cancel and needs to handle the suspension

In the face of the above complex network anomalies, the solutions currently recommended by various companies are that the business party uses a unique key to query whether the related operations have been completed, and if completed, directly return success. The relevant judgment logic is complex, prone to errors, and imposes a heavy business burden.

3.2 Subtransaction Barrier

In the project https://github.com/yedf/dtm, a sub-transaction barrier technology has emerged. Using this technology, this effect can be achieved. See the schematic diagram:

All these requests, when they reach the subtransaction barrier, will be filtered out if they are abnormal, and will pass through the barrier if they are normal. After developers use subtransaction barriers, all the exceptions mentioned above are properly handled. Business developers only need to focus on the actual business logic, which greatly reduces their burden.

The subtransaction barrier provides a method called ThroughBarrierCall, whose prototype is:

func ThroughBarrierCall(db *sql.DB, transInfo *TransInfo, busiCall BusiFunc)

Business developers write their own related logic in busiCall and call this function. ThroughBarrierCall ensures that busiCall will not be called in scenarios such as empty rollback and suspension. When the business is called repeatedly, there is idempotent control to ensure that it is submitted only once.

Subtransaction barriers manage TCC, SAGA, transaction messages, etc., and can also be extended to other areas

3.3 Subtransaction Barrier Principle

The principle of subtransaction barrier technology is to establish a branch transaction status table sub_trans_barrier in the local database, with the unique key being global transaction id-subtransaction id-subtransaction branch name ( try|confirm|cancel )

  • Open transaction
  • If it is a Try branch, then insert ignore inserts gid-branchid-try . If the insertion is successful, the logic inside the barrier is called.
  • If it is a Confirm branch, insert ignore inserts gid-branchid-confirm . If the insertion is successful, the logic inside the barrier is called.
  • If it is a Cancel branch, then insert ignore inserts gid-branchid-try , and then inserts gid-branchid-cancel . If try is not inserted and cancel is successfully inserted, the logic inside the barrier is called
  • The logic inside the barrier returns success, the transaction is committed, and success is returned
  • The logic within the barrier returns an error, rolls back the transaction, and returns an error

Under this mechanism, problems related to network anomalies are solved

  • Empty compensation control - if Try is not executed and Cancel is executed directly, then the insertion of Cancel into gid-branchid-try will succeed, without going through the logic within the barrier, thus ensuring empty compensation control
  • Idempotence control - any branch cannot insert a unique key repeatedly, ensuring that there will be no repeated execution
  • Anti-hanging control--Try is executed after Cancel, so if the inserted gid-branchid-try is unsuccessful, it will not be executed, ensuring anti-hanging control

For SAGA, transaction messages, etc., the mechanism is similar.

3.4 Summary of Subtransaction Barriers

The subtransaction barrier technology was first created by https://github.com/yedf/dtm. Its significance lies in designing a simple and easy-to-implement algorithm and providing a simple and easy-to-use interface. With the help of these two items, developers are completely freed from the handling of network exceptions.

This technology currently needs to be used with the yedf/dtm transaction manager. The SDK is currently available to developers of Go and Python languages. SDKs for other languages ​​are being planned. For other distributed transaction frameworks, as long as appropriate distributed transaction information is provided, the technology can be quickly implemented according to the above principles.

4. Distributed Transaction Practice

We take the SAGA transaction introduced earlier as an example and use DTM as the transaction framework to complete a specific distributed transaction. This example uses the Go language. If you are not interested in this, you can jump directly to the summary at the end of the article.

4.1 A SAGA transaction

Let's write the core business code first to adjust the user's account balance

func qsAdjustBalance(uid int, amount int) (interface{}, error) {
    _, err := dtmcli.SdbExec(sdbGet(), "update dtm_busi.user_account set balance = balance + ? where user_id = ?", amount, uid)
    return dtmcli.ResultSuccess, err
}


Next, let's write a specific forward operation/compensation operation processing function

    app.POST(qsBusiAPI+"/TransIn", common.WrapHandler(func(c *gin.Context) (interface{}, error) {
        return qsAdjustBalance(2, 30)
    }))
    app.POST(qsBusiAPI+"/TransInCompensate", common.WrapHandler(func(c *gin.Context) (interface{}, error) {
        return qsAdjustBalance(2, -30)
    }))
    app.POST(qsBusiAPI+"/TransOut", common.WrapHandler(func(c *gin.Context) (interface{}, error) {
        return qsAdjustBalance(1, -30)
    }))
    app.POST(qsBusiAPI+"/TransOutCompensate", common.WrapHandler(func(c *gin.Context) (interface{}, error) {
        return qsAdjustBalance(1, 30)
    }))

At this point, the processing functions of each sub-transaction have been OK, and then the SAGA transaction is opened to make branch calls

    req := &gin.H{"amount": 30} // Microservice payload // DtmServer is the address of the DTM service saga := dtmcli.NewSaga(DtmServer, dtmcli.MustGenGid(DtmServer)).
        // Add a TransOut subtransaction. The forward operation is url: qsBusi+"/TransOut", and the reverse operation is url: qsBusi+"/TransOutCompensate"
        Add(qsBusi+"/TransOut", qsBusi+"/TransOutCompensate", req).
        // Add a TransIn subtransaction. The forward operation is url: qsBusi+"/TransOut", and the reverse operation is url: qsBusi+"/TransInCompensate"
        Add(qsBusi+"/TransIn", qsBusi+"/TransInCompensate", req)
    // Submit the saga transaction, dtm will complete all subtransactions/rollback all subtransactions err := saga.Submit()

At this point, a complete SAGA distributed transaction has been written.

If you want to run a successful example in its entirety, set up the environment according to the instructions of the yedf/dtm project and run the saga example with the following command:

go run app/main.go quick_start

4.2 Handling Network Anomalies

What should I do if a brief failure occurs when calling the transfer operation in a transaction submitted to dtm? According to the SAGA transaction protocol, dtm will retry unfinished operations. What should we do at this time? The fault may be a network failure after the transfer operation is completed, or the machine may crash during the transfer operation. How can we ensure that the adjustment of account balance is correct?

We use the subtransaction barrier function to ensure that after multiple retries, only one successful submission will occur.

We adjust the processing function to:

func sagaBarrierAdjustBalance(sdb *sql.Tx, uid int, amount int) (interface{}, error) {
    _, err := dtmcli.StxExec(sdb, "update dtm_busi.user_account set balance = balance + ? where user_id = ?", amount, uid)
    return dtmcli.ResultSuccess, err

}

func sagaBarrierTransIn(c *gin.Context) (interface{}, error) {
    return dtmcli.ThroughBarrierCall(sdbGet(), MustGetTrans(c), func(sdb *sql.Tx) (interface{}, error) {
        return sagaBarrierAdjustBalance(sdb, 1, reqFrom(c).Amount)
    })
}

func sagaBarrierTransInCompensate(c *gin.Context) (interface{}, error) {
    return dtmcli.ThroughBarrierCall(sdbGet(), MustGetTrans(c), func(sdb *sql.Tx) (interface{}, error) {
        return sagaBarrierAdjustBalance(sdb, 1, -reqFrom(c).Amount)
    })
}

The dtmcli.TroughBarrierCall call here uses the sub-transaction barrier technology to ensure that the callback function in the third parameter is processed only once.

You can try calling this TransIn service multiple times and the balance will be adjusted only once. You can run the new process by running the following command:

go run app/main.go saga_barrier

4.3 Handling Rollback

What will happen if the bank discovers an abnormality in User 2's account when preparing to transfer the amount to User 2 and returns a failure? We adjust the processing function to make the transfer operation return failure

func sagaBarrierTransIn(c *gin.Context) (interface{}, error) {
    return dtmcli.ResultFailure, nil
}

We give a timing diagram of transaction failure interaction

There is one thing here. The forward operation of TransIn did nothing and returned failure. At this time, calling the compensation operation of TransIn will cause the reverse adjustment to go wrong?

Don't worry. The previous subtransaction barrier technology can ensure that if TransIn error occurs before submission, the compensation is a no-op. If the TransIn error occurs after submission, the compensation operation will submit the data once. If TransIn is still in progress, the compensation operation will wait for the final submission/rollback of TransIn before submitting the compensation/empty rollback.

You can change the TransIn that returns an error to:

func sagaBarrierTransIn(c *gin.Context) (interface{}, error) {
    dtmcli.ThroughBarrierCall(sdbGet(), MustGetTrans(c), func(sdb *sql.Tx) (interface{}, error) {
        return sagaBarrierAdjustBalance(sdb, 1, 30)
    })
    return dtmcli.ResultFailure, nil
}

The final result is that the balance is still fine

5. Summary

This article introduces some basic theories of distributed transactions and explains commonly used distributed transaction solutions. The second half of the article also gives the causes, classifications, and elegant solutions for transaction exceptions. Finally, a runnable distributed transaction example is used to demonstrate the previously introduced content in a short program.

This is the end of this article about the seven classic solutions for MySQL and Golan distributed transactions. For more information about the seven classic solutions for distributed transactions, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to implement distributed transactions in MySQL XA
  • Implementation of Node connection to MySQL query transaction processing
  • MySQL database transaction example tutorial
  • Analysis and summary of the impact of MySQL transactions on efficiency
  • MySQL transaction isolation level details
  • Detailed explanation of transactions and indexes in MySQL database
  • MySQL transaction analysis

<<:  Vue implements custom "modal pop-up window" component example code

>>:  Solution to the problem that the border style of the <td></td> tag cannot be displayed in the browser

Recommend

CentOS8 network card configuration file

1. Introduction CentOS8 system update, the new ve...

Using js to achieve the effect of carousel

Today, let's talk about how to use js to achi...

Tutorial on using $attrs and $listeners in Vue

Table of contents introduce Example Summarize int...

What does input type mean and how to limit input

Common methods for limiting input 1. To cancel the...

JavaScript to implement drop-down list selection box

This article example shares the specific code of ...

Detailed steps to deploy lnmp under Docker

Table of contents Pull a centos image Generate ng...

Programs to query port usage and clear port usage in Windows operating system

In Windows operating system, the program to query...

Correct use of Vue function anti-shake and throttling

Preface 1. Debounce: After a high-frequency event...

Detailed explanation of the MySQL MVCC mechanism principle

Table of contents What is MVCC Mysql lock and tra...

In-depth study of JavaScript array deduplication problem

Table of contents Preface 👀 Start researching 🐱‍🏍...

MySQL 5.7.17 installation and configuration tutorial for Mac

1. Download MySQL Click on the official website d...