Currently, almost all large websites and applications are deployed in a distributed manner. The data consistency issue in distributed scenarios has always been an important topic. The distributed CAP theory tells us that "no distributed system can satisfy consistency, availability, and partition tolerance at the same time, and can only satisfy two of them at the same time." Therefore, many systems have to make trade-offs between these three at the beginning of their design. In most scenarios in the Internet field, it is necessary to sacrifice strong consistency in exchange for high availability of the system. The system often only needs to ensure "eventual consistency" as long as the final time is within the acceptable range for users. In many scenarios, in order to ensure the eventual consistency of data, we need a lot of technical solutions to support it, such as distributed transactions and distributed locks. Sometimes, we need to ensure that a method can only be executed by the same thread at the same time. In a stand-alone environment, Java actually provides many concurrent processing-related APIs, but these APIs are powerless in distributed scenarios. In other words, pure Java API cannot provide distributed locking capabilities. Therefore, there are currently multiple solutions for the implementation of distributed locks. For the implementation of distributed locks, the following solutions are currently commonly used: Implement distributed locks based on databases Implement distributed locks based on caches (redis, memcached, tair) Before analyzing these implementation solutions, let's think about what kind of distributed lock we need? (Here we take method lock as an example, the same is true for resource lock) It can be ensured that in a distributed application cluster, the same method can only be executed by one thread on one machine at the same time. Implementing distributed locks based on database Based on database table To implement distributed locks, the simplest way may be to directly create a lock table and then implement it by operating the data in the table. Create a database table like this: CREATE TABLE `methodLock` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key', `method_name` varchar(64) NOT NULL DEFAULT '' COMMENT 'The locked method name', `desc` varchar(1024) NOT NULL DEFAULT 'Remarks', `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Save data time, automatically generated', PRIMARY KEY (`id`), UNIQUE KEY `uidx_method_name` (`method_name`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Locking method'; When we want to lock a method, execute the following SQL: insert into methodLock(method_name,desc) values ('method_name','desc') Because we have made a unique constraint on method_name, if multiple requests are submitted to the database at the same time, the database will ensure that only one operation can succeed. Then we can assume that the thread that successfully operates has obtained the lock of the method and can execute the method body content. When the method is executed, if you want to release the lock, you need to execute the following SQL: delete from methodLock where method_name = 'method_name' The above simple implementation has the following problems: 1. This lock is highly dependent on the availability of the database. The database is a single point. Once the database fails, the business system will become unavailable. Of course, we can also solve the above problems in other ways. Is the database a single point? Create two databases and synchronize data in both directions. Once it fails, quickly switch to the backup database. Based on database exclusive lock In addition to adding and deleting records in the data table, you can also use the locks built into the data to implement distributed locks. We also use the database table we just created. Distributed locks can be implemented through exclusive locks of the database. Based on the MySQL InnoDB engine, you can use the following methods to implement locking operations: public Boolean lock(){ connection.setAutoCommit(false) while(true){ try{ result = select * from methodLock where method_name=xxx for update; if(result==null){ return true; } } catch(Exception e){ } sleep(1000); } return false; } Add for update after the query statement, and the database will add an exclusive lock to the database table during the query process (I would like to mention here that when the InnoDB engine locks, it will only use row-level locks when searching through indexes, otherwise it will use table-level locks. Here we want to use row-level locks, so we need to add an index to method_name. It is worth noting that this index must be created as a unique index, otherwise there will be a problem that multiple overloaded methods cannot be accessed at the same time. For overloaded methods, it is recommended to add the parameter type as well.). After an exclusive lock is added to a record, other threads cannot add exclusive locks to the row of records. We can assume that the thread that obtains the exclusive lock can obtain the distributed lock. After obtaining the lock, the business logic of the method can be executed. After executing the method, it can be unlocked by the following method: public void unlock(){ connection.commit(); } Release the lock through the connection.commit() operation. This method can effectively solve the problems of being unable to release locks and blocking locks mentioned above. Blocking lock? The for update statement will return immediately after successful execution, and will remain blocked until success if the execution fails. However, it still cannot directly solve the database single point and reentrancy problems. There may be another problem here, although we use a unique index for method_name and explicitly use for update to use row-level locks. However, MySql will optimize the query. Even if the index field is used in the condition, whether to use the index to retrieve data is determined by MySQL by judging the cost of different execution plans. If MySQL believes that a full table scan is more efficient, such as for some very small tables, it will not use the index. In this case, InnoDB will use table locks instead of row locks. It would be tragic if this happened. . . Another problem is that if we use exclusive locks to lock distributed locks, then if an exclusive lock is not submitted for a long time, it will occupy the database connection. Once there are too many similar connections, the database connection pool may burst. Summarize To summarize the ways to implement distributed locks using databases, both methods rely on a table in the database. One is to determine whether a lock currently exists based on the existence of records in the table, and the other is to implement distributed locks through exclusive locks in the database. Advantages of distributed locks in databases Directly rely on the database, easy to understand. Disadvantages of distributed locks in databases There will be all kinds of problems, and in the process of solving the problems, the entire plan will become more and more complicated. Implementing distributed locks based on cache Compared with the solution of implementing distributed locks based on database, the solution based on cache will perform better in terms of performance. Moreover, many caches can be deployed in clusters to solve single point problems. There are many mature cache products available, including Redis, memcached, and Tair within our company. Here we take Tair as an example to analyze the solution of using cache to implement distributed locks. There are many articles about Redis and memcached on the Internet, and there are also some mature frameworks and algorithms that can be used directly. The distributed lock implementation based on Tair is actually similar to Redis, and the main implementation method is to use the TairManager.put method. public boolean trylock(String key) { ResultCode code = ldbTairManager.put(NAMESPACE, key, "This is a Lock.", 2, 0); if (ResultCode.SUCCESS.equals(code)) return true; else return false; } public boolean unlock(String key) { ldbTairManager.invalid(NAMESPACE, key); } The above implementation also has several problems: 1. This lock has no expiration time. Once the unlocking operation fails, the lock record will remain in Tair and other threads will no longer be able to obtain the lock. Of course, there are also ways to solve this problem. No expiration time? Tair's put method supports passing in an expiration time, and the data will be automatically deleted after the time is reached. But how long should I set the expiration time to? If the expiration time is set too short, the lock will be automatically released before the method is executed, which will cause concurrency problems. If the time is set too long, other threads that acquire the lock may have to wait for a longer time. This problem also exists when using a database to implement distributed locks Summarize Cache can be used instead of database to implement distributed locks, which can provide better performance. At the same time, many cache services are deployed in clusters to avoid single point problems. In addition, many cache services provide methods that can be used to implement distributed locks, such as Tair's put method and redis's setnx method. In addition, these cache services also provide support for automatic deletion of expired data, and you can directly set the timeout time to control the release of the lock. Advantages of using cache to implement distributed locks Good performance and easy to implement. Disadvantages of using cache to implement distributed locks Controlling the lock expiration time by timeout is not very reliable. Implementing distributed locks based on Zookeeper Distributed locks can be implemented based on zookeeper temporary ordered nodes. The general idea is: when each client locks a method, a unique instantaneous ordered node is generated in the directory of the specified node corresponding to the method on ZooKeeper. The method to determine whether to acquire the lock is very simple. You only need to determine the one with the smallest serial number in the ordered node. When the lock is released, just delete the transient node. At the same time, it can avoid deadlock problems caused by the inability to release locks due to service downtime. Let's see if Zookeeper can solve the problems mentioned above. The lock cannot be released? Using Zookeeper can effectively solve the problem of locks not being released, because when creating a lock, the client will create a temporary node in ZK. Once the client suddenly hangs up after acquiring the lock (the Session connection is disconnected), the temporary node will be automatically deleted. Other clients can then acquire the lock again. Not re-entrant? Using Zookeeper can also effectively solve the problem of non-reentrancy. When the client creates a node, the current client's host information and thread information are directly written into the node. The next time you want to acquire the lock, you can just compare it with the data in the current smallest node. If the information is the same as your own, you will directly obtain the lock. If it is different, you will create a temporary sequential node to participate in the queue. Single point of issue? Using Zookeeper can effectively solve the single point problem. ZK is deployed in a cluster. As long as more than half of the machines in the cluster are alive, it can provide services to the outside world. You can directly use the zookeeper third-party library Curator client, which encapsulates a reentrant lock service. public boolean tryLock(long timeout, TimeUnit unit) throws InterruptedException { try { return interProcessMutex.acquire(timeout, unit); } catch (Exception e) { e.printStackTrace(); } return true; } public boolean unlock() { try { interProcessMutex.release(); } catch (Throwable e) { log.error(e.getMessage(), e); finally executorService.schedule(new Cleaner(client, path), delayTimeForClean, TimeUnit.MILLISECONDS); } return true; } InterProcessMutex provided by Curator is an implementation of distributed lock. The acquire method is used to acquire the lock, and the release method is used to release the lock. The distributed lock implemented using ZK seems to fully meet all our expectations for a distributed lock at the beginning of this article. However, this is not the case. The distributed lock implemented by Zookeeper actually has a disadvantage, which is that its performance may not be as high as that of the cache service. Because each time in the process of creating and releasing a lock, it is necessary to dynamically create and destroy instantaneous nodes to implement the lock function. In ZK, creating and deleting nodes can only be performed through the Leader server, and then the data cannot be shared on all Follower machines. In fact, using Zookeeper may also cause concurrency problems, but they are not common. Consider this situation: due to network jitter, the session connection between the client and the ZK cluster is broken. Then ZK thinks that the client has hung up and will delete the temporary node. At this time, other clients can obtain the distributed lock. Concurrency problems may arise. This problem is not common because zk has a retry mechanism. Once the zk cluster cannot detect the client's heartbeat, it will retry. The Curator client supports multiple retry strategies. The temporary node will be deleted only if it fails after multiple retries. (Therefore, it is also important to choose a suitable retry strategy, and to find a balance between the granularity of the lock and concurrency.) Summarize Advantages of using Zookeeper to implement distributed locks Effectively solve single point problems, non-reentrance problems, non-blocking problems and problems where locks cannot be released. It is relatively simple to implement. Disadvantages of using Zookeeper to implement distributed locks The performance is not as good as using cache to implement distributed locks. You need to have some understanding of the principles of ZK. Comparison of three solutions None of the above methods can be perfect. Just like CAP, it is impossible to meet the requirements of complexity, reliability, performance, etc. at the same time. Therefore, the best way is to choose the most suitable one according to different application scenarios. From the perspective of difficulty of understanding (from low to high) From the perspective of implementation complexity (low to high) From a performance perspective (high to low) From the perspective of reliability (high to low) Summarize The above is all the content of this article about the detailed interpretation of the principles of distributed locks and three implementation methods. Interested friends can continue to refer to: Detailed explanation of mutex lock semaphores and multi-threaded waiting mechanism in Java, detailed example of Apache Zookeeper usage, several important MySQL variables and other related topics of this site. I hope it will be helpful to everyone. If you have any questions, please leave a message at any time. The editor will reply in time to provide you with a better reading experience and help. Thank you friends for your support of this site! You may also be interested in:
|
<<: Understand the principles and applications of JSONP in one article
>>: Summary of the use of TypeScript in React projects
Table of contents JavaScript Objects 1. Definitio...
Table of contents 【Function Background】 [Raw SQL]...
This post introduces a set of free Photoshop wire...
This article mainly introduces how to implement a...
Official website address: https://dev.mysql.com/d...
1. Download 1. Click the latest download from the...
Using CSS layout to create web pages that comply w...
Preface The origin is a question 1: If your umask...
Quickly modify the table structure of a MySQL tab...
Table of contents Structural inheritance (impleme...
Requirements: Remove HTTP response headers in IIS...
Table of contents 1: Introduction to galera-clust...
Table of contents Preface 1. Nginx installation 1...
MySQL DDL statements What is DDL, DML. DDL is dat...
Table of contents Using routing plugins in a modu...