Canal is an open source project under Alibaba, developed using Java. The main purpose is to provide incremental data subscription and consumption based on incremental log analysis of MySQL database. Currently, it mainly supports MySQL. GitHub address: https://github.com/alibaba/canal Before introducing the internal principles of Canal, let's first understand the MySQL Master/Slave synchronization principle: The MySQL master starts the binlog mechanism and writes data changes to the binary log (which is called binary log events and can be viewed using show binlog events). The MySQL slave (I/O thread) copies the master's binary log events to its relay log. The MySQL slave (SQL thread) replays the events in the relay log to reflect the data changes in its own data. How Canal works: Canal simulates the interaction protocol of MySQL slave, pretends to be MySQL slave, and sends dump protocol to MySQL master. MySQL master receives the dump request and starts pushing binary log to slave (that is, canal). Canal parses binary log object (originally byte stream) In short, Canal obtains data by simulating to become a MySQL slave and listening to MySQL binlog logs. After setting MySQL binlog to row mode, you can obtain each Insert/Update/Delete script executed, as well as the data before and after the modification. Based on this feature, Canal can efficiently obtain changes in MySQL data. Canal Architecture: Note: server represents a Canal running instance, corresponding to a jvm instance and a data queue (1 server corresponds to 1..n instances) EventParser: data source access, simulate slave protocol and interact with master, protocol parsing EventSink: Parser and Store connector, mainly for data filtering, processing, and distribution EventStore: responsible for storage MemoryMetaManager: Incremental subscription and consumption information manager Event Parser Design: The entire parser process can be roughly divided into the following steps: Connection obtains the log position of the last successful parsing (if it is the first startup, it obtains the initially specified position or the binlog log position of the current database) Connection establishes a connection and sends a BINLOG_DUMP request to MySQL master. MySQL starts pushing binary logs. The received binary logs are parsed through BinlogParser and some specific information is added. For example, add field name, field type, primary key information, unsigned type processing, etc. Pass the parsed data to the EventSink component for data storage (this is a blocking operation until the storage is successful) and regularly record the binary Log position so that incremental subscription can continue after restart If the master node that needs to be synchronized crashes, you can continue to synchronize binlog logs from its other slave nodes to avoid single point failure. Event Sink Design: EventSink's main functions are as follows: Data filtering: Supports wildcard filtering mode, table name, field content, etc. Data routing/distribution: Solve the 1:n (one parser corresponds to multiple stores) Data merging: solving n:1 (multiple parsers correspond to one store) Data processing: additional processing before entering the store, such as joining data 1:n business In order to make rational use of database resources, common businesses are generally isolated according to the schema, and then a data source routing is performed on the MySQL upper layer or the DAO layer to shield the impact of the database's physical location on development. Alibaba mainly uses cobar/tddl to solve the data source routing problem. Therefore, generally, multiple schemas are deployed on a database instance, and each schema will be monitored by one or more business parties. Data n:1 service Similarly, when the data scale of a business reaches a certain level, it will inevitably involve the problem of horizontal splitting and vertical splitting. When these split data need to be processed, it is necessary to link multiple stores for processing. The consumption sites will become multiple, and the progress of data consumption cannot be guaranteed to be as orderly as possible. Therefore, in certain business scenarios, the split incremental data needs to be merged, such as sorting and merging according to timestamp/global ID. Event Store Design: Supports multiple storage modes, such as Memory mode. A memory ring design is used to save messages, which draws on the implementation idea of Disruptor's RingBuffer. RingBuffer Design: Three cursors are defined: put: The last write position of the Sink module for data storage (the cursor that writes data synchronously) get: the last extraction position obtained by data subscription (the cursor of synchronously obtained data) ack: The last consumption location of successful data consumption Drawing on the implementation of Disruptor's RingBuffer, let's straighten the RingBuffer: Implementation Notes: The put/get/ack cursor is used for incrementing and is stored in long type. The relationship between the three is put>=get>=ackbuffer get operation, through the modulo or & operation. (& operation: cusor & (size - 1), size needs to be an exponent of 2, which is more efficient) Instance design: Instance represents an actual running data queue, including EventPaser, EventSink, EventStore and other components. The CanalInstanceGenerator is abstracted, mainly considering the configuration management method: Manager method: connect to your own internal web console/manager system. (Currently mainly used within the company) Spring method: define based on spring xml + properties and build spring configuration. Server design: The server represents a Canal running instance. In order to facilitate component-based use, two implementations of Embeded (embedded) and Netty (network access) are abstracted. Incremental subscription/consumption design: For the specific protocol format, please refer to: CanalProtocol.proto. Data object format: EntryProtocol.proto Entry Header logfileName [binlog file name] logfileOffset [binlog position] executeTime [timestamp of the change recorded in binlog] schemaName [database instance] tableName [table name] eventType [insert/update/delete type] entryType [transaction header BEGIN/transaction tail END/data ROWDATA] storeValue [byte data, expandable, corresponding type is RowChange] RowChange isDdl [Whether it is a DDL change operation, such as create table/drop table] sql [specific ddl sql] rowDatas [specific insert/update/delete change data, can be multiple, 1 binlog event can correspond to multiple changes, such as batch processing] beforeColumns [array of Column type] afterColumns [array of Column type] Column index [column number] sqlType [jdbc type] name [column name] isKey [Is it the primary key] updated [Has there been any changes?] isNull [Is the value null] value [specific content, note that it is text] Supplementary explanation for the above: 1. It can provide the field content before and after the database change, and complete the name, isKey and other information that are not in the binlog 2. Can provide DDL change statements Canal HA mechanism: Canal's HA implementation mechanism relies on Zookeeper, which is mainly divided into the HA of Canal server and Canal client. Canal server: To reduce the number of requests for MySQL dump, only one instance on each server can be in the running state at the same time, and the others are in the standby state. Canal client: To ensure order, an instance can only be get/ack/rollbacked by one canal client at a time, otherwise the order of reception by the client cannot be guaranteed. Canal Server HA architecture diagram: General steps:
The Canal Client method is similar to the Canal Server method, and also uses Zookeeper's method of preempting EPHEMERAL nodes for control. This is the end of this article about the detailed analysis of the binlog log tool for monitoring MySQL: Canal. For more relevant MySQL binlog log content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Singleton design pattern in JavaScript
>>: Solution to the problem of not finding Tomcat configuration in Intelli Idea
I encountered a pitfall when writing dynamic form...
Table of contents Preface Setting up slow query l...
I recently started learning the NestJs framework....
MySQL is a free relational database with a huge u...
Table of contents Preface zx library $`command` c...
In the project, you will encounter custom public ...
The steps are as follows 1. Create a docker group...
Official documentation: So mysql should be starte...
Related reading: MySQL8.0.20 installation tutoria...
On the server, in order to quickly log in to the ...
Written in front: Sometimes you may need to view ...
Table of contents Preface: Result: 1. Polymerizat...
Time always passes surprisingly fast without us n...
Effect (source code at the end): accomplish: 1. D...
Table of contents Preface What is DrawCall How do...