1. Introductiongh-ost is based on the golang language and is an open source DDL tool from GitHub. It is the abbreviation of GitHub's Online Schema Transmogrifier/Transfigurator/Transformer/Thingy, which means GitHub's online table definition converter. 1.1 PrincipleThe main implementation principle is to first create two tables, one is the shadow table _gho, gh-ost will apply the original table data and incremental data to this table, and finally switch the table name between this table and the original table, and the other is the _ghc table, which stores the changelog data, including signal markers, heartbeats, etc. Secondly, gh-ost will start two goroutines, one for copying the original table data and the other for applying the incremental binlog to the _gho table. The two goroutines run in parallel, which means you don’t need to worry about whether the data is copied first or the binlog is applied first. Because the insert statement will be adjusted here, first the insert into we copied will be rewritten as insert ignore into, and the insert into in the binlog will be rewritten as replace into, which can well support the parallelization of the two goroutines. But can such adjustments be applied to all DDLs? The answer is no. Finally, once all the original table data has been copied, gh-ost will enter the table swap phase, using a more secure atomic swap. 1.2 Process 1. Check for foreign keys and triggers. 1.3 Features1. No trigger: Monitor data changes in the table by analyzing binlog logs. 2. Lightweight: Since no triggers are used, the impact on the main database during the operation is minimal, and there is no need to worry about concurrency and locking. 3. Pausable: All write operations are controlled by gh-ost. When rate-limited, gh-ost can pause writing data to the master, create an internal tracking table, and write heartbeat events to this table with minimal system overhead. 4. Dynamic control: gh-ost can listen for requests via a unix socket file or a TCP port (configurable), and the operator can change the corresponding parameters after the command is run. 5. Auditable: The program interface can be used to obtain the status of gh-ost, report the current progress, the configuration of key parameters, the current server identification, etc. 6. Testable: gh-ost has built-in support for testing, which can be specified using the --test-on-replica flag: it will perform mutations on the replica, at the end of which gh-ost will stop replication, swap the table, reverse the swap, keep both tables in sync, and stop replication. You can test and compare the data in the two tables at your leisure. 1.4 github addresshttps://github.com/github/gh-ost/ 2. Test environment:2.1 Test ServerMain database: 110.119.120.231 From the library: 110.119.120.230 2.2 Installationcd /usr/local/src/ wget https://github.com/github/gh-ost/releases/download/v1.0.48/gh-ost-binary-linux20190214020851.tar.gz tar xzvf gh-ost-binary-linux-20190214020851.tar.gz -C /usr/local/ 2.3 Create a Usercreate user ghost@'110.%' identified by 'ghost'; grant ALL PRIVILEGES on *.* to ghost@'110.%'; flush privileges; 2.4 Command ParametersUsage of gh-ost: --aliyun-rds: Whether to execute on Alibaba Cloud Database. true --allow-master-master: Whether to allow gh-ost to run in a dual-master replication architecture. Generally used together with the -assume-master-host parameter. --allow-nullable-unique-key: Allow gh-ost to allow the unique key that the data migration depends on to be NULL. The default is to not allow NULL unique keys. If the unique key that the data migration relies on allows NULL values, incorrect data may result. Please use it with caution. --allow-on-master: Allow gh-ost to run directly on the master. The default slave that gh-ost connects to. In addition, for DDL on a single instance, a single instance is equivalent to a master database, and the --allow-on-master parameter and ROW mode need to be enabled. --alter string:DDL statement --approve-renamed-columns ALTER: If you change the name of a column, gh-ost will recognize this and ask for a reason for the column rename. By default, gh-ost will not proceed unless you provide --approve-renamed-columns ALTER. --ask-pass:MySQL password --assume-master-host string:Specify a master database for gh-ost, in the format of "ip:port" or "hostname:port". This is useful in a master-master architecture, or when gh-ost cannot find the master. --assume-rbr: If binlog_format=ROW is set for the database instance to which gh-ost is connected, you can specify -assume-rbr. This will prevent stop slave and start slave from running on the slave, and the gh-ost user does not need the SUPER privilege to execute. --check-flag --chunk-size int: Number of rows to process in each iteration (allowed range: 100-100000), default value is 1000. --concurrent-rowcount: If this parameter is True (default), after row-copying, gh-ost estimates the number of rows (using explain select count(*)) and adjusts the ETA. Otherwise, gh-ost first estimates the number of rows and then starts row-copying. --conf string: gh-ost configuration file path. --critical-load string: A comma-separated list of status-name=values. When the MySQL status exceeds the corresponding values, gh-ost will exit. -critical-load Threads_connected=20,Connections=1500 means that when the status values in MySQL Threads_connected>20,Connections>1500, gh-ost will stop and exit due to the critical load of the database. Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits --critical-load-hibernate-seconds int : When the critical load is reached, gh-ost will go into hibernation for the specified time. It will not read/write anything from any server. --critical-load-interval-millis int: When the value is 0, gh-ost will exit immediately when the critical load is reached. When the value is not 0, gh-ost will check again after -critical-load-interval-millis seconds when the -critical-load is reached. If the -critical-load is still reached after the second check, gh-ost will exit. --cut-over string: Select the cut-over type: atomic/two-step. The atomic (default) type of cut-over uses the github algorithm, and the two-step type uses the facebook-OSC algorithm. --cut-over-exponential-backoff --cut-over-lock-timeout-seconds int: The maximum lock waiting time during the cut-over phase of gh-ost. When the lock times out, gh-ost will retry the cut-over. (Default: 3) --database string: database name. --debug: debug mode. --default-retries int: The number of times various operations are retried before panicking. (Default is 60) --discard-foreign-keys: This parameter is for a table with foreign keys. When gh-ost creates a ghost table, it will not create foreign keys for the ghost table. This parameter is useful for deleting foreign keys, but otherwise use it with caution. --dml-batch-size int: Batch size for applying DML events in a single transaction (range 1-100) (default 10) --exact-rowcount: Accurately count the number of table rows (using select count(*)) to get a more accurate estimate of the time. --execute: actually execute alter & migrate table. The default is noop, which means no execution. Just test and exit. If you want the ALTER TABLE statement to be actually implemented in the database, you need to explicitly specify -execute --exponential-backoff-max-interval int --force-named-cut-over: If true, the 'unpostpone | cut-over' interactive command must name the migrated table --force-table-names string: Table name prefix to use on temporary tables --heartbeat-interval-millis int: gh-ost heartbeat frequency value, default is 500 --help --hooks-hint string: arbitrary messages injected into hooks via GH_OST_HOOKS_HINT --hooks-path string: directory where hook files are stored (default is empty, i.e. hooks are disabled). The hook will search for a hook file with the same naming convention in this directory to execute. --host string :MySQL IP/hostname --initially-drop-ghost-table: Check and drop existing ghost tables before gh-ost operation. This parameter is not recommended. Please manually process the existing ghost table. By default, this parameter is not enabled and gh-ost will exit the operation. --initially-drop-old-table: Before gh-ost operation, check and drop the existing old table. This parameter is not recommended. Please manually process the existing ghost table. By default, this parameter is not enabled and gh-ost will exit the operation. --initially-drop-socket-file:gh-ost forcefully deletes an existing socket file. This parameter is not recommended and may delete a running gh-ost program, causing DDL failures. --master-password string :MySQL master password --master-user string:MySQL master account --max-lag-millis int:Maximum delay time of master-slave replication. When the delay time of master-slave replication exceeds this value, gh-ost will take throttle measures. Default value: 1500s. --max-load string: comma separated status name = threshold, such as: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes --migrate-on-replica: gh-ost runs migration on the replica, not the master. --nice-ratio float: The sleep time for each chunk period, in the range [0.0…100.0]. 0: No sleep in each chunk period, that is, one chunk is executed after another; 1: Sleep for an additional 1 millisecond for each row-copy; 0.7: Sleep for an additional 7 milliseconds for each row-copy of 10 milliseconds. --ok-to-drop-table: After the gh-ost operation is completed, the old table is deleted. By default, the old table is not deleted and the _tablename_del table will exist. --panic-flag-file string: When this file is created, gh-ost will exit immediately. --password string : MySQL password --port int : MySQL port, preferably from the slave --postpone-cut-over-flag-file string: When this file exists, the cut-over phase of gh-ost will be postponed and data will still be replicated until the file is deleted. --quiet: Quiet mode. --replica-server-id uint : gh-ost's server_id --replication-lag-query string: Deprecated --serve-socket-file string: Absolute path to gh-ost's socket file. --serve-tcp-port int: Port to use for gh-ost. Default is closed port. --skip-foreign-key-checks: Set to 'true' when you are sure there are no foreign keys on your table, and want to skip gh-ost validation. --skip-renamed-columns ALTER: If you change the name of a column (e.g., change column), gh-ost will recognize this and ask for a reason for the column rename. By default, gh-ost will not proceed with the command. This parameter tells gh-ost to skip the migration of the column and treat the renamed column as a don’t care column. This operation is dangerous, you will lose all values in the column. --stack: Add error stack trace. --switch-to-rbr: Tell gh-ost to automatically switch the binlog_format of the slave repository to ROW format. --table string: table name --test-on-replica: Test gh-ost on the slave, including data migration on the slave. After the data migration is complete, stop the slave, and immediately swap the original table with the ghost table and then immediately swap back. Continue to stop the slave so that you can compare the two tables. --test-on-replica-skip-replica-stop: When -test-on-replica is executed, this parameter means that the slave does not need to be stopped during the process. --throttle-additional-flag-file string: When this file is created, gh-ost operation will stop immediately. This parameter can be used when multiple gh-ost operations are running at the same time to create a file and stop all gh-ost operations, or delete the file and resume all gh-ost operations. --throttle-control-replicas string: Lists all slave libraries that need to be checked for master-slave replication delay. --throttle-flag-file string: When this file is created, gh-ost operation will stop immediately. This parameter is suitable for controlling a single gh-ost operation. -throttle-additional-flag-file string is suitable for controlling multiple gh-ost operations. --throttle-http string --throttle-query string: Throttle query. Executes once every second. When the return value = 0, no throttling is required. When the return value > 0, throttling needs to be performed. This query will be run on the migrated server, so please make sure it is lightweight. --timestamp-old-table: Use timestamp in old table names. This will make cross-migration of old table names unique and conflict-free --tungsten: Tells gh-ost that you are running a tungsten-replication topology. --user string :MYSQL user --verbose --version 3. Operation ModeMode 1: Connect to the slave database and modify it on the master database This is the default mode in which gh-ost will look for slaves, find the master of the cluster and connect to it. The specific steps for the modification operation are: 1. Read and write row data on the master database; 2. Read binary log events on the slave database and apply the changes to the master database; 3. Check the table format, fields, primary key, total number of rows, etc. on the slave database; 4. Read gh-ost internal event logs (such as heartbeats) on the slave. 5. Complete table switching on the main database. If your main database log format is SBR, the tool can also work normally. But the slave must be configured with binary logging enabled (log_bin, log_slave_updates) and binlog_format=ROW (gh-ost is a binary file that reads from the slave). Example of use: # gh-ost --initially-drop-old-table --initially-drop-ghost-table --user="ghost" --password="ghost" --host=110.119.120.230 --port=3306 --database="test" --table="t1" --verbose --alter="ADD COLUMN y1 varchar(10),add column y2 int not null default 0 comment 'test' " --assume-rbr --execute Parameter meaning: --initially-drop-old-table: Check and drop existing old tables before gh-ost operation. --initially-drop-ghost-table: Check and drop existing ghost tables before gh-ost operation. --verbose: Output log of execution process --assume-rbr: If binlog_format=ROW is set for the database instance to which gh-ost is connected, you can specify -assume-rbr. This will avoid running stop slave and start slave on the slave. The user who executes gh-ost does not need the SUPER privilege. Mode 2: Modify directly on the main database If there is no slave library, or you do not want to operate on the slave library, you can use the master library directly. gh-ost will do all operations directly on the master. You can still view the master-slave replication delay above. 1) The master database must generate binary logs in Row format 2) You must use the --allow-on-master option when starting gh-ost to enable this mode. # gh-ost --initially-drop-old-table --initially-drop-ghost-table --user="ghost" --password="ghost" --host="110.119.120.231" --port=3306 --database="test" --table="t2" --verbose --alter="add column test_field varchar(256) default '';" --exact-rowcount --serve-socket-file=/tmp/gh-ost.t2.sock --panic-flag-file=/tmp/ghost.panic.t2.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.t2.flag --allow-on-master --execute Parameter meaning: --exact-rowcount: Accurately count the number of table rows (using select count(*)) to get a more accurate estimate of the time. --serve-socket-file: The absolute path to the gh-ost socket file. For example: --serve-socket-file=/tmp/gh-ost.t1.sock creates a socket file for listening and adjusts parameters through the interface. When the load and latency increase during the operation, the operation has to be terminated and parameters such as chunk-size have to be reconfigured. Then the operation command can be re-executed. Dynamic adjustments can be made through the socket interface. #pause echo throttle | socat - /tmp/gh-ost.t1.sock #recover echo no-throttle | socat - /tmp/gh-ost.t1.sock Modify the speed limit parameters: echo chunk-size=1500 | socat - /tmp/gh-ost.t1.sock echo max-load=Thread_running=30 | socat - /tmp/gh-ost.t1.sock
--postpone-cut-over-flag-file: When this file exists, the cut-over phase of gh-ost will be postponed. Data will still be copied, but the table will not be switched until this file is deleted. --allow-on-master: Allow gh-ost to run directly on the master. Mode 3: Modify and test on the slave This mode will make changes on the slave database. All operations are performed on the slave database and will not have any impact on the master database. During the operation, gh-ost will also pause from time to time so that the replica data can be kept up to date. --test-on-replica indicates that the operation is for testing purposes only. Replication is stopped before the final switchover operation. The original table and temporary table will switch back and forth, and ultimately the original table will remain unchanged. When the master-slave replication is paused, you can check and compare the data in the two tables (if you do not want to stop the slave, you can add the parameter --test-on-replica-skip-replica-stop). # gh-ost --initially-drop-old-table --initially-drop-ghost-table --user="ghost" --password="ghost" --host=110.119.120.230 --port=3306 --database="test" --table="t3" --verbose --alter="ADD COLUMN abc1 varchar(10),add column abc2 int not null default 0 comment 'test' " --test-on-replica --assume-rbr --execute RDS limitations: 1. The user does not have the Super privilege, so the --assume-rbr command must be added during use. gh-ost will assume that the binlog itself is in row mode and will not modify it. The binlog on Alibaba Cloud RDS is also in row mode by default, so there is no problem. 2. Other permissions, mainly REPLICATION SLAVE, REPLICATION CLIENT can pull binlog and also obtain it. 3. Unable to connect to the standby database to pull binlog. The standby database is usually transparent to the user, so gh-ost needs to connect directly to the primary database, which may increase the load on the primary database. When using it, you need to add --allow-on-master and --assume-master-host. The officially recommended method is to connect to one of the backup databases, because there will be some high-pressure SELECT operations, and it is best to put them in the backup database. 4. To execute on Alibaba Cloud Database, you need to add a parameter --aliyun-rds. If you are using it now, remember to add the following parameters: --allow-on-master --assume-rbr --assume-master-host --aliyun-rds 4. Comparison between gh-ost and pt-osc4.1 A brief introduction to pt-osc pt-osc workflow pt-osc tool limitations So what are the specific advantages of gh-ost over pt-osc? The following is a brief introduction to its two most core features. 4.2 TriggerlessBefore the emergence of gh-ost, third-party MySQL DDL tools were all implemented using triggers, including Percona's pt-osc, Facebook's OSC, and so on. The mechanism used by gh-ost is completely different from them: it synchronizes data through MySQL binlog. gh-ost itself is registered as a fake slave, which can pull binlog from the master or slave in the cluster, parse it in real time, and reapply all DML operations of the change table to the shadow table. Therefore, for DML operations that occur on the change table during the release, the performance overhead caused by triggers and lock contention can be completely avoided. In addition, we usually choose the slave node in the cluster as the target publishing machine, and the slave generally does not carry any business. In this way, the binlog parsing overhead will not fall on the master providing the business, but is only an asynchronous DML statement replay. 4.3 Dynamically controllableAnother most important feature is dynamic control, which was not previously available in other third-party open source tools. When publishing through pt-osc before, the parameters cannot be modified after the command is executed unless it is stopped and restarted. Assume that the release is 90% complete and suddenly the server load increases due to various other reasons. In order not to affect the business, you can only choose to stop the release and wait for performance to recover before restarting. The tables published through pt-osc are all large tables and take a long time, so it is very embarrassing to encounter such scenarios. Therefore, it is very important if the parameters in the release can be dynamically adjusted. gh-ost also implements a socket server. We can interact with the publishing process in real time through the socket. It can support real-time pause, resume, and dynamic adjustment of many parameters to adapt to external changes. 5. References1. gh-ost principle https://www.cnblogs.com/mysql-dba/p/9901589.html 2. Technology sharing | gh-ost online ddl change tool https://zhuanlan.zhihu.com/p/83770402 3. Practical skills | The evolution of Ctrip’s database publishing system https://blog.csdn.net/ctrip_tech/article/details/108395676 4.MySQL Online DDL gh-ost Usage Instructions https://www.cnblogs.com/zhoujinyi/p/9187421.html 5.MySQL--pt-osc tool learning https://www.cnblogs.com/TeyGao/p/7160421.html This is the end of this article about the MySQL online DDL tool gh-ost. For more information about MySQL online DDL gh-ost, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Vue realizes simple effect of running light
>>: A brief discussion on CSS height collapse problem
MySQL 8.0: MVCC for Large Objects in InnoDB In th...
1. What is the use of slow query? It can record a...
Preface The reason for writing this article is mai...
This article is just to commemorate those CSS que...
Dynamically implement a simple secondary menu Whe...
It's embarrassing to say that I had to search ...
Table of contents definition structure Examples C...
This article will introduce how to use explain to...
Table of contents Preface Development Environment...
This article introduces 5 ways to solve the 1px b...
Table of contents Overview The history of CPU-bou...
Table of contents 1 The common rules for creating...
Background of the problem The server monitoring s...
When we write pages, we sometimes find that the C...
There are some tags in XHTML that have similar fu...