I was woken up by a phone call early in the morning. The database of a certain project was down and could not be started (I slept too soundly and did not hear the alarm text message). I was very scared! The person on the phone said that all MySQL database master databases could not be started, but the slave databases were normal. It was suspected that the master database was connecting to other Alibaba Cloud master databases. These databases were previously migrated from Alibaba Cloud to the IDC computer room, so he made this judgment. Quickly turn on the computer, connect to ***, log in to one of the database servers, and try to execute the following command to start the mysql service
The startup failed, and I tried another database server, but it still failed. Considering that all databases cannot be started, it can be preliminarily determined that the problem may be caused by a problem with the database host. The underlying design of the database is two physical nodes virtualized, plus one physical machine for backup. All virtual machines on one physical machine are used as MySQL master databases, and virtual machines on another physical machine are used as MySQL slave databases. Give up troubleshooting in the virtual machine and quickly log in to the host system. Next, we will troubleshoot the problem from two aspects. ü Virtualized backend management system It was discovered that the storage was full and the problem was serious. ü SSH login to the host system debian
The system log /var/log/messages found a large number of disk io errors. Based on the above findings, it can be basically concluded that there is a problem with the disk: one problem is that the storage space allocated by proxmox is full, and the other is a disk io error. After knowing the problem, there are two solutions: fix the error or promote the slave database to the master database. Considering the standby issue, we should try our best to repair the master database. If it cannot be repaired, we can use the second solution (promote the slave database). Free up disk space Why does the disk space fill up? Someone must have done something on the virtual machine, and it may be that each virtual machine performed the same operation, which caused the host machine's disk space to fill up quickly. Log in to a virtual machine running MySQL database and execute the command
Logging into other servers, the partition /dev/sdb1 is also in use by more than 90%. Enter the directory /data and run the following command to view the directory space usage:
Wow, there are several directories of more than 50G (I deleted them when writing this article and have no records left). Judging from the directory names, these files should be automatically generated by the backup database. Ignore it, delete it first. Someone must have done an automatic task in the system. I checked it with the command crontab –l and found the following:
At first glance, there is nothing wrong with this script. But if you look closely, you will find that there is a "~" symbol in the last line. There is something wrong! The intention of the person who wrote the script was to back up the database once a day and then delete the historical backup data from the previous day so as not to fill up the disk. But there are two fatal problems, which are described here. Backup strategy error There is a dedicated backup system, and data should be backed up to this system instead of local backup. Wrong means After the backup script is written, it should be executed manually to verify its correctness. Instead of just throwing it there after writing it. Repair disk errors Contact the computer room urgently and ask the technicians to connect KVM over to the host machine. In case the system cannot be booted, you can view it remotely or enter single-user mode to perform repair operations such as fsck. Use SSH to connect to the host system Debian, confirm that the full disk space is released, and then execute reboot to restart the system. After a few minutes, the system boots normally. Subsequent Operations Checking the system log, there is no disk io error, and the creation of directories and files is normal; starting each virtual machine and starting the database on it are all normal. Notify all parties and check whether everything is normal from a business perspective. After a while, I received a bunch of recovery messages via text messages, and I felt much more at ease. Needless to say, it was the project's SA who did this and did not notify anyone. Tell him privately and ask him to explain the matter to other people. If you do anything risky in the future, it’s best to inform each other. The above is what I introduced to you. How to kill a bunch of MySQL databases with just such a shell script. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website! You may also be interested in:
|
<<: Vue implements seamless scrolling of lists
>>: Detailed explanation of nginx installation, deployment and usage on Linux
I recently started learning about database knowle...
First, let's take a look at my basic developm...
1. This is a bit complicated to understand, I hop...
1. CSS3 triangle continues to zoom in special eff...
When developing a backend management project, it ...
【content】: 1. Use background-image gradient style...
Table of contents 1. Nginx implements load balanc...
Copy code The code is as follows: <!--[if !IE]...
1. Set the parent container to a table and the ch...
Table of contents 1. Scope 1. Global scope 2. Loc...
Table of contents 1. Benefits of using Docker 2. ...
The server reports an error 502 when synchronizin...
Table of contents 1. Concepts related to stored p...
Classification of CSS styles 1. Internal style --...
Often, after a web design is completed, the desig...