I have been taking a lot of MySQL notes recently, mainly because the company's Oracle is relatively stable and has less maintenance. Last week, I was assigned to migrate a MySQL data of billions, so I took the opportunity to record my study notes. Data migration is similar in working principle and technical support to data export, BI reports, etc. The biggest difference lies in the amount of data imported and exported. Generally, the amount of report data will not exceed several million. However, data migration for Internet companies often involves tens of millions or hundreds of millions of data. Importing and exporting are two processes. Even when doing data migration, we should look at them separately. At the same time, the import/export methods are divided into: 1. MySQL comes with import/export method 2. Various client import/export methods First summarize the export: 1. For data with fewer fields or less field content, you can use tools such as Navicat to export through the client. Here I export three fields, all of which are values within 11 digits. Using Navicat to export about 2.5 million data per minute, 2. MySQL's own export statement: select into outfile statement; SELECT ... FROM TABLE_A --You can add a where condition INTO OUTFILE "/path/to/file" --Export file location FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' --Field separator and inclusion character LINES TERMINATED BY '\n';--Line break character The fields before are very simple and understandable, so I won’t explain them here. Let’s talk about the fields after: FIELDS TERMINATED BY ',' means that the fields are separated by commas, such as: field A field B, the display format when exported is: A, B OPTIONALLY ENCLOSED BY '"' means the field content is enclosed in double quotes. The export format is: "A","B" LINES TERMINATED BY '\n'; Each data line is separated by a newline. The export format is as follows: "A","B" "A1","B1" Of course, you can define the field distinction and inclusion symbols yourself, such as: '# The advantage of using MySQL's built-in export/import is that it is extremely fast, but the disadvantage is that it can only export files to the local address of the server host. For colleagues like bi who do not have database host permissions, this method may be a luxury. Fortunately, the export speed of third-party client tools for reports with fewer fields/content is not particularly slow; Import: The key is record import. Import is mainly done by DBA to migrate data. The methods are divided into client and MySQL built-in methods: I highly recommend using MySQL import here because I had to migrate 130 million data before. It took 22 hours to import the data using the Navicat client, which was too time-consuming and uncertain. Tools like Navicat have the risk of pseudo-death. Therefore, it is not recommended to import more than 10,000 data through Navicat. MySQL built-in import method: --The official document definition is as follows, and the comments are added based on my own understanding: LOAD DATA 、 [LOW_PRIORITY | CONCURRENT]--Execute/immediately when no one is using the database[LOCAL]--With this parameter, the server side reads the file not on the server host. Without this parameter, the file is read on the server host by defaultINFILE 'file_name' --Read the file address and file name[REPLACE | IGNORE]--When encountering duplicate data, it is: replace/repeat write. It is recommended to use ignore repeated writeINTO TABLE tbl_name --Import to which table[PARTITION (partition_name [, partition_name] ...)]--This line of parameters can be omitted. It is recommended to use the following fields [CHARACTER SET charset_name]--Set the character format of the imported content. You can specify utf-8 or GBK, etc. [{FIELDS | COLUMNS} --fields identifier [TERMINATED BY 'string'] --What symbol is used to distinguish the system fields [[OPTIONALLY] ENCLOSED BY 'char']--What symbol is used to distinguish the start and end of the system field itself [ESCAPED BY 'char']--Escape character. If it is a text file, there are special characters such as double quotes in the text field. You can ignore the special characters of the text file by defining an escape character] [LINES --lines identifier [STARTING BY 'string'] -- string that defines the beginning of a line. If there is no character identifier at the beginning of a line, it is usually not necessary to write [TERMINATED BY 'string'] -- end-of-line string identifier. The data between lines is distinguished by defining characters] [IGNORE number {LINES | ROWS}]--Ignore the first number of lines in the file, usually not written--the following specifies which fields to insert into [(col_name_or_user_var [, col_name_or_user_var] ...)] [SET col_name={expr | DEFAULT}, [, col_name={expr | DEFAULT}] ...] The original text says that using load data can import data into the database very quickly, but if you want to use the fields and lines parameters, you must have a parameter value and fields must come before the lines parameter; The statement I used this time is:
Insert is a very simple statement. I will not give specific examples here. What I want to share is how to improve the efficiency of insertion. Because when I used the statement to insert for the first time, it started executing at 12 o'clock in the evening and had not been completed by 11 o'clock the next day. So it doesn't mean that using load without configuring other things will definitely be fast; The data format I inserted this time is as follows: The text format is as follows: There are 140 million data records in total, exported in the form of a text document with a size of 4.3G; uploaded to the server/data/files folder via ftp software; Complaint 1:Since the project required that all three fields must have indexes, I added indexes when I created the table, which resulted in an indefinitely long wait time. reason: Indexes take up space. If I need to add indexes to import three fields, it means that I need to write the index once for each field, which takes several times longer than not adding indexes. Optimization method: Before importing, remove the table index and leave an auto-increment id, and then add it after the import is complete Complaint point 2:Engine selection: The MySQL engine treats load writing differently, especially the master-slave backup mechanism: For MyISAM engine: For innodb engine: Therefore, if you are pursuing extreme speed and have billions of data, you can consider choosing the MyISAM engine. The default for MySQL should be InnoDB. However, I did not change the engine this time. I do not recommend changing the default InnoDB engine. After all, Oracle's official main engine is the most comprehensive. Unless there are special circumstances, it is not recommended to use MyISAM. If you use MyISAM, pay attention to two points: When using MyISAM, you can adjust several session values to expand the read memory and improve the read data. The statements are as follows: SET SESSION BULK_INSERT_BUFFER_SIZE = 256217728; SET SESSION MYISAM_SORT_BUFFER_SIZE = 256217728; For the MyISAM engine, the unique check before importing can be turned off first and then turned on again: SET UNIQUE_CHECKS=0 -- Disable SET UNIQUE_CHECKS=1 -- Enable Complaint point 3:Although MySQL supports local clients to read files, due to various network reasons, it has little impact on the case of dozens or hundreds of data. However, when the amount of data reaches hundreds of millions, even 1 millisecond will have a particularly large impact, so it is recommended to use FTP to transfer to the server for reading Complaint point 4:Experience sharing, after importing, check the server status: use the top command to check the host CPU MySQL usage. Theoretically, it will occupy more CPU. My first time, which took a very long time, the CPU occupied 10%, which was an extremely abnormal import. The second time I imported normally, the CPU occupied 110%, which was the state of rapid writing. The last 140 million data took only more than 7 minutes, so you must monitor the server after executing the statement, otherwise the statement may not be executed normally. CPU usage: Note: The biggest difference between load and insert is that load only operates the syntax once, and then the data is inserted in batches, while insert operates on each data once and traverses the field index once, so insert itself is extremely slow for big data. Summarize:The biggest and most obvious change in this optimization is that after removing the index, the import speed is extremely fast. Index, let me say it again: When importing, you can remove the index first and add it after the import is complete. Updated on July 3, 2020When importing big data into MySQL, you must pay attention to the max transaction limit. A few months ago, when doing data migration, a large transaction limit occurred on the MySQL 8.0 MGR cluster, causing the instance to have problems and restart MySQL. The default configuration should be a transaction limit of 150 million. At that time, the imported data was relatively large and no parameter expansion was performed. At the same time, no data segmentation or flow control was performed for import, which caused the database to be blocked and restarted. According to the company's requirement of a 7*24*365 mechanism, this was considered an accident. If the company has high requirements, it is recommended to pay attention to the MySQL configuration itself when importing or import transaction submission limits; This concludes this article about MySQL billion-level data import, export and migration notes. For more relevant MySQL billion-level data import, export and migration content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Sample code for implementing honeycomb/hexagonal atlas with CSS
>>: Flash embedded in web pages and IE, FF, Maxthon compatibility issues
I won't say much nonsense, let's just loo...
Knowledge point 1: Set the base URL of the web pa...
Table of contents 1. Interface effect preview 2.u...
If the developer uses Dockerfile to build the ima...
Table of contents Installation-free version of My...
cause I recently started to refactor the project,...
Technical Background This application uses the vu...
After installing MySQL, you will find that the ro...
Table of contents background Problem location Fur...
Table of contents 1. Component Introduction 2. So...
Table of contents Why use day.js Moment.js Day.js...
Delete a file by its inode number First use ls -i...
In centos7, the permissions of the /etc/rc.d/rc.l...
This article shares the specific code for JavaScr...
Table of contents Cause of the incident Use Node ...