1. What problems did we encounter? In standard SQL, we usually write the following SQL insert statement. INSERT INTO TBL_TEST (id) VALUES(1); Obviously, this approach is also feasible in MYSQL. However, when we need to insert data in batches, such statements will cause performance problems. For example, if you need to insert 100,000 pieces of data, then you will need 100,000 insert statements, each of which needs to be submitted to the relational engine for parsing and optimization before it can reach the storage engine to do the actual insertion work. It is precisely because of the performance bottleneck problem that the MYSQL official documentation mentions the use of batch insertion, that is, inserting multiple values in one INSERT statement. Right now, INSERT INTO TBL_TEST (id) VALUES (1), (2), (3) This approach can indeed help speed up batch insertion. The reason is not difficult to understand. Since fewer INSERT statements are submitted to the server, the network load is reduced. The most important thing is that the parsing and optimization time seems to increase, but in fact, the number of data rows affected is much greater. So the overall performance is improved. According to some claims on the Internet, this method can improve efficiency by dozens of times. However, I have also seen several other methods on the Internet, such as preprocessing SQL and batch submission. So how do these methods perform? This article will compare these methods. 2. Comparison of Environment and Methods As you can imagine, the performance of this machine caused my TPS to be very low, so all the data below is meaningless, but the trend is different, it can show the performance trend of the entire insertion. Due to the characteristics of our business, the table we use is very large, with a total of 195 fields, and when fully written (each field is fully filled, including varchar), it will be slightly less than 4KB in size. Generally speaking, the size of a record is 3KB. Because of our actual experience, we are sure that by submitting a large number of INSERT statements in a single transaction can significantly improve performance. Therefore, all the following tests are based on the practice of committing every time 5,000 records are inserted. Finally, it should be noted that all the tests below were performed using the MYSQL C API and the INNODB storage engine was used. 3. Comparison Method Ideal Type Testing (I) - Comparison of Methods Purpose: Find the most appropriate insertion mechanism under ideal circumstances Key methods: 1. Each process/thread inserts in the order of primary key 2. Comparison of different insertion methods 3. Compare the impact of different numbers of processes/threads on insertion *“Normal method” refers to the case where an INSERT statement inserts only one VALUE. * "Prepared SQL" refers to the case where the preprocessed MYSQL C API is used. * "Multi-table value SQL (10 records)" means inserting 10 records using one INSERT statement. Why 10? Subsequent verification tells us that this method has the highest performance. Conclusion,It is obvious that from the trend of the three methods, the multi-table value SQL (10 items) method is the most efficient. Ideal Test (II) - Comparison of the number of SQL statements with multiple table values Obviously, as the amount of data increases, inserting 10 records per INSERT statement is the most efficient approach. Ideal Test (III) - Comparison of Connection Numbers Conclusion: Performance is highest when the number of connections and operations is twice the number of CPU cores General testing - testing based on our business volume Purpose: What is the best insertion mechanism for common transaction situations? Key methods: 1. Simulate production data (about 3KB per record) 2. Each thread inserts the primary key out of order Obviously, if the data is inserted out of order according to the primary key, the performance will drop sharply. This is actually consistent with the phenomenon shown by the internal implementation principle of INNODB. But it is still certain that the case of multi-table value SQL (10 items) is the best. Stress Testing Purpose: Best insertion mechanism for extreme trading situations? Key methods: 1. Fill each field of the data row (each record is about 4KB) 2. Each thread inserts the primary key out of order The results are similar to our previous patterns, with an extreme drop in performance. And here it is verified that as the record size increases (it may exceed the size of a page, after all, there is still slot and page head information occupying space), there will be page splits and other phenomena, and performance will deteriorate. IV. Conclusion Based on the above tests and our understanding of INNODB, we can draw the following conclusions. • Use a sequential primary key strategy (such as auto-incrementing primary keys, or modifying business logic to make inserted records as sequential as possible) • It is most appropriate to use a multi-value table (10 entries) for insertion • It is relatively appropriate to control the number of processes/threads to 2 times the number of CPUs V. Appendix I found that there are very few complete examples of preprocessing SQL statements for MYSQL on the Internet. Here is a simple example. --Create table statement CREATE TABLE tbl_test ( pri_key varchar(30), nor_char char(30), max_num DECIMAL(8,0), long_num DECIMAL(12, 0), rec_upd_ts TIMESTAMP ); C code #include <string.h> #include <iostream> #include <mysql.h> #include <sys/time.h> #include <sstream> #include <vector> using namespace std; #define STRING_LEN 30 char pri_key[STRING_LEN]= "123456"; char nor_char [STRING_LEN] = "abcabc"; char rec_upd_ts [STRING_LEN] = "NOW()"; bool SubTimeval(timeval &result, timeval &begin, timeval &end) { if ( begin.tv_sec>end.tv_sec ) return false; if ( (begin.tv_sec == end.tv_sec) && (begin.tv_usec > end.tv_usec) ) return false; result.tv_sec = ( end.tv_sec - begin.tv_sec ); result.tv_usec = ( end.tv_usec - begin.tv_usec ); if (result.tv_usec<0) { result.tv_sec--; result.tv_usec+=1000000;} return true; } int main(int argc, char ** argv) { INT32 ret = 0; char errmsg[200] = {0}; int sqlCode = 0; timeval tBegin, tEnd, tDiff; const char* precompile_statment2 = "INSERT INTO `tbl_test`( pri_key, nor_char, max_num, long_num, rec_upd_ts) VALUES(?, ?, ?, ?, ?)"; MYSQL conn; mysql_init(&conn); if (mysql_real_connect(&conn, "127.0.0.1", "dba", "abcdefg", "TESTDB", 3306, NULL, 0) == NULL) { fprintf(stderr, "mysql_real_connect, 2 failed\n"); exit(0); } MYSQL_STMT *stmt = mysql_stmt_init(&conn); if (!stmt) { fprintf(stderr, "mysql_stmt_init, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } if (mysql_stmt_prepare(stmt, precompile_statment2, strlen(precompile_statment2))) { fprintf(stderr, "mysql_stmt_prepare, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } int i = 0; int max_num = 3; const int FIELD_NUM = 5; while (i < max_num) { //MYSQL_BIND bind[196] = {0}; MYSQL_BIND bind[FIELD_NUM]; memset(bind, 0, FIELD_NUM * sizeof(MYSQL_BIND)); unsigned long str_length = strlen(pri_key); bind[0].buffer_type = MYSQL_TYPE_STRING; bind[0].buffer = (char *)pri_key; bind[0].buffer_length = STRING_LEN; bind[0].is_null = 0; bind[0].length = &str_length; unsigned long str_length_nor = strlen(nor_char); bind[1].buffer_type = MYSQL_TYPE_STRING; bind[1].buffer = (char *)nor_char; bind[1].buffer_length = STRING_LEN; bind[1].is_null = 0; bind[1].length = &str_length_nor; bind[2].buffer_type = MYSQL_TYPE_LONG; bind[2].buffer = (char*)&max_num; bind[2].is_null = 0; bind[2].length = 0; bind[3].buffer_type = MYSQL_TYPE_LONG; bind[3].buffer = (char*)&max_num; bind[3].is_null = 0; bind[3].length = 0; MYSQL_TIME ts; ts.year = 2002; ts.month = 02; ts.day = 03; ts.hour = 10; ts.minute = 45; ts.second = 20; unsigned long str_length_time = strlen(rec_upd_ts); bind[4].buffer_type = MYSQL_TYPE_TIMESTAMP; bind[4].buffer = (char *)&ts; bind[4].is_null = 0; bind[4].length = 0; if (mysql_stmt_bind_param(stmt, bind)) { fprintf(stderr, "mysql_stmt_bind_param, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } cout << "before execute\n"; if (mysql_stmt_execute(stmt)) { fprintf(stderr, "mysql_stmt_execute, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } cout << "after execute\n"; i++; } mysql_commit(&conn); mysql_stmt_close(stmt); return 0; } The above is the optimization method for MySQL batch inserting data. It is recommended that you also read more previous articles on 123WORDPRESS.COM. You may also be interested in:
|
<<: Sample code for using js to implement Ajax concurrent requests to limit the number of requests
>>: Implementation of drawing audio waveform with wavesurfer.js
Suppose we have n items and we have to sort these...
1. Data desensitization explanation In daily deve...
Situation description: The database was started a...
Let's look at the code first: ALTER TABLE rep...
Overview UNION The connection data set keyword ca...
The first step is to check the version number and...
1. Unzip the downloaded file as shown below . 2. ...
Table of contents 1. Definition and call of const...
background: Since the company's projects seem...
Recently, I used vuethink in my project, which in...
Preface This article introduces the installation ...
Hide version number The version number is not hid...
Common Nginx configuration allows cross-domain se...
Think about it: Why should css be placed in the h...
Command: mysqlhotcopy This command will lock the ...