MYSQL development performance research: optimization method for batch inserting data

MYSQL development performance research: optimization method for batch inserting data

1. What problems did we encounter?

In standard SQL, we usually write the following SQL insert statement.

INSERT INTO TBL_TEST (id) VALUES(1);

Obviously, this approach is also feasible in MYSQL. However, when we need to insert data in batches, such statements will cause performance problems. For example, if you need to insert 100,000 pieces of data, then you will need 100,000 insert statements, each of which needs to be submitted to the relational engine for parsing and optimization before it can reach the storage engine to do the actual insertion work.

It is precisely because of the performance bottleneck problem that the MYSQL official documentation mentions the use of batch insertion, that is, inserting multiple values ​​in one INSERT statement. Right now,

INSERT INTO TBL_TEST (id) VALUES (1), (2), (3)

This approach can indeed help speed up batch insertion. The reason is not difficult to understand. Since fewer INSERT statements are submitted to the server, the network load is reduced. The most important thing is that the parsing and optimization time seems to increase, but in fact, the number of data rows affected is much greater. So the overall performance is improved. According to some claims on the Internet, this method can improve efficiency by dozens of times.

However, I have also seen several other methods on the Internet, such as preprocessing SQL and batch submission. So how do these methods perform? This article will compare these methods.

2. Comparison of Environment and Methods
My environment is pretty poor, basically it's just a backward virtual machine. It has only 2 cores and 6G of memory. The operating system is SUSI Linux and the MYSQL version is 5.6.15.

As you can imagine, the performance of this machine caused my TPS to be very low, so all the data below is meaningless, but the trend is different, it can show the performance trend of the entire insertion.

Due to the characteristics of our business, the table we use is very large, with a total of 195 fields, and when fully written (each field is fully filled, including varchar), it will be slightly less than 4KB in size. Generally speaking, the size of a record is 3KB.

Because of our actual experience, we are sure that by submitting a large number of INSERT statements in a single transaction can significantly improve performance. Therefore, all the following tests are based on the practice of committing every time 5,000 records are inserted.

Finally, it should be noted that all the tests below were performed using the MYSQL C API and the INNODB storage engine was used.

3. Comparison Method

Ideal Type Testing (I) - Comparison of Methods

Purpose: Find the most appropriate insertion mechanism under ideal circumstances

Key methods:

1. Each process/thread inserts in the order of primary key

2. Comparison of different insertion methods

3. Compare the impact of different numbers of processes/threads on insertion

image

*“Normal method” refers to the case where an INSERT statement inserts only one VALUE.

* "Prepared SQL" refers to the case where the preprocessed MYSQL C API is used.

* "Multi-table value SQL (10 records)" means inserting 10 records using one INSERT statement. Why 10? Subsequent verification tells us that this method has the highest performance.

Conclusion,It is obvious that from the trend of the three methods, the multi-table value SQL (10 items) method is the most efficient.

Ideal Test (II) - Comparison of the number of SQL statements with multiple table values

image

Obviously, as the amount of data increases, inserting 10 records per INSERT statement is the most efficient approach.

Ideal Test (III) - Comparison of Connection Numbers

image

image

Conclusion: Performance is highest when the number of connections and operations is twice the number of CPU cores

General testing - testing based on our business volume

Purpose: What is the best insertion mechanism for common transaction situations?

Key methods:

1. Simulate production data (about 3KB per record)

2. Each thread inserts the primary key out of order

image

Obviously, if the data is inserted out of order according to the primary key, the performance will drop sharply. This is actually consistent with the phenomenon shown by the internal implementation principle of INNODB. But it is still certain that the case of multi-table value SQL (10 items) is the best.

Stress Testing

Purpose: Best insertion mechanism for extreme trading situations?

Key methods:

1. Fill each field of the data row (each record is about 4KB)

2. Each thread inserts the primary key out of order

image

The results are similar to our previous patterns, with an extreme drop in performance. And here it is verified that as the record size increases (it may exceed the size of a page, after all, there is still slot and page head information occupying space), there will be page splits and other phenomena, and performance will deteriorate.

IV. Conclusion

Based on the above tests and our understanding of INNODB, we can draw the following conclusions.

• Use a sequential primary key strategy (such as auto-incrementing primary keys, or modifying business logic to make inserted records as sequential as possible)

• It is most appropriate to use a multi-value table (10 entries) for insertion

• It is relatively appropriate to control the number of processes/threads to 2 times the number of CPUs

V. Appendix

I found that there are very few complete examples of preprocessing SQL statements for MYSQL on the Internet. Here is a simple example.

--Create table statement CREATE TABLE tbl_test 
(
  pri_key varchar(30), 
  nor_char char(30), 
  max_num DECIMAL(8,0), 
  long_num DECIMAL(12, 0), 
  rec_upd_ts TIMESTAMP
);

C code

#include <string.h>
#include <iostream>
#include <mysql.h>
#include <sys/time.h>
#include <sstream>
#include <vector>
 
using namespace std;
 
#define STRING_LEN 30
  
char pri_key[STRING_LEN]= "123456"; 
char nor_char [STRING_LEN] = "abcabc"; 
char rec_upd_ts [STRING_LEN] = "NOW()"; 
 
bool SubTimeval(timeval &result, timeval &begin, timeval &end)
{
  if ( begin.tv_sec>end.tv_sec ) return false;
 
  if ( (begin.tv_sec == end.tv_sec) && (begin.tv_usec > end.tv_usec) )  
    return false;
 
  result.tv_sec = ( end.tv_sec - begin.tv_sec );  
  result.tv_usec = ( end.tv_usec - begin.tv_usec );  
 
  if (result.tv_usec<0) {
    result.tv_sec--;
    result.tv_usec+=1000000;} 
  return true;
}
 
int main(int argc, char ** argv)
{
  INT32 ret = 0;
  char errmsg[200] = {0};
  int sqlCode = 0;
 
  timeval tBegin, tEnd, tDiff;
   
  const char* precompile_statment2 = "INSERT INTO `tbl_test`( pri_key, nor_char, max_num, long_num, rec_upd_ts) VALUES(?, ?, ?, ?, ?)";
   
  MYSQL conn;
  mysql_init(&conn);
   
  if (mysql_real_connect(&conn, "127.0.0.1", "dba", "abcdefg", "TESTDB", 3306, NULL, 0) == NULL)
  {
    fprintf(stderr, "mysql_real_connect, 2 failed\n");
    exit(0);
  }
   
  MYSQL_STMT *stmt = mysql_stmt_init(&conn);
  if (!stmt)
  {
   fprintf(stderr, "mysql_stmt_init, 2 failed\n");
   fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
   exit(0);
  }
   
  if (mysql_stmt_prepare(stmt, precompile_statment2, strlen(precompile_statment2)))
  {
   fprintf(stderr, "mysql_stmt_prepare, 2 failed\n");
   fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
   exit(0);
  }
   
  int i = 0; 
  int max_num = 3;
  const int FIELD_NUM = 5;
  while (i < max_num)
  {
    //MYSQL_BIND bind[196] = {0};
    MYSQL_BIND bind[FIELD_NUM];
    memset(bind, 0, FIELD_NUM * sizeof(MYSQL_BIND));
   
    unsigned long str_length = strlen(pri_key);
    bind[0].buffer_type = MYSQL_TYPE_STRING;
    bind[0].buffer = (char *)pri_key;
    bind[0].buffer_length = STRING_LEN;
    bind[0].is_null = 0;
    bind[0].length = &str_length;
     
    unsigned long str_length_nor = strlen(nor_char);
    bind[1].buffer_type = MYSQL_TYPE_STRING;
    bind[1].buffer = (char *)nor_char;
    bind[1].buffer_length = STRING_LEN;
    bind[1].is_null = 0;
    bind[1].length = &str_length_nor;
     
    bind[2].buffer_type = MYSQL_TYPE_LONG;
    bind[2].buffer = (char*)&max_num;
    bind[2].is_null = 0;
    bind[2].length = 0;
     
    bind[3].buffer_type = MYSQL_TYPE_LONG;
    bind[3].buffer = (char*)&max_num;
    bind[3].is_null = 0;
    bind[3].length = 0;
     
    MYSQL_TIME ts;
    ts.year = 2002;
    ts.month = 02;
    ts.day = 03;
    ts.hour = 10;
    ts.minute = 45;
    ts.second = 20;
     
    unsigned long str_length_time = strlen(rec_upd_ts);
    bind[4].buffer_type = MYSQL_TYPE_TIMESTAMP;
    bind[4].buffer = (char *)&ts;
    bind[4].is_null = 0;
    bind[4].length = 0;
     
    if (mysql_stmt_bind_param(stmt, bind))
    {
      fprintf(stderr, "mysql_stmt_bind_param, 2 failed\n");
      fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
      exit(0);
    }
     
    cout << "before execute\n";
    if (mysql_stmt_execute(stmt))
    {
     fprintf(stderr, "mysql_stmt_execute, 2 failed\n");
     fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
     exit(0);
    }
    cout << "after execute\n";
     
    i++;
  }
   
  mysql_commit(&conn);
   
  mysql_stmt_close(stmt);
 
  return 0;  
}

The above is the optimization method for MySQL batch inserting data. It is recommended that you also read more previous articles on 123WORDPRESS.COM.

You may also be interested in:
  • Examples of 4 methods for inserting large amounts of data in MySQL
  • MYSQL batch insert data implementation code
  • Tutorial on implementing batch inserts in MySQL to optimize performance
  • How to avoid MySQL batch inserts with unique indexes
  • Mysql uses insert to insert multiple records to add data in batches
  • MySQL batch insert data script
  • Detailed example code of mysql batch insert loop
  • Detailed explanation of MySQL batch SQL insert performance optimization
  • MySql batch insert optimization Sql execution efficiency example detailed explanation
  • How to quickly insert 10 million records into MySQL

<<:  Sample code for using js to implement Ajax concurrent requests to limit the number of requests

>>:  Implementation of drawing audio waveform with wavesurfer.js

Recommend

How to fill items in columns in CSS Grid Layout

Suppose we have n items and we have to sort these...

Implementation of MYSQL (telephone number, ID card) data desensitization

1. Data desensitization explanation In daily deve...

The difference between key and index in MySQL

Let's look at the code first: ALTER TABLE rep...

Detailed explanation of the failure of MySQL to use UNION to connect two queries

Overview UNION The connection data set keyword ca...

Coexistence of python2 and python3 under centos7 system

The first step is to check the version number and...

MySql 8.0.16-win64 Installation Tutorial

1. Unzip the downloaded file as shown below . 2. ...

JavaScript knowledge: Constructors are also functions

Table of contents 1. Definition and call of const...

Installation tutorial of MySQL 5.7 green version under windows2008 64-bit system

Preface This article introduces the installation ...

Implementation code of Nginx anti-hotlink and optimization in Linux

Hide version number The version number is not hid...

How to use map to allow multiple domain names to cross domains in Nginx

Common Nginx configuration allows cross-domain se...

Why should css be placed in the head tag

Think about it: Why should css be placed in the h...

How to backup and restore the mysql database if it is too large

Command: mysqlhotcopy This command will lock the ...