• Questions raised The structure of the source table t_source is as follows: item_id int, created_time datetime, modified_time datetime, item_name varchar(20), other varchar(20) Require: 1. There are 1 million records in the source table, of which 500,000 have duplicate created_time and item_name. • Experimental environment Linux virtual machine: CentOS release 6.4; 8G physical memory (MySQL configured with 4G); 100G mechanical hard disk; dual physical CPU dual core, a total of four processors; MySQL 8.0.16. • Create test tables and data -- Create a source table create table t_source (item_id int, created_time datetime, modified_time datetime, item_name varchar(20), other varchar(20) ); -- Create the target table create table t_target like t_source; -- Generate 1 million test data, of which 500,000 created_time and item_name have duplicate delimiters // create procedure sp_generate_data() begin set @i := 1; while @i<=500000 do set @created_time := date_add('2017-01-01',interval @i second); set @modified_time := @created_time; set @item_name := concat('a',@i); insert into t_source values (@i,@created_time,@modified_time,@item_name,'other'); set @i:=@i+1; end while; commit; set @last_insert_id := 500000; insert into t_source select item_id + @last_insert_id, created_time, date_add(modified_time,interval @last_insert_id second), item_name, 'other' from t_source; commit; end // delimiter ; call sp_generate_data(); -- The source table has no primary key or unique constraint, and there may be two identical data, so insert another record to simulate this situation. insert into t_source select * from t_source where item_id=1; There are 1,000,001 records in the source table, and the target table after deduplication should have 500,000 records. mysql> select count(*),count(distinct created_time,item_name) from t_source; +----------+----------------------------------------+ | count(*) | count(distinct created_time,item_name) | +----------+----------------------------------------+ | 1000001 | 500000 | +----------+----------------------------------------+ 1 row in set (1.92 sec) 1. Clever use of indexes and variables1. No-index comparison test(1) Using correlated subqueries truncate t_target; insert into t_target select distinct t1.* from t_source t1 where item_id in (select min(item_id) from t_source t2 where t1.created_time=t2.created_time and t1.item_name=t2.item_name); This statement will take a long time to produce results, so just look at the execution plan. mysql> explain select distinct t1.* from t_source t1 where item_id in -> (select min(item_id) from t_source t2 where t1.created_time=t2.created_time and t1.item_name=t2.item_name); +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 997282 | 100.00 | Using where; Using temporary | | 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 997282 | 1.00 | Using where | +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------+ 2 rows in set, 3 warnings (0.00 sec) Both the main query and the related subquery scan the entire table, which requires a total of 1 million * 1 million data rows to be scanned. No wonder no results are produced. (2) Using table joins truncate t_target; insert into t_target select distinct t1.* from t_source t1, (select min(item_id) item_id,created_time,item_name from t_source group by created_time,item_name) t2 where t1.item_id = t2.item_id; This method takes 14 seconds and the query plan is as follows: mysql> explain select distinct t1.* from t_source t1, (select min(item_id) item_id,created_time,item_name from t_source group by created_time,item_name) t2 where t1.item_id = t2.item_id; +----+-------------+------------+------------+------+---------------+-------------+---------+----------------+--------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+-------------+---------+----------------+--------+----------+------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 997282 | 100.00 | Using where; Using temporary | | 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 5 | test.t1.item_id | 10 | 100.00 | Distinct | | 2 | DERIVED | t_source | NULL | ALL | NULL | NULL | NULL | NULL | 997282 | 100.00 | Using temporary | +----+-------------+------------+------------+------+---------------+-------------+---------+----------------+--------+----------+------------------------------+ 3 rows in set, 1 warning (0.00 sec) •The inner query scans 1 million rows of the t_source table, creates a temporary table, finds the minimum item_id after deduplication, and generates the derived table derived2, which has 500,000 rows. (3) Using variables set @a:='1000-01-01 00:00:00'; set @b:=' '; set @f:=0; truncate t_target; insert into t_target select item_id,created_time,modified_time,item_name,other from (select t0.*,if(@a=created_time and @b=item_name,@f:=0,@f:=1) f, @a:=created_time,@b:=item_name from (select * from t_source order by created_time,item_name) t0) t1 where f=1; This method takes 13 seconds and the query plan is as follows: mysql> explain select item_id,created_time,modified_time,item_name,other -> from -> (select t0.*,if(@a=created_time and @b=item_name,@f:=0,@f:=1) f, @a:=created_time,@b:=item_name -> from -> (select * from t_source order by created_time,item_name) t0) t1 where f=1; +----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+ | 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100.00 | NULL | | 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 997282 | 100.00 | NULL | | 3 | DERIVED | t_source | NULL | ALL | NULL | NULL | NULL | NULL | 997282 | 100.00 | Using filesort | +----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+ 3 rows in set, 5 warnings (0.00 sec) •The innermost query scans 1 million rows of the t_source table and uses file sort to generate the derived table derived3. Compared with method 2 above, the total number of scanned rows remains unchanged, which is 2 million rows. There is only a slight difference. This time the automatically generated index is on the constant column f, while the automatically generated index for table join is on the item_id column, so the query time is almost the same. So far, we have not created any indexes on the source table. Regardless of which writing method is used, the created_time and item_name fields need to be sorted to check for duplicates. Therefore, it is natural to think that if a joint index is established on these two fields, the ordered nature of the index itself can be used to eliminate additional sorting, thereby improving query performance. 2. Create a joint index comparison test on created_time and item_name-- Create a joint index of created_time and item_name fields create index idx_sort on t_source(created_time,item_name,item_id); analyze table t_source; (1) Using correlated subqueries truncate t_target; insert into t_target select distinct t1.* from t_source t1 where item_id in (select min(item_id) from t_source t2 where t1.created_time=t2.created_time and t1.item_name=t2.item_name); This time it took 19 seconds, and the query plan is as follows: mysql> explain select distinct t1.* from t_source t1 where item_id in -> (select min(item_id) from t_source t2 where t1.created_time=t2.created_time and t1.item_name=t2.item_name); +----+--------------------+-------+------------+------+---------------+----------+--------+---------+----------------------------------------+--------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+----------+--------+---------+----------------------------------------+--------+----------+------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 997281 | 100.00 | Using where; Using temporary | | 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_sort | idx_sort | 89 | test.t1.created_time,test.t1.item_name | 2 | 100.00 | Using index | +----+--------------------+-------+------------+------+---------------+----------+--------+---------+----------------------------------------+--------+----------+------------------------------+ 2 rows in set, 3 warnings (0.00 sec) •The t_source table of the outer query is the driving table and needs to scan 1 million rows. •For the item_id of each row in the driver table, two rows of data are queried through the idx_sort index. (2) Using table joins truncate t_target; insert into t_target select distinct t1.* from t_source t1, (select min(item_id) item_id,created_time,item_name from t_source group by created_time,item_name) t2 where t1.item_id = t2.item_id; This time it took 13 seconds, and the query plan is as follows: mysql> explain select distinct t1.* from t_source t1, -> (select min(item_id) item_id,created_time,item_name from t_source group by created_time,item_name) t2 -> where t1.item_id = t2.item_id; +----+-------------+------------+------------+-------+---------------+------------+---------+-----------------+--------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+-------+---------------+------------+---------+-----------------+--------+----------+------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 997281 | 100.00 | Using where; Using temporary | | 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 5 | test.t1.item_id | 10 | 100.00 | Distinct | | 2 | DERIVED | t_source | NULL | index | idx_sort | idx_sort | 94 | NULL | 997281 | 100.00 | Using index | +----+-------------+------------+------------+-------+---------------+------------+---------+-----------------+--------+----------+------------------------------+ 3 rows in set, 1 warning (0.00 sec) Compared with no index, although the subquery changes from a full table scan to a full index scan, it still needs to scan 1 million rows of records. Therefore, the query performance improvement is not obvious. (3) Using variables set @a:='1000-01-01 00:00:00'; set @b:=' '; set @f:=0; truncate t_target; insert into t_target select item_id,created_time,modified_time,item_name,other from (select t0.*,if(@a=created_time and @b=item_name,@f:=0,@f:=1) f, @a:=created_time,@b:=item_name from (select * from t_source order by created_time,item_name) t0) t1 where f=1; This time it took 13 seconds, and the query plan was exactly the same as when there was no index. It can be seen that the index has no effect on this writing method. Is it possible to eliminate nesting and query the results with only one layer? (4) Use variables and eliminate nested queries set @a:='1000-01-01 00:00:00'; set @b:=' '; truncate t_target; insert into t_target select * from t_source force index (idx_sort) where (@a!=created_time or @b!=item_name) and (@a:=created_time) is not null and (@b:=item_name) is not null order by created_time,item_name; This time it took 12 seconds, and the query plan is as follows: mysql> explain select * from t_source force index (idx_sort) -> where (@a!=created_time or @b!=item_name) and (@a:=created_time) is not null and (@b:=item_name) is not null -> order by created_time,item_name; +----+-------------+----------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | t_source | NULL | index | NULL | idx_sort | 94 | NULL | 997281 | 99.00 | Using where | +----+-------------+----------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+ 1 row in set, 3 warnings (0.00 sec) This statement has the following characteristics: •The nested subquery is eliminated, and only one full index scan of the t_source table is required, so the query plan is optimal. Prior to MySQL 8, this statement was the best solution for single-threaded deduplication. A careful analysis of this statement reveals that it cleverly utilizes the logical query processing steps and index features of SQL statements. The logical steps of a SQL query are: 1. Perform a Cartesian product (cross join) The logical execution steps of each query statement are a subset of these 11 steps. Take this query statement as an example. Its execution order is: force the search for data rows through the index idx_sort -> apply the where filter -> process the select list -> apply the order by clause. In order for the variables to be assigned and compared in sorted order of created_time and item_name, the data rows must be found in index order. The force index (idx_sort) hint here plays this role and must be written in this way to make the entire duplicate check statement valid. Otherwise, because the table is scanned first and then sorted, the order of variable assignment cannot be guaranteed, and the correctness of the query results cannot be ensured. The order by clause cannot be ignored either. Otherwise, even with the force index prompt, MySQL will use a full table scan instead of a full index scan, resulting in incorrect results. The index also ensures the order of created_time and item_name, avoiding file sorting. Before the query statement starts, initialize the variable to a value that is impossible to appear in the data, and then enter the where clause to judge from left to right. First compare the variable and field values, then assign the created_time and item_name values of this row to the variable, and process each row in the order of created_time and item_name. item_name is a string type, and (@b:=item_name) is not a valid Boolean expression, so it should be written as (@b:=item_name) is not null. Finally, I would like to add that the writing of " 2. Using window functionsThe new window functions in MySQL 8 make the previously troublesome deduplication operation very simple. truncate t_target; insert into t_target select item_id, created_time, modified_time, item_name, other from (select *, row_number() over(partition by created_time,item_name) as rn from t_source) t where rn=1; This statement only takes 12 seconds to execute, and the writing is clear and easy to understand. The query plan is as follows: mysql> explain select item_id, created_time, modified_time, item_name, other -> from (select *, row_number() over(partition by created_time,item_name) as rn -> from t_source) t where rn=1; +----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+ | 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | const | 10 | 100.00 | NULL | | 2 | DERIVED | t_source | NULL | ALL | NULL | NULL | NULL | NULL | 997281 | 100.00 | Using filesort | +----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+ 2 rows in set, 2 warnings (0.00 sec) This query performs a full table scan on the t_source table and uses filesort to sort the table by the partition fields created_time and item_name. The outer query retains one piece of data from each partition. Because any one of the multiple data with repeated From the execution plan, the window function deduplication statement does not seem to be as good as the variable deduplication of nested queries, but this method is actually the fastest. For a description of MySQL window functions, see https://dev.mysql.com/doc/refman/8.0/en/window-functions.html. 3. Multithreaded parallel executionThe single duplicate checking statement has been adjusted to the optimal state, but it is still executed in a single-threaded manner. Can we use multiple processors to execute deduplication operations in parallel using multiple threads, thereby further increasing the speed? For example, my experimental environment is 4 processors. If I use 4 threads to execute the duplicate checking SQL at the same time, theoretically the performance should be improved by nearly 4 times. 1. Data ShardingWhen generating test data, created_time uses the method of adding one second to each record, that is, the maximum and minimum time difference is 500,000 seconds, and the data is evenly distributed, so the data is first divided into 4 parts. (1) Query the created_time boundary values of the four sets of data mysql> select date_add('2017-01-01',interval 125000 second) dt1, -> date_add('2017-01-01',interval 2*125000 second) dt2, -> date_add('2017-01-01',interval 3*125000 second) dt3, -> max(created_time) dt4 -> from t_source; +---------------------+---------------------+---------------------+---------------------+ | dt1 | dt2 | dt3 | dt4 | +---------------------+---------------------+---------------------+---------------------+ | 2017-01-02 10:43:20 | 2017-01-03 21:26:40 | 2017-01-05 08:10:00 | 2017-01-06 18:53:20 | +---------------------+---------------------+---------------------+---------------------+ 1 row in set (0.00 sec) (2) Check the number of records in each data set to confirm that the data is evenly distributed mysql> select case when created_time >= '2017-01-01' -> and created_time < '2017-01-02 10:43:20' -> then '2017-01-01' -> when created_time >= '2017-01-02 10:43:20' -> and created_time < '2017-01-03 21:26:40' -> then '2017-01-02 10:43:20' -> when created_time >= '2017-01-03 21:26:40' -> and created_time < '2017-01-05 08:10:00' -> then '2017-01-03 21:26:40' -> else '2017-01-05 08:10:00' -> end min_dt, -> case when created_time >= '2017-01-01' -> and created_time < '2017-01-02 10:43:20' -> then '2017-01-02 10:43:20' -> when created_time >= '2017-01-02 10:43:20' -> and created_time < '2017-01-03 21:26:40' -> then '2017-01-03 21:26:40' -> when created_time >= '2017-01-03 21:26:40' -> and created_time < '2017-01-05 08:10:00' -> then '2017-01-05 08:10:00' -> else '2017-01-06 18:53:20' -> end max_dt, -> count(*) -> from t_source -> group by case when created_time >= '2017-01-01' -> and created_time < '2017-01-02 10:43:20' -> then '2017-01-01' -> when created_time >= '2017-01-02 10:43:20' -> and created_time < '2017-01-03 21:26:40' -> then '2017-01-02 10:43:20' -> when created_time >= '2017-01-03 21:26:40' -> and created_time < '2017-01-05 08:10:00' -> then '2017-01-03 21:26:40' -> else '2017-01-05 08:10:00' -> end, -> case when created_time >= '2017-01-01' -> and created_time < '2017-01-02 10:43:20' -> then '2017-01-02 10:43:20' -> when created_time >= '2017-01-02 10:43:20' -> and created_time < '2017-01-03 21:26:40' -> then '2017-01-03 21:26:40' -> when created_time >= '2017-01-03 21:26:40' -> and created_time < '2017-01-05 08:10:00' -> then '2017-01-05 08:10:00' -> else '2017-01-06 18:53:20' -> end; +---------------------+---------------------+----------+ | min_dt | max_dt | count(*) | +---------------------+---------------------+----------+ | 2017-01-01 | 2017-01-02 10:43:20 | 249999 | | 2017-01-02 10:43:20 | 2017-01-03 21:26:40 | 250000 | | 2017-01-03 21:26:40 | 2017-01-05 08:10:00 | 250000 | | 2017-01-05 08:10:00 | 2017-01-06 18:53:20 | 250002 | +---------------------+---------------------+----------+ 4 rows in set (4.86 sec) The union of the four data sets should cover the entire source data set, and there should be no duplication between the data. That is to say, the created_time of the four copies of data must be continuous and mutually exclusive. Continuity ensures that all data are processed, and mutual exclusion ensures that there is no need for a second duplicate check. In fact, this is similar to the concept of time range partitioning. Perhaps it is better to use a partitioned table, but the step of rebuilding the table is omitted here. 2. Create a stored procedure for checking for duplicate contentWith the above information we can write 4 statements to process all the data. In order to make the calling interface as simple as possible, create the following stored procedure. delimiter // create procedure sp_unique(i smallint) begin set @a:='1000-01-01 00:00:00'; set @b:=' '; if (i<4) then insert into t_target select * from t_source force index (idx_sort) where created_time >= date_add('2017-01-01',interval (i-1)*125000 second) and created_time < date_add('2017-01-01',interval i*125000 second) and (@a!=created_time or @b!=item_name) and (@a:=created_time) is not null and (@b:=item_name) is not null order by created_time,item_name; else insert into t_target select * from t_source force index (idx_sort) where created_time >= date_add('2017-01-01',interval (i-1)*125000 second) and created_time <= date_add('2017-01-01',interval i*125000 second) and (@a!=created_time or @b!=item_name) and (@a:=created_time) is not null and (@b:=item_name) is not null order by created_time,item_name; end if; end // The execution plan of the query statement is as follows: mysql> explain select * from t_source force index (idx_sort) -> where created_time >= date_add('2017-01-01',interval (1-1)*125000 second) -> and created_time < date_add('2017-01-01',interval 1*125000 second) -> and (@a!=created_time or @b!=item_name) -> and (@a:=created_time) is not null -> and (@b:=item_name) is not null -> order by created_time,item_name; +----+-------------+----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------+ | 1 | SIMPLE | t_source | NULL | range | idx_sort | idx_sort | 6 | NULL | 498640 | 100.00 | Using index condition | +----+-------------+----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------+ 1 row in set, 3 warnings (0.00 sec) The MySQL optimizer performs index range scans and uses index condition pushdown (ICP) to optimize queries. 3. Parallel ExecutionThe following uses shell background process and MySQL Schedule Event to achieve parallelism. (1) Shell background process •Create duplicate_removal.sh file with the following content: #!/bin/bash mysql -vvv -u root -p123456 test -e "truncate t_target" &>/dev/null date '+%H:%M:%S' for y in {1..4} do sql="call sp_unique($y)" mysql -vvv -u root -p123456 test -e "$sql" &>par_sql1_$y.log & done wait date '+%H:%M:%S' • Execute script files The execution output is as follows: [mysql@hdp2~]$./duplicate_removal.sh 14:27:30 14:27:35 This method takes 5 seconds, and the 4 process calls executed in parallel take 4.87 seconds, 4.88 seconds, 4.91 seconds, and 4.73 seconds respectively: [mysql@hdp2~]$cat par_sql1_1.log | sed '/^$/d' mysql: [Warning] Using a password on the command line interface can be insecure. -------------- call sp_unique(1) -------------- Query OK, 124999 rows affected (4.87 sec) Bye [mysql@hdp2~]$cat par_sql1_2.log | sed '/^$/d' mysql: [Warning] Using a password on the command line interface can be insecure. -------------- call sp_unique(2) -------------- Query OK, 125000 rows affected (4.88 sec) Bye [mysql@hdp2~]$cat par_sql1_3.log | sed '/^$/d' mysql: [Warning] Using a password on the command line interface can be insecure. -------------- call sp_unique(3) -------------- Query OK, 125000 rows affected (4.91 sec) Bye [mysql@hdp2~]$cat par_sql1_4.log | sed '/^$/d' mysql: [Warning] Using a password on the command line interface can be insecure. -------------- call sp_unique(4) -------------- Query OK, 125001 rows affected (4.73 sec) Bye [mysql@hdp2~]$ It can be seen that the execution time of each process is 4.85 seconds. Because it is executed in parallel, the total process execution time is the slowest 4.91 seconds, which is 2.5 times faster than the single-threaded speed. (2) MySQL Schedule Event • Create event history log table -- Used to view event execution time and other information create table t_event_history ( dbname varchar(128) not null default '', eventname varchar(128) not null default '', starttime datetime(3) not null default '1000-01-01 00:00:00', endtime datetime(3) default null, issuccess int(11) default null, duration int(11) default null, errormessage varchar(512) default null, randno int(11) default null ); • Create an event for each concurrent thread delimiter // create event ev1 on schedule at current_timestamp + interval 1 hour on completion preserve disable do begin declare r_code char(5) default '00000'; declare r_msg text; declare v_error integer; declare v_starttime datetime default now(3); declare v_randno integer default floor(rand()*100001); insert into t_event_history (dbname, eventname, starttime, randno) #Job namevalues(database(),'ev1', v_starttime,v_randno); begin #Exception handling section declare continue handler for sqlexception begin set v_error = 1; get diagnostics condition 1 r_code = returned_sqlstate , r_msg = message_text; end; #This is the actual user program procedure called call sp_unique(1); end; update t_event_history set endtime=now(3),issuccess=isnull(v_error),duration=timestampdiff(microsecond,starttime,now(3)), errormessage=concat('error=',r_code,', message=',r_msg),randno=null where starttime=v_starttime and randno=v_randno; end // create event ev2 on schedule at current_timestamp + interval 1 hour on completion preserve disable do begin declare r_code char(5) default '00000'; declare r_msg text; declare v_error integer; declare v_starttime datetime default now(3); declare v_randno integer default floor(rand()*100001); insert into t_event_history (dbname, eventname, starttime, randno) #Job namevalues(database(),'ev2', v_starttime,v_randno); begin #Exception handling section declare continue handler for sqlexception begin set v_error = 1; get diagnostics condition 1 r_code = returned_sqlstate , r_msg = message_text; end; #This is the actual user program procedure called call sp_unique(2); end; update t_event_history set endtime=now(3),issuccess=isnull(v_error),duration=timestampdiff(microsecond,starttime,now(3)), errormessage=concat('error=',r_code,', message=',r_msg),randno=null where starttime=v_starttime and randno=v_randno; end // create event ev3 on schedule at current_timestamp + interval 1 hour on completion preserve disable do begin declare r_code char(5) default '00000'; declare r_msg text; declare v_error integer; declare v_starttime datetime default now(3); declare v_randno integer default floor(rand()*100001); insert into t_event_history (dbname, eventname, starttime, randno) #Job namevalues(database(),'ev3', v_starttime,v_randno); begin #Exception handling section declare continue handler for sqlexception begin set v_error = 1; get diagnostics condition 1 r_code = returned_sqlstate , r_msg = message_text; end; #This is the actual user program procedure called call sp_unique(3); end; update t_event_history set endtime=now(3),issuccess=isnull(v_error),duration=timestampdiff(microsecond,starttime,now(3)), errormessage=concat('error=',r_code,', message=',r_msg),randno=null where starttime=v_starttime and randno=v_randno; end // create event ev4 on schedule at current_timestamp + interval 1 hour on completion preserve disable do begin declare r_code char(5) default '00000'; declare r_msg text; declare v_error integer; declare v_starttime datetime default now(3); declare v_randno integer default floor(rand()*100001); insert into t_event_history (dbname, eventname, starttime, randno) #Job namevalues(database(),'ev4', v_starttime,v_randno); begin #Exception handling section declare continue handler for sqlexception begin set v_error = 1; get diagnostics condition 1 r_code = returned_sqlstate , r_msg = message_text; end; #This is the actual user program procedure called call sp_unique(4); end; update t_event_history set endtime=now(3),issuccess=isnull(v_error),duration=timestampdiff(microsecond,starttime,now(3)), errormessage=concat('error=',r_code,', message=',r_msg),randno=null where starttime=v_starttime and randno=v_randno; end // In order to record the execution time of each event, the logic of the operation log table is added to the event definition. Because only one insert and one update are executed in each event, a total of 8 very simple statements are executed for the 4 events, and the impact on the test is negligible. Execution time is accurate to milliseconds. • Trigger event execution mysql -vvv -u root -p123456 test -e "truncate t_target;alter event ev1 on schedule at current_timestamp enable;alter event ev2 on schedule at current_timestamp enable;alter event ev3 on schedule at current_timestamp enable;alter event ev4 on schedule at current_timestamp enable;" This command line triggers four events in sequence, but it does not wait for the previous one to be completed before executing the next one. Instead, it executes the next one immediately. This can be clearly seen from the output of the command: [mysql@hdp2~]$mysql -vvv -u root -p123456 test -e "truncate t_target;alter event ev1 on schedule at current_timestamp enable;alter event ev2 on schedule at current_timestamp enable;alter event ev3 on schedule at current_timestamp enable;alter event ev4 on schedule at current_timestamp enable;" mysql: [Warning] Using a password on the command line interface can be insecure. -------------- truncate t_target -------------- Query OK, 0 rows affected (0.06 sec) -------------- alter event ev1 on schedule at current_timestamp enable -------------- Query OK, 0 rows affected (0.02 sec) -------------- alter event ev2 on schedule at current_timestamp enable -------------- Query OK, 0 rows affected (0.00 sec) -------------- alter event ev3 on schedule at current_timestamp enable -------------- Query OK, 0 rows affected (0.02 sec) -------------- alter event ev4 on schedule at current_timestamp enable -------------- Query OK, 0 rows affected (0.00 sec) Bye [mysql@hdp2~]$ • View event execution log mysql> select * from test.t_event_history; +--------+-----------+-------------------------+-------------------------+-----------+---------+--------------+--------+ | dbname | eventname | starttime | endtime | issuccess | duration | errormessage | randno | +--------+-----------+-------------------------+-------------------------+-----------+---------+--------------+--------+ | test | ev1 | 2019-07-31 14:38:04.000 | 2019-07-31 14:38:09.389 | 1 | 5389000 | NULL | NULL | | test | ev2 | 2019-07-31 14:38:04.000 | 2019-07-31 14:38:09.344 | 1 | 5344000 | NULL | NULL | | test | ev3 | 2019-07-31 14:38:05.000 | 2019-07-31 14:38:09.230 | 1 | 4230000 | NULL | NULL | | test | ev4 | 2019-07-31 14:38:05.000 | 2019-07-31 14:38:09.344 | 1 | 4344000 | NULL | NULL | +--------+-----------+-------------------------+-------------------------+-----------+---------+--------------+--------+ 4 rows in set (0.00 sec) It can be seen that each process takes 4.83 seconds to execute. Because they are executed in parallel, the total execution time is the slowest 5.3 seconds. The optimization effect is almost the same as the shell background process method. Summarize The above is the operation method that I introduced to you to optimize MySQL deduplication operation to the extreme. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website! You may also be interested in:
|
<<: How to change the host name in Linux
>>: Vue realizes click flip effect
[Abstract] This article quickly builds a complete...
The specific code for implementing skinning with ...
Basic environment configuration Please purchase t...
Table of contents Preface design accomplish summa...
Starting from MySQL 8.0.16, you can set a passwor...
Spring integration with springmvc The web.xml con...
Last year, the open letter was a huge hit, even a...
Install mysql under win10 1. Download MySQL from ...
Preface This article mainly introduces the releva...
[LeetCode] 182.Duplicate Emails Write a SQL query...
Messy log Nginx in daily use is mostly used as bo...
A recent business involves such a requirement tha...
First you need to install Vue-cli: npm install -g...
How is Line-height inherited?Write a specific val...
Table of contents Common array methods pop() unsh...