A simple method to merge and remove duplicate MySQL tables

A simple method to merge and remove duplicate MySQL tables

Scenario:

The crawled data generates a data table with the same structure as another main table, which needs to be merged and deduplicated

Solution: (direct example)

First create two tables pep and pep2, where pep is the main table

CREATE TABLE IF NOT EXISTS `pep/pep2`(
`id` INT UNSIGNED AUTO_INCREMENT,
`no` VARCHAR(100) NOT NULL,
PRIMARY KEY ( `id` )
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

Then insert two pieces of data into pep, and insert a piece of data identical to that in pep into pep2

insert into pep(no) values('abc');
insert into pep(no) values('caa');

insert into pep2(no) values('abc');

Insert pep2 data into pep

insert into pep (no) select no from pep2;

Group to recreate a new temporary table tmp

create table tmp select id,no from pep group by no;

Note: After creating this table, the ID field type is no longer a primary key auto-increment

 You may also get an error ```Syntax error or access violation: 1055 Expression #1 of SELECT 
 list is not in GROUP BY clause and contains nonaggregated 
 column 'XXX.Y.ZZZZ' which is not functionally dependent on 
 columns in GROUP BY clause; this is incompatible with
 sql_mode=only_full_group_by
 ```
 Solution: Execute the following two commands:
 ```
 mysql> set global sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
 
 mysql> set session sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
 ```

Delete the pep table and rename the tmp table to pep

drop table pep;
alter table tmp rename to pep;

Check the desc structure and select * from pep and find that the field type of id has changed. You need to change it back to the original type.

alter table pep add primary key (id);
alter table pep modify id int auto_increment;

You can also use join to remove duplicates. To be faster, you can add a field (which can be the md5 value of several fields combined), create a unique index unique for this field, and automatically filter out duplicate data when inserting data in the future.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Summary of three deduplication methods in SQL
  • Detailed example of using the distinct method in MySQL
  • How to optimize MySQL deduplication operation to the extreme
  • MySQL deduplication methods
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • SQL Learning Notes 5: How to remove duplicates and assign values ​​to newly added fields
  • Summary of SQL deduplication methods

<<:  Solve the problem of inconsistent front and back end ports of Vue

>>:  Ubuntu 18.04 obtains root permissions and logs in as root user

Recommend

React encapsulates the global bullet box method

This article example shares the specific code of ...

Solution to nacos not being able to connect to mysql

reason The mysql version that nacos's pom dep...

Why not use UTF-8 encoding in MySQL?

MySQL UTF-8 encoding MySQL has supported UTF-8 si...

Example of how to deploy a Django project using Docker

It is also very simple to deploy Django projects ...

【HTML element】How to embed images

The img element allows us to embed images in HTML...

Summary of 6 skills needed to master web page production

It has to be said that a web designer is a general...

Docker practice: Python application containerization

1. Introduction Containers use a sandbox mechanis...

CSS sets the box container (div) height to always be 100%

Preface Sometimes you need to keep the height of ...

How to call a piece of HTML code together on multiple HTML pages

Method 1: Use script method: Create a common head...

Introduction to document.activeELement focus element in JavaScript

Table of contents 1. The default focus is on the ...

Three ways to configure JNDI data source in Tomcat

In my past work, the development server was gener...

Implementation of CentOS8.0 network configuration

1. Differences in network configuration between C...

Tutorial on installing Ubuntu 20.04 and NVIDIA drivers

Install Ubuntu 20.04 Install NVIDIA drivers Confi...

MySQL detailed explanation of isolation level operation process (cmd)

Read uncommitted example operation process - Read...