A practical record of checking and processing duplicate MySQL records on site

A practical record of checking and processing duplicate MySQL records on site

Preface

I was on a business trip at a customer site to deploy and debug software. One day, the customer suddenly came to me and said that there was an abnormality in the data produced on site. The most direct manifestation was that the same label appeared on multiple materials. They needed my cooperation to see how to troubleshoot the problem.

analyze

The customer put a pile of boxes with duplicate labels in front of me. I was so scared that I was afraid I was in big trouble.

After a little thought, what we need to do now is to query the database for duplicate tags, that is, count a tag and determine the ones with count > 1.

Emmm, grammatical error, I remember there was a "Having", let's try it instead

Wow, there are more than 1,500 duplicate tags. Let's count the total number of questions and group them to see the percentage of duplicate tags.

By the way, first throw these duplicate label data to the customer to trace the product (fortunately Navicat supports copying data)

Data Total

The result of the above query record is a temporary table. Based on this, sum() is used to sum

There are a lot of duplicate records, which is a bit of a big problem.

Repetition rate

Change the query method of the previous query table, that is, group the count data again

From the results, most of the problem data were repeated twice.

The difference between where and having

Where is a constraint statement that constrains the query conditions in the database before the query results are returned. That is, it takes effect before the results are returned, and aggregate functions cannot be used after where.

Having is a filtering statement. The so-called filtering is performed after the results of the database query are returned, that is, it takes effect after the results are returned, and the aggregation function can be used after having.

The so-called aggregate function is a function that calculates a set of values ​​and returns a single value: sum---sum, count---count, max---maximum value, avg---average value, etc.

Summarize

In the query operation of MySQL, we usually use multi-table joint query, use the query results as the intermediate table for multiple queries, group the query results, perform statistical aggregation and other operations

If function calculation or grouping is used in the query operation, the typical feature is to merge multiple records in the original table into one, and having is required to filter the results of these operations. In addition, use where to conditionally filter the table records.

This is the end of this article about the troubleshooting and processing of duplicate MySQL records on site. For more information about the troubleshooting and processing of duplicate MySQL records, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL's method of dealing with duplicate data (preventing and deleting)
  • MySQL study notes on handling duplicate data
  • How to handle concurrent updates of MySQL data
  • Detailed explanation of MySQL execution principle, logical layering, and changing database processing engine
  • Some methods to optimize query speed when MySQL processes massive data
  • MySQL data processing sorting and explaining the operations of adding, deleting and modifying

<<:  CSS HACK for IE6/IE7/IE8/IE9/FF (summary)

>>:  Detailed explanation of CSS3 elastic expansion box

Recommend

How to use mysqladmin to get the current TPS and QPS of a MySQL instance

mysqladmin is an official mysql client program th...

The whole process of installing mysql5.7.22 under ARM64 architecture

MySQL download address: https://obs.cn-north-4.my...

How to Change Colors and Themes in Vim on Linux

Vim is a text editor that we use very often in Li...

Docker Gitlab+Jenkins+Harbor builds a persistent platform operation

CI/CD Overview CI workflow design Git code versio...

Reasons and solutions for the failure of React event throttling effect

Table of contents The problem here is: Solution 1...

Detailed explanation of samba + OPENldap to build a file sharing server

Here I use samba (file sharing service) v4.9.1 + ...

Learn one minute a day to use Git server to view debug branches and fix them

Debug branch During the normal development of a p...

Detailed explanation of Linux text editor Vim

Vim is a powerful full-screen text editor and the...

JavaScript to achieve digital clock effect

This article example shares the specific code of ...

Ubuntu 16.04 image complete installation tutorial under VMware

This article shares with you the installation tut...

MySQL trigger simple usage example

This article uses examples to illustrate the simp...