This article uses an example to illustrate how to remove duplicate entries in MySQL optimization tips. Share with you for your reference, the details are as follows: When it comes to deduplication, my head can't help but hurt, especially when the output volume is large. After all, I am not a professional DB, so I can only come up with a deduplication method that suits me. First, follow the normal first paragraph, use the having function to check for duplicates, and then delete them one by one. Don’t ask me how to write the SQL for having to detect duplicates, you know it. . . This is fine when there are only a few duplicates. What if there are thousands or tens of thousands of different data duplicates? . . When we are done, let's consider that when we use the having function to query, the original SQL is as follows: select `name`,count(*) as count from sentence group by `name` having count>1 You can run it and feel it, when there are about 5 million data, the speed is sesame oil. . . However, without considering optimizing its running speed, let's consider using this statement to remove duplicate items from the array. First of all, we need to know what data we need to use when deleting duplicate data. The id is for sure, and then the filtering conditions, right? So, what is missing from the data queried by the SQL above? ID. . . Then let's add this field and try it out: select id,`name`,count(*) as count from sentence group by `name` having count>1 As a result, the three fields id, name, and count will be displayed. You can run it yourself to see the specific effect. Based on this data, we can remove duplicates. The specific SQL design is to delete the id that is not the id we queried, but the name value is the value we queried. Because our data is not one, it has to be processed by the program. All ids are concatenated with commas, and the name value is processed with quotation marks and commas. After that, it can be used. The example is below: delete from sentence where id not in(1,2,3) and name in('a','b','c') If there is too much data, we can write it into a script and then perform the operation in batches when we are done. Hey, now we can remove duplicates, but the speed is still a problem. Next, we need to consider how to optimize this SQL to increase its speed, and then we will be done. Since we are going to improve the running speed of our SQL, according to common sense, the first thing we should think of is the index. Okay, without further ado, let's create an index first. But what about creating that field? ? ? This is another problem. This principle is based on the fields that can be distinguished by your name field. For example, my name field stores brand names, and then I have an industry field to store the industry of each brand, so I created an index on my industry field. Of course, there are more suitable ones, it depends on how you think about it. Without further ado, let's take a look at our optimized sql: select id,`name`,count(*) as count from sentence where industry = 'Drinks' group by `name` having count>1 The results are as follows: What does the result show? It shows that our index is in use. . . As for the speed, you should all understand it even without me telling you. After that, we can use the program to concatenate all the IDs with commas, and process the name values with quotes and commas. After that, we can proceed to the task of removing duplicates. The efficiency has increased significantly. . . However, some readers may use conditions that contain likes and other conditions that will invalidate the index. In that case, we can simply classify the data, check the data in each category separately, and after all the data are queried, we can use the program to check for duplicates and remove the data needed for deletion. Appendix: MySQL RAND query optimization & random query optimization When it comes to random query, everyone knows to use the rand() function, but when the amount of data reaches a certain level, the query efficiency can be imagined. so what? Let's consider optimizing this query. My optimization method is to use a program to randomize and then use limit to get the value. Let’s record the general idea. First, query the number of data that meets the conditions, then use PHP's rand function to randomly select a value within this numerical range, and then query it directly. Example sql: Everyone can try it. With about five million pieces of data, the query speed is at least ten times faster than the MySQL RAND function query method. Okay, that’s all for this sharing. Readers who are interested in more MySQL-related content can check out the following topics on this site: "MySQL query skills", "MySQL common functions summary", "MySQL log operation skills", "MySQL transaction operation skills summary", "MySQL stored procedure skills" and "MySQL database lock related skills summary" I hope this article will be helpful to everyone's MySQL database design. You may also be interested in:
|
<<: Vue development tree structure components (component recursion)
>>: How to use regular expressions to automatically match wildcard domain names in nginx
There are few and inadequate installation tutoria...
Introduction to Docker Docker is an open source c...
Recently, I need to frequently use iframe to draw ...
Syntax format: row_number() over(partition by gro...
Application scenario 1: Domain name-based redirec...
Docker is an open source engine that makes it eas...
This article shares the Java connection MySQL und...
Preface Those who have played with MySQL must be ...
Table of contents Introduction and Demo API: Cont...
Preface Introduction Lombok is a handy tool, just...
Use self:: or __CLASS__ to get a static reference...
ReactRouter implementation ReactRouter is the cor...
You can use the trigger method. There is no native...
Table of contents 1. Purpose 2. Grammar 3. Practi...
I have been in contact with PHP for so long, but ...