MySQL method steps to determine whether it is a subset

MySQL method steps to determine whether it is a subset

1. Problem

The story originated from a report that queries the error and omission rate: there are two query results, one for the items that have been added to the report and the other for the items that should be added to the report.

What does it mean to be without omission? That is, all items that should be added have been added

The reporting completeness rate is the ratio of the number of complete reports to the total number of reports.

Here are two examples of reports (one with all added and one with missing parts)

First, find the first result - the items that should be added to the report

SELECT 
     r.id AS report ID, m.project_id should be added to the project FROM 
  report 
  INNER JOIN application a ON r.app_id=a.id
  INNER JOIN application_sample s ON a.id=s.app_id
  RIGHT JOIN application_sample_item si ON s.id=si.sample_id       
  RIGHT JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id;

Then, find the second result - report the items that have been added

SELECT r.id AS report_id,i.project_id AS added_project FROM report r 
RIGHT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927');

The above is the result set we want to compare. It is not difficult to see that report 44927 is complete, while 44930 has the same number of items, but actually adds 758 items and lacks 112 items, so it is a missing report.

2. Solution

Judging from the question, it is obviously a question of judging whether it is a subset. You can traverse the added items and the items that should be added separately. If the items that should be added can be matched in the added items, it means that the items that should be added are a subset of the added items, that is, there are no omissions.

This problem can indeed be solved by looping and comparing, but the cross join of Cartesian products in SQL often means huge overhead and slow query speed. So is there any way to avoid this problem?

Option 1:

With the help of the functions FIND_IN_SET and GROUP_CONCAT, first understand the following two functions

FIND_IN_SET(str,strlist)

  • str: the string to be queried
  • strlist: parameters are separated by English "," such as (1,2,6,8,10,22)

The FIND_IN_SET function returns the position of the string to be queried in the target string.

GROUP_CONCAT( [distinct] Field to be connected [order by sort field asc/desc ] [separator 'separator'] )

The GROUP_CONCAT() function can concatenate the values ​​of the same field of multiple records into one record and return it. The default separator is English ',' .

However, the default length of GROUP_CONCAT() is 1024

Therefore, if the length of the splicing needs to exceed 1024, it will cause incomplete truncation and the length needs to be modified.

SET GLOBAL group_concat_max_len=102400;
SET SESSION group_concat_max_len=102400;

From the above two function introductions, we find that FIND_IN_SET and GROUP_CONCAT are separated by English ',' (in bold)

Therefore, we can use GROUP_CONCAT to concatenate the items that have been added into a string, and then use FIND_IN_SET to query one by one whether the items to be added exist in the string.

1. Modify the SQL in the description of the problem and use GROUP_CONCAT to concatenate the items of the added items into a string

SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id; 

2. Use FIND_IN_SET to check one by one whether the items to be added exist in the string

SELECT Q.id,FIND_IN_SET(W.List of items to be added, Q.List of items already added) AS Is FROM missing 
   (
   -- Report the added projects SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
      LEFT JOIN report_item i ON r.id=i.report_id
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
   )Q,
   (
   -- Report items that should be addedSELECT 
         r.id,s.app_id,m.project_id should add project list FROM 
         report 
         INNER JOIN application a ON r.app_id=a.id
         INNER JOIN application_sample s ON a.id=s.app_id
         INNER JOIN application_sample_item si ON s.id=si.sample_id       
         INNER JOIN set_project_mapping m ON si.set_id=m.set_id
      WHERE r.id IN ('44930','44927')
      ORDER BY r.id,m.project_id
   )W
   WHERE Q.id=W.id;

3. Filter out missed reports

 SELECT Q.id,CASE WHEN FIND_IN_SET(W.List of items to be added, Q.List of items already added)>0 THEN 1 ELSE 0 END AS Is FROM missing 
   (
   -- Report the added projects SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
      LEFT JOIN report_item i ON r.id=i.report_id
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
   )Q,
   (
   -- Report items that should be addedSELECT 
         r.id,s.app_id,m.project_id should add project list FROM 
         report 
         INNER JOIN application a ON r.app_id=a.id
         INNER JOIN application_sample s ON a.id=s.app_id
         INNER JOIN application_sample_item si ON s.id=si.sample_id       
         INNER JOIN set_project_mapping m ON si.set_id=m.set_id
      WHERE r.id IN ('44930','44927')
      ORDER BY r.id,m.project_id
   )W
   WHERE Q.id=W.id
   GROUP BY Q.id
   HAVING COUNT(`Is it missing`)=SUM(`Is it missing`);

4. Our ultimate goal is to find the zero omission rate

 SELECT COUNT(X.id) number of reports without missing items, Y.total total number of reports, CONCAT(FORMAT(COUNT(X.id)/Y.total*100,2),'%') AS project without missing items rate FROM 
(
  SELECT Q.id,CASE WHEN FIND_IN_SET(W.List of items to be added, Q.List of items already added)>0 THEN 1 ELSE 0 END AS Is FROM missing 
   (
   -- Report the added projects SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS added project list FROM report r 
      LEFT JOIN report_item i ON r.id=i.report_id
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
   )Q,
   (
   -- Report items that should be addedSELECT 
         r.id,s.app_id,m.project_id should add project list FROM 
         report 
         INNER JOIN application a ON r.app_id=a.id
         INNER JOIN application_sample s ON a.id=s.app_id
         INNER JOIN application_sample_item si ON s.id=si.sample_id       
         INNER JOIN set_project_mapping m ON si.set_id=m.set_id
       WHERE r.id IN ('44930','44927')
    ORDER BY r.id,m.project_id
   )W
   WHERE Q.id=W.id
   GROUP BY Q.id
   HAVING COUNT(`Is it missing`)=SUM(`Is it missing`)
 )X,
 (
    -- Total number of reports SELECT COUNT(E.nums) AS total FROM
    (
      SELECT COUNT(r.id) AS nums FROM report r 
      WHERE r.id IN ('44930','44927')
      GROUP BY r.id
    )E    
 )Y 
 ;

Option 2:

Although the above solution 1 avoids line-by-line traversal and comparison, it is essentially a one-by-one comparison of items. So is there any way to avoid comparison?

Of course the answer is yes. We can determine whether it is fully included based on the statistical quantity.

1. Use union all to join the added items with the items to be added without removing duplicates.

 (
 -- Items that should be addedSELECT 
  r.id,m.project_id
FROM 
   report 
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id       
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
 -- Added project select r.id,i.project_id from report r,report_item i 
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
 )

From the results, we can see that there are duplicate items under the same report, which represent the items that should be added and the items that have been added.

2. According to the joint table results, the number of overlapping items in the statistical report

# Should add the number of items that overlap with those already added select tt.id,count(*) count from 
(
   select t.id,t.project_id,count(*) from 
   (
      (
        -- Items that should be addedSELECT 
          r.id,m.project_id
        FROM 
          report 
          INNER JOIN application a ON r.app_id=a.id
          INNER JOIN application_sample s ON a.id=s.app_id
          INNER JOIN application_sample_item si ON s.id=si.sample_id       
          INNER JOIN set_project_mapping m ON si.set_id=m.set_id
        WHERE r.id IN ('44930','44927')
        ORDER BY r.id,m.project_id
      )
      UNION ALL
      (
        -- Added project select r.id,i.project_id from report r,report_item i 
        where r.id = i.report_id and r.id IN ('44930','44927')
        group by r.app_id,i.project_id
      )
      
   )
   GROUP BY t.id,t.project_id
   HAVING count(*) >1 
) tt group by tt.id 

3. Compare the amount in the second step with the amount that should be added. If they are equal, it means there is no omission.

select bb.id, aa.count has been added, bb.count needs to be added,
    CASE WHEN aa.count/bb.count=1 THEN 1
    ELSE 0
    END AS 'Is it missing' 
from 
(
# Should add the number of items that overlap with those already added select tt.id,count(*) count from 
(
   select t.id,t.project_id,count(*) from 
   (
      (
        -- Items that should be addedSELECT 
          r.id,m.project_id
        FROM 
          report 
          INNER JOIN application a ON r.app_id=a.id
          INNER JOIN application_sample s ON a.id=s.app_id
          INNER JOIN application_sample_item si ON s.id=si.sample_id       
          INNER JOIN set_project_mapping m ON si.set_id=m.set_id
        WHERE r.id IN ('44930','44927')
        ORDER BY r.id,m.project_id
      )
      UNION ALL
      (
        -- Added project select r.id,i.project_id from report r,report_item i 
        where r.id = i.report_id and r.id IN ('44930','44927')
        group by r.app_id,i.project_id
      )
      
   )
   GROUP BY t.id,t.project_id
   HAVING count(*) >1 
) tt group by tt.id 
) aa RIGHT JOIN
(
  -- The number of items that should be addedSELECT 
    r.id,s.app_id,COUNT(m.project_id) count
  FROM 
    report 
    INNER JOIN application a ON r.app_id=a.id
    INNER JOIN application_sample s ON a.id=s.app_id
    INNER JOIN application_sample_item si ON s.id=si.sample_id       
    INNER JOIN set_project_mapping m ON si.set_id=m.set_id
  WHERE r.id IN ('44930','44927')
  GROUP BY r.id
  ORDER BY r.id,m.project_id
) bb ON aa.id = bb.id 
ORDER BY aa.id 

4. Find the no-missing rate

select 
    SUM(asr.`Is it missing?) AS No missing number, COUNT(asr.id) AS Total number, CONCAT(FORMAT(SUM(asr.`Is it missing?)/COUNT(asr.id)*100,5),'%') AS Report no missing rate from 
(
  select bb.id, aa.count has been added, bb.count needs to be added,
      CASE WHEN aa.count/bb.count=1 THEN 1
      ELSE 0
      END AS 'Is it missing' 
  from 
  (
  # Should add the number of items that overlap with those already added select tt.id,count(*) count from 
  (
     select t.id,t.project_id,count(*) from 
     (
        (
          -- Items that should be addedSELECT 
            r.id,m.project_id
          FROM 
            report 
            INNER JOIN application a ON r.app_id=a.id
            INNER JOIN application_sample s ON a.id=s.app_id
            INNER JOIN application_sample_item si ON s.id=si.sample_id       
            INNER JOIN set_project_mapping m ON si.set_id=m.set_id
          WHERE r.id IN ('44930','44927')
          ORDER BY r.id,m.project_id
        )
        UNION ALL
        (
          -- Added project select r.id,i.project_id from report r,report_item i 
          where r.id = i.report_id and r.id IN ('44930','44927')
          group by r.app_id,i.project_id
        )
        
     )
     GROUP BY t.id,t.project_id
     HAVING count(*) >1 
  ) tt group by tt.id 
  ) aa RIGHT JOIN
  (
    -- The number of items that should be addedSELECT 
      r.id,s.app_id,COUNT(m.project_id) count
    FROM 
      report 
      INNER JOIN application a ON r.app_id=a.id
      INNER JOIN application_sample s ON a.id=s.app_id
      INNER JOIN application_sample_item si ON s.id=si.sample_id       
      INNER JOIN set_project_mapping m ON si.set_id=m.set_id
    WHERE r.id IN ('44930','44927')
    GROUP BY r.id
    ORDER BY r.id,m.project_id
  ) bb ON aa.id = bb.id 
  ORDER BY aa.id
) asr;

This is the end of this article about the steps to determine whether MySQL is a subset. For more information about how to determine whether MySQL is a subset, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySql8 WITH RECURSIVE recursive query parent-child collection method

<<:  W3C Tutorial (9): W3C XPath Activities

>>:  What is Nginx load balancing and how to configure it

Recommend

What are the image file formats and how to choose

1. Which three formats? They are: gif, jpg, and pn...

MySQL quickly inserts 100 million test data

Table of contents 1. Create a table 1.1 Create te...

HTML markup language - table tag

Click here to return to the 123WORDPRESS.COM HTML ...

Detailed explanation of the benefits of PNG in various network image formats

BMP is an image file format that is independent o...

Web Design Help: Web Font Size Data Reference

<br />The content is reproduced from the Int...

JavaScript design pattern chain of responsibility pattern

Table of contents Overview Code Implementation Pa...

How to use the dig/nslookup command to view DNS resolution steps

dig - DNS lookup utility When a domain name acces...

Newbies quickly learn the steps to create website icons

<br />Original URL: http://www.lxdong.com/po...

Implementation of MySQL's MVCC multi-version concurrency control

1 What is MVCC The full name of MVCC is: Multiver...

Solution to "Specialized key was too long" in MySQL

Table of contents Solution 1 Solution 2 When crea...

Detailed explanation of fetch network request encapsulation example

export default ({ url, method = 'GET', da...

Summary of commonly used operators and functions in MySQL

Let’s build the data table first. use test; creat...

Detailed explanation of Vue custom instructions and their use

Table of contents 1. What is a directive? Some co...

Idea configures tomcat to start a web project graphic tutorial

Configure tomcat 1. Click run configuration 2. Se...