Why do code standards require SQL statements not to have too many joins?

Why do code standards require SQL statements not to have too many joins?

Free points

Interviewer : Have you ever used Linux?

Me : Yes

Interviewer : I want to check the memory usage. What command should I use?

Me : free or top

Interviewer : Then tell me what information you can see using the free command

Me : Well, as shown in the figure below, you can see the usage of memory and cache

  • total total memory
  • used memory used
  • free free memory
  • buff/cache used cache
  • avaiable available memory

img

Interviewer : Do you know how to clean up the used cache (buff/cache)?

Me : em… I don’t know

Interviewer : sync; echo 3 > /proc/sys/vm/drop_caches can clean up buff/cache . Do you think it is good for me to execute this command online?

img

Me : (Free points, very happy) There are great benefits. By clearing the cache, we will have more available memory space, just like the little rocket of xx guard on PC, click it and it will release a lot of memory.

Interviewer : Em..., go back and wait for notification

SQL Join

Interviewer : Let's change the topic and talk about your understanding of join.

Me : OK (If I answer wrong again, it will be all over, so I’ll seize the opportunity)

review

join in SQL can combine the specified tables according to certain conditions and return the data to the client

The join methods are

inner join join

img

left join join

img

right join

img

full join

img

Interviewer : If you need to use join statements in project development, how can you optimize and improve performance?

Me : There are two situations, small data scale and large data scale.

Interviewer : And then?

Me : For

  • The data size is small, so just put it all into memory.
  • Large data size

You can optimize the execution speed of join statements by adding indexes. You can reduce the number of join by using redundant information. Try to reduce the number of table joins. The number of table joins in one SQL statement should not exceed 5.

Interviewer : So we can conclude that join statements are relatively performance-intensive, right?

Me : Yes

Interviewer : Why?

Buffer

Me : When executing a join statement, there must be a comparison process.

Interviewer : Yes

Me : Comparing two tables one by one is slow, so we can read the data from the two tables into a內存塊one by one. Taking MySQL's InnoDB engine as an example, we can definitely find the relevant memory area using the following statement: show variables like '%buffer%'

img

As shown in the figure below, the size of join_buffer_size will affect the execution performance of our join statement

Interviewer : Besides that?

A major premise

Me : Any project will eventually go online, and it is inevitable to generate data, and the scale of data cannot be too small.

Interviewer : That’s right.

Me : Most of the data in the database will eventually be saved to硬盤and stored in the form of files.

Take MySQL's InnoDB engine as an example

  • InnoDB usesas the basic IO unit, and the size of each page is 16KB
  • InnoDB creates an .ibd file for each table to store data

img

verify

img

Me : This means that we need to read as many files as we have to connect to the tables. Although we can use indexes, we still have to move the hard disk head frequently.

Interviewer : So frequent head movement will affect performance, right?

Me : Yes, don’t the current open source frameworks like to say that they have greatly improved performance through sequential reading and writing, such as hbase and kafka ?

Interviewer : That's right. Do you think Linux has optimized this? Hint: You can run free command again to see.

Me : Why is the cache taking up more than 1.2G?

img

img

Interviewer : Have you ever thought about

  • What is stored in buff/cache ?
  • Why does buff/cache occupy so much memory, while the availlable memory is still 1.1G ?
  • Why can you clean up the memory used by buff/cache with two commands, but you can only release used by ending the process?

Taste, taste carefully

After thinking for a few minutes

img

Me : If you just release the memory used by buff/cache so casually, it means it is not important, and clearing it will not affect the operation of the system.

Interviewer : Not entirely true.

Me : Could it be? I remember a sentence in "CSAPP" (In-depth Understanding of Computer Systems)

The essence of the memory hierarchy is that each storage layer is a cache for the devices in the lower layers.

img

Translated into human language, it means that Linux will treat the memory as a hard disk cache

Interviewer : Now you know how to answer that easy question, right?

Me : I…

img

Join Algorithm

Interviewer : Let me give you another chance. If I ask you to implement the Join algorithm, what would you do?

Me : If there is no index, nested loops will do the trick. If there is an index, you can use it to improve performance.

Interviewer : Let’s talk about join_buffer What do you think is stored in join_buffer ?

Me : During the scan process, the database will select a table and put the data it wants to return and compare with other tables into join_buffer

Interviewer : How do you handle it when there is an index?

Me : This is relatively simple. Just read the index trees of the two tables and compare them. I will introduce the method of dealing with no index.

Nested Loop Join

img

Nested loops only read one row of data in the table each time. That is to say, if outerTable has 100,000 rows of data and innerTable has 100 rows of data, it needs to be read 10000000 times (assuming that the files of these two tables are not cached to memory by the operating system, we call them cold data tables)

Of course, no database engine uses this algorithm now (too slow)

Block nested loop

img

Block , that is, each time a piece of data is taken into memory to reduce I/O overhead

MySQL InnoDB uses this algorithm when no index is available.

Consider the following two tables t_a and t_b

img

When the join operation cannot be performed using the index, InnoDB automatically uses the Block nested loop algorithm.

img

Summarize

When I was in school, the database teacher liked to test me on database paradigms. It was not until I started working that I learned that performance should be the basis for everything. If redundancy is possible, then use it. If it is really not possible, then join if join really affects performance. Try increasing your join_buffer_size , or changing the SSD.

This concludes the article on why code standards require SQL statements not to have too many joins. For more information on why SQL statements should not have too many joins, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief discussion on the underlying principle of mysql join
  • Analysis of usage scenarios of JOIN in SQL statements
  • MYSQL database basics - Join operation principle
  • How to solve the problem of invalid left join in MySQL and the precautions for its use
  • The process of quickly converting mysql left join to inner join
  • MySQL efficient query left join and group by (plus index)
  • MySQL join buffer principle
  • Detailed explanation of various join summaries of SQL

<<:  CSS code to distinguish ie8/ie9/ie10/ie11 chrome firefox

>>:  The difference between the name and id of the a tag's target pointing to the iframe

Recommend

JavaScript example code to determine whether a file exists

1. Business Scenario I have been doing developmen...

How to add fields and comments to a table in sql

1. Add fields: alter table table name ADD field n...

Sample code for installing Jenkins using Docker

Two problems that are easy to encounter when inst...

Centos builds chrony time synchronization server process diagram

My environment: 3 centos7.5 1804 master 192.168.1...

Summary of 10 common HBase operation and maintenance tools

Abstract: HBase comes with many operation and mai...

Detailed tutorial on installing Docker on CentOS 8

1. Previous versions yum remove docker docker-cli...

How to set up Spring Boot using Docker layered packaging

The Spring Boot project uses docker containers, j...

Implementation of CSS child element selection parent element

Usually a CSS selector selects from top to bottom...

Detailed explanation of desktop application using Vue3 and Electron

Table of contents Vue CLI builds a Vue project Vu...

jQuery implements a simple carousel effect

Hello everyone, today I will share with you the i...

Install Kafka in Linux

Table of contents 1.1 Java environment as a prere...

Common HTML tag writing errors

We better start paying attention, because HTML Po...

Summary of 7 reasons why Docker is not suitable for deploying databases

Docker has been very popular in the past two year...