Quickly learn MySQL basics

Quickly learn MySQL basics

This article mainly sorts out the basic usage of SQL, which will cover the following aspects:

  • SQL case specification
  • Database types and applicable scenarios
  • SELECT execution process
  • WHERE Usage Guidelines
  • Common functions in MySQL
  • Subquery Classification
  • How to choose appropriate EXISTS and IN subqueries

Understanding SQL

SQL is one of the longest ways we use to interact with data. If divided by function, it can be divided into the following four parts:

  • DDL, data definition language. Define database objects, data tables, and data columns. That is, add, delete, and modify the database and table structure.
  • DML, data manipulation language. Add, delete and modify data tables.
  • DCL, Data Control Language. Define access rights and security levels.
  • DQL, Data Query Language. Used to query data.

When writing SQL, you may find that many SQLs are not written in the same capital letters or lowercase letters. Although this does not affect the execution results of SQL, maintaining a unified writing standard is the key to improving efficiency. Usually, the following principles are followed:

  • Table names, table aliases, field names, field aliases, etc. should be in lowercase.
  • SQL reserved words, function names, bind variables, etc. are written in upper case.
  • In data tables, field names are named with underscores.

The top DBMSs currently are:

  • Relational database: A database built on a relational model. When creating a table, the relationship between the tables is usually represented by an ER diagram.
  • Key-value database: stores data in the form of key-value. The advantage is fast search speed, but the disadvantage is that it cannot use filtering conditions such as WHERE like relational databases. A common scenario is as a content cache.
  • Document-based databases use documents as the basic unit for processing information when saving.
  • Search engine: designed for full-text retrieval. The core principle is "inverted index".
  • Column-based database: Compared with row-based databases such as MySQL, column-based databases store data in columns. Since columns have the same data type, they can be better compressed, thereby reducing the system's I/O. They are suitable for distributed file systems, but their functions are relatively limited.
  • Graph databases use graph data structures to store relationships between entities. For example, the relationships between people in a social network are implemented using a data model consisting of nodes and edges.

Understanding SELECT

SELECT is usually the first keyword you come into contact with when learning SQL. I won’t go into the basics here, but here are some common specifications:

Alias

SELECT name AS n FROM student

Query constants and add a fixed constant column:

SELECT 'student information' as student_info, name FROM student

Remove duplicate rows

SELECT DISTINCT age FROM student

It should be noted that DISTINCT removes duplicates from all subsequent columns. In the following case, the combination of age and name will be removed.

SELECT DISTINCT age,name FROM student

Sort data, ASC stands for ascending order, DESC stands for descending order

For example, sort by name first, and then sort by age if the names are equal.

SELECT DISTINCT age FROM student ORDERY BY name,age DESC

Limit the number of returns

SELECT DISTINCT age FROM student ORDERY BY name DESC LIMIT 5

SELECT execution order

Only by understanding the execution order of SELECT can we write more efficient SQL.

There are two principles for SELECT order:

  • The order of keywords cannot be reversed:
    SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ...
  • SELECT is executed in the following order:
    FROM > WHERE > GROUP BY > HAVING > SELECTed fields > DISTINCT > ORDER BY > LIMIT
SELECT DISTINCT student_id, name, count(*) as num #Sequence 5
FROM student JOIN class ON student.class_id = class.class_id #Sequence 1
WHERE age > 18 #Sequence 2
GROUP BY student.class_id #Sequence 3
HAVING num > 2 #Sequence 4
ORDER BY num DESC #Sequence 6
LIMIT 2 #Sequence 7

Before analyzing this process one by one, we need to know that a virtual table will be generated in each of the above steps, and then this virtual table will be used as input in the next step, but this process is invisible to us:

  1. Starting from the FROM statement, perform a CROSS JOIN Cartesian product operation on the student and class tables to obtain the virtual table vt 1-1;
  2. Through ON screening, filtering is performed on the basis of vt1-1 and then table vt 1-2 is obtained;
  3. Add an external row. If left join, right join and full join are used, external rows will be involved, and external rows will be added on the basis of vt1-2 to obtain vt1-3.
  4. If there are more than two tables, the above steps will be repeated.
  5. After obtaining the final table data of vt1, the filtering stage after WHERE will be executed to obtain table vt2.
  6. Then we go to the GROUP stage and group them to get vt3.
  7. Then in the HAVING stage, the grouped data is filtered to obtain vt4.
  8. Then we enter the SELECT stage to extract the required fields and get vt5-1. Then we go through the DISTINCT stage to filter out the duplicate rows and get vt5-2.
  9. Then sort the specified fields and enter the ORDER BY phase to get vt6.
  10. Finally, in the LIMIT stage, the specified row is taken out, corresponding to vt7, which is the final result.

If function calculations are involved, such as sum(), the aggregate function calculations will be performed after GROUP BY and before HAVING.

Expression calculations, such as age * 10, are calculated after the HAVING phase and before the SELECT phase.

  • From here, we can summarize the first method to improve SQL efficiency:

When using SELECT, specify explicit columns instead of SELECT *. This reduces the amount of network transmission.

Filtering with WHERE

When using WHERE filtering, there are three common methods: comparison operators, logical operators, and wildcards.

For comparison operators, commonly used operators are shown in the following table.

For logical operators, you can connect multiple comparison operators to perform multi-condition screening. Commonly used operators are as follows:

It should be noted that when AND and OR appear at the same time, AND has a higher priority and will be executed first. When (), parentheses are present, they have the highest precedence.

Using wildcard filtering:

like: (%) represents zero or more characters, (_) represents only one character

function

Just like the functions defined in programming languages, SQL also defines some functions for easy use, such as sum, average, length, etc.

Common functions are mainly divided into the following four categories, and the classification principle is based on the data type when defining the column:

  • Arithmetic functions:

  • String functions

It should be noted that when comparing dates using strings, the DATE function should be used for comparison.

  • Date functions

  • Conversion function:

The CAST function does not round off when converting data types. If the original value is a decimal, an error will be reported when converting to an integer.

When converting, you can use the DECIMAL(a,b) function to specify the precision of the decimal. For example, DECIMAL(8,2) means the precision is 8 digits - the decimal plus the integer is at most 8 digits. The maximum number of decimal places is 2.

Then convert it by SELECT CAST(123.123 AS DECIMAL(8,2)).

Aggregate functions

Typically, we use aggregate functions to summarize table data, with the input being a set of data and the output being a single value.

There are five commonly used aggregate functions:

The COUNT function requires extra attention. For details, please refer to this article.

How to group

When counting results, it is often necessary to group the data according to certain conditions, which corresponds to the GROUP BY statement.

For example, count the number of students in each class:

SELECT class_id, COUNT(*) as student_count FROM student \
GROUP BY class_id;

GROUP BY can also be followed by multiple column names to group, for example, by class and gender:

SELECT class_id, sex, COUNT(*) as student_count FROM \
student GROUP BY class_id, sex;

The difference between HAVING filtering and WHERE

Like WHERE, you can filter the grouped data. The difference is that WHERE applies to rows, while HAVING applies to groups.

Moreover, the operations supported by WHERE are also supported by HAVING.

For example, you can filter classes with more than 2 people:

SELECT class_id, COUNT(*) as student_count FROM student \ 
GROUP BY class_id \ 
HAVING student_count > 20;

Subqueries

In more complex situations, nested queries are often performed, such as after obtaining a result, the result is used as input to obtain another set of results.

In SQL, queries can be divided into correlated subqueries and non-correlated subqueries.

Assume the following table structure:

-- ----------------------------
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) NOT NULL DEFAULT '',
 `age` int(3) NOT NULL,
 `sex` varchar(10) NOT NULL DEFAULT '',
 `class_id` int(11) NOT NULL COMMENT 'Class ID',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of Student
-- ----------------------------
INSERT INTO `student` VALUES ('1', '胡一', 13, '男', '1');
INSERT INTO `student` VALUES ('3', '王阿', 11, '女', '1');
INSERT INTO `student` VALUES ('5', '王淇', 12, '男', '1');
INSERT INTO `student` VALUES ('7', '刘伟', 11, '女', '1');
INSERT INTO `student` VALUES ('7', '王识', 11, '女', '2');

-- ----------------------------
DROP TABLE IF EXISTS `student_activities`;
CREATE TABLE `student_activities` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) NOT NULL DEFAULT '',
 `stu_id` int(11) NOT NULL COMMENT 'Class ID',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8;
INSERT INTO `student_activities` VALUES ('1', 'Museum', 1);
INSERT INTO `student_activities` VALUES ('3, 'Spring Outing', 3);

Non-correlated subqueries

The subquery queries the data result from the data table. If this data result is only executed once, then this data result is used as the condition for the main query to be executed next.

Here we want to query the names of students in the same class as Hu Yi:

SELECT name FROM student WHERE class_id = \
(SELECT class_id FROM student WHERE name='胡一')

Here, we first find Hu Yi's class, which is only one query. Then we find students based on the class, which is a non-correlated subquery.

Correlated subqueries

If the subquery needs to be executed multiple times, a loop is used, starting with the external query, passing in the subquery each time for query, and then feeding the results back to the external query.

For example, let's query the names of students who are older than the average age in each class:

SELECT name FROM student as s1 WHERE age > 
	(SELECT AVG(age) FROM student as s2 where s1.class_id = s2.class_id)

Here, based on each student's class information, find out the average age of the corresponding class and then make a judgment. Each time a subquery is executed, it needs to be calculated based on the outer query. Such a subquery is a correlated subquery.

EXISTS Subquery

In correlated subqueries, it is often used together with EXISTS. Used to determine whether the condition is met, if it is met, it is True, if it is not met, it is False.

For example, query the names of students who have participated in school activities:

SELECT NAME FROM student as s where \
	EXISTS(SELECT stu_id FROM student_activities as sa where sa.stu_id=s.id)

Similarly, NOT EXISTS means that it does not exist. If it satisfies the condition, it is FALSE, and if it does not, it is True.

For example, query the names of students who have not participated in school activities:

SELECT NAME FROM student as s where \
	NOT EXISTS(SELECT stu_id FROM student_activities as sa where sa.stu_id=s.id)

Set comparison subqueries

You can use set operators in subqueries to compare results.

Let's use IN to query the names of students who participated in school activities:

SELECT name FROM student WHERE id IN (SELECT stu_id FROM student_activities)

Difference between EXISTS and IN

Since both EXISTS and IN can achieve the same function, what is the difference between them?

Now suppose we have table A and table B, where both A and B have field cc, and b+ index is created for cc, where table A has n records and table B has m indexes.

The abstract mode is:

SELECT * FROM A WHERE cc IN (SELECT cc FROM B)

SELECT * FROM A WHERE EXIST (SELECT cc FROM B WHERE B.cc=A.cc)

For EXISTS, the outer table will be looped one by one first. After each result of the outer table is obtained, it will be brought into the inner table of the subquery to determine whether the value exists.

The pseudo code is similar to the following:

for i in A
for j in B
if j.cc == i.cc:
return result

First, look at the appearance A. Every line needs to be traversed, so it takes n times. When querying the inner table B, the query efficiency becomes log(m) B + tree height instead of m due to the use of index.

So the total efficiency is: n * log(m)

Therefore, when the number of items in table A is significantly smaller than that in table B, it is recommended to use the EXISTS query.

Looking at IN again, the internal table B will be queried first, and then the external table A will be used for judgment. The pseudo code is as follows:

for i in B
for j in A
if j.cc == i.cc:
return result

Since all the data in the inner table must be searched first, the number of times required is m. Looking at the outer table A, since the cc index is used, n can be simplified to log(n), that is, m * log(n).

Therefore, when the data in table A is significantly larger than that in table B, it is recommended to use IN query.

To summarize, for IN and EXISTS, the principle of using a small table to drive a large table is adopted.

Here we expand on the difference between NOT EXISTS and NOT IN:

SELECT * FROM A WHERE cc NOT IN (SELECT cc FROM B) 

SELECT * FROM A WHERE NOT EXIST (SELECT cc FROM B WHERE B.cc=A.cc)

For NOT EXITS, like EXISTS, the index of cc can be used for the inner table. Applicable when table A is smaller than table B.

But for NOT IN, it is different from IN. Since cc is set with an index, cc IN (1, 2, 3) can be converted to WHERE cc=1 OR cc=2 OR cc=3, and the cc index can be used normally. But for NOT IN, it is converted into cc!=1 OR cc!=2 OR cc!=3. At this time, since it is an inequality query, the index cannot be used, and the entire table is scanned.

That is to say, when an index is set, NOT EXISTS is more efficient than NOT IN.

But for the case without an index, IN and OR are different:

1. Different operations
1. in: in is a hash connection between the parent query table and the child query table.
2. or: or loops the parent query table, and queries the child query table again each time the loop is repeated.

2. Different application scenarios
1. in: in is suitable for situations where the subquery table has more data than the parent query table.
2. or: or is suitable for situations where the data in the subquery table is less than that in the parent query table.

3. Different efficiency
1. in: In the absence of an index, the execution efficiency of in will not decrease significantly as the amount of data after in increases.
2. or: In the absence of an index, the execution efficiency of or will decrease significantly as the amount of data after or increases.

Summarize

This article mainly summarizes some basic knowledge of SQL:

When using SELECT queries, by explicitly specifying the column names, you can reduce IO transmission and improve efficiency.

And it should be noted that the SELECT query process starts from FROM and ends with LIMIT. Understanding the overall process can help us better organize SQL.

Then the WHERE filtering operators and commonly used functions are introduced in detail. Here, it should be noted that the DATE function should be used when comparing time, and how to group and filter data.

Finally, the applicable scenarios of subqueries, IN and EXISTS are emphasized.

The above is the details of quickly learning the basics of MySQL. For more information about the basics of MySQL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Summary of basic knowledge points of MySql database
  • A summary after learning MySQL (Basics)
  • MySQL database basic commands (collection)
  • Introduction to MySQL (I) Basic operations of data tables and databases
  • MySQL series tutorials for beginners

<<:  JavaScript implements select all and unselect all operations

>>:  How to check whether a port is occupied in LINUX

Recommend

MySQL group query optimization method

MySQL handles GROUP BY and DISTINCT queries simil...

Nginx configuration and compatibility with HTTP implementation code analysis

Generate SSL Key and CSR file using OpenSSL To co...

How to implement form validation in Vue

1. Installation and use First, install it in your...

What are the benefits of semantic HTML structure?

one: 1. Semantic tags are just HTML, there is no ...

The simplest form implementation of Flexbox layout

Flexible layout (Flexbox) is becoming increasingl...

How to set up Spring Boot using Docker layered packaging

The Spring Boot project uses docker containers, j...

Steps to set up Windows Server 2016 AD server (picture and text)

Introduction: AD is the abbreviation of Active Di...

Summary of bootstrap learning experience-css style design sharing

Due to the needs of the project, I plan to study ...

Solution to PHP not being able to be parsed after nginx installation is complete

Table of contents Method 1 Method 2 After install...

Unbind SSH key pairs from one or more Linux instances

DetachKeyPair Unbind SSH key pairs from one or mo...

Correct use of MySQL partition tables

Overview of MySQL Partitioned Tables We often enc...

How to set Tomcat as an automatically started service? The quickest way

Set Tomcat to automatically start the service: I ...