The MySQL development team officially released the MySQL 8.0.18 GA version on October 14, 2019, bringing some new features and enhancements. The most eye-catching feature is that multi-table join queries support hash join mode. Let's first take a look at the official description: MySQL implements a hash join method for inner join queries. For example, starting with MySQL 8.0.18, the following query can use hash join for join query: SELECT * FROM t1 JOIN t2 ON t1.c1=t2.c1; Hash join does not require index support. In most cases, hash join is more efficient than the previous Block Nested-Loop algorithm for equivalent joins without indexes. Use the following statements to create three test tables: CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); CREATE TABLE t3 (c1 INT, c2 INT); Use the EXPLAIN FORMAT=TREE command to see the hash join in the execution plan, for example: mysql> EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> ON t1.c1=t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1) You must use the FORMAT=TREE option of the EXPLAIN command to see the hash joins in the node. In addition, the EXPLAIN ANALYZE command can also display the usage information of hash join. This is also a new feature added in this version. Queries that use equi-joins between multiple tables will also be optimized in this way. For example, the following query: SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) JOIN t3 ON (t2.c1 = t3.c1); In the above example, any other non-equijoin conditions will be used as filters after the join operation. You can view this in the output of the EXPLAIN FORMAT=TREE command: mysql> EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1) -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1) It can also be seen from the above output that queries containing multiple equal join conditions can also (will) use multiple hash join connections. However, if any connection statement (ON) does not use the equivalent connection condition, the hash join connection method will not be used. For example: mysql> EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 < t3.c1)\G *************************** 1. row *************************** EXPLAIN: <not executable by iterator executor> At this time, the slower mysql> EXPLAIN -> SELECT * -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 < t3.c1)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: t2 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: Using where; Using join buffer (Block Nested Loop) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: t3 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: Using where; Using join buffer (Block Nested Loop) Hash join also applies to Cartesian product when no query conditions are specified, for example: mysql> EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> WHERE t1.c2 > 50\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Filter: (t1.c2 > 50) (cost=0.35 rows=1) -> Table scan on t1 (cost=0.35 rows=1) By default, MySQL uses hash joins whenever possible. At the same time, two methods are provided to control whether to use hash join: Set the server system variable Specify the optimizer hint HASH_JOIN or NO_HASH_JOIN for a specific join at the statement level. The amount of memory allowed for hash join can be controlled by the system variable Increase the value of Increase the value of Next, they compare the performance of set join_buffer_size=2097152000; SET @@cte_max_recursion_depth = 99999999; INSERT INTO t1 -- INSERT INTO t2 -- INSERT INTO t3 WITH RECURSIVE t AS ( SELECT 1 AS c1, 1 AS c2 UNION ALL SELECT t.c1 + 1, t.c1 * 2 FROM t WHERE t.c1 < 1000000 ) SELECT * FROM t; Hash join without index: mysql> EXPLAIN ANALYZE -> SELECT COUNT(*) -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Aggregate: count(0) (actual time=22993.098..22993.099 rows=1 loops=1) -> Inner hash join (t3.c1 = t1.c1) (cost=9952535443663536.00 rows=9952435908880402) (actual time=14489.176..21737.032 rows=1000000 loops=1) -> Table scan on t3 (cost=0.00 rows=998412) (actual time=0.103..3973.892 rows=1000000 loops=1) -> Hash -> Inner hash join (t2.c1 = t1.c1) (cost=99682753413.67 rows=99682653660) (actual time=5663.592..12236.984 rows=1000000 loops=1) -> Table scan on t2 (cost=0.01 rows=998412) (actual time=0.067..3364.105 rows=1000000 loops=1) -> Hash -> Table scan on t1 (cost=100539.40 rows=998412) (actual time=0.133..3395.799 rows=1000000 loops=1) 1 row in set (23.22 sec) mysql> SELECT COUNT(*) -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 = t3.c1); +----------+ | COUNT(*) | +----------+ | 1000000 | +----------+ 1 row in set (12.98 sec) The actual run took 12.98 seconds. At this time, if you use block nested loop: mysql> EXPLAIN FORMAT=TREE -> SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*) -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: <not executable by iterator executor> 1 row in set (0.00 sec) SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*) FROM t1 JOIN t2 ON (t1.c1 = t2.c1) JOIN t3 ON (t2.c1 = t3.c1); EXPLAIN shows that hash join cannot be used. The query ran for dozens of minutes without producing any results, and one of the CPUs was used up to 100% because it was constantly executing nested loops (1,000,000 to the power of 3). Let’s look at the block nested loop method with an index and add an index: mysql> CREATE index idx1 ON t1(c1); Query OK, 0 rows affected (7.39 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> CREATE index idx2 ON t2(c1); Query OK, 0 rows affected (6.77 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> CREATE index idx3 ON t3(c1); Query OK, 0 rows affected (7.23 sec) Records: 0 Duplicates: 0 Warnings: 0 View the execution plan and run the same query: mysql> EXPLAIN ANALYZE -> SELECT COUNT(*) -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Aggregate: count(0) (actual time=47684.034..47684.035 rows=1 loops=1) -> Nested loop inner join (cost=2295573.22 rows=998412) (actual time=0.116..46363.599 rows=1000000 loops=1) -> Nested loop inner join (cost=1198056.31 rows=998412) (actual time=0.087..25788.696 rows=1000000 loops=1) -> Filter: (t1.c1 is not null) (cost=100539.40 rows=998412) (actual time=0.050..5557.847 rows=1000000 loops=1) -> Index scan on t1 using idx1 (cost=100539.40 rows=998412) (actual time=0.043..3253.769 rows=1000000 loops=1) -> Index lookup on t2 using idx2 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000) -> Index lookup on t3 using idx3 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000) 1 row in set (47.68 sec) mysql> SELECT COUNT(*) -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1) -> JOIN t3 -> ON (t2.c1 = t3.c1); +----------+ | COUNT(*) | +----------+ | 1000000 | +----------+ 1 row in set (19.56 sec) The actual run took 19.56 seconds. So the test results in our scenario are as follows:
Add another hash join result without index in Oracle 12c: 1.282 s. Here is another hash join result without index in PostgreSQL 11.5: 6.234 s. Add another hash join result without index in SQL 2017: 5.207 s. Summarize The above is the new feature of MySQL 8.0, Hash Join, which I would like to introduce to you. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website! If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you! You may also be interested in:
|
<<: Bootstrap+Jquery to achieve calendar effect
>>: Directory permissions when creating a container with Docker
Mininet Mininet is a lightweight software defined...
Because frameset and body are on the same level, y...
This article shares the specific code of Navicat ...
Table of contents 1. Redux 1.1. Store (librarian)...
Use the rpm installation package to install mysql...
First, start MySQL in skip-grant-tables mode: mys...
Install nginx Note that you must install nginx-fu...
Table of contents 1. Steps 1. Define buttom permi...
Table of contents nonsense text The first router/...
Table of contents Preface 1. First completely uni...
CocosCreator version: 2.3.4 Cocos does not have a...
Table of contents Why is addEventListener needed?...
Docker is an open source project that provides an...
Table of contents 1. Introduction 2. Rendering 3....
This is a large drop-down menu implemented purely...