MySQL 8.0 New Features: Hash Join

MySQL 8.0 New Features: Hash Join

The MySQL development team officially released the MySQL 8.0.18 GA version on October 14, 2019, bringing some new features and enhancements. The most eye-catching feature is that multi-table join queries support hash join mode. Let's first take a look at the official description:

MySQL implements a hash join method for inner join queries. For example, starting with MySQL 8.0.18, the following query can use hash join for join query:

SELECT * 
  FROM t1 
  JOIN t2 
    ON t1.c1=t2.c1;

Hash join does not require index support. In most cases, hash join is more efficient than the previous Block Nested-Loop algorithm for equivalent joins without indexes. Use the following statements to create three test tables:

CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);

Use the EXPLAIN FORMAT=TREE command to see the hash join in the execution plan, for example:

mysql> EXPLAIN FORMAT=TREE
  -> SELECT * 
  -> FROM t1 
  -> JOIN t2 
  -> ON t1.c1=t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
  -> Table scan on t2 (cost=0.35 rows=1)
  -> Hash
    -> Table scan on t1 (cost=0.35 rows=1)

You must use the FORMAT=TREE option of the EXPLAIN command to see the hash joins in the node. In addition, the EXPLAIN ANALYZE command can also display the usage information of hash join. This is also a new feature added in this version.

Queries that use equi-joins between multiple tables will also be optimized in this way. For example, the following query:

SELECT * 
  FROM t1
  JOIN t2 
    ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
  JOIN t3 
    ON (t2.c1 = t3.c1);

In the above example, any other non-equijoin conditions will be used as filters after the join operation. You can view this in the output of the EXPLAIN FORMAT=TREE command:

mysql> EXPLAIN FORMAT=TREE
  -> SELECT * 
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
  -> JOIN t3 
  -> ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1)
  -> Table scan on t3 (cost=0.35 rows=1)
  -> Hash
    -> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1)
      -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
        -> Table scan on t2 (cost=0.35 rows=1)
        -> Hash
          -> Table scan on t1 (cost=0.35 rows=1)

It can also be seen from the above output that queries containing multiple equal join conditions can also (will) use multiple hash join connections.

However, if any connection statement (ON) does not use the equivalent connection condition, the hash join connection method will not be used. For example:

mysql> EXPLAIN FORMAT=TREE
  -> SELECT * 
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 < t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

At this time, the slower block nested loop connection algorithm will be used. This is the same as it was before MySQL 8.0.18 without indexes:

mysql> EXPLAIN
  -> SELECT * 
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 < t3.c1)\G       
*************************** 1. row ***************************
      id: 1
 select_type: SIMPLE
    table: t1
  partitions: NULL
     type: ALL
possible_keys: NULL
     key: NULL
   key_len: NULL
     ref: NULL
     rows: 1
   filtered: 100.00
    Extra: NULL
*************************** 2. row ***************************
      id: 1
 select_type: SIMPLE
    table: t2
  partitions: NULL
     type: ALL
possible_keys: NULL
     key: NULL
   key_len: NULL
     ref: NULL
     rows: 1
   filtered: 100.00
    Extra: Using where; Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
      id: 1
 select_type: SIMPLE
    table: t3
  partitions: NULL
     type: ALL
possible_keys: NULL
     key: NULL
   key_len: NULL
     ref: NULL
     rows: 1
   filtered: 100.00
    Extra: Using where; Using join buffer (Block Nested Loop)

Hash join also applies to Cartesian product when no query conditions are specified, for example:

mysql> EXPLAIN FORMAT=TREE
  -> SELECT *
  -> FROM t1
  -> JOIN t2
  -> WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (cost=0.70 rows=1)
  -> Table scan on t2 (cost=0.35 rows=1)
  -> Hash
    -> Filter: (t1.c2 > 50) (cost=0.35 rows=1)
      -> Table scan on t1 (cost=0.35 rows=1)

By default, MySQL uses hash joins whenever possible. At the same time, two methods are provided to control whether to use hash join:

Set the server system variable optimizer_switch to hash_join=on or hash_join=off at the global or session level. The default is hash_join=on .

Specify the optimizer hint HASH_JOIN or NO_HASH_JOIN for a specific join at the statement level.

The amount of memory allowed for hash join can be controlled by the system variable join_buffer_size ; hash join will not use more memory than the amount set by this variable. If the memory required for a hash join exceeds this threshold, MySQL will perform the operation on disk. Note that if the hash join cannot be completed in memory and the number of open files exceeds the value of the system variable open_files_limit , the join operation may fail. To resolve this issue, use one of the following methods:

Increase the value of join_buffer_size to ensure that hash join can be completed in memory.

Increase the value of n_files_limit .

Next, they compare the performance of hash join and block nested loop . First, 1,000,000 records are generated for t1, t2, and t3 respectively:

set join_buffer_size=2097152000;
SET @@cte_max_recursion_depth = 99999999;
INSERT INTO t1
-- INSERT INTO t2
-- INSERT INTO t3
WITH RECURSIVE t AS (
 SELECT 1 AS c1, 1 AS c2
 UNION ALL
 SELECT t.c1 + 1, t.c1 * 2
  FROM t
  WHERE t.c1 < 1000000
)
SELECT *
 FROM t;

Hash join without index:

mysql> EXPLAIN ANALYZE
  -> SELECT COUNT(*)
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=22993.098..22993.099 rows=1 loops=1)
  -> Inner hash join (t3.c1 = t1.c1) (cost=9952535443663536.00 rows=9952435908880402) (actual time=14489.176..21737.032 rows=1000000 loops=1)
    -> Table scan on t3 (cost=0.00 rows=998412) (actual time=0.103..3973.892 rows=1000000 loops=1)
    -> Hash
      -> Inner hash join (t2.c1 = t1.c1) (cost=99682753413.67 rows=99682653660) (actual time=5663.592..12236.984 rows=1000000 loops=1)
        -> Table scan on t2 (cost=0.01 rows=998412) (actual time=0.067..3364.105 rows=1000000 loops=1)
        -> Hash
          -> Table scan on t1 (cost=100539.40 rows=998412) (actual time=0.133..3395.799 rows=1000000 loops=1)

1 row in set (23.22 sec)

mysql> SELECT COUNT(*)
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 = t3.c1);
+----------+
| COUNT(*) |
+----------+
| 1000000 |
+----------+
1 row in set (12.98 sec)

The actual run took 12.98 seconds. At this time, if you use block nested loop:

mysql> EXPLAIN FORMAT=TREE
  -> SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

1 row in set (0.00 sec)

SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)
 FROM t1
 JOIN t2 
  ON (t1.c1 = t2.c1)
 JOIN t3 
  ON (t2.c1 = t3.c1);

EXPLAIN shows that hash join cannot be used. The query ran for dozens of minutes without producing any results, and one of the CPUs was used up to 100% because it was constantly executing nested loops (1,000,000 to the power of 3).

Let’s look at the block nested loop method with an index and add an index:

mysql> CREATE index idx1 ON t1(c1);
Query OK, 0 rows affected (7.39 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> CREATE index idx2 ON t2(c1);
Query OK, 0 rows affected (6.77 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> CREATE index idx3 ON t3(c1);
Query OK, 0 rows affected (7.23 sec)
Records: 0 Duplicates: 0 Warnings: 0

View the execution plan and run the same query:

mysql> EXPLAIN ANALYZE
  -> SELECT COUNT(*)
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=47684.034..47684.035 rows=1 loops=1)
  -> Nested loop inner join (cost=2295573.22 rows=998412) (actual time=0.116..46363.599 rows=1000000 loops=1)
    -> Nested loop inner join (cost=1198056.31 rows=998412) (actual time=0.087..25788.696 rows=1000000 loops=1)
      -> Filter: (t1.c1 is not null) (cost=100539.40 rows=998412) (actual time=0.050..5557.847 rows=1000000 loops=1)
        -> Index scan on t1 using idx1 (cost=100539.40 rows=998412) (actual time=0.043..3253.769 rows=1000000 loops=1)
      -> Index lookup on t2 using idx2 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)
    -> Index lookup on t3 using idx3 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)

1 row in set (47.68 sec)

mysql> SELECT COUNT(*)
  -> FROM t1
  -> JOIN t2 
  -> ON (t1.c1 = t2.c1)
  -> JOIN t3 
  -> ON (t2.c1 = t3.c1);
+----------+
| COUNT(*) |
+----------+
| 1000000 |
+----------+
1 row in set (19.56 sec)

The actual run took 19.56 seconds. So the test results in our scenario are as follows:

Hash Join (no index) Block Nested Loop (No Index) Block Nested Loop (with index)
12.98 s Not returned 19.56 s

Add another hash join result without index in Oracle 12c: 1.282 s.

Here is another hash join result without index in PostgreSQL 11.5: 6.234 s.

Add another hash join result without index in SQL 2017: 5.207 s.

Summarize

The above is the new feature of MySQL 8.0, Hash Join, which I would like to introduce to you. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website! If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • MySQL 8.0 New Features - Introduction to Check Constraints
  • In-depth explanation of hidden fields, a new feature of MySQL 8.0
  • Implementation of check constraints in MySQL 8.0
  • Analysis of the new features of MySQL 8.0 - transactional data dictionary and atomic DDL
  • A brief discussion on the pitfalls and solutions of the new features of MySQL 8.0 (summary)
  • MySQL 8.0 new features: support for atomic DDL statements
  • Detailed explanation of new relational database features in MySQL 8.0
  • Solution to IDEA not being able to connect to MySQL port number occupation
  • Use MySQL to open/modify port 3306 and open access permissions in Ubuntu/Linux environment
  • Perfect solution to mysql cannot start after phpstudy is installed (no need to delete the original database, no need to change any configuration, no need to change the port) direct coexistence
  • Enable remote access rights for MySQL under Linux and open port 3306 in the firewall
  • MySQL 8.0 New Features - Introduction to the Use of Management Port

<<:  Bootstrap+Jquery to achieve calendar effect

>>:  Directory permissions when creating a container with Docker

Recommend

Install Mininet from source code on Ubuntu 16.04

Mininet Mininet is a lightweight software defined...

Put frameset in body through iframe

Because frameset and body are on the same level, y...

Navicat for MySql Visual Import CSV File

This article shares the specific code of Navicat ...

Front-end state management (Part 2)

Table of contents 1. Redux 1.1. Store (librarian)...

Install mysql offline using rpm under centos 6.4

Use the rpm installation package to install mysql...

How to change the root password of Mysql5.7.10 on MAC

First, start MySQL in skip-grant-tables mode: mys...

Detailed process of using nginx to build a webdav file server in Ubuntu

Install nginx Note that you must install nginx-fu...

Vue button permission control introduction

Table of contents 1. Steps 1. Define buttom permi...

Detailed explanation of how to dynamically set the browser title in Vue

Table of contents nonsense text The first router/...

MySQL 8.0 upgrade experience

Table of contents Preface 1. First completely uni...

How to make a List in CocosCreator

CocosCreator version: 2.3.4 Cocos does not have a...

Detailed explanation of the difference between JavaScript onclick and click

Table of contents Why is addEventListener needed?...

How to Dockerize a Python Django Application

Docker is an open source project that provides an...

JavaScript+HTML to implement student information management system

Table of contents 1. Introduction 2. Rendering 3....

Sample code for a large drop-down menu implemented in pure CSS

This is a large drop-down menu implemented purely...