Mysql tree-structured database table design

Preface
1. Basic Data
2. Inheritance-driven design
3. Design based on left and right value encoding
4. Tree Structure CRUD Algorithm

(1) Get the descendant nodes of a node
(2) Get the genealogy path of a node
(3) Add descendant nodes to a node
(4) Delete a node

V. Conclusion

References

Preface

I recently studied tree menus and found many examples online. The following is some information I found on the Internet, and then practiced it again myself and recorded it so that I won’t forget it next time.

In the process of programming, we often use tree structures to represent the relationship between certain data, such as corporate departments, column structures, product categories, etc. Generally speaking, these tree structures need to be persisted with the help of a database. However, various relational databases currently record and store data information in the form of two-dimensional tables, so the Tree cannot be directly stored in the DBMS. Designing a suitable Schema and its corresponding CRUD algorithm is the key to storing tree structures in relational databases.

An ideal tree structure should have the following characteristics: low data storage redundancy and strong intuitiveness; simple and efficient retrieval and traversal process; and efficient node addition, deletion, modification and query (CRUD) operations. I accidentally found a very clever design on the Internet. The original text was in English. After reading it, I felt it was interesting, so I sorted it out. This article will introduce two tree-structured Schema design schemes: one is an intuitive and simple design idea, and the other is an improved scheme based on left and right value encoding.

1. Basic Data

This article uses an example of a food family tree to explain how to organize food by category, color, and variety. The tree structure is as follows:

Write the picture description here

2. Inheritance-driven design

The most intuitive analysis of the tree structure is the inheritance relationship between nodes. By explicitly describing the parent node of a node, a two-dimensional relationship table can be established. The Tree table structure of this solution is usually designed as: {Node_id, Parent_id}. The above data can be described as shown in the following figure:

Write the picture description here

The advantages of this solution are obvious: design and implementation are natural, very intuitive and convenient. The disadvantages are of course also very prominent: since the inheritance relationship between nodes is directly recorded, any CRUD operation on the Tree will be inefficient. This is mainly due to frequent "recursive" operations. The recursive process constantly accesses the database, and each database IO will have time overhead. Of course, this solution is not without use. When the size of the Tree is relatively small, we can use the cache mechanism to optimize it, load the Tree information into memory for processing, and avoid the performance overhead of direct database IO operations.

3. Design based on left and right value encoding

In general database-based applications, the demand for queries is always greater than that for deletion and modification. In order to avoid the "recursive" process when querying the tree structure, a new non-recursive query and infinitely grouped left and right value encoding scheme is designed based on the pre-order traversal of the Tree to save the data of the tree.

Write the picture description here

When seeing this table structure for the first time, I believe most people are not clear about how the left value (Lft) and the right value (Rgt) are calculated, and this table design does not seem to preserve the inheritance relationship between parent and child nodes. But when you count from 1 to 18 with your fingers on the numbers in the table, you should notice something. Yes, the order in which you move your fingers is the order in which you perform a pre-order traversal of the tree, as shown in the figure below. When we start from the left side of the root node Food, marked as 1, and follow the direction of pre-order traversal, marking numbers on the traversed path one by one, we finally return to the root node Food and write 18 on the right.

Write the picture description here

Based on this design, we can infer that all nodes with left values greater than 2 and right values less than 11 are subsequent nodes of Fruit, and the structure of the entire tree is stored through left and right values. However, this is not enough. Our goal is to be able to perform CRUD operations on the tree, which means we need to construct corresponding algorithms.

4. Tree Structure CRUD Algorithm

(1) Get the descendant nodes of a node

Only one SQL statement is needed to return the pre-order traversal list of the node's descendant nodes. Take Fruit as an example:

SELECT * FROM tree WHERE lft BETWEEN 2 AND 11 ORDER BY lft ASC

The query results are as follows:

Write the picture description here

So how many descendant nodes does a certain node have? Through the left and right values of the node, we can circle its descendant nodes, then the total number of descendants = (right value – left value – 1) / 2. Taking Fruit as an example, the total number of its descendants is: (11 –2 – 1) / 2 = 4. At the same time, in order to display the tree structure more intuitively, we need to know the level of the node in the tree, which can be achieved through SQL queries of left and right values. Taking Fruit as an example: SELECTCOUNT(*) FROM tree WHERE lft <= 2 AND rgt >=11. For the convenience of description, we can create a view for Tree and add a hierarchical column. The column value can be calculated by writing a custom function. The function definition is as follows:

Create Table

CREATE TABLE `tree` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `lft` int(255) DEFAULT NULL,
  `rgt` int(11) DEFAULT NULL
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('1', 'Food', '1', '18');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('2', 'Fruit', '2', '11');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('3', 'Red', '3', '6');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('4', 'Cherry', '4', '5');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('5', 'Yellow', '7', '10');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('6', 'Banana', '8', '9');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('7', 'Meat', '12', '17');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('8', 'Beef', '13', '14');
INSERT INTO `jpa`.`tree` (`id`, `name`, `lft`, `rgt`) VALUES ('9', 'Pork', '15', '16');

CREATE VIEW `treeview` AS 
SELECT 
  `a`.`id` AS `id`,
  `a`.`name` AS `name`,
  `a`.`lft` AS `lft`,
  `a`.`rgt` AS `rgt`,
  `CountLayer` (`a`.`id`) AS `layer` 
FROM
  `tree` `a`

Based on the level calculation function, we create a view and add a new series of records of node levels:

> CREATE FUNCTION `CountLayer` (`node_id` INT) RETURNS INT (11) 
BEGIN
    DECLARE result INT (10) DEFAULT 0;
    DECLARE lftid INT;
    DECLARE rgtid INT;
    SELECT lft,rgt INTO lftid, rgtid FROM tree WHERE id = node_id;
    SELECT COUNT(*) INTO result FROM tree WHERE lft <= lftid AND rgt >= rgtid;
    RETURN (result);
END

Create a stored procedure to calculate all descendant nodes and corresponding levels of a given node:

CREATE PROCEDURE `GetChildrenNodeList`(IN `node_id` INT)
BEGIN
DECLARE lftid INT;
DECLARE rgtid INT;
SELECT lft,rgt INTO lftid,rgtid FROM tree WHERE id= node_id;
SELECT * FROM treeview WHERE lft BETWEEN lftid AND rgtid ORDER BY lft ASC;
END

Now, we use the above stored procedure to calculate all descendant nodes and corresponding levels of the node Fruit. The query results are as follows:

Write the picture description here

From the above implementation, we can see that the design scheme using left and right value encoding only needs to perform two database queries when traversing the tree, eliminating recursion. In addition, the query conditions are all numerical comparisons, so the query efficiency is extremely high. As the tree size continues to expand, the design scheme based on left and right value encoding will improve the query efficiency much more than the traditional recursive scheme. Of course, we have only given a simple algorithm for obtaining the descendants of a node. To really use this tree, we need to implement functions such as inserting and deleting nodes on the same level.

(2) Get the genealogy path of a node

Assuming we want to obtain the genealogy path of a node, we only need one SQL statement to complete it based on the left and right value analysis. Take Fruit as an example: SELECT* FROM tree WHERE lft < 2 AND rgt > 11 ORDER BY lft ASC . The relatively complete stored procedure is:

CREATE PROCEDURE `GetParentNodePath`(IN `node_id` INT)
BEGIN
DECLARE lftid INT;
DECLARE rgtid INT;
SELECT lft,rgt INTO lftid,rgtid FROM tree WHERE id= node_id;
SELECT * FROM treeview WHERE lft < lftid AND rgt > rgtid ORDER BY lft ASC;
END

(3) Add descendant nodes to a node

Assume that we want to add a new child node "Apple" under the node "Red", the tree will become as shown in the following figure, where the red node is the newly added node.

Write the picture description here

CREATE PROCEDURE `AddSubNode`(IN `node_id` INT, IN `node_name` VARCHAR(64))
BEGIN
   DECLARE rgtid INT;
   DECLARE t_error INT DEFAULT 0;  
   DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET t_error=1; -- Error handling SELECT rgt INTO rgtid FROM tree WHERE id= node_id; 
   START TRANSACTION;
        UPDATE tree SET rgt = rgt + 2 WHERE rgt >= rgtid;
        UPDATE tree SET lft = lft + 2 WHERE lft >= rgtid;
        INSERT INTO tree (NAME,lft,rgt) VALUES (node_name,rgtid,rgtid+1);    
    IF t_error =1 THEN  
     ROLLBACK;
    ELSE
      COMMIT;
    END IF;
END

(4) Delete a node

If we want to delete a node, all the descendant nodes of the node will be deleted at the same time, and the number of these deleted nodes is: (right value of the deleted node – left value of the deleted node + 1) / 2, and the left and right values of the remaining nodes will be adjusted if they are greater than the left and right values of the deleted node. Let's see what happens to the tree. Taking Beef as an example, the deletion effect is shown in the figure below.

Write the picture description here

Then we can construct the corresponding stored procedure:

CREATE PROCEDURE `DelNode`(IN `node_id` INT)
BEGIN
   DECLARE lftid INT;
     DECLARE rgtid INT;
   DECLARE t_error INT DEFAULT 0;  
   DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET t_error=1; -- Error handling SELECT lft,rgt INTO lftid,rgtid FROM tree WHERE id= node_id;
   START TRANSACTION;
       DELETE FROM tree WHERE lft >= lftid AND rgt <= rgtid;
       UPDATE tree SET lft = lft -(rgtid - lftid + 1) WHERE lft > lftid;
       UPDATE tree SET rgt = rgt -(rgtid - lftid + 1) WHERE rgt >rgtid;
    IF t_error =1 THEN  
     ROLLBACK;
    ELSE
      COMMIT;
    END IF;

END

V. Conclusion

We can summarize the tree structure Schema design scheme that achieves unlimited grouping through left and right value encoding:

(1) Advantages: It achieves unlimited grouping without recursive operations, and the query condition is based on the comparison of integers, which is very efficient.

(2) Disadvantages: Adding, deleting, and modifying nodes is costly and will involve changes to many aspects of data in the table.