A brief discussion on the design and optimization of MySQL tree structure tables

A brief discussion on the design and optimization of MySQL tree structure tables

Preface

In many management and office systems, tree structures can be seen everywhere. For example, if you have used "department" or "institution", you should know that the final display effect on the page is a hierarchical structure. The following figure randomly lists a tree structure display of a department.

insert image description here

Design Considerations

1. Table structure design

For students who have a little experience in development and table structure design, it should be easy to design such a table. You only need to add a pid/field in the depart table to meet the requirements. Refer to the following table:

CREATE TABLE `depart` (
  `depart_id` varchar(32) NOT NULL COMMENT 'Department ID',
  `pid` varchar(32) NOT NULL DEFAULT '0' COMMENT 'Organization parent ID',
  `name` varchar(64) NOT NULL COMMENT 'Department name',
  `description` varchar(512) DEFAULT NULL COMMENT 'Department description',
  `code` varchar(64) DEFAULT NULL COMMENT 'Department code',
  PRIMARY KEY (`depart_id`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

2. Business design

The above figure is a general tree structure diagram, which is suitable for most business scenarios. For example, if the "department" does not exist independently, the business related to the department mainly includes the following points:

  • Whether the department exists independently, and there may be businesses directly related to the department, for example: users can be associated with the department
  • When displaying the department effect and loading the department tree list, is it loaded dynamically? Or a one-time return?
  • Business related to department addition, deletion, modification and query
  • Associate the department's business for him, that is, the department is the associated business object

3. Performance considerations

  • Full loading is not advisable. It is the last resort. Once the hierarchy is too deep and the amount of data is too large, query performance will become a nightmare.
  • Dynamic loading has a better effect. The server overhead and pressure are much smaller than full loading. However, the problem of dynamic loading is also obvious. Once the level is deep, the user operation experience is not good.
  • The hierarchical structure function increases the coding complexity of data import, that is, the Excel import function

Regarding the first point, I would like to make a few additional additions to the second point. Both full loading and dynamic loading can be implemented. I have seen them in the projects or products I have experienced. It really depends on the product design and customer requirements, because the different designs of full loading and dynamic loading will also bring corresponding results.

For example, the advantage of full loading is that the data is returned to the page at one time, and the page is cached after rendering, so it is very fast when loaded again later. At the same time, searches like the one below are very efficient because they do not require interaction with the interface.

insert image description here

But problems also arise. Department data is not static, and addition, deletion and modification operations are common. Designing it to load the entire data means that when querying for the first time, once the data volume is very large and the hierarchy is very deep, if the page also needs to render the user data associated with the department, this will put a lot of pressure on the server. Students with a little experience should be able to roughly think of the return data structure of this server.

insert image description here

The following is a preliminary implementation idea

function (currentDepart_id) {
  
  1. Find the current department DB...

  2. Find the sub-department DB of the current department...
  
  3. Based on the sub-department list of the current department, traverse, recursively query, and package the returned data DB...

}

From the above code implementation, it can be seen that when the amount of data increases, it is estimated that the query will become a performance bottleneck. In the development of the editor's project, similar tests have been done. There are 3 levels, 1,000 data in each level (data loading of users associated with departments is not calculated). On a 4-core 16G server (with ordinary CPU performance), it takes an average of about 3 seconds to complete a full data load. For B-side products, this design plus this delay is acceptable to users (1,000 departments, this amount of data is relatively large)

As analyzed above, the performance bottleneck of full load lies in the database IO. Imagine that when querying, starting from the top node or a certain node, the larger the amount of data, the deeper the level, the more queries, and the greater the IO overhead.

What is the solution? In practice, there are two experiences that can be used as reference:

  • Design a reasonable cache storage structure
  • Improve table structure

Regarding the first point, it is also easy for everyone to think of, but how to design it more reasonably? Taking the following figure as an example, we can consider using non-leaf nodes as keys and the collections under leaf nodes as values, and storing all values ​​in a redis collection. This consideration comes from the actual business and user demand verification, that is, the data of departments or institutions that are truly meaningful are distributed on leaf nodes.

insert image description here

In this way, the coding implementation may be transformed into the following:

1. Add new function to the department add(params){
    
    1. depart to DB......
    2. Determine the level of the current depart, whether it is a leaf node (whether it is about to become a leaf node)

      if(leaf node){
        3. Find the parent node ID and query the key in redis

        4. Take out the cache collection corresponding to the parent key and add the newly added part_id

      } else {
        5. Create a new key, that is, a new cache empty collection, waiting for subsequent data to be added (or not created)
      }

}


2. Delete department functiob delete(params){
    
    1. Depart's own deletion DB...

    2. If there are subset departments under the current department, do you need to delete the sub-departments together (combined with your own product business)?
        DB......

        Get all non-leaf node sets 3. Assuming that the second step is established, you also need to use the key created by the current department node, and take out the list set in the key, and delete the Redis operation together to get all non-leaf node sets in the second step, assemble them into keys, and loop through and delete the keys (memory operation, performance is not a problem, and asynchronous operation can also be done)

}

Full loading combined with redis is a key step to break through the performance bottleneck. However, from the above implementation, the complexity of coding has indeed increased, and the coding requirements for developers are relatively high. However, after this implementation, it can be said that the query performance will be greatly improved.

The second consideration for optimizing query performance is the transformation of table structure

Many students have questions, how much impact can the transformation of table structure have on performance? You may not believe it, but when simulating data stress testing, without using the modified implementation, using 5 levels of departments, each department has 1000 data (I mean the data volume of each department at each level is 1000, you can calculate the total data volume), each department is associated with 500 users, and the final performance of such data volume is about 5 minutes.

It seems that after the amount of data increases, the query pressure is really great. With the modified design and test results, the same data is finally displayed in an average of 15 to 20 seconds, which is more than 10 times the improvement. Perhaps before I give the answer, many students have used it, but have not really experienced its magic.

Based on the table structure at the beginning of this article, we add a path field, so the transformed table is as follows:

CREATE TABLE `depart` (
  `depart_id` varchar(32) NOT NULL COMMENT 'Department ID',
  `pid` varchar(32) NOT NULL DEFAULT '0' COMMENT 'Organization parent ID',
  `name` varchar(64) NOT NULL COMMENT 'Department name',
  `description` varchar(512) DEFAULT NULL COMMENT 'Department description',
  `code` varchar(64) DEFAULT NULL COMMENT 'Department code',
  PRIMARY KEY (`depart_id`),
  `path` varchar(128) NOT NULL COMMENT 'Department path',
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

The path field is of great significance. Usually, starting from the first level, each level is assumed to accommodate up to 10,000 departments. The data of the first level will look like this: 00001, 00002, 00003... and so on. For the second level, if we add a second-level department under the department 00002, the data will look like this: 00002/00001, 00002/00002, 00002/00003... and so on.

As for the deeper level, even if I don’t give examples, I believe everyone can list the following structures by themselves.

What are the benefits of doing this?

We know that MySQL supports regular expression functions, as well as like. Imagine that we want to query all the hierarchical data starting from a certain level at one time. What should we do if there is no path field? Obviously, it is through recursion as mentioned above.

However, with the path field, we can directly use the regular expression function of MySQL. Taking the above data as an example, through the following two SQL statements, all subsets of the data of the first-level department (test) can be found at one time. In this way, the number of interactions with the database can be greatly reduced.

insert image description here

This implementation is prone to pitfalls, or the place where problems are more likely to occur in actual operation is the generation of path rules. Usually, a custom function needs to be customized in advance to generate paths for users. As long as the generated path field data is accurate, this implementation is a great breakthrough in terms of optimizing query performance. This method is used in the development project where the editor is working.

function generatePath(pid){
    
  1. Is pid the top level? 2. Get the depart of the parent department

  3. List all path fields at the same level as the department to be added under the parent department 4. Take the maximum value of the path in step 3 5. Generate a new path based on the maximum value of the path in step 4

}

Another difficult business is that after the design of the path field, when importing department data into Excel, the processing of this path is still a relatively complex implementation point, which is left for everyone to think about.

The above discussion discusses the optimization of business implementation and code design under full load, as well as the optimization of table structure design. The implementation of dynamic loading is relatively easy to implement with a little reference based on the above two solutions.

In summary, here is a recommendation for best practices in business design with a hierarchical structure.

In terms of table structure, use path field data loading, and try to use dynamic loading. If the department (hierarchical business) does not change much, you can consider introducing cache. For specific practices, refer to the above article.

This is the end of this article about the design and optimization of MySQL tree structure tables. For more relevant MySQL tree structure table optimization content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief analysis and sharing of the advantages and disadvantages of three tree structure table designs in MYSQL

<<:  translate(-50%,-50%) in CSS achieves horizontal and vertical centering effect

>>:  A pitfall and solution of using fileReader

Recommend

uniapp dynamic modification of element node style detailed explanation

Table of contents 1. Modify by binding the style ...

How to solve jQuery conflict problem

In front-end development, $ is a function in jQue...

Analyze Tomcat architecture principles to architecture design

Table of contents 1. Learning Objectives 1.1. Mas...

How to make a website front end elegant and attractive to users

The temperament of a web front-end website is a fe...

CSS3 achieves conic-gradient effect

grammar: background-image: conic-gradient(from an...

How to set an alias for a custom path in Vue

How to configure custom path aliases in Vue In ou...

Example analysis of MySQL startup and connection methods

Table of contents How to start mysqld Method 1: m...

Specific use of MySQL window functions

Table of contents 1. What is a window function? 1...

MySQL 5.7.19 winx64 free installation version configuration tutorial

mysql-5.7.19-winx64 installation-free version con...

Use tomcat to set shared lib to share the same jar

As more and more projects are deployed, more and ...

Get / delete method to pass array parameters in Vue

When the front-end and back-end interact, sometim...

How to use the yum command

1. Introduction to yum Yum (full name Yellow dogU...

Installation, activation and configuration of ModSecurity under Apache

ModSecurity is a powerful packet filtering tool t...

Native js imitates mobile phone pull-down refresh

This article shares the specific code of js imita...