How to design MySQL statistical data tables

How to design MySQL statistical data tables

Cache data tables are often used when collecting statistical data, so they are also called statistical data. For example, for employee and department data tables, we may need to query how many employees there are in a department. There are three ways to achieve this:

  • Add a field for the number of employees under the department. The number of employees needs to be updated synchronously every time an employee is added, modified, or deleted (if an employee changes departments, the number of employees in multiple departments needs to be updated). This method can ensure real-time performance, but it is very inefficient. This is fine if the operation is not frequent. However, if it is quite frequent, it means that two tables need to be operated each time, and the business code needs to be processed at the same time, which deeply couples the statistical business with the ordinary business.
  • Each time a query is made, the SUM function is executed from the employee table to obtain the number of employees in the department. This method avoids buried points, but it needs to sum the employee data table every time, which will be inefficient if the amount of employee data is large.
  • Create a new statistical table to summarize the number of people in each department from the employee table at regular intervals. This method of extracting data at a fixed time will sacrifice a certain degree of real-time performance, but it reduces the coupling of the code. Since there are not too many departments, the size of this table is predictable, which also improves the efficiency of data access. This method is called cached data table.

Taking the mobile personal center of Nuggets as an example, in order to display the number of followers, followers and Nuggets power value of each user, it is impossible to do a SUM for each query. This means that SUM operations of multiple tables need to be done, which will be very inefficient. In addition, the calculation of the Nuggets power value involves a more complex calculation method (related to the number of views and likes of the article). Therefore, we can guess the general table design, so that when querying the user's personal homepage information, all data can be read from this table.

CREATE t_user_summay (
  id INT PRIMARY KEY,
  user_id BIGINT(20),
  focused_user_cnt INT,
  followed_user_cnt INT,
  user_value INT,
  user_level ENUM('Lv1', 'Lv2', ..., 'Lv8'),
  created_time DATETIME,
  updated_time DATETIME,
);

Is real-time update required?

In actual application, there are two ways to update the statistical table: one is real-time update, and the other is periodic reconstruction of data. Both methods have their pros and cons. Real-time updates ensure the immediacy of query data, but they sacrifice performance and require code embedding. In addition, since data updates are irregular, fragmentation may occur. Periodic reconstruction of data sacrifices real-time performance. If most of the data does not change, it will lead to unnecessary statistical calculations. However, if the data changes frequently, periodic reconstruction of data will obviously be more efficient and avoid the situation of buried points. Of course, avoiding application embedding can also be done through triggers, you can refer to //www.jb51.net/article/213062.htm

Materialized View Tool (Flexviews)

In MySQL, there is an open source tool called Flexviews that is used to extract data from the database binlog to complete data statistics. It is somewhat similar to a view, but different from a view, the data table generated by Flexviews is a physical table, which is why it is called a materialized view. Moreover, Flexviews also supports incremental updates and full updates. It is recommended to use incremental updates to avoid the situation where statistics for all rows need to be rebuilt. Incremental updates check which rows have changed before performing the update, which has higher performance than full updates. However, in order to detect data changes, a view needs to be introduced to record the change log of the data rows.

Count table

In actual development, we often need to count some operations, such as the number of article readings and likes. If the count values ​​are put into the same table, concurrency problems are likely to occur when updating. Using a separate count table can avoid query cache invalidation issues and enable some more advanced techniques. For example, a data table that counts the number of article reads and likes:

CREATE TABLE t_article_counter (
  article_id INT PRIMARY KEY,
  read_cnt INT UNSIGNED NOT NULL,
  praise_cnt INT UNSIGNED NOT NULL
);

When updating the reading count, you can use MySQL's built-in increment operation:

UPDATE t_article_counter 
SET read_cnt = read_cnt + 1
WHERE article_id = 1;

This method can make the operation single-line and mutually exclusive for things, thus serializing the transactions to avoid concurrency problems. But it will affect the number of concurrent requests. You can add multiple slots to an article to increase concurrency.

CREATE TABLE t_article_counter (
  id INT NOT NULL PRIMARY KEY,
  slot TINYINT UNSIGNED,
  article_id INT,
  read_cnt INT UNSIGNED NOT NULL,
  praise_cnt INT UNSIGNED NOT NULL,
  INDEX(article_id)
);

At this time, you can create 100 slots to initialize the data, and do the following when updating:

UPDATE t_article_counter
SET read_cnt = read_cnt + 1 
WHERE slot = RAND() * 100 AND article_id = 1;

To get the total number of reads for an article, you need to use a SUM operation:

SELECT SUM(read_cnt) FROM t_article_counter
WHERE article_id = 1;

This method actually trades space for time and increases concurrency.

Summarize

This article introduces how to design statistical data tables. The key lies in the business type. For tables with low update frequency and small data volume, there is no problem using real-time synchronization or direct SUM. For large data tables with high-frequency updates, independent statistical tables can be used. At the same time, if there is a high concurrency situation, the statistical table can consider adding multiple slots to each entity to increase the concurrency. If you want to synchronize data periodically, you can also use the Flexviews materialized view plug-in.

The above is the details of how to design statistical data tables in MySQL. For more information about designing statistical data tables in MySQL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Summary of common MySQL table design errors
  • MySQL data table partitioning strategy and advantages and disadvantages analysis
  • MySQL advanced features - detailed explanation of the concept and mechanism of data table partitioning
  • How to construct a table index in MySQL
  • How to maintain MySQL indexes and data tables
  • Mysql delete data and data table method example
  • Setting the engine MyISAM/InnoDB when creating a data table in MySQL
  • How to delete a MySQL table
  • About MYSQL, you need to know the data types and operation tables
  • Detailed explanation of creating a data table in MySQL and establishing primary and foreign key relationships
  • A simple method to merge and remove duplicate MySQL tables

<<:  Detailed tutorial on installing harbor private warehouse using docker compose

>>:  CSS3 realizes text relief effect, engraving effect, flame text

Recommend

How to configure user role permissions in Jenkins

Jenkins configuration of user role permissions re...

How to use node scaffolding to build a server to implement token verification

content Use scaffolding to quickly build a node p...

Web skills: Multiple IE versions coexistence solution IETester

My recommendation Solution for coexistence of mul...

Solution to the problem of English letters not wrapping in Firefox

The layout of text has some formatting requiremen...

Detailed explanation of Linux commands and file search

1. Perform file name search which (search for ...

React example showing file upload progress

Table of contents React upload file display progr...

Sample code for JS album image shaking and enlarging display effect

The previous article introduced how to achieve a ...

Introduction to JavaScript conditional access attributes and arrow functions

Table of contents 1. Conditional access attribute...

v-for directive in vue completes list rendering

Table of contents 1. List traversal 2. The role o...

How to use webpack and rollup to package component libraries

Preface I made a loading style component before. ...

How to write the parent and child directories of HTML relative paths

How to indicate the parent directory ../ represent...

A brief analysis of the knowledge points of exporting and importing MySQL data

Often, we may need to export local database data ...

Rounding operation of datetime field in MySQL

Table of contents Preface 1. Background 2. Simula...

Detailed tutorial on how to install mysql8.0 using Linux yum command

1. Do a good job of cleaning before installation ...