How to split data in MySQL table and database

How to split data in MySQL table and database

The relational database itself can easily become a system bottleneck, as the storage capacity, number of connections, and processing power of a single machine are all limited. When the data volume of a single table reaches 10 million or 100 GB, due to the large number of query dimensions, even if slave libraries are added and indexes are optimized, the performance still degrades severely when performing many operations. At this time, we should consider splitting it. The purpose of splitting is to reduce the burden on the database and shorten the query time.

The core content of database distribution is nothing more than data segmentation (Sharding) and the positioning and integration of data after segmentation. Data segmentation is to store data in multiple databases in a dispersed manner, so that the amount of data in a single database becomes smaller, and the performance problem of a single database is alleviated by expanding the number of hosts, thereby achieving the purpose of improving database operation performance.

Data segmentation can be divided into two types according to its segmentation type: vertical (longitudinal) segmentation and horizontal (lateral) segmentation.

1. Vertical (longitudinal) slicing

There are two common types of vertical segmentation: vertical database segmentation and vertical table segmentation.

1.1 Vertical Database

That is, based on business coupling, different tables with low correlation are stored in different databases. The approach is similar to splitting a large system into multiple small systems, based on business

The categories are divided independently. Similar to the "microservice governance" approach, each microservice uses a separate database. As shown in the figure:

The data tables of different modules are stored in separate libraries. Modules are not related to each other.

If so, it needs to be solved through data redundancy or secondary processing. This business method and data structure are the clearest. But if we cannot prevent cross-database related queries, we will declare this path different.

1.2 Vertical table division

It is based on the "columns" in the database. If a table has many fields, you can create a new extended table and split the fields that are not frequently used or have a large field length into the extended table. When there are many fields (for example, a large table has more than 100 fields), "splitting the large table into small tables" makes development and maintenance easier and avoids cross-page problems. MySQL stores data through data pages at the bottom layer. If a record occupies too much space, it will cross pages, causing additional performance overhead. In addition, the database loads data into memory in units of rows, so that the fields in the table are shorter and accessed more frequently. The memory can load more data, with a higher hit rate, reducing disk IO and thus improving database performance.

Advantages of vertical segmentation:

  • Solve the coupling at the business system level and make the business clear
  • Similar to the governance of microservices, it can also manage, maintain, monitor, and expand data of different businesses in a hierarchical manner.
  • In high-concurrency scenarios, vertical segmentation can improve the bottlenecks of IO, database connection number, and single-machine hardware resources to a certain extent.

shortcoming:

  • Some tables cannot be joined and can only be solved through interface aggregation, which increases the complexity of development
  • Distributed transaction processing is complex
  • There is still a problem of too much data in a single table (horizontal segmentation is required)

2. Horizontal (cross-sectional) segmentation

When an application is difficult to split vertically at a finer granularity, or the amount of data rows after splitting is huge, and there are bottlenecks in single database reading, writing, and storage performance, horizontal splitting is required.

Horizontal sharding is divided into intra-database sharding and sub-database sharding. It distributes the same table into multiple databases or multiple tables according to different conditions based on the inherent logical relationship of the data in the table. Each table only contains a part of the data, thereby reducing the amount of data in a single table and achieving a distributed effect. As shown in the figure:

Compared with vertical segmentation, which classifies tables, this method is to store data in different databases (or different tables) according to a certain rule for each field in the table, that is, to segment data according to the number of rows.

Splitting tables within a database only solves the problem of too much data in a single table, but does not distribute the table to databases on different machines. Therefore, it is not very helpful in reducing the pressure on the MySQL database. Everyone still competes for the CPU, memory, and network IO of the same physical machine. It is best to solve this problem by splitting the database and tables.

Advantages of horizontal sharding:

  • There is no performance bottleneck of excessive single database data volume and high concurrency, which improves system stability and load capacity
  • The application-side transformation is relatively minor and there is no need to split the business modules

shortcoming:

  • Transaction consistency across shards is difficult to ensure
  • Cross-database join query performance is poor
  • The difficulty and maintenance of multiple data expansions are extremely large

After horizontal sharding, the same table will appear in multiple databases/tables, and the content of each database/table is different. Several typical data sharding rules are:

2.1 According to the numerical range

Divide by time interval or ID interval. For example: distribute data of different months or even days into different databases by date; distribute records with userId of 1 to 9999 to the first database, records with userId of 10000 to 20000 to the second database, and so on. In a sense, the "cold and hot data separation" used in some systems, which migrates some less-used historical data to other databases and only provides queries for hot data in business functions, is also a similar practice.

The advantages of this are:

  • Single table size is controllable
  • It is naturally easy to expand horizontally. If you want to expand the entire shard cluster later, you only need to add nodes without migrating data from other shards.
  • When using shard fields for range searches, continuous shards can quickly locate shards for fast queries, effectively avoiding cross-shard query issues.

shortcoming:

  • Hot data becomes a performance bottleneck. There may be data hotspots in continuous sharding. For example, when sharding by time field, some shards store data in the most recent time period, which may be frequently read and written, while some shards store historical data, which is rarely queried.

2.2 Modulo by value

Generally, the hash modulus segmentation method is used. For example, the Customer table is segmented into 4 databases according to the cusno field. The data with a remainder of 0 is placed in the first database, the data with a remainder of 1 is placed in the second database, and so on. In this way, the data of the same user will be scattered in the same database. If the query condition contains the cusno field, the corresponding database can be clearly located for query.

advantage:

  • Data sharding is relatively even, and hot spots and concurrent access bottlenecks are not likely to occur

shortcoming:

  • When the sharded cluster is expanded later, the old data needs to be migrated (using the consistent hashing algorithm can better avoid this problem)
  • It is easy to face the complex problem of cross-shard queries. For example, in the above example, if the frequently used query conditions do not contain cusno, the database cannot be located, so it is necessary to initiate queries to four databases at the same time, merge the data in memory, and return the minimum set to the application. The database sharding becomes a burden.

The above is the details of how to perform data segmentation by sharding MySQL tables and databases. For more information about data segmentation by sharding MySQL tables and databases, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MYSQL performance optimization sharing (sharding of databases and tables)
  • Mysql database partitioning and table partitioning methods (commonly used)
  • Summary of MYSQL database data splitting: sub-library and sub-table
  • Application scenarios and design methods of MySQL table and database sharding
  • Summary of MySQL database and table sharding
  • In-depth explanation of MySql table, database, sharding and partitioning knowledge
  • Getting Started Guide to MySQL Sharding
  • Summary of MySQL's commonly used database and table sharding solutions
  • MySQL sharding details
  • MySQL sharding project practice

<<:  HTML small tag usage tips

>>:  Website redesign is a difficult task for every family

Recommend

How to achieve 3D dynamic text effect with three.js

Preface Hello everyone, this is the CSS wizard - ...

Details of various font formats in HTML web pages

This section starts with the details of text modi...

Web Design: Web Music Implementation Techniques

<br />When inserting music into a web page, ...

Continuous delivery using Jenkins and Docker under Docker

1. What is Continuous Delivery The software produ...

Analysis of idea compiler vue indentation error problem scenario

Project scenario: When running the Vue project, t...

Tomcat maxPostSize setting implementation process analysis

1. Why set maxPostSize? The tomcat container has ...

sql script function to write postgresql database to implement parsing

This article mainly introduces the sql script fun...

Things You Don’t Know About the CSS ::before and ::after Pseudo-Elements

CSS has two pseudo-classes that are not commonly ...

W3C Tutorial (7): W3C XSL Activities

A style sheet describes how a document should be ...

Differences between ES6 inheritance and ES5 inheritance in js

Table of contents Inheritance ES5 prototype inher...

Native JavaScript to achieve the effect of carousel

This article shares the specific code for JavaScr...

MySQL InnoDB transaction lock source code analysis

Table of contents 1. Lock and Latch 2. Repeatable...