A brief discussion on MySQL large table optimization solution

A brief discussion on MySQL large table optimization solution

background

The amount of new data in the business table of Alibaba Cloud RDS for MySQL (MySQL version 5.7) database exceeds 10 million per month. As the amount of data continues to increase, our business has slow queries on large tables. During peak business hours, slow queries on the main business table take tens of seconds, which seriously affects the business.

Program Overview

20201030141518

1. Database design and index optimization

The MySQL database itself is highly flexible, resulting in insufficient performance, which is heavily dependent on the developer's table design and index optimization capabilities. Here are some optimization suggestions

  • Convert the time type to timestamp format, store it in int type, and create an index to increase query efficiency
  • It is recommended to define the field as not null. Null values ​​are difficult to query and take up extra index space.
  • Use TINYINT type instead of ENUM
  • To store precise floating point numbers, DECIMAL must be used instead of FLOAT and DOUBLE.
  • The length of the field should be determined according to business needs. Do not set it too large.
  • Try not to use the TEXT type. If you must use it, it is recommended to split the uncommon large fields into other tables.
  • MySQL has a limit on the length of index fields. The default limit for the length of each index column of the InnoDB engine is 767 bytes. The total length of all index columns cannot be greater than 3072 bytes (MySQL 8.0 single index can create 1024 characters)
  • Please contact the DBA if DDL is required for large tables

Leftmost index matching rule

As the name implies, leftmost first. When creating a composite index, the most frequently used column in the where clause should be placed on the leftmost side according to business needs. A very important question for composite indexes is how to arrange the order of columns. For example, if two fields, c1 and c2, are used after where, should the index order be (c1, c2) or (c2, c1)? The correct approach is to put columns with fewer duplicate values ​​at the front. For example, if 95% of the values ​​in a column are not duplicated, then this column can generally be placed at the front.

  • Composite index index(a,b,c)
  • where a=3 only a is used
  • where a=3 and b=5 uses a,b
  • where a=3 and b=5 and c=4 uses a,b,c
  • where b=3 or where c=4 does not use the index
  • where a=3 and c=4 only a is used
  • where a=3 and b>10 and c=7 uses a,b
  • where a=3 and b like 'xx%' and c=7 uses a,b
  • In fact, it is equivalent to creating multiple indexes: key(a), key(a,b), key(a,b,c)

2. Switch the database to PloarDB read-write separation

PolarDB is Alibaba Cloud's self-developed next-generation relational cloud database, which is 100% compatible with MySQL and has a storage capacity of up to 100 TB. A single database can be expanded to up to 16 nodes, making it suitable for a variety of database application scenarios in enterprises. PolarDB adopts an architecture that separates storage and computing. All computing nodes share a copy of the data, providing minute-level configuration upgrades and upgrades, second-level fault recovery, global data consistency, and free data backup and disaster recovery services.

Cluster architecture, separation of computing and storage
PolarDB uses a multi-node cluster architecture. There is a Writer node (master node) and multiple Reader nodes (read-only nodes) in the cluster. Each node shares the underlying storage (PolarStore) through a distributed file system (PolarFileSystem).

Read-write separation When an application uses a cluster address, PolarDB provides external services through an internal proxy layer (Proxy). The application's requests first pass through the proxy before accessing the database node. The proxy layer can not only perform security authentication and protection, but also parse SQL, send write operations (such as transactions, UPDATE, INSERT, DELETE, DDL, etc.) to the master node, and distribute read operations (such as SELECT) evenly to multiple read-only nodes to achieve automatic read-write separation. For applications, it's as easy as using a single point database.

In offline hybrid scenarios: different services use different connection addresses and different data nodes to avoid mutual impact

20201029160013

Sysbench performance stress test report:

PloarDB 4 cores 16G 2 units

20201029160525

20201029160550

PloarDB 8 cores 32G 2 units

20201029160755

20201029160845

3. Migrate historical data from sharded tables to the MySQL 8.0 X-Engine storage engine

The business table retains three months of data (this is based on company needs), and the historical data is divided into monthly tables to the historical database X-Engine storage engine table. Why should we choose the X-Engine storage engine table? What are its advantages?

Cost savings: X-Engine's storage cost is about half of InnoDB's

X-Engine tiered storage improves QPS. It uses a hierarchical storage structure to store hot data and cold data in different layers, and compresses the layer where cold data is located by default.

X-Engine is an online transaction processing (OLTP) database storage engine developed by Alibaba Cloud's database product division.
The X-Engine storage engine is not only seamlessly compatible with MySQL (thanks to the MySQL Pluginable Storage Engine feature), but also uses a tiered storage architecture. Because the goal is to store large-scale, massive data, provide high-concurrency transaction processing capabilities, and reduce storage costs, in most large data scenarios, data is accessed unevenly, and frequently accessed hot data actually accounts for a small proportion. X-Engine divides data into multiple levels based on the frequency of data access. Based on the access characteristics of each level of data, it designs a corresponding storage structure and writes it to a suitable storage device.

  • X-Engine uses LSM-Tree as the architectural basis for tiered storage and has been redesigned to:
  • The hot data layer and data updates use memory storage, and the performance of transaction processing is improved through memory database technology (Lock-Free index structure/append only).
  • The pipeline transaction processing mechanism runs several stages of transaction processing in parallel, greatly improving throughput.
  • Data with low access frequency is gradually eliminated or merged into a persistent storage layer and stored in combination with multi-level storage devices (NVM/SSD/HDD).
  • A lot of optimizations have been made to the compaction process which has a significant impact on performance:
  • Split the data storage granularity, take advantage of the characteristics of concentrated data update hotspots, and reuse data as much as possible during the merging process.
  • Fine-grained control of the shape of the LSM reduces I/O and computing costs, effectively alleviating the increase in space during the merge process.
  • At the same time, more fine-grained access control and caching mechanisms are used to optimize read performance.

20201029162440

4. Alibaba Cloud PloarDB MySQL 8.0 version parallel query

After the table is split, our data volume is still large, which does not completely solve our slow query problem, but only reduces the size of our business table. For this part of the slow query, we need to use PolarDB's parallel query optimization

PolarDB MySQL 8.0 has launched a parallel query framework. When the amount of data you query reaches a certain threshold, the parallel query framework will be automatically started, which will exponentially reduce the query time. The data will be sharded into different threads at the storage layer, and multiple threads will calculate in parallel. The result pipeline will be aggregated to the main thread, and finally the main thread will do some simple merging and return it to the user, improving query efficiency.
Parallel Query uses the parallel processing capabilities of multi-core CPUs. Taking an 8-core 32 GB configuration as an example, the schematic diagram is shown below.

20201029163124

Parallel query is applicable to most SELECT statements, such as large table queries, multi-table join queries, and queries with large computational workloads. For very short queries, the effect is less dramatic.

Parallel query usage: Hint syntax can be used to control a single statement. For example, if the system turns off parallel query by default, but a high-frequency slow SQL query needs to be accelerated, Hint can be used to accelerate the specific SQL.

SELECT /+PARALLEL(x)/ … FROM …; – x > 0

SELECT /*+ SET_VAR(max_parallel_degree=n) */ * FROM … // n > 0

Query test: Database configuration 16 cores 32G single table data volume exceeds 30 million

The time before parallel query was 4326ms, and after parallel query was added, it was 525ms, which is an 8.24 times performance improvement.

lALPDhmOtqINirTNAl_NBIw_1164_607

lALPDgQ9vsVjxDbNAl7NBHk_1145_606

5. Interactive Analysis of Hologre

Although we have improved the efficiency of slow queries on large tables by using parallel query optimization, we are still unable to achieve some specific requirements such as real-time reports and real-time large screens, and can only rely on big data for processing.
Here I recommend Alibaba Cloud's interactive analysis Hologre (
https://help.aliyun.com/product/113622.html)

20201030151537

6. Postscript

The optimization of large tables with tens of millions of data is based on business scenarios and is optimized at the expense of cost. It is not about horizontally splitting and expanding the database right away. This will bring huge challenges to operation and maintenance and business, and the effect may not be good in many cases. Whether our database design, index optimization, and table partitioning strategy are in place, we should choose the appropriate technology to implement it according to business needs.

This is the end of this article about MySQL large table optimization solutions. For more relevant MySQL large table optimization content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL optimization solution: enable slow query log
  • A brief discussion on MySQL select optimization solution
  • MySQL query optimization: a table optimization solution for 1 million data
  • Reasons and optimization solutions for slow MySQL limit paging with large offsets
  • MySQL functional index optimization solution
  • MySQL Optimization Solution Reference
  • Several common optimization solutions for MySQL

<<:  The principle and implementation of two-way binding in Vue2.x

>>:  How to monitor Tomcat using LambdaProbe

Recommend

Tutorial on using the frameset tag in HTML

Frameset pages are somewhat different from ordina...

How to align text boxes in multiple forms in HTML

The form code is as shown in the figure. The styl...

CSS sets the box container (div) height to always be 100%

Preface Sometimes you need to keep the height of ...

Vue implements the digital thousands separator format globally

This article example shares the specific code for...

Implementation example of react project from new creation to deployment

Start a new project This article mainly records t...

Vue uses the video tag to implement video playback

This article shares the specific code of Vue usin...

JavaScript article will show you how to play with web forms

1. Introduction Earlier we introduced the rapid d...

Basic operations on invisible columns in MySQL 8.0

Table of contents 01 Create invisible columns 02 ...

Detailed explanation of MySQL EXPLAIN output columns

1. Introduction The EXPLAIN statement provides in...

The process of installing SVN on Ubuntu 16.04.5LTS

This article briefly introduces the process of se...

Vue project realizes login and registration effect

This article example shares the specific code of ...

Summary of various methods of implementing article dividing line styles with CSS

This article summarizes various ways to implement...

6 Practical Tips for TypeScript Development

Table of contents 1. Determine the entity type be...

Method of implementing recursive components based on Vue technology

describe This article introduces a method to impl...