Practical record of optimizing MySQL tables with tens of millions of data

Practical record of optimizing MySQL tables with tens of millions of data

Preface

Let me explain here first. Many people on the Internet say that Alibaba stipulates that 5 million data must be divided into different databases and tables. In fact, this 5 million is not a fixed value, but is related to the configuration of MySQL and the hardware of the machine. In order to improve performance, MySQL will load the table index into memory. However, when the data in the table reaches a certain amount, the memory will be unable to store these indexes. Without the ability to store indexes, only disk IO can be performed, resulting in performance degradation.

Practical Tuning

I have a table with 1000w data and only one primary key index.

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `uname` varchar(20) DEFAULT NULL COMMENT 'Account',
  `pwd` varchar(20) DEFAULT NULL COMMENT 'Password',
  `addr` varchar(80) DEFAULT NULL COMMENT 'Address',
  `tel` varchar(20) DEFAULT NULL COMMENT 'Phone',
  `regtime` char(30) DEFAULT NULL COMMENT 'Registration time',
  `age` int(11) DEFAULT NULL COMMENT 'Age',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10000003 DEFAULT CHARSET=utf8;

Query all about 16s. It's quite slow. Usually we have a backend system, such as an e-commerce platform, and this is a user table. The backend management system generally queries these user information and performs some operations, such as directly adding new users or deleting users.

So there are two requirements here, one is to query count, the other is to query by page

Let's test the time used for count and the time used for paging query respectively.

select * from user limit 1, 10 //Almost not used select * from user limit 1000000, 10 //0.35s
select * from user limit 5000000, 10 //1.7s
select * from user limit 9000000, 10 //2.8s
select count(1) from user //1.7s

From the query time above, we can see that if it is a paging query, the more data is queried, the longer it takes, and the query count also takes 1.7s. This obviously does not meet our requirements. So, we need to optimize here. First, let's try index optimization. This is the execution plan with only the primary key index:

alter table `user` add INDEX `sindex` (`uname`,`pwd`,`addr`,`tel`,`regtime`,`age`)

Looking at the execution plan above, although type is changed from all->index and the sindex index is used, the query speed does not actually change.

In fact, creating a joint index is to make conditional queries faster, rather than full table queries.

select * from user where uname='6.445329111484186' //3.5s (no joint index)
select * from user where uname='6.445329111484186' //0.003s (with joint index)

So this is the difference between having a joint index and having no index

It can be basically proved here that the efficiency will be very slow when performing full table query with or without index.

Since the index result is no longer useful, we can only look for other solutions. According to what I said in my previous MySQL interview, count can be stored in a table separately

CREATE TABLE `attribute` (
  `id` int(11) NOT NULL,
  `formname` varchar(50) COLLATE utf8_bin NOT NULL COMMENT 'table name',
  `formcount` int(11) NOT NULL COMMENT 'Total data of the table',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Here I would like to say that this kind of table generally does not query all, but only one, so when creating the table, you can create a hash

select formcount from attribute where formname='user' //Almost not used

The optimization of count is completed. If there are selection conditions above, you can create an index and query by index filtering, so you don't need to read the count.

Well, count is fine, but how to optimize paging query? Here we can use subqueries to optimize

select * from user where
id>=(select id from user limit 9000000,1) limit 10 //1.7s

In fact, the writing of subqueries in this way to determine the id is actually querying through covering indexes. Efficiency will be greatly increased. However, my test here is 1.7s. When the company optimized this aspect before, the query time was lower than this. You can also generate data and test it yourself.

But if the amount of data is too large, I still recommend using es or making some default choices. Count can be listed separately.

At this point, the optimization of a paging query for tens of millions of data has been completed.

Summarize

This is the end of this article about MySQL tens of millions of data table optimization. For more relevant MySQL tens of millions of data table optimization content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Let's talk in detail about the direction of slow SQL optimization in MySQL
  • A brief discussion on MySQL select optimization solution
  • Mysql vertical table conversion to horizontal table method and optimization tutorial
  • Implementation and optimization of MySql subquery IN
  • Help you quickly optimize MySQL
  • MySQL data insertion optimization method concurrent_insert
  • MySQL optimization query_cache_limit parameter description
  • MySQL optimization: how to write high-quality SQL statements
  • MySQL query optimization: a table optimization solution for 1 million data
  • MYSQL's 10 classic optimization cases and scenarios

<<:  Sample code for implementing mysql master-slave replication in docker

>>:  Significantly optimize the size of PNG images with CSS mask (recommended)

Recommend

Docker's health detection mechanism

For containers, the simplest health check is the ...

Basic ideas and codes for implementing video players in browsers

Table of contents Preface Summary of audio and vi...

Detailed explanation of MySQL master-slave replication process

1. What is master-slave replication? The DDL and ...

Vue.js implements timeline function

This article shares the specific code of Vue.js t...

Analysis of SQL integrity constraint statements in database

Integrity constraints Integrity constraints are f...

Summary of Textarea line break issues in HTML

Recently, I encountered a problem of whether the d...

Detailed tutorial on building nextcloud private cloud storage network disk

Nextcloud is an open source and free private clou...

React native realizes the monitoring gesture up and down pull effect

React native implements the monitoring gesture to...

Docker container introduction

1. Overview 1.1 Basic concepts: Docker is an open...

win2008 server security settings deployment document (recommended)

I had been working on the project before the New ...

CSS Paint API: A CSS-like Drawing Board

1. Use Canvas images as CSS background images The...

Detailed explanation of daily_routine example code in Linux

First look at the example code: #/bin/bash cal da...

Several ways to implement inheritance in JavaScript

Table of contents Structural inheritance (impleme...

Graphical explanation of the function call of proto file in Vue

1. Compile proto Create a new proto folder under ...