Practical record of optimizing MySQL tables with tens of millions of data

Preface

Let me explain here first. Many people on the Internet say that Alibaba stipulates that 5 million data must be divided into different databases and tables. In fact, this 5 million is not a fixed value, but is related to the configuration of MySQL and the hardware of the machine. In order to improve performance, MySQL will load the table index into memory. However, when the data in the table reaches a certain amount, the memory will be unable to store these indexes. Without the ability to store indexes, only disk IO can be performed, resulting in performance degradation.

Practical Tuning

I have a table with 1000w data and only one primary key index.

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `uname` varchar(20) DEFAULT NULL COMMENT 'Account',
  `pwd` varchar(20) DEFAULT NULL COMMENT 'Password',
  `addr` varchar(80) DEFAULT NULL COMMENT 'Address',
  `tel` varchar(20) DEFAULT NULL COMMENT 'Phone',
  `regtime` char(30) DEFAULT NULL COMMENT 'Registration time',
  `age` int(11) DEFAULT NULL COMMENT 'Age',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10000003 DEFAULT CHARSET=utf8;

Query all about 16s. It's quite slow. Usually we have a backend system, such as an e-commerce platform, and this is a user table. The backend management system generally queries these user information and performs some operations, such as directly adding new users or deleting users.

So there are two requirements here, one is to query count, the other is to query by page

Let's test the time used for count and the time used for paging query respectively.

select * from user limit 1, 10 //Almost not used select * from user limit 1000000, 10 //0.35s
select * from user limit 5000000, 10 //1.7s
select * from user limit 9000000, 10 //2.8s
select count(1) from user //1.7s

From the query time above, we can see that if it is a paging query, the more data is queried, the longer it takes, and the query count also takes 1.7s. This obviously does not meet our requirements. So, we need to optimize here. First, let's try index optimization. This is the execution plan with only the primary key index:

alter table `user` add INDEX `sindex` (`uname`,`pwd`,`addr`,`tel`,`regtime`,`age`)

Looking at the execution plan above, although type is changed from all->index and the sindex index is used, the query speed does not actually change.

In fact, creating a joint index is to make conditional queries faster, rather than full table queries.

select * from user where uname='6.445329111484186' //3.5s (no joint index)
select * from user where uname='6.445329111484186' //0.003s (with joint index)

So this is the difference between having a joint index and having no index

It can be basically proved here that the efficiency will be very slow when performing full table query with or without index.

Since the index result is no longer useful, we can only look for other solutions. According to what I said in my previous MySQL interview, count can be stored in a table separately

CREATE TABLE `attribute` (
  `id` int(11) NOT NULL,
  `formname` varchar(50) COLLATE utf8_bin NOT NULL COMMENT 'table name',
  `formcount` int(11) NOT NULL COMMENT 'Total data of the table',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Here I would like to say that this kind of table generally does not query all, but only one, so when creating the table, you can create a hash

select formcount from attribute where formname='user' //Almost not used

The optimization of count is completed. If there are selection conditions above, you can create an index and query by index filtering, so you don't need to read the count.

Well, count is fine, but how to optimize paging query? Here we can use subqueries to optimize

select * from user where
id>=(select id from user limit 9000000,1) limit 10 //1.7s

In fact, the writing of subqueries in this way to determine the id is actually querying through covering indexes. Efficiency will be greatly increased. However, my test here is 1.7s. When the company optimized this aspect before, the query time was lower than this. You can also generate data and test it yourself.

But if the amount of data is too large, I still recommend using es or making some default choices. Count can be listed separately.

At this point, the optimization of a paging query for tens of millions of data has been completed.

Summarize

This is the end of this article about MySQL tens of millions of data table optimization. For more relevant MySQL tens of millions of data table optimization content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

Let's talk in detail about the direction of slow SQL optimization in MySQL
A brief discussion on MySQL select optimization solution
Mysql vertical table conversion to horizontal table method and optimization tutorial
Implementation and optimization of MySql subquery IN
Help you quickly optimize MySQL
MySQL data insertion optimization method concurrent_insert
MySQL optimization query_cache_limit parameter description
MySQL optimization: how to write high-quality SQL statements
MySQL query optimization: a table optimization solution for 1 million data
MYSQL's 10 classic optimization cases and scenarios

<<: Sample code for implementing mysql master-slave replication in docker

>>: Significantly optimize the size of PNG images with CSS mask (recommended)

HTML fixed title column, title header table specific implementation code

In MySQL database, datetime, bigint, and timestamp are used to represent time selection. Which one is the most efficient for storing time?

Blog

Practical record of optimizing MySQL tables with tens of millions of data

Preface

Practical Tuning

Summarize

HTML fixed title column, title header table specific implementation code

Detailed explanation of Nest.js environment variable configuration and serialization

Detailed tutorial on OpenStack environment deployment based on CentOS (OpenStack installation)

Font Treasure House 50 exquisite free English font resources Part 2

A Brief Analysis of CSS Selector Grouping

MySQL 5.7.20 zip installation tutorial

Vue component encapsulates sample code for uploading pictures and videos

A simple example of how to implement fuzzy query in Vue

Solve the pitfall of storing boolean type values in localstorage

In MySQL database, datetime, bigint, and timestamp are used to represent time selection. Which one is the most efficient for storing time?

Recommend

Docker's health detection mechanism

Basic ideas and codes for implementing video players in browsers

Modify the default color of the input box placeholder text -webkit-input-placeholder method

Detailed explanation of MySQL master-slave replication process

Vue.js implements timeline function

Analysis of SQL integrity constraint statements in database

Summary of Textarea line break issues in HTML

Detailed tutorial on building nextcloud private cloud storage network disk

React native realizes the monitoring gesture up and down pull effect

Docker container introduction

win2008 server security settings deployment document (recommended)

CSS Paint API: A CSS-like Drawing Board

Detailed explanation of daily_routine example code in Linux

Several ways to implement inheritance in JavaScript

Graphical explanation of the function call of proto file in Vue