Example to explain the size of MySQL statistics table

Example to explain the size of MySQL statistics table

Counting the size of each table in each database is one of the simplest requirements of data governance. This article will count the data volume of each database and table in MySQL from two aspects: sampling statistical results and precise statistical results.

1. Statistical estimation of data volume

The MySQL data dictionary library information_schema records the estimated amount of data (inaccurate for InnoDB engine tables, accurate for MyISAM engine tables) and information such as data size, index size, and table fragment size.

If you want to know the approximate data volume of each database and table, you can directly query information_schema.tables for statistics. For example:

SELECT table_schema,table_name,table_rows,data_length+index_length+ data_free data_size
 FROM information_schema.`TABLES` WHERE table_schema IN ('db1','db2');

The unit of data_size is B

As mentioned above, the number of data items and sizes in the statistical information are based on the sampling statistics of part of the data, which may be different from the actual size. The larger the table, the more obvious the difference. If you want to know the actual situation of each table, you need to use the subsequent methods.

2. Count the actual amount of data

If you want to count the actual size of each table, you have to traverse each table and calculate the number of records. You can check the size of each table by checking the table space size (each table has an independent table space). The purpose of accurate statistics can be achieved by following the steps below.

Creating Paths

Create a working directory to save scripts and temporary files, etc.

mkdir -p /usr/local/data_size

Create statistics database and table

Create a statistics database on the database instance that needs statistics.

SQL> create database bak_db;

Create a stored procedure for statistics

SQL> use bak_db;
SQL>CREATE PROCEDURE `p_db_size`()
BEGIN
DECLARE v_id INT;
DECLARE v_maxid INT;
DECLARE v_tbname VARCHAR(50);
DECLARE v_dbname VARCHAR(50);
DECLARE v_sql_upd VARCHAR(200);
SET v_id =(SELECT MIN(id) FROM bak_db.tb_size);
SET v_maxid =(SELECT MAX(id) FROM bak_db.tb_size);
WHILE v_id <=v_maxid
DO
SET v_tbname = (SELECT tbname FROM bak_db.tb_size WHERE id=v_id);
SET v_dbname = (SELECT dbname FROM bak_db.tb_size WHERE id=v_id);
SET v_sql_upd = CONCAT('update bak_db.tb_size set tb_rows=(select count(*) from ',v_dbname,".",v_tbname,") where id=",v_id);
  SET @v_sql_upd := v_sql_upd;
  PREPARE stmt FROM @v_sql_upd;
  EXECUTE stmt ;
  DEALLOCATE PREPARE stmt;
  SET v_id = v_id +1;
END WHILE;
  END;

Creating a Script

vim data.sh
/* Insert the following content */

#!/bin/bash
cd /usr/local/data_size
du -s /data/mysql/mysql3306/data/db1/* |grep -v ".frm" |grep -v ".opt" >/usr/local/data_size/data_size     
du -s /data/mysql/mysql3306/data/db2/* |grep -v ".frm" |grep -v ".opt">>/usr/local/data_size/data_size

# The next 4 steps are to splice into sql 
awk '{print "insert into bak_db.tb_size(size,tb_route)values("""$0}' /usr/local/data_size/data_size >/usr/local/data_size/data_size1  
awk '{print $0";"}' /usr/local/data_size/data_size1 >/usr/local/data_size/data_size.sql
sed -i "s#\t#,'#g" /usr/local/data_size/data_size.sql
sed -i "s#;#');#g" /usr/local/data_size/data_size.sql

# Create a statistics table /usr/local/mysql5.7/bin/mysql -uroot -p'Test#123456' -h 192.168.28.132 -e "drop table if exists bak_db.tb_size;CREATE TABLE IF NOT EXISTS bak_db.tb_size ( id INT(11) NOT NULL PRIMARY KEY AUTO_INCREMENT ,size INT,tb_route VARCHAR(200),tbname VARCHAR(50),dbname VARCHAR(50),tb_rows INT(11));" 

# Import data /usr/local/mysql5.7/bin/mysql -uroot -p'Test#123456' -h 192.168.28.132 -e "use bak_db;truncate table bak_db.tb_size;source /usr/local/data_size/data_size.sql;"

# Generate library name and table name. Of course, this step can also be obtained from the data field. /usr/local/mysql5.7/bin/mysql -uroot -p'Test#123456' -h 192.168.28.132 -e "use bak_db;UPDATE bak_db.tb_size SET tbname=REPLACE(SUBSTRING_INDEX(tb_route,'/',-1),'.ibd','');"
 /usr/local/mysql5.7/bin/mysql -uroot -p'Test#123456' -h 192.168.28.132 -e "UPDATE bak_db.tb_size SET dbname=LEFT (SUBSTRING_INDEX(tb_route,'/',-2),INSTR(SUBSTRING_INDEX(tb_route,'/',-2),'/')-1);"
sleep 10 # If the previous steps are in the master database, it is recommended to pause for a while to avoid failure to obtain the table and content when performing statistics later. If the previous steps are in the slave database, this step can be omitted echo 'start call procedure' 
# Call the stored procedure to count the number of records in each table /usr/local/mysql5.7/bin/mysql -uroot -p'Test#123456' -h 192.168.28.132 -e "use bak_db;call bak_db.p_db_size();"

# Export the table and data /usr/local/mysql5.7/bin/mysqldump -uroot -p'Test#123456' -h 192.168.28.132 --single-transaction bak_db tb_size >/usr/local/data_size/tb_size.sql

# Import the table and results into the master database (the slave database is equivalent to deleting and rebuilding it once)
/usr/local/mysql5.7/bin/mysql -uroot -p'Test#123456' -h 192.168.28.128 -e "use bak_db;source /usr/local/data_size/tb_size.sql;"

The results are as follows:

It can be seen that the exact value is very different from the value in the statistical information, and the larger the table, the more obvious the difference.

TIPS: There is still a lot of room for optimization in the precise statistical script in this article. It was written in a hurry. You can adjust it as needed. My level is limited, so please feel free to correct me. If you have any questions, please feel free to communicate with me.

The above is an example to explain the details of MySQL statistical library table size. For more information about MySQL statistical library table size, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Overview of MySQL Statistics
  • A small Shell script to accurately count the number of rows in each Mysql table
  • MySQL implements multi-table association statistics (subquery statistics) example
  • Python implements Mysql data statistics and numpy statistical functions
  • sqlserver/mysql statistics of continuous time period data by day, hour, minute [recommended]
  • Detailed explanation of MySQL to obtain statistical data for each day and each hour of a certain period of time
  • PHP+MySQL to achieve daily data statistics optimization operation example within a period of time
  • PHP advertising click statistics code (php+mysql)
  • Reasons why MySQL 8.0 statistics are inaccurate

<<:  A brief analysis of React's understanding of state

>>:  Detailed explanation of how to use Docker to deploy Django+MySQL8 development environment

Recommend

Detailed explanation of the installation and use of Vue-Router

Table of contents Install Basic configuration of ...

Use of CSS3's focus-within selector

Pseudo-elements and pseudo-classes Speaking of th...

JavaScript to filter arrays

This article example shares the specific code for...

How to use time as a judgment condition in MySQL

Background: During the development process, we of...

How to implement animation transition effect on the front end

Table of contents Introduction Traditional transi...

Docker FAQ

Docker only maps ports to IPv6 but not to IPv4 St...

Ubuntu 16.04 kernel upgrade steps

1. Environment Ubuntu 16.04 running on a virtual ...

A brief introduction to web2.0 products and functions

<br />What is web2.0? Web2.0 includes those ...

How to understand semantic HTML structure

I believe everyone knows HTML and CSS, knows the ...

W3C Tutorial (12): W3C Soap Activity

Web Services are concerned with application-to-ap...

Baota Linux panel command list

Table of contents Install Pagoda Management Pagod...

JavaScript Dom Object Operations

Table of contents 1. Core 1. Get the Dom node 2. ...

React High-Order Component HOC Usage Summary

One sentence to introduce HOC What is a higher-or...

The use of v-model in vue3 components and in-depth explanation

Table of contents Use two-way binding data in v-m...