MySQL data compression performance comparison details

MySQL data compression performance comparison details

The data required by the Data Cube is rarely or never updated once it is written. This kind of data is well suited for compression to reduce disk usage. MySQL itself provides two compression methods - archive engine and myisampack method for MyISAM engine. Today, we tested these two methods separately and compared their advantages and disadvantages in terms of disk usage and query performance. As for why I do this, you should understand, and I will introduce it later. And look at the text:

1. Test environment

1.1 Hardware and Software

A 64-bit 2.6.18-92 kernel Linux development machine, 4G memory, and four 2800Mhz Dual-Core AMD Opteron (tm) Processor 2220 CPUs.

MySQL is placed on a 7200 rpm SAT hard disk, not in raid ;

No optimization was made to MySQL and query cache was turned off to prevent query cache from interfering with the test results.

1.2 Table Structure

2,424,753 records, actual data of a shard in the production environment;

The joint indexes ( partition_by1,idx_rank ) and ( partition_by1,chg_idx ) are established respectively, where partition_by1 is a varchar type with a length of 32 and is used for retrieval; the other two fields are floating point numbers and are mostly used for sorting;

As a sub-column, autokid acts as PRIMARY KEY and is only used to ensure atomicity during data loading. It has no practical significance.

2. Test Purpose

2.1 Comparison of compression space

The greater the compression rate, the smaller the disk space occupied, which directly reduces the data storage cost;

2.2 Query Performance Comparison

There should not be a noticeable degradation in query performance after compression. Archive does not support indexing, so performance degradation is inevitable. We should also have an idea of ​​how much the performance will be reduced and whether it is acceptable.

3. Testing Tools

3.1 mysqlslap

The official tool is of course the best choice. For an introduction to mysqlslap , please refer to the official documentation.

3.2 Test query

A total of 9973 actual SQL statements accessing the topranks_v3 table in the production environment were intercepted, from which 7 with the largest number of visits were extracted, with a concurrency of 50 and repeated 10 times. The command is as follows:

./mysqlslap --defaults-file=../etc/my.cnf -u**** -p**** -c50 -i10 -q ../t.sql --debug-info

4. Test conclusion

Comparison Items Disk space Time consumed (seconds) CPU Idle LOAD concurrent
Base table (MyISAM) 403956004 2.308 30 15 50
ARCHIVE 75630745 >300 75 4 1
PACK 99302109 2.596 30 twenty two 50

Based on the test data given in the table above, we can simply draw the following conclusions:

  • For the test table, the space occupied by Archive table is about 18.7% of the previous space, and the space occupied after myisampack is about 24.6% of the previous space. The difference between the two is not much. From the perspective of space utilization alone, it seems that we need to choose archive table.
  • Let's look at query performance and compare it with the benchmark table. In terms of total time and system load, the query performance of pack table under 50 concurrency is comparable to that of the benchmark table; while archive table takes more than 5 minutes under single concurrency (I can't wait any longer, so I kill it)!

So, we seem to be able to conclude that for tables that require online queries, ARCHIVE engine can basically be ignored.

Why is ARCHIVE engine so slow during this test?

We know that mysql provides the archive storage engine to reduce disk overhead, but there is also a prerequisite, that is, the archived data does not need to be or is rarely queried online, and it does not matter if the query is slow occasionally. Due to the above reasons, archive tables are not allowed to create indexes other than auto-increment columns.

With this consensus, let's take a test SQL to analyze why there is such a big difference in query performance before and after not using the index.

In our test SQL, there is such a line:

SELECT c1,c2,...,cn FROM mysqlslap.rpt_topranks_v3
WHERE ... AND partition_by1 = '50008090'
ORDER BY added_quantity3 DESC
LIMIT 500


As we said before, the test table has an index on the partition_by1 field. So, we preliminarily judge that this query should use the partition_by1 index on the benchmark table and myisampack table. EXPLAIN:

mysql> EXPLAIN
    -> SELECT ... FROM mysqlslap.rpt_topranks_v3
    -> WHERE ... AND partition_by1 = '50008090'
    -> ORDER BY added_quantity3 DESC
    -> LIMIT 500\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        TABLE: rpt_topranks_v3
         type: ref
possible_keys: idx_toprank_pid,idx_toprank_chg
          KEY: idx_toprank_pid
      key_len: 99
          ref: const
         rows: 2477
        Extra: USING WHERE; USING filesort
1 row IN SET (0.00 sec)

As we expected, this query uses the index on the partition_by1 field, matches the target number of rows 2477, and then there is a sort on the added_quantity3 field. Since added_quantity3 has no index, filesort is used.

Let's take a look at the EXPLAIN results of this SQL on the archive table:

mysql> EXPLAIN
    -> SELECT ... FROM mysqlslap.rpt_topranks_v3_<strong>archive</strong>
    -> WHERE ... AND partition_by1 = '50008090'
    -> ORDER BY added_quantity3 DESC
    -> LIMIT 500\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        TABLE: rpt_topranks_v3_archive
         type: ALL
possible_keys: NULL
          KEY: NULL
      key_len: NULL
          ref: NULL
         rows: 2424753
        Extra: USING WHERE; USING filesort
1 row IN SET (0.00 sec)


EXPLAIN says: " I don't have any indexes available, so I can only scan the entire table for 2424753 rows and then do a filesort ." If you want performance, then you are obviously wronging MySQL .

This is the end of this article about the details of MySQL data compression performance comparison. For more information about MySQL data compression performance comparison, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed tutorial on installation and configuration of compressed version of MySQL database
  • Python regularly backs up MySQL data by date and compresses it
  • MySQL database backup command sharing (MySQL compressed database backup)

<<:  How to use Docker to build a development environment (Windows and Mac)

>>:  Double loading issue when the page contains img src

Recommend

Remote development with VSCode and SSH

0. Why do we need remote development? When develo...

Detailed explanation of how to mount remote file systems via SSH on Linux

Features of SSHFS: Based on FUSE (the best usersp...

How to handle spaces in CSS

1. Space rules Whitespace within HTML code is usu...

In-depth analysis of Linux NFS mechanism through cases

Continuing from the previous article, we will cre...

Eight hook functions in the Vue life cycle camera

Table of contents 1. beforeCreate and created fun...

Vue realizes simple effect of running light

This article shares the specific code of Vue to a...

Pros and Cons of Vite and Vue CLI

There is a new build tool in the Vue ecosystem ca...

MySQL 8.0 New Features - Introduction to Check Constraints

Table of contents Preface Check Constraints Creat...

Detailed explanation of the getBoundingClientRect() method in js

1. getBoundingClientRect() Analysis The getBoundi...

Detailed example of using typescript to encapsulate axios in Vue3

This axios package is used in the vue3 demo. For ...

Detailed explanation of the process of modifying Nginx files in centos7 docker

1. Install nginx in docker: It is very simple to ...

2 reasons why html-css tag style setting does not work

1 CSS style without semicolon ";" 2 Tags...