MySQL data compression performance comparison details

1. Test environment

1.1 Hardware and Software
1.2 Table Structure

2. Test Purpose

2.1 Comparison of compression space
2.2 Query Performance Comparison

3. Testing Tools

3.1 mysqlslap
3.2 Test query

4. Test conclusion

The data required by the Data Cube is rarely or never updated once it is written. This kind of data is well suited for compression to reduce disk usage. MySQL itself provides two compression methods - archive engine and myisampack method for MyISAM engine. Today, we tested these two methods separately and compared their advantages and disadvantages in terms of disk usage and query performance. As for why I do this, you should understand, and I will introduce it later. And look at the text:

1. Test environment

1.1 Hardware and Software

A 64-bit 2.6.18-92 kernel Linux development machine, 4G memory, and four 2800Mhz Dual-Core AMD Opteron (tm) Processor 2220 CPUs.

MySQL is placed on a 7200 rpm SAT hard disk, not in raid ;

No optimization was made to MySQL and query cache was turned off to prevent query cache from interfering with the test results.

1.2 Table Structure

2,424,753 records, actual data of a shard in the production environment;

The joint indexes ( partition_by1,idx_rank ) and ( partition_by1,chg_idx ) are established respectively, where partition_by1 is a varchar type with a length of 32 and is used for retrieval; the other two fields are floating point numbers and are mostly used for sorting;

As a sub-column, autokid acts as PRIMARY KEY and is only used to ensure atomicity during data loading. It has no practical significance.

2. Test Purpose

2.1 Comparison of compression space

The greater the compression rate, the smaller the disk space occupied, which directly reduces the data storage cost;

2.2 Query Performance Comparison

There should not be a noticeable degradation in query performance after compression. Archive does not support indexing, so performance degradation is inevitable. We should also have an idea of how much the performance will be reduced and whether it is acceptable.

3. Testing Tools

3.1 mysqlslap

The official tool is of course the best choice. For an introduction to mysqlslap , please refer to the official documentation.

3.2 Test query

A total of 9973 actual SQL statements accessing the topranks_v3 table in the production environment were intercepted, from which 7 with the largest number of visits were extracted, with a concurrency of 50 and repeated 10 times. The command is as follows:

./mysqlslap --defaults-file=../etc/my.cnf -u**** -p**** -c50 -i10 -q ../t.sql --debug-info

4. Test conclusion

Comparison Items	Disk space	Time consumed (seconds)	CPU Idle	LOAD	concurrent
Base table (MyISAM)	403956004	2.308	30	15	50
ARCHIVE	75630745	>300	75	4	1
PACK	99302109	2.596	30	twenty two	50

Based on the test data given in the table above, we can simply draw the following conclusions:

For the test table, the space occupied by Archive table is about 18.7% of the previous space, and the space occupied after myisampack is about 24.6% of the previous space. The difference between the two is not much. From the perspective of space utilization alone, it seems that we need to choose archive table.
Let's look at query performance and compare it with the benchmark table. In terms of total time and system load, the query performance of pack table under 50 concurrency is comparable to that of the benchmark table; while archive table takes more than 5 minutes under single concurrency (I can't wait any longer, so I kill it)!

So, we seem to be able to conclude that for tables that require online queries, ARCHIVE engine can basically be ignored.

Why is ARCHIVE engine so slow during this test?

We know that mysql provides the archive storage engine to reduce disk overhead, but there is also a prerequisite, that is, the archived data does not need to be or is rarely queried online, and it does not matter if the query is slow occasionally. Due to the above reasons, archive tables are not allowed to create indexes other than auto-increment columns.

With this consensus, let's take a test SQL to analyze why there is such a big difference in query performance before and after not using the index.

In our test SQL, there is such a line:

SELECT c1,c2,...,cn FROM mysqlslap.rpt_topranks_v3
WHERE ... AND partition_by1 = '50008090'
ORDER BY added_quantity3 DESC
LIMIT 500

As we said before, the test table has an index on the partition_by1 field. So, we preliminarily judge that this query should use the partition_by1 index on the benchmark table and myisampack table. EXPLAIN:

mysql> EXPLAIN
    -> SELECT ... FROM mysqlslap.rpt_topranks_v3
    -> WHERE ... AND partition_by1 = '50008090'
    -> ORDER BY added_quantity3 DESC
    -> LIMIT 500\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        TABLE: rpt_topranks_v3
         type: ref
possible_keys: idx_toprank_pid,idx_toprank_chg
          KEY: idx_toprank_pid
      key_len: 99
          ref: const
         rows: 2477
        Extra: USING WHERE; USING filesort
1 row IN SET (0.00 sec)

As we expected, this query uses the index on the partition_by1 field, matches the target number of rows 2477, and then there is a sort on the added_quantity3 field. Since added_quantity3 has no index, filesort is used.

Let's take a look at the EXPLAIN results of this SQL on the archive table:

mysql> EXPLAIN
    -> SELECT ... FROM mysqlslap.rpt_topranks_v3_<strong>archive</strong>
    -> WHERE ... AND partition_by1 = '50008090'
    -> ORDER BY added_quantity3 DESC
    -> LIMIT 500\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        TABLE: rpt_topranks_v3_archive
         type: ALL
possible_keys: NULL
          KEY: NULL
      key_len: NULL
          ref: NULL
         rows: 2424753
        Extra: USING WHERE; USING filesort
1 row IN SET (0.00 sec)

EXPLAIN says: " I don't have any indexes available, so I can only scan the entire table for 2424753 rows and then do a filesort ." If you want performance, then you are obviously wronging MySQL .

This is the end of this article about the details of MySQL data compression performance comparison. For more information about MySQL data compression performance comparison, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

Detailed tutorial on installation and configuration of compressed version of MySQL database
Python regularly backs up MySQL data by date and compresses it
MySQL database backup command sharing (MySQL compressed database backup)

<<: How to use Docker to build a development environment (Windows and Mac)

>>: Double loading issue when the page contains img src