Getting better compression than InnoDB?

I have been very happy with the benefit from InnoDB compression and a lot of hard work locally by my peers to make it work in production for us. When a table is compressed with InnoDB, you must declare a compression factor for that table (key_block_size=...) so that the table will possibly be compressed by 2X, 4X, 8X, or 16X. We have been using key_block_size=8 for many tables to get 2X compression.

I think it is possible to do much (2X or more) better than InnoDB. InnoDB uses an update-in-place approach to modify database pages on disk. There are solutions described as write-optimized (shadow pages, log-structured merge tree, fractal tree) that have a few significant advantages and one potential disadvantage with respect to compression. The advantages of the write-optimized family include:

better fill-factor - leaf nodes of an update-in-place B-Tree tend to be about 2/3 full after a sequence of random operations. This does not occur for the write-optimized family.
no round-up - when a 16kb in-memory page is compressed to 6kb for an InnoDB table that uses key_block_size=8, then 8kb is still written to disk. Many write-optimized solutions only write out 6kb to disk in this case.
compress-once - for a write-optimized solution a page is compressed once and decompressed many times. For InnoDB pages are compressed and decompressed frequently. InnoDB must use a compression algorithm that is equally good at both operations. But for a write-optimized solution you can use an algorithm that makes decompression much faster than compression. You can also use an algorithm that spends a lot of time compressing data to improve the compression rate.
larger leaf nodes - database pages for InnoDB can be no larger than 16kb. We are working to modify it to support 32kb and 64kb pages. Hopefully that is possible. But it is very easy to use larger leaf nodes for solutions in the write-optimized family. Larger pages support larger compression windows which provide better compression rates.

From the reasons above, the first two are a big deal and explain why it is possible to get much better compression rates from the write-optimized alternatives. An additional benefit from a write-optimized solution is that they don't suffer from fragmentation so a database won't grow over when it is updated without adding data to it and there is no need to reorg tables to get space back. While they don't suffer from fragmentation they do introduce a new source of inefficiency. Write-optimized solutions usually have some amount of database space occupied by dead rows (rows that cannot be read by any current or future transactions). Compaction and garbage collection are the methods used to reclaim this space and you might be able to estimate the amount of space required for this by considering the row change rate and the frequency at which compaction can be done.

PlanetMySQL Voting: Vote UP / Vote DOWN

Getting better compression than InnoDB?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List