How do InnoDB and Postgres store large columns? Both use overflow pages for rows with large columns and support compression. Postgres is more likely to store large columns in overflow pages and InnoDB is likely to use more space in overflow pages. Tables can be queried in Postgres to determine the per-table space used by overflow pages. MySQL does not have tables for that but the data can be extracted with a lot more work and time.
InnoDB
Peter wrote an excellent post for InnoDB. There are two file formats, Antelope and Barracuda, and the general approach is the same. A row can use about 1/2 of a page which is ~8k for 16k pages. When the row is too big and has large columns, the largest of them are moved to overflow pages until the row fits. If the row is not small enough after all have been moved then an error is raised.
When a column is moved for the Antelope format the first 768 bytes are stored inline. If 10 large columns are moved they can still consume 7680 bytes inline, excluding the metadata bytes. Thus you can't have a row with 11 BLOB columns each with length 10,000 as that would not fit.
The difference with Barracuda is that when a large column is moved to an overflow page, only a 20-byte pointer is stored inline rather than a 768-byte prefix.
Large columns do not share overflow pages. When five columns each from two rows are moved to overflow pages, then at least ten overflow pages will be allocated. This can use much more space than expected and it is difficult to monitor. You can get the data on space used but it is not fun and it can take a long time (see xtrabackup --stats, upcoming patch to innochecksum in Facebook MySQL patch, InnoDB tablespace monitor).
The Barracuda format also supports compression for the overflow pages and with key_block_size=16 compression is only done for overflow pages (not for the inline columns).
Documentation can be tricky:
- This accurately describes Antelope behavior but does not state that this is Antelope behavior as the official docs describe the built-in behavior and the Antelope/Barracuda distinction occurs with the plug-in.
- This page confuses me and incorrectly describes the 768-byte prefix approach.
- This is a good but brief description of the changes with Barracuda.
- This is the best description and covers both the strategy for moving columns and the amount of space used by external pages.
Postgres
Disclaimer - I do not use Postgres, but I read the manuals, mailing lists and blog posts. What I have written below is a summary. There are many more details in the docs that I cite. They are worth reading.
Postgres uses TOAST (The Oversized Attribute Storage Technique) for large columns, has an 8k page size by default and rows cannot span pages. Large columns are compressed and then split into pieces (usually 2k each) where each piece is stored as a separate row in the per-table TOAST table. Queries can be done to determine the space used for each TOAST table.
Differences
Postgres is more likely than InnoDB to use overflow pages (the TOAST table) because large columns are stored in the TOAST table when a row is wider than TOAST_TUPLE_TARGET (2k by default). InnoDB does not use overflow pages until rows are wider than ~8k.
But InnoDB is likely to use more space in the overflow pages because each column that uses overflow pages requires its own pages. If 5 columns each from rows A and B use overflow pages then at least 10 pages are allocated by InnoDB. This can waste a lot of space. Postgres splits each large column into compressed pieces and treats each piece as a row in a TOAST table.
PlanetMySQL Voting: Vote UP / Vote DOWN