At the le MUG some questions raised on TokuDB, i hope this post will answer more in deph details
InnoDB like any other general purpose RDBMS can be really bad for online back-office with big tables, despite following some usual practice and OLTP normalize schema.
This happen frequently when
It can be highlight watching random IO disks until 100% disk usage
iostat –xm 2
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 0.00 305.00 101.00 4.68 0.28 25.04 1.01 2.48 2.41 97.65
sdb 0.00 0.00 285.50 110.50 4.37 0.34 24.36 1.02 2.59 2.47 98.00
sdb 0.00 0.00 361.50 158.50 5.59 2.73 32.76 1.18 2.27 1.87 97.00
iostat
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 1215.50 37040.00 335.00 74080 670
Today we force protection of the InnoDB buffer pool by setting all range query with a minimum constant date of today interval minus 2 month on most queries but it is not always possible and limit features
IBM 5030 M3 16 Nehalem cores 32G RAM XFS RAID10 4 SAS 15Ktpm
Schema extraction for the bench :
Every primary key are composed of a timestamp and a bigint UUID
+--------------------+--------------------+
| table_name |Millon rows |
+--------------------+--------------------+
| xxx2 | 52.8748 |
| xxx1 | 44.5123 |
| xxx3l | 31.6692 |
| xxx4 | 30.0430 |
MariaDB> select table_name,avg_row_length from information_schema.tables where table_name in('xxx1','xxx2','xxx3l','xxx4');
+-------------+----------------+
| table_name | avg_row_length |
+-------------+----------------+
| xxx2 | 237 |
| xxx3l | 610 |
| xxx4 | 87 |
| xxx1 | 248 |
MariaDB> show table_statistics;
| Table_name | Rows_read |
| xxx2 | 750284|
| xxx1 | 3118583 |
MariaDB> show index_statistics;
|Table_name | Index_name | Rows_read |
| xxx2 | apnum_idx | 64 475 |
| xxx1 | trsitemachdateheure_idx | 3 023 319 |
| xxx2 | apdatesitemach_idx | 456 567 |
| xxx1 | PRIMARY | 93 381 |
| xxx2 | PRIMARY | 228 113 |
From here we now estimate for every 100 eavy queries we touch
We estimate 10% of the data in memory on the production (200G out 24G)
InnoDB stay the reference for in memory concurency workload and this was expected as TokuDB is the challenger, but now we can already state that for any IO bound workload it will make your day.
There are many way to fixe those bad queries (de normalisation project) but nothing is more simple then the alter table engine=TokuDB on a slave and don't forget to alter secondary indexes using clustered index syntax .
![]()
PlanetMySQL Voting: Vote UP / Vote DOWN
InnoDB like any other general purpose RDBMS can be really bad for online back-office with big tables, despite following some usual practice and OLTP normalize schema.
- Having indexes in memory
- Touching a minimum set of rows for secondary index scan
- Scanning some range in primary key order
This happen frequently when
- The working dataset is growing 2 to 100 times the size of the hardware memory
- Joining big tables jumping from secondary indexes to clustered table
- Increase concurrency on a big tables with range scan starting from a random point in the PRIMARY KEY
It can be highlight watching random IO disks until 100% disk usage
iostat –xm 2
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 0.00 305.00 101.00 4.68 0.28 25.04 1.01 2.48 2.41 97.65
sdb 0.00 0.00 285.50 110.50 4.37 0.34 24.36 1.02 2.59 2.47 98.00
sdb 0.00 0.00 361.50 158.50 5.59 2.73 32.76 1.18 2.27 1.87 97.00
iostat
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 1215.50 37040.00 335.00 74080 670
Today we force protection of the InnoDB buffer pool by setting all range query with a minimum constant date of today interval minus 2 month on most queries but it is not always possible and limit features
Benchmark
IBM 5030 M3 16 Nehalem cores 32G RAM XFS RAID10 4 SAS 15Ktpm
Schema extraction for the bench :
InnoDB | TokuDB | |
Total Tablespace size | 249G | 155G |
Index size | 31G | 192G |
Buffer pool | 24G | 14G |
Every primary key are composed of a timestamp and a bigint UUID
Multiple big tables, many rows and big avg record size
MariaDB> select table_name, table_rows/1000000 from information_schema.tables order by table_rows desc limit 20;+--------------------+--------------------+
| table_name |Millon rows |
+--------------------+--------------------+
| xxx2 | 52.8748 |
| xxx1 | 44.5123 |
| xxx3l | 31.6692 |
| xxx4 | 30.0430 |
MariaDB> select table_name,avg_row_length from information_schema.tables where table_name in('xxx1','xxx2','xxx3l','xxx4');
+-------------+----------------+
| table_name | avg_row_length |
+-------------+----------------+
| xxx2 | 237 |
| xxx3l | 610 |
| xxx4 | 87 |
| xxx1 | 248 |
Index and data usage for the slow queries
Let’s run a test of 100 queries and see what is happening inside the storageMariaDB> show table_statistics;
| Table_name | Rows_read |
| xxx2 | 750284|
| xxx1 | 3118583 |
MariaDB> show index_statistics;
|Table_name | Index_name | Rows_read |
| xxx2 | apnum_idx | 64 475 |
| xxx1 | trsitemachdateheure_idx | 3 023 319 |
| xxx2 | apdatesitemach_idx | 456 567 |
| xxx1 | PRIMARY | 93 381 |
| xxx2 | PRIMARY | 228 113 |
From here we now estimate for every 100 eavy queries we touch
- 500M of row data ((750284*237) +(3118583*87))/1024/1024
- 47M of row index ((64475+3023319+456567)*12+(93381+228113)*22)/1024/1024
- 90% secondary indexes key read in xxx1, a 21GB table and 100G in production
- 10% secondary indexes key read xxx2 a 22GB table size and 100G in production
We estimate 10% of the data in memory on the production (200G out 24G)
BENCHMARK
We collect 1000 queries in the slow query log running beetween 10s to 30s inside InnoDB and replace the interval 2 month with 9 month.InnoDB key read /s | TokuDB key read /s | |
Full memory workload 100 slow queries concurrency 1 | Flat status 1M<handler_read_nxt<1,2M 12s : 10 tps | Flat status 1M<handler_read_nxt<1,2M 9s : 10 tps |
Full memory workload 100 slow queries concurrency 4 | Flat status 1M<handler_read_nxt<3M 4s : 40 tps | Negative scalability 400K<handler_read_nxt<1M 14s : 7,4 tps |
50% memory workload 1000 slow queries concurrency 1 | Eratic status 1K <handler_read_nxt<2M 3730s : 0,26 tps | Flat status 200K<handler_read_nxt<400K 854s : 1,2 tps |
10% memory workload 1000 slow queries concurrency 1 buffer_pool 5G | still waiting | Flat status 150K<handler_read_nxt<280K 1262s : 0,8 tps |
Query time in second | InnoDB | TokuDB |
Q1 | 9,05 | 0,22 |
Q2 | 17,56 | 0,3 |
Q3 | 5,95 | 0,29 |
Q4 | 19 | 0,29 |
Q5 | 0,24 | 0,01 |
Q6 | 9,7 | 0,09 |
Q7 | 2,01 | 0,23 |
InnoDB stay the reference for in memory concurency workload and this was expected as TokuDB is the challenger, but now we can already state that for any IO bound workload it will make your day.
There are many way to fixe those bad queries (de normalisation project) but nothing is more simple then the alter table engine=TokuDB on a slave and don't forget to alter secondary indexes using clustered index syntax .
PlanetMySQL Voting: Vote UP / Vote DOWN