In our recent release of Percona Server 5.5.19 we introduced new value for innodb_flush_neighbor_pages=cont
.
This way we are trying to deal with the problem of InnoDB flushing.
Actually there is also the second fix to what we think is bug in InnoDB, where it blocks queries while it is not needed (I will refer to it as “sync fix”). In this post I however will focus on innodb_flush_neighbor_pages.
By default InnoDB flushes so named neighbor pages, which really are not neighbors.
Say we want to flush page P. InnoDB is looking in an area of 128 pages around page P, and flushes all the pages in that area that are dirty. To illustrate, say we have an area of memory like this: ...D...D...D....P....D....D...D....D
where each dot is a page that does not need flushing, each “D” is a dirty page that InnoDB will flush, and P is our page.
So, as the result of how it works, instead of performing 1 random write, InnoDB will perform 8 random writes.
This is quite far from original intention to flush as many pages as possible in singe sequential write.
So we added new innodb_flush_neighbor_pages=cont
method, with it, only really sequential write will be performed
That is case ...D...D...D..DDDPD....D....D...D....D
only following pages will be flushed:
...D...D...D..FFFFF....D....D...D....D
(marked as “F”)
Beside “cont”, in Percona Server 5.5.19 innodb_flush_neighbor_pages
also accepts values “area” (default) and “none” (recommended for SSD).
What kind of effect does it have ? Let’s run some benchmarks.
We repeated the same benchmark I ran in Disaster MySQL 5.5 flushing, but now we used two servers: Cisco UCS C250 and HP ProLiant DL380 G6
First results from HP ProLiant.
Response time graph (axe y has logarithmic scale):
As you see with “cont” we are able to get stable line. And even with default innodb_flush_neighbor_pages, Percona Server has smaller dips than MySQL.
So this is to show effect of “sync fix”, let’s compare Percona Server 5.5.18 (without fix) and 5.5.19 (with fix).
You see that the fix helps to have queries running in cases when before it was “hard” stop, and no
transaction processed.
The previous result may give you impression that “cont” guarantees stable line, but unfortunately this is not always the case.
There are results ( throughput and response time) from Cisco UCS 250 server:
You see, on this server we have longer and deeper periods when MySQL stuck in flushing, and in such cases, the
innodb_flush_neighbor_pages=cont
only helps to relief the problem, not completely solving it.
Which, I believe, is still better than complete stop for significant amount of time.
The raw results, scripts and different CPU/IO metrics are available from our Benchmarks Launchpad
PlanetMySQL Voting: Vote UP / Vote DOWN