Quantcast
Viewing all articles
Browse latest Browse all 18770

The benefit of keeping the InnoDB transaction log in cache

I was getting inconsistent performance results while running sysbench to generate a workload of point-updates and point-lookups. The rate of rows updated per second would vary between 200 and 600 and the variance appeared to be random. From PMP there was a lot of contention on the transaction log system mutex. It took me about one day to guess at the root cause -- the InnoDB transaction log was not in the OS buffer cache and 512-byte aligned log writes frequently required disk reads to get a 4kb aligned page into the cache to apply the write. The problem is avoided when either the transaction log remains in the OS buffer cache or you use the Percona patch that adds all_o_direct as an option for innodb_flush_method.

 

The problem was harder to debug than it should be because InnoDB didn't report log write latency via a separate metric. All synchronous writes, log and doublewrite buffer, were reported via the "Sync writes" line and that combines large/slow writes to the doublewrite buffer with small/fast writes to the transaction log:

Sync writes: 4836209719 requests, 0 old, 5009.82 bytes/r, svc: 498087.30 secs, 0.10 msecs/r

 

I have a diff out to fix that:

Log writes: 2328355 requests, 0 old, 955.39 bytes/r, svc: 14.44 secs, 0.01 msecs/r

Doublewrite buffer writes: 19718 requests, 12 old, 946736.45 bytes/r, svc: 41.29 secs, 2.09 msecs/r

 

I suspect this is even harder to debug in official MySQL which doesn't have any of the metrics above in SHOW INNODB STATUS output or SHOW STATUS counters. Perhaps the performance schema makes this easier to debug but I don't know much about that feature.

 

I then reproduced the problem by starting the benchmark with the transaction log in the OS buffer cache and then running echo 1 > /proc/sys/vm/drop_caches to remove it from cache. The results below show the impact on both the rate of rows read and updated per second. The test used 128 client threads doing point-lookups and 32 client threads doing point-updates. The database was 240G on disk and the InnoDB buffer cache was 30G. The rates drop significantly when the cache is dropped.

 

Image may be NSFW.
Clik here to view.

 

The results are great prior to removing the transaction log from cache. On a server with 8 10k RPM SAS disks I was able to get 2800 point-lookups and 500 point-updates per second. Using 16kb InnoDB pages the server sustained 2500 page reads/second from disk and 500 page writes/second to disk.

 

Image may be NSFW.
Clik here to view.

 

The final graph uses logscale for the y-axis to plot the rate of rows updated/second and the average latency for a log write in microseconds. The update rate drops when the log write latency spikes to more than 10ms per write. It was less than 10us prior to that.

 

Image may be NSFW.
Clik here to view.


PlanetMySQL Voting: Vote UP / Vote DOWN

Viewing all articles
Browse latest Browse all 18770

Trending Articles