Last week I was faced to a strange problem while benchmarking MySQL 5.6:
some of my user sessions become "frozen" periodically and finally my
whole test was not valid..
More in-depth analyze shown that:
- sessions are "frozen" when executing a SELECT statement..
- sessions are not really frozen, but waiting on read!..
- the read seems to be a kind of long read, involving many I/O operations..
- all makes think that query is execution a full table scan (200M rows, 40GB table) and ignoring indexes, so instead of 1ms I have >5min response time..
But the problem is completely "random" - it's not happening all of the
time and not in all users sessions. And every time I'm executing EXPLAIN
for any given "wrong" query - it's always giving me a right execution
plan.. So, seems that something is going wrong on the time when queries
are executed concurrently, and not when I'm trying to analyze them
alone. Which means that I have to catch the execution state exactly on
the time when the execution plan is wrong!..
How to do it?.. -
since MySQL 5.6 there was an Optimizer Tracing feature introduced which
is helping a lot to understand the reason of choices made by optimizer.
There is a good short HOWTO
available on the Forge MySQL. While regarding my stress code, I have to
add the following queries after CONNECT for my user sessions:
SET OPTIMIZER_TRACE="enabled=on,end_marker=on"; SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000;
And then in case if in the user session the last executed query took more than 60sec, user then executed the following query to save the last query optimizer trace into a file:
SELECT TRACE INTO DUMPFILE '/tmp/trace.$(ID)-$(REF)' FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
As it's much more simple to read this trace from a file rather SELECT
output ;-))
So far, all my 32 concurrent users were now running
with Optimizer Trace enabled (currently the trace may be enabled only
from the session itself, there is no any global setting available for
security reasons). And very quickly I was able to get trace files for my
problematic cases! What is great that the trace is containing the whole
SQL query in question, so it was very easy then to replay the same
SELECT query and obtain the optimizer trace when this query is executed
with a right plan. Then simply compare two traces, easy no? ;-)
Skiping
all details, the source of problem is shown by the following lines:
right
execution plan:
... "table_scan": { "rows": 197451683, "cost": 4.24e7 ...
wrong execution plan:
... "table_scan": { "rows": 1, "cost": 2.3 ...
So, for some reasons MySQL Optimizer is getting an information from InnoDB that my table containing only one single row instead of 200M rows.. Seems I was very lucky until now to never meet such a problem, but looks like the issue is real, and may happen time to time in InnoDB databases during table statistics updates. To avoid such kind of problems, MySQL 5.6 introduced InnoDB persistent statistics - once enabled, the statistics will be updated only on the next time of ANALYZE TABLE execution! So, if your table is changed frequently and widely, you have to plan regular ANALYZE of all such tables (via cron or MySQL Event Scheduler, depending what is a more simple for you).
And YES! by adding in my conf file:
# innodb pers.stats
innodb_analyze_is_persistent=1
all my problems were gone! no more broken execution plan anymore!..
The very positive from my observations was:
- less than 10% overall performance degradation once Optimizer Tracing was enabled on all(!) 32 concurrent sessions
- near no degradation at all when InnoDB persistent statistics were enabled
- and finally I was able quickly find the source of my problem! ;-))
But my "problematic" workload was more I/O-centric rather CPU-bound (even CPU was used near 100%)... So, what will be an overhead in the "most worse" case when the workload is more CPU-bound and queries response time is less than 1ms?.. ;-)
Let's get a look on the following graphs representing 3 test cases:
- default configuration as before
- persistent statistics are enabled on InnoDB
- optimizer tracing is enabled within all user sessions
Read-Only Test
Image may be NSFW.
Clik here to view.

Observations :
- default: 35700 TPS
- with persistent stats: 35600 TPS
- with enabled optimizer tracing: 28500 TPS => 20% degradation..
Read+Write Test
Image may be NSFW.
Clik here to view.

Observations :
- default: 48870 TPS
- with persistent stats: 48860 TPS
- with enabled optimizer tracing: 41100 TPS => 16% degradation..
So, keeping in mind these degradation levels, I'm pretty ready today to
add ahead in my code of all perf tools an option to enable Optimizer
Tracing on demand! - such a feature added in to any application using
MySQL may save you days of debugging! Then, even 20% of performance
degradation is nothing comparing to difference in bad response time..
And if I'm tracing only one user session, the global performance
degradation will be less than 1% ;-) while 20% difference on a web
application (for ex.) you may even not see, as the network latency
sometimes may give you even more surprises ;-))
And seems that
Persistent Statistics in InnoDB is the must! It just needs to have a
solution for the most straight forward adoption on the user land..
BTW,
did you try already these features?.. You should ;-)
Any comments and feedbacks are very welcome!..
Rgds,
-Dimitri
Other resources to read:
- MySQL Optimizer tracing: Query Execution Plan descriptions beyond EXPLAIN
- MySQL Optimizer tracing: how to configure it
PlanetMySQL Voting: Vote UP / Vote DOWN