It is no secret that bugs related to multithreading–deadlocks, data races, starvations etc–have a big impact on application’s stability and are at the same time hard to find due to their nondeterministic nature. Any tool that makes finding such bugs easier, preferably before anybody is aware of their existence, is very welcome.
Enter the Helgrind tool of the Valgrind dynamic analyzer suite. Now, Valgrind does not need much introduction, especially to the MySQL server developers. The Valgrind Memcheck tool, which is synonymous with Valgrind itself, is relatively widely used for the MySQL server development: there is support for it in the MySQL Test Framework, DBQP, and the server source is properly annotated to get the most out of Memcheck. Not everything is perfect though: the manual is only partially correct in how to properly build the server to use it (bug 61587). We at Percona regularly use Memcheck for our development and upstream testing with useful results (bug 61986, bug 61985, …).
Helgrind does not enjoy the same level of support and I think it’s a shame. What I have to found to be useful in the daily work is to have the ability to run a selected MTR testcase under Helgrind. (For the time being, running the whole MySQL testsuite is not very practical due to 100x slowdown imposed by Helgrind). For this purpose, I have patched mysql-test-run.pl to accept a new option: –helgrind. The patch is a copy-paste-modify of the already existing Callgrind support and is not going to win any Cleanest Feature of the Year awards, but hey, this is MTR we are talking about. For a proper implementation for our purposes, Patrick has added preliminary Helgrind support to DBQP.
For 5.1:
For 5.5:
Now let’s see what kind of goodies does Helgrind find on MySQL server. Let’s take a recent 5.1 bzr version (pre-5.1.61) and a single test, innodb_plugin.innodb_bug53674. The test is chosen for no particular reason except that it is a quick-running one.
==9090== Possible data race during read of size 8 at 0xff27758 by thread #4 ==9090== at 0x6B67C51: log_io_complete (sync0sync.ic:150) ==9090== by 0x6B34515: fil_aio_wait (fil0fil.c:4512) ==9090== by 0x6BC58E0: io_handler_thread (srv0start.c:474) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== This conflicts with a previous write of size 8 by thread #1 ==9090== at 0x6BC9FFF: mutex_spin_wait (sync0sync.c:441) ==9090== by 0x6B751FC: mtr_commit (sync0sync.ic:221) ==9090== by 0x6B39223: fsp_fill_free_list (fsp0fsp.c:1455) ==9090== by 0x6B3DED8: fsp_header_init (fsp0fsp.c:1010) ==9090== by 0x6BC72CE: innobase_start_or_create_for_mysql (srv0start.c:1514) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== Address 0xff27758 is 216 bytes inside a block of size 664 alloc'd ==9090== at 0x4C28FDF: malloc (vg_replace_malloc.c:236) ==9090== by 0x6B70A62: mem_heap_create_block (mem0mem.c:333) ==9090== by 0x6B66F3B: log_init (mem0mem.ic:443) ==9090== by 0x6BC63B0: innobase_start_or_create_for_mysql (srv0start.c:1339) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504)
The race here is between the reads and writes of mutex_t::waiters in mutex_get_waiters and mutex_set_waiters. Interestingly, there are comments next to it that say /* Here we assume that the [read|write] of a single word from memory is atomic */. While it is true that machine native word stores/loads are technically atomic, i.e. you cannot observe any intermediate state, it is likely that if other CPU cores will read this same variable, they will get an old value for quite some time after the store. Now of course it is extremely unfair to say about InnoDB developers that they do not understand parallelism. They do. The stacktraces above are incomplete due to inlining, but it’s not hard to find out that mutex_get_waiters is called from mutex_exit and there we find a comment:
/* A problem: we assume that mutex_reset_lock word is a memory barrier, that is when we read the waiters field next, the read must be serialized in memory after the reset. A speculative processor might perform the read first, which could leave a waiting thread hanging indefinitely.
Our current solution call every second sync_arr_wake_threads_if_sema_free() to wake up possible hanging threads if they are missed in mutex_signal_object. */
So, this race is accounted for. Let’s go on.
==9090== Possible data race during write of size 8 at 0x6e29da8 by thread #15 ==9090== at 0x6BC3EFD: srv_monitor_thread (srv0srv.c:1994) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== This conflicts with a previous write of size 8 by thread #14 ==9090== at 0x6BC4594: srv_error_monitor_thread (srv0srv.c:1671) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== Location 0x6e29da8 is 0 bytes inside global var "srv_last_monitor_time" ==9090== declared at srv0srv.c:424
This is a race between srv_monitor_thread and srv_error_monitor_thread InnoDB threads storing to srv_last_monitor_time. It’s hard for me to think of any harm that may happen here due to the way this variable is used. Moving on.
==9090== Possible data race during write of size 8 at 0x7386718 by thread #1 ==9090== at 0x6B08EA5: buf_page_get_gen (buf0buf.c:1597) ==9090== by 0x6BD7CCF: trx_sys_create_doublewrite_buf (trx0sys.c:265) ==9090== by 0x6BC76AD: innobase_start_or_create_for_mysql (srv0start.c:1692) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504) ==9090== This conflicts with a previous read of size 8 by thread #14 ==9090== at 0x6B0CAF2: buf_refresh_io_stats (buf0buf.c:3529) ==9090== by 0x6BC45D1: srv_error_monitor_thread (srv0srv.c:1680) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== Address 0x7386718 is 184 bytes inside a block of size 632 alloc'd ==9090== at 0x4C28FDF: malloc (vg_replace_malloc.c:236) ==9090== by 0x6B70A62: mem_heap_create_block (mem0mem.c:333) ==9090== by 0x6B07D3B: buf_pool_init (mem0mem.ic:443) ==9090== by 0x6BC6379: innobase_start_or_create_for_mysql (srv0start.c:1310) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504)
This one shows that accesses to the fields of buf_pool->stat are unprotected. These are buffer pool statistics counters: number of read, written, evicted, etc. pages. The more interesting race is not the one show but rather between different threads bumping the counters at the same time. As a result, some of the stores might be lost, with the counter values slightly smaller than they should be in the end.
==9090== Possible data race during write of size 8 at 0xff27738 by thread #1 ==9090== at 0x6B66B85: log_write_low (log0log.c:330) ==9090== by 0x6B75144: mtr_commit (mtr0mtr.c:153) ==9090== by 0x6BD7F87: trx_sys_create_doublewrite_buf (trx0sys.c:396) ==9090== by 0x6BC76AD: innobase_start_or_create_for_mysql (srv0start.c:1692) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504) ==9090== This conflicts with a previous read of size 8 by thread #14 ==9090== at 0x6BC44E0: srv_error_monitor_thread (log0log.ic:407) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== Address 0xff27738 is 184 bytes inside a block of size 664 alloc'd ==9090== at 0x4C28FDF: malloc (vg_replace_malloc.c:236) ==9090== by 0x6B70A62: mem_heap_create_block (mem0mem.c:333) ==9090== by 0x6B66F3B: log_init (mem0mem.ic:443) ==9090== by 0x6BC63B0: innobase_start_or_create_for_mysql (srv0start.c:1339) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504)
This one seems to suggest an unprotected access to the current LSN (log_sys->lsn), with a small wrinkle: the accesses are actually protected. log_write_low asserts that it holds the log system mutex, and log_get_lsn acquires it..
==9090== Possible data race during write of size 8 at 0x6e26200 by thread #1 ==9090== at 0x6B77778: os_file_write (os0file.c:2172) ==9090== by 0x6B35B5E: fil_io (fil0fil.c:4432) ==9090== by 0x6B67EDA: log_group_write_buf (log0log.c:1290) ==9090== by 0x6B68496: log_write_up_to.part.12 (log0log.c:1472) ==9090== by 0x6B0E0E9: buf_flush_write_block_low (buf0flu.c:999) ==9090== by 0x6B0EEC1: buf_flush_batch (buf0flu.c:1231) ==9090== by 0x6B69121: log_make_checkpoint_at (log0log.c:1640) ==9090== by 0x6BD7F98: trx_sys_create_doublewrite_buf (trx0sys.c:399) ==9090== by 0x6BC76AD: innobase_start_or_create_for_mysql (srv0start.c:1692) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== This conflicts with a previous read of size 8 by thread #14 ==9090== at 0x6B79F6F: os_aio_refresh_stats (os0file.c:4408) ==9090== by 0x6BC459F: srv_error_monitor_thread (srv0srv.c:1673) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== Location 0x6e26200 is 0 bytes inside global var "os_n_file_writes" ==9090== declared at os0file.c:190
Just like with buffer pool stats, the I/O stats might be slightly smaller than they should be.
==9090== Possible data race during write of size 8 at 0x5e06270 by thread #1 ==9090== at 0x6BC83EA: sync_array_reserve_cell (sync0arr.c:366) ==9090== by 0x6BC917F: rw_lock_s_lock_spin (sync0rw.c:415) ==9090== by 0x6B6905A: log_checkpoint (sync0rw.ic:419) ==9090== by 0x6B69141: log_make_checkpoint_at (log0log.c:2059) ==9090== by 0x6BD7F98: trx_sys_create_doublewrite_buf (trx0sys.c:399) ==9090== by 0x6BC76AD: innobase_start_or_create_for_mysql (srv0start.c:1692) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504) ==9090== This conflicts with a previous read of size 8 by thread #14 ==9090== at 0x6BC8B23: sync_array_print_long_waits (sync0arr.c:937) ==9090== by 0x6BC4551: srv_error_monitor_thread (srv0srv.c:2297) ==9090== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221) ==9090== by 0x4E37EFB: start_thread (pthread_create.c:304) ==9090== by 0x5A0789C: clone (clone.S:112) ==9090== Address 0x5e06270 is 0 bytes inside a block of size 800000 alloc'd ==9090== at 0x4C28FDF: malloc (vg_replace_malloc.c:236) ==9090== by 0x6BE309B: ut_malloc (ut0mem.c:106) ==9090== by 0x6BC8074: sync_array_create (sync0arr.c:240) ==9090== by 0x6BCA0F0: sync_init (sync0sync.c:1401) ==9090== by 0x6BC2E04: srv_boot (srv0srv.c:1043) ==9090== by 0x6BC60FE: innobase_start_or_create_for_mysql (srv0start.c:1220) ==9090== by 0x6B48855: innobase_init(void*) (ha_innodb.cc:2284) ==9090== by 0x712F17: ha_initialize_handlerton(st_plugin_int*) (handler.cc:435) ==9090== by 0x7A212A: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1048) ==9090== by 0x7A5166: plugin_init(int*, char**, int) (sql_plugin.cc:1275) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504)
The RW lock spin stats join the company of the buffer pool and I/O stats.
==9090== Thread #18: lock order "0xF14460 before 0xF141E0" violated ==9090== at 0x4C2A1CB: pthread_mutex_lock (hg_intercepts.c:496) ==9090== by 0x74A32E: show_status_array(THD*, char const*, st_mysql_show_var*, enum_var_type, system_status_var*, char const*, st_table*, bool, Item*) (sql_show.cc:2266) ==9090== by 0x74A894: fill_variables(THD*, TABLE_LIST*, Item*) (sql_show.cc:5522) ==9090== by 0x74E4DF: get_schema_tables_result(JOIN*, enum_schema_table_state) (sql_show.cc:6238) ==9090== by 0x6A5E0C: JOIN::exec() (sql_select.cc:1863) ==9090== by 0x6A7D72: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:2553) ==9090== by 0x6A87FC: handle_select(THD*, st_lex*, select_result*, unsigned long) (sql_select.cc:269) ==9090== by 0x63A3E3: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:5179) ==9090== by 0x6435EA: mysql_execute_command(THD*) (sql_parse.cc:2309) ==9090== by 0x782C6D: sp_instr_stmt::exec_core(THD*, unsigned int*) (sp_head.cc:2970) ==9090== by 0x788560: sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*) (sp_head.cc:2791) ==9090== by 0x788973: sp_instr_stmt::execute(THD*, unsigned int*) (sp_head.cc:2913) ==9090== Required order was established by acquisition of lock at 0xF14460 ==9090== at 0x4C2A1CB: pthread_mutex_lock (hg_intercepts.c:496) ==9090== by 0x61F605: THD::init() (sql_class.cc:841) ==9090== by 0x6204D0: THD::THD() (sql_class.cc:707) ==9090== by 0x7A503E: plugin_init(int*, char**, int) (sql_plugin.cc:1395) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504) ==9090== followed by a later acquisition of lock at 0xF141E0 ==9090== at 0x4C2AFD5: pthread_rwlock_rdlock (hg_intercepts.c:1447) ==9090== by 0x7A2EDA: cleanup_variables(THD*, system_variables*) (sql_plugin.cc:2566) ==9090== by 0x7A6667: plugin_thdvar_init(THD*) (sql_plugin.cc:2523) ==9090== by 0x61F60D: THD::init() (sql_class.cc:842) ==9090== by 0x6204D0: THD::THD() (sql_class.cc:707) ==9090== by 0x7A503E: plugin_init(int*, char**, int) (sql_plugin.cc:1395) ==9090== by 0x632436: init_server_components() (mysqld.cc:4035) ==9090== by 0x561388: main (mysqld.cc:4504)
All previous errors were data races, and here we see a new kind of error: lock order violation, thus a potential deadlock. The locks in question are LOCK_global_system_variables and LOCK_system_variables_hash. Deadlock is possible when one connection issues SHOW VARIABLES or equivalent while another connection is just starting up (i.e. its THD is being initialized). I have reported it as bug 63203. Interestingly Valeriy Kravchuk has confirmed it by finding another (related) lock order violation: between LOCK_system_variables_hash and LOCK_plugin.
I have only shown and analyzed a small part of all Helgrind-reported data races. The remaining ones either show the same issues, show similar issues (other InnoDB stat counters), show non-issues or my analysis errors (like the log_sys->lsn “race” above) or I simply haven’t analyzed them (yes, there is remains a lot of the last kind). What would I do about all this? If I had my way, I’d use atomic access primitives in InnoDB for the benign and minor cases too–if only to declare an intent and bullet-proof the code against future changes that might make the issues not-so-minor. Additionally, I’d consider backporting and using MySQL 5.5 atomic operation primitives with proper memory barriers, so that there is no need for workarounds in mutex_get_waiters/mutex_set_waiters above and similar cases. And then Helgrind would be even more useful for daily development and automated testing with a clean log by default Image may be NSFW.
Clik here to view.
PlanetMySQL Voting: Vote UP / Vote DOWN