While working through my previous blog post, rows_examined UNION Handler% ... ?!?, I learned about some unexpected behavior of the MEMORY (HEAP) storage engine and how it interacts with Handler_read_rnd_next
. In particular, even a "deleted" row (one that never really made it into the table to begin with) still must be read over when scanning the table, thereby incrementing Handler_read_rnd_next
when one might not expect that to happen. I thought this would be an interesting opportunity to use gdb to really watch what the server was doing when this situation is encountered.
I'm far from being a gdb expert. In fact, I've probably only used it a dozen times or so. Nevertheless, I found it pretty straightforward to get things set up and to understand how to make my way around. (I've probably missed out on some shortcuts and some useful functionality here; hopefully I'm not leading people too far astray!) Fortunately, mysqld
binaries are "not stripped" these days. That means that debugging symbols are included in the binary, which gives gdb the ability to associate the activities of a running program with the lines in the source code used to build it. Verify that this is the case using file
, which should say "not stripped" at the end of its output:
$ file ./bin/mysqld ./bin/mysqld: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
Because we'll need (and want!) to look at the server source code while we do this, we need to download a copy of the source. It is crucial that the source code you have exactly matches the source code used to compile the version of the server you're using. I'm using MariaDB for this testing, so I need to get the version of MariaDB source used to build the version of MariaDB I'm using. I happen to be using MariaDB 5.2.4, so I grabbed mariadb-5.2.4.tar.gz from http://downloads.askmonty.org/MariaDB/5.2.4/.
It's also possible to use bzr to grab a copy of the source, if you're inclined to set that all up, or even to browse the source online on Launchpad. I know that I need revision #2894, because that's the highest revision number in the ChangeLog for MariaDB 5.2.4. You can also see the "Tag" of mariadb-5.2.4 on this revision in the Launchpad revision history.
Once you have obtained the source code that corresponds to the version of MySQL, MariaDB, or Percona server (SkySQL supports all of them!) that you're running, fire up the server, find its PID, and attach gdb. Make sure nobody else is using the server while you're doing this; I'm not taking any steps to narrow the scope of my gdb session to a particular thread. (On Ubuntu starting with 10.10, you either need to run gdb as root or you need to alter the setting of /proc/sys/kernel/yama/ptrace_scope
.)
$ gdb -p $( cat ./data/ub-dev.pid ) GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08 Copyright (C) 2011 Free Software Foundation, Inc. [...snip...] (gdb)
When you attach to a running process with gdb, it stops the process. You can then set breakpoints, print data structures and variables, et cetera. We have a mission, though, and that is to try to look at what happens when the MEMORY engine causes Handler_read_rnd_next
to be incremented. In order to find any of this, it's important to remember that the MEMORY was historically called HEAP, and it retains that name in the server source code. Let's see what we can find out about Handler_read_rnd_next
:
$ pwd /home/kolbe/dev/src/mariadb-5.2.4 $ grep -r Handler_read_rnd_next * | grep -v mysql-test sql/mysqld.cc: {"Handler_read_rnd_next", (char*) offsetof(STATUS_VAR, ha_read_rnd_next_count), SHOW_LONG_STATUS},
I eliminated results that match "mysql-test" because there are lots of tests that look at the value of Handler_read_rnd_next
and those are of no interest to us. We see that Handler_read_rnd_next
only shows up once in the entire source tree, and that it appears to act as a more human-friendly alias for a lower-level status counter called ha_read_rnd_next_count
.
The various storage engines are implemented by way of code that lies in their respective subdirectory of the storage
subdirectory of the source tree. We are interested only in the MEMORY engine here, which is called "heap" internally (and historically). Let's look for ha_read_rnd_next_count
, but confine our search to the storage/heap
directory. I'll use some options to grep
to get some context around any matches.
$ grep -n -A 10 -B 10 ha_read_rnd_next_count storage/heap/* storage/heap/ha_heap.cc-338- return error; storage/heap/ha_heap.cc-339-} storage/heap/ha_heap.cc-340- storage/heap/ha_heap.cc-341-int ha_heap::rnd_init(bool scan) storage/heap/ha_heap.cc-342-{ storage/heap/ha_heap.cc-343- return scan ? heap_scan_init(file) : 0; storage/heap/ha_heap.cc-344-} storage/heap/ha_heap.cc-345- storage/heap/ha_heap.cc-346-int ha_heap::rnd_next(uchar *buf) storage/heap/ha_heap.cc-347-{ storage/heap/ha_heap.cc:348: ha_statistic_increment(&SSV::ha_read_rnd_next_count); storage/heap/ha_heap.cc-349- int error=heap_scan(file, buf); storage/heap/ha_heap.cc-350- table->status=error ? STATUS_NOT_FOUND: 0; storage/heap/ha_heap.cc-351- return error; storage/heap/ha_heap.cc-352-} storage/heap/ha_heap.cc-353- storage/heap/ha_heap.cc-354-int ha_heap::rnd_pos(uchar * buf, uchar *pos) storage/heap/ha_heap.cc-355-{ storage/heap/ha_heap.cc-356- int error; storage/heap/ha_heap.cc-357- HEAP_PTR heap_position; storage/heap/ha_heap.cc-358- ha_statistic_increment(&SSV::ha_read_rnd_count);
We can see that the ha_heap::rnd_next
function is the only place the HEAP engine modifies the ha_read_rnd_next_count
. This function is pretty straightforward. Let's set a breakpoint right after heap_scan
is called, so that we can see what's inside the file
variable after the new row has been read.
(gdb) break ha_heap.cc:350 Breakpoint 1 at 0x7d0458: file ha_heap.cc, line 350.
We'll also cd
inside of gdb to the location of the matching file so that we can look at the source code while we're in the debugger.
(gdb) cd /home/kolbe/dev/src/mariadb-5.2.4/storage/heap Working directory /home/kolbe/dev/src/mariadb-5.2.4/storage/heap.
Alright, back on the MySQL side, we need some sample data to work with. You'll want to be using GNU screen or a separate terminal window or something in order to be able to easily look at both the MySQL client and your gdb session. Before we'll be able to execute any statements in MySQL, we have to tell gdb to let the server continue running.
(gdb) continue Continuing.
Now, some sample data and a SELECT to kick things off.
mariadb 5.2.4> create table h1 (id int) engine=memory; Query OK, 0 rows affected (0.03 sec) mariadb 5.2.4> insert into h1 values (1),(2),(3); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mariadb 5.2.4> select * from h1;
You'll notice that the SELECT statement doesn't return. That's because the server hit the breakpoint we set in gdb!
Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0;
Now, we can take a look around at some data structures to see what is going on. The ha_heap::rnd_next
function had called heap_scan(file, buf)
, so let's see what file
and buf
are.
(gdb) print buf $1 = <optimized out> (gdb) print file $2 = (HP_INFO *) 0x435c4a0
Well, buf
isn't something we'll be able to use, because the compiler decided it wasn't actually going to bring that into scope here. file
, on the other hand, is a pointer to a HP_INFO struct (see include/heap.h
). We can dereference and print it right at the gdb command-line:
(gdb) set print pretty on (gdb) print *file $4 = { s = 0x432a490, current_ptr = 0x438bd30 "\375\001", current_hash_ptr = 0x0, current_record = 0, next_block = 3, lastinx = -1, errkey = -1, mode = 2, opt_flag = 0, update = 50, lastkey = 0x435c768 "\321\b", recbuf = 0x435c768 "\321\b", last_find_flag = HA_READ_KEY_EXACT, parents = {0x0 <repeats 65 times>}, last_pos = 0x0, lastkey_len = 0, implicit_emptied = 0 '\000', lock = { owner = 0x43026d0, next = 0x0, prev = 0x432a5e0, lock = 0x432a590, cond = 0x0, status_param = 0x0, debug_print_param = 0x436ec00, type = TL_READ, priority = 0 }, open_list = { prev = 0x0, next = 0x4373550, data = 0x435c4a0 } }
A couple things look useful right away, current_record
and current_ptr
especially.
s
also looks interesting (and it is!), and we can print it to see what it is, just like we did with file
:
(gdb) print file->s $5 = (HP_SHARE *) 0x432a490 (gdb) print *file->s [...snip...]
It turns out file->s
is a pretty large data structure, so I'm not going to include it here. You can browse the definition of HP_SHARE in include/heap.h
. We won't be looking into it during this exercise.
We saw earlier that the return value of heap_scan
was assigned to error
:
(gdb) print error $9 = 0
You don't have to take my word for it, though; gdb will gladly print the source code that surrounds the current execution context:
(gdb) list 345 346 int ha_heap::rnd_next(uchar *buf) 347 { 348 ha_statistic_increment(&SSV::ha_read_rnd_next_count); 349 int error=heap_scan(file, buf); 350 table->status=error ? STATUS_NOT_FOUND: 0; 351 return error; 352 } 353 354 int ha_heap::rnd_pos(uchar * buf, uchar *pos)
We'll keep an eye on error
going forward. For now, though, let's continue execution of the program.
(gdb) c Continuing. Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0;
It stops again right away. There are several rows in the table, and this function is called for every additional row the server reads. Let's make sure error
is still 0
and let's see if anything has changed in file
since we last looked at it:
(gdb) print error $12 = 0 (gdb) print *file $11 = { s = 0x432a490, current_ptr = 0x438bd40 "\375\002", current_hash_ptr = 0x0, current_record = 1, next_block = 3, lastinx = -1, errkey = -1, mode = 2, opt_flag = 0, update = 50, lastkey = 0x435c768 "\321\b", recbuf = 0x435c768 "\321\b", last_find_flag = HA_READ_KEY_EXACT, parents = {0x0 <repeats 65 times>}, last_pos = 0x0, lastkey_len = 0, implicit_emptied = 0 '\000', lock = { owner = 0x43026d0, next = 0x0, prev = 0x432a5e0, lock = 0x432a590, cond = 0x0, status_param = 0x0, debug_print_param = 0x436ec00, type = TL_READ, priority = 0 }, open_list = { prev = 0x0, next = 0x4373550, data = 0x435c4a0 } }
Two things jump out, current_ptr
is now "\375\002"
(it was "\375\001"
before), and current_record
is 1
(it was 0
before). Let's keep going.
(gdb) c Continuing. Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) print error $13 = 0 (gdb) print *file $14 = { s = 0x432a490, current_ptr = 0x438bd50 "\375\003", current_hash_ptr = 0x0, current_record = 2, next_block = 3, lastinx = -1, errkey = -1, mode = 2, opt_flag = 0, update = 50, lastkey = 0x435c768 "\321\b", recbuf = 0x435c768 "\321\b", last_find_flag = HA_READ_KEY_EXACT, parents = {0x0 <repeats 65 times>}, last_pos = 0x0, lastkey_len = 0, implicit_emptied = 0 '\000', lock = { owner = 0x43026d0, next = 0x0, prev = 0x432a5e0, lock = 0x432a590, cond = 0x0, status_param = 0x0, debug_print_param = 0x436ec00, type = TL_READ, priority = 0 }, open_list = { prev = 0x0, next = 0x4373550, data = 0x435c4a0 } }
OK, same as before. There was no error, current_ptr is now "\375\003"
and current_record
is 2
.
(gdb) c Continuing. Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) print error $15 = 137
Hey, we have an error! What went wrong? Well, nothing. We got to the end of the table. Take a look at include/my_base.h
:
389-#define HA_ERR_OLD_FILE 132 /* old databasfile */ 390-#define HA_ERR_NO_ACTIVE_RECORD 133 /* No record read in update() */ 391-#define HA_ERR_RECORD_DELETED 134 /* A record is not there */ 392-#define HA_ERR_RECORD_FILE_FULL 135 /* No more room in file */ 393-#define HA_ERR_INDEX_FILE_FULL 136 /* No more room in file */ 394:#define HA_ERR_END_OF_FILE 137 /* end in next/prev/first/last */ 395-#define HA_ERR_UNSUPPORTED 138 /* unsupported extension used */ 396-#define HA_ERR_TO_BIG_ROW 139 /* Too big row */ 397-#define HA_WRONG_CREATE_OPTION 140 /* Wrong create option */ 398-#define HA_ERR_FOUND_DUPP_UNIQUE 141 /* Dupplicate unique on write */ 399-#define HA_ERR_UNKNOWN_CHARSET 142 /* Can't open charset */
The next time we let the program continue, we get our result set back on the client.
(gdb) c Continuing.
mariadb 5.2.4> select * from h1; +------+ | id | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set (10 min 29.13 sec)
Our breakpoint was hit 4 times, one for each row in the table and one final time for EOF. But I set out trying to figure out what would happen in the case of a "deleted" row in the MEMORY table. Let's test that out, now.
mariadb 5.2.4> delete from h1 where id=2;
Oops, that triggers our breakpoint! We can disable it like this:
Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) info breakpoints Num Type Disp Enb Address What 1 breakpoint keep y 0x00000000007d0458 in ha_heap::rnd_next(unsigned char*) at ha_heap.cc:350 breakpoint already hit 28 times (gdb) disable breakpoint 1 (gdb) c Continuing.
Back in the client, you should see that the row was deleted successfully:
mariadb 5.2.4> delete from h1 where id=2; Query OK, 1 row affected (48.27 sec)
We want to SELECT from the table again, but our breakpoint is disabled. Instead of disabling it, we could have just kept hitting continue
to keep gdb working through the file, or even done continue 4
to get all the way through it, since we know that we hit that breakpoint 4 times scanning through our 3-row table.
We've lost our gdb command line now, though, and we won't get it back until a breakpoint is hit or we manually interrupt the program we're debugging. (The normal way to do that is to hit Ctrl-C, but for whatever reason that doesn't seem to work most of the time when debugging mysqld.) Another technique is to send mysqld the TRAP signal (you may need to do it more than once!). Once we have the (gdb)
command prompt back, we'll re-enable our breakpoint.
$ kill -TRAP $( cat ./data/ub-dev.pid )
Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 0x7f8e7a7c6740 (LWP 26113)] 0x00007f8e78ec4913 in select () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) enable breakpoint 1 (gdb) info breakpoints Num Type Disp Enb Address What 1 breakpoint keep y 0x00000000007d0458 in ha_heap::rnd_next(unsigned char*) at ha_heap.cc:350 breakpoint already hit 28 times (gdb) c Continuing.
Now, let's SELECT from the table again and look at some of the data that was interesting the first time through.
mariadb 5.2.4> flush status; Query OK, 0 rows affected (0.00 sec) mariadb 5.2.4> select * from h1;
Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) print error $16 = 0 (gdb) print file->current_ptr $17 = (uchar *) 0x438bd30 "\375\001" (gdb) print file->current_record $18 = 0 (gdb) c Continuing. Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) print error $19 = 134 (gdb) print file->current_ptr $20 = (uchar *) 0x438bd40 "" (gdb) print file->current_record $21 = 1
The first row came through as we expected, but the second resulted in an error. The snippet of include/ha_base.h
I included above shows that 134 corresponds to HA_ERR_RECORD_DELETED. There it is, our deleted record (id=2
), just where we would've expected to find it.
(gdb) c Continuing. Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) print error $22 = 0 (gdb) print file->current_ptr $23 = (uchar *) 0x438bd50 "\375\003" (gdb) print file->current_record $24 = 2 (gdb) c Continuing. Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350 350 table->status=error ? STATUS_NOT_FOUND: 0; (gdb) print error $25 = 137 (gdb) print file->current_record $27 = 3 (gdb) c Continuing.
We can see that there are still 4 calls to ha_heap::rnd_next
(each of which increments Handler_read_rnd_next
), even though only 2 rows are returned to the client.
mariadb 5.2.4> select * from h1; +------+ | id | +------+ | 1 | | 3 | +------+ 2 rows in set (4 min 4.66 sec) mariadb 5.2.4> show status like 'Handler_read_rnd_next%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_rnd_next | 4 | +-----------------------+-------+ 1 row in set (0.00 sec)
SHOW STATUS hits our breakpoint, too, which means SHOW STATUS is creating a temporary MEMORY table! Again, you'll either need to continue through the breakpoint or disable it.
So, there it is. We just used gdb to look at the activities of a running MariaDB server (though as I said, this will apply equally to MySQL and Percona servers) as it retrieved rows from a (non-indexed) MEMORY table. I was actually surprised how straightforward it was to do this. I look forward to spending more time tracing things like this to gain a better understanding of the server source, and I hope that this might encourage others to do the same!
PlanetMySQL Voting: Vote UP / Vote DOWN