Quantcast
Channel: Planet MySQL
Viewing all articles
Browse latest Browse all 18766

Loading Tables with TokuDB 4.0

$
0
0

Often, the first step in evaluating and deploying a database is to load an existing dataset into the database. In the latest version, TokuDB makes use of multi-core parallelism to speed up loading (and new index creation). Using the loader, MySQL tables using TokuDB load 5x-8x faster than with previous versions of TokuDB.

Measuring Load Performance

We generated several different datasets to measure the performance of TokuDB when doing a LOAD DATA INFILE … command. To characterize performance, we vary

  • rows to load
  • keys per row
  • row length (including keys)

All generated keys, including the primary, are random, 8-byte values. The remaining data, needed to pad out the row length to specified length, is text.

Two files files are produced as part of data generation.

  1. data file, containing ‘|’ separated fields
  2. sql file, containing the CREATE TABLE command corresponding to the generated data

For instance, if the number of keys is 3 and the row length is 256 bytes, the following SQL statement is produced:

     CREATE TABLE load_table (\
         val0 BIGINT UNSIGNED NOT NULL,\
         val1 BIGINT UNSIGNED NOT NULL,\
         val2 BIGINT UNSIGNED NOT NULL,\
         pad VARCHAR(232) NOT NULL,\
         PRIMARY KEY (val0),\
         KEY valkey1 (val1),\
         KEY valkey2 (val2)\
         ) ENGINE=tokudb

We can make the data generation program available if anyone is interested.

Load Test

A simple shell script

  • creates the test table
  • performs a LOAD DATA INFILE <datafile> INTO TABLE load_table FIELDS TERMINATED BY ‘|’
  • returns execution time

For the experiments to be meaningful, we created datasets that do not fit in memory.

Results

We ran our benchmark on an Amazon Web Services c1.large node with 8 cores and 7 GB of memory. The test loads 100M rows (NOT pre-sorted). The data file was on a 2 disk RAID-0, the MySQL DB files on a different 2 disk RAID-0.

TokuDB Version 3 (~single-threaded) v. TokuDB Version 4 (multi-threaded)

Keys Row Len v3 rows/s v4 rows/s Speedup
1 64 27K 142K 5.1
4 64 13K 82K 6.2
1 256 7K 54K 7.2
4 256 5K 43K 8.2

Other metrics

Several metrics can be used to measure performance:

  • rows per second : data insert rate
  • key-value pairs per second : indicates how fast the primary and secondary indexes are being created
  • MB/s : how much raw data is being added to the database

Metrics for TokuDB v4:

Keys Row Len Rows/sec KV-pairs/sec MB/sec
1 64 142K 142K 9.1
4 64 82K 330K 5.3
1 256 54K 54K 13.9
4 256 43K 173K 11.1

These results show

  1. significant parallelization (we believe larger CPU core count machines will see even larger benefits)
  2. a significant jump in absolute load performance
  3. speed-ups are not limited to tables with many keys – even the 1 key tables are 5-7x faster

We will report further results, especially speedups on larger CPU count machines, as they become available.


PlanetMySQL Voting: Vote UP / Vote DOWN

Viewing all articles
Browse latest Browse all 18766

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>