Loading Air Traffic Control Data with TokuDB 4.1.1

TokuDB has a big advantage over B-trees when trickle loading data into existing tables. However, it is possible to preprocess the data when bulk loading into empty tables or when new indexes are created. TokuDB release 4 now uses a parallel algorithm to speed up these types of bulk insertions. How does the parallel loader performance compare with the serial loader? We use the Air Traffic Control (ATC) data and queries described in a Percona blog and also used in an experiment with TokuDB 2.1.0 to gain some insight.

Our ATC data is about 122M rows in size, is stored in a 40GiB CSV file, and can be found in our Amazon S3 public bucket. See the end of this blog for details. We use a table schema with 8 indices to speed up the ATC queries. The loader inserts the data into the primary fractal tree and one fractal tree for each of the 8 keys.

The load was run on two (old) machines:

2 core: Intel Core 2 Duo E8500 3.16GHz, 5GiB RAM, 1 disk SATA 1TB.
8 core: 2 socket quad core Xeon X5460 3.16GHz, 16GiB RAM, 6 disk SAS 1TB RAID0.

Load times:

Load Times for the ATC Database
	TokuDB 2.1.0 and MySQL 5.1.36	TokuDB 4.1.1 and MySQL 5.1.46	TokuDB Speedup
2 core	10,974s	5,201s	2.1x
8 core	11,286s	2,655s	4.2x

We use “LOAD DATA INFILE” to load the CSV file into a MySQL table. The CSV file can be found in our Amazon S3 public bucket.
We see over 4x load time speed up for this database. This compares favorably with the other loaders. The next steps are to run experiments on machines with a larger number of cores, identify bottlenecks, and remove them.
BTW, the ATC load times into MyISAM are roughly the same as seen with the TokuDB 2.1.0 serial loader. Perhaps loads into MyISAM can be made faster with some twiddling of the MySQL system variables.

TokuDB data sizes (including indices):

6.7 GiB, or almost 6 times smaller than the CSV source file. The parallel loader uses the same compression parameters as the serial loader, so no change in size occurred.

Query times:

Query Times for the ATC Database
	TokuDB 2.1.0	TokuDB 4.1.1
Q0	22s	28s
Q1	67s	33s
Q2	20s	24s
Q3	54s	29s
Q4	2s	1s
Q5	11s	7s
Q6	35s	29s
Q7	32s	39s
Q8.1y	7s	4s
Q8.2y	42s	32s
Q8.3y	37s	33s
Q8.4y	38s	35s
Q8.10y	139s	56s
Q9	36s	35s

TokuDB 4 uses the standard MySQL handler row iterator, so the query times are roughly unchanged. In addition, the query handling is not yet parallelized, so there is not much difference in query times on the 2 core and 8 core machines.
BTW, the query times for MyISAM are slightly longer than TokuDB.
The queries can be found in our Amazon S3 public bucket.

Future blogs:

Loader scalability on systems with larger number of cores. Machines with 48 cores (4 socket, 12 cores per socket) are now reasonably priced.
How we implemented the parallel loader: algorithms and parallel runtime.

The ATC CSV data files, the schema, and the queries can be retrieved from our public Amazon S3 bucket called
tokutek-pub. Here are the Amazon S3 keys:

atc_On_Time_Performance.mysql.csv.gz.aa
atc_On_Time_Performance.mysql.csv.gz.ab
atc_On_Time_Performance.mysql.csv.gz.ac
atc_On_Time_Performance.mysql.csv.gz.ad
atc_On_Time_Performance.mysql.csv.gz.xml
atc_ontime_create_covered.sql
atc_ontime_create_covered.sql.xml
atc_ontime_queries.tar.gz
atc_ontime_queries.tar.gz.xml
s3get
s3get.xml

PlanetMySQL Voting: Vote UP / Vote DOWN

Loading Air Traffic Control Data with TokuDB 4.1.1

Trending Articles

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Who Is Sisanda Jonas? | Biography| Profile| History Of South African Media...

God of war 3 PPSSPP Download For Android 1.3 GB

The 10 Tennessee Cities With The Largest Black Population For 2021

Black Angus Grilled Artichokes

Breaking Down Bumpy’s Boys: NYC Black Mob Boss Of Old Surrounded Himself With...

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Sarangapur Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers List...

Thomas Grundy – Bradwell

Practice Sheet of Right form of verbs for HSC Students

Suspected burglar to know fate in January

[MP3] Texzy Ft Dr. Ritzy –“Leg Over” (Prod. @DrRitzy & @KezzyKlef)

18A St. Fintan's Villas, Deansgrange, Co. Dublin - €365,000

Walkthrough Pokemon Victory Fire Complete | English Language

99 God Status for Whatsapp, Facebook

Attharintiki Daaredhi: Bappu Gari Bommo Lyrics Translation

Not much punishment for substantial benefit fraud

Cattivissimo.Me.3.2017.iTALiAN.MD.WEBDL.XviD-iSTANCE Seed (318)/Leech (148)