Splitting a MySQL Dump File Into Smaller Files Via Perl

I was trolling the MySQL forums web site and noticed a post regarding someone who was trying to load a 50-gigabyte MySQL dump file. The author of the post had stated that the loading of the file had been running for days, and was wondering if there was a better way to import the file. They did not have access to anything else (i.e. – original database) but this file.

I have had to restore several databases in the past from a single large MySQL dump file – which led me to start backing up each database individually. These databases are for my own personal use and are not updated that often, so I don’t need to have point-in-time recovery – and so a MySQL dump works just fine. If I had a production system, I would invest in the MySQL Enterprise Backup and the MySQL Enterprise Monitor applications.

But, I have had problems when I tried to restore a relatively small MySQL instance of 21 databases with a total file size of less than 900 megabytes. So, I wrote a Perl script to simply split the large MySQL dump file into smaller files, which I could then load one at a time to isolate the problem. I am sure that there are many other ways of troubleshooting this person’s problem, but I thought I would dig out this script and post it here. This script is also handy for splitting any text file that you have into smaller files. All you would have to do is to change the keywords that causes a new file to be created.

The script opens the MySQL dump file, and starts outputting each line to a new file, looking for the keywords “– Current Database: “. When the script sees those keywords, it closes the last file and opens a new one, and then prints that line with the keywords out to the new file until the keyword appears again. The script names each new file with a sequential number – as in database_01.sql, database_02.sql, etc. The first file that this script creates (named database_00.sql) contains the header information from the MySQL dump file, and this information is included in each of the subsequent database files (thanks to Nick Miles for pointing this out – see comments).

I tested this script on an 850-megabyte MySQL dump file, and it worked perfectly (and executed in less than one minute). To make sure that your new files contain the same amount of data as the original file, you can use the Unix command “wc” to show you the size of each file, and then you can compare the sum of these sizes to the size of your MySQL dump file. I first ran the script, and it produced the following 22 files:
sh-3.2# /usr/bin/perl split_file.pl 2011-09-13 21:24:23 - Creating file database_00.sql 2011-09-13 21:24:23 - Creating file database_01.sql 2011-09-13 21:24:23 - Creating file database_02.sql 2011-09-13 21:24:23 - Creating file database_03.sql 2011-09-13 21:24:23 - Creating file database_04.sql 2011-09-13 21:24:23 - Creating file database_05.sql 2011-09-13 21:24:23 - Creating file database_06.sql 2011-09-13 21:24:23 - Creating file database_07.sql 2011-09-13 21:24:23 - Creating file database_08.sql 2011-09-13 21:24:23 - Creating file database_09.sql 2011-09-13 21:24:23 - Creating file database_10.sql 2011-09-13 21:24:23 - Creating file database_11.sql 2011-09-13 21:24:23 - Creating file database_12.sql 2011-09-13 21:24:23 - Creating file database_13.sql 2011-09-13 21:24:23 - Creating file database_14.sql 2011-09-13 21:24:24 - Creating file database_15.sql 2011-09-13 21:24:24 - Creating file database_16.sql 2011-09-13 21:24:24 - Creating file database_17.sql 2011-09-13 21:24:24 - Creating file database_18.sql 2011-09-13 21:24:24 - Creating file database_19.sql 2011-09-13 21:24:24 - Creating file database_20.sql 2011-09-13 21:25:14 - Creating file database_21.sql
Here are the sizes of the individual database*.sql files, as well as the original source file named all_data.sql:
sh-3.2# wc database*sql sh-3.2# wc database*.sql 18 66 807 database_00.sql 51 175 1676 database_01.sql 413 18379 1135180 database_02.sql 596 101335 5281418 database_03.sql 571 14161 367113 database_04.sql 571 10116 264555 database_05.sql 26 87 943 database_06.sql 88 2849 38786 database_07.sql 565 11121 411554 database_08.sql 338 10954 427717 database_09.sql 92 403 7529 database_10.sql 300 12690 326553 database_11.sql 273 1929 43345 database_12.sql 62 218 2143 database_13.sql 75 82987 13049334 database_14.sql 755 74021 909296 database_15.sql 62 324 3775 database_16.sql 209 535852 8778365 database_17.sql 147 2391 37442 database_18.sql 26 87 946 database_19.sql 1243 28094397 826796485 database_20.sql 333 134098 5402798 database_21.sql 6814 29108640 863287760 total

sh-3.2# ls -l all_data.sql
-rw-rw-rw- 1 tonydarnell admin 863270813 Sep 13 20:23 all_data.sql

As you can see, the total size of the database*.sql files is 863287760, and the size of our original MySQL dump file (all_data.sql) is 863270813. But, we have to take into account that we added the header information to each file. The header file size is 807, and we have 21 files, for a total of 16,947. If we subtract 16,947 from the total 863287760, we get 863270813, which is the same size as our original file.

After splitting the main MySQL dump file, you can now restore/import each file individually, and if you have any problems, they will be easier to troubleshoot.

Here is the script:
#!/usr/bin/perl -w

# Need to explicitly load the functions in POSIX
use POSIX 'strftime';
use File::Copy;

# file name of the file you want to split
$filename = "all_data.sql";

# Note that if you pass no argument to localtime, it assumes the current time
$time = strftime '%Y-%m-%d %H:%M:%S', localtime;

$count = 0;
$filecount = 0;

# filename for original file

# syntax for the first file name - the header file - and it will be named with 00
# if you have more than 100 files, you could modify this statement to place a leading
# zero if the file is greater than 10 but less than 100
if ($filecount < 10)

{
$header_filename = "database_0" . $filecount . ".sql";
}

print "$time - Creating file $header_filename\n";

open(OUTPUT, ">$header_filename") || die "Can't redirect stdout";

open(LINE ,"$filename") || die ("\nERROR - could not open file: $filename\n");
while (<LINE>)

{

if ($_ =~ "^-- Current Database: " && $count > 1)

{

               if ($count eq 1)

               {
                    print "Closing header file.\n";
                    close (OUTPUT);
               }

# close the old file
close(OUTPUT);

$filecount++;

                    # syntax for new file names
                    if ($filecount < 10)

                    {
                         $new_filename = "database_0" . $filecount . ".sql";
                    }

                    else

                    {
                     $new_filename = "database_" . $filecount . ".sql";
                    }

                    $time = strftime '%Y-%m-%d %H:%M:%S', localtime;
                    print "$time - Creating file $new_filename\n";

# copy the header file to the next new file, so the header info is in the file
copy("$header_filename","$new_filename") or die "Copy failed: $!";

open(OUTPUT, ">>$new_filename") || die "Can't redirect stdout";
print OUTPUT $_;

}

else

{
# print information into the header file
print OUTPUT $_;
}

$count++;

# end while
}

close (LINE);
close (OUTPUT);

—————————————–

Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program.

Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.