Bulletproofing MySQL replication with checksums

Read the original article at Bulletproofing MySQL replication with checksums

Your MySQL replications running well? You might not even know if they aren’t. One of the scariest things about MySQL replication is that it can drift out of sync with the master “silently”. No errors, no warnings.

What and Why?

MySQL’s replication solution evolved as a statement based technology. Instead of sending actual block changes, MySQL just has to log committed transactions, and reapply those on the slave side. This affords a wonderful array of topologies and different uses, but has it’s drawbacks. The biggest occur when data does not get updated or changed in the same way on the slave. If you’re new to MySQL or coming from the Oracle world you might expect that this would flag an error. But there are many scenarios in which MySQL will not flag an error:

mixed transactional and non-transactional tables
use of non-deterministic functions such as uuid()
stored procedures and functions
update with LIMIT clause

There are others but suffice it to say if you want to rely on your slave being consistent, you need to check it!

The solution – mathematical checksums

If you’re a seasoned Linux user, you’re probably familiar with the md5sum command. It creates a checksum on a file. You can do so on different servers to compare a file in a mathematically exact way. In fact rsync uses this technique to efficiently determine what files or pieces of files need to be copied across a network. That’s what makes it so fast!

It turns out that MySQL can checksum tables too. However were we to build our own solution, we might have trouble doing so manually as table data is constantly in a state of flux.

Enter Percona’s pt-table-checksum tool formerly part of Maatkit. Run it periodically against your master schemas or the entire instance if you like. It will store checksums of all of your tables in a special checksum table. The data from this table then will propagate through replication to all of your connected slaves.

The tool then has a check mode, which allows you to verify all the connected slaves are ok, or report the differences if it finds any.

Step-by-step Setup

First you’ll need to grab a copy of the percona toolkit. Note that if you previously installed maatkit then you may want to delete those old scripts to avoid confusion. mk-table-checksum if you used maatkit, or pt-table-checksum if you have 1.0 versions. You likely installed using wget or perl Makefile, so you may need to go and remove those manually.

Assuming you’ve already got the percona repository installed issue:

$ yum install -y percona-toolkit

I’ve found some of the maatkit tools to be rather fussy about getting all the options right. The first thing to do which will help simplify this is to add a section in your local user’s “.my.cnf” file like this:

[client]

user=root

password=myrootpassword

That way the percona tools will look for this whenever it needs authentication credentials. Otherwise we assume localhost for this example, so you should verify you can connect with the mysql client as root from localhost.

Now let's checksum the "mysql" system schema.

$ pt-table-checksum --replicate=test.checksum --create-replicate-table --databases=mysql localhost

Note the --create-replicate-table option. You only need this option the first time. From there the test.checksum table will exist.

You should see some output that looks like this:

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE04-24T16:06:45 0 0 0 1 0 0.099 mysql.columns_priv
04-24T16:06:45 0 0 32 1 0 0.100 mysql.db
04-24T16:06:45 0 0 0 1 0 0.096 mysql.event
04-24T16:06:45 0 0 0 1 0 0.096 mysql.func
04-24T16:06:45 0 0 38 1 0 0.102 mysql.help_category
04-24T16:06:45 0 0 452 1 0 0.106 mysql.help_keyword
04-24T16:06:46 0 0 993 1 0 0.096 mysql.help_relation
04-24T16:06:46 0 0 506 1 0 0.100 mysql.help_topic
04-24T16:06:46 0 0 0 1 0 0.099 mysql.host
04-24T16:06:46 0 0 0 1 0 0.104 mysql.ndb_binlog_index
04-24T16:06:46 0 0 0 1 0 0.107 mysql.plugin
04-24T16:06:46 0 1 1 1 0 0.115 mysql.proc
04-24T16:06:46 0 0 0 1 0 0.186 mysql.procs_priv
04-24T16:06:46 0 1 1 1 0 0.097 mysql.proxies_priv
04-24T16:06:47 0 0 0 1 0 0.097 mysql.servers
04-24T16:06:47 0 0 0 1 0 0.096 mysql.tables_priv
04-24T16:06:47 0 0 0 1 0 0.098 mysql.time_zone
04-24T16:06:47 0 0 0 1 0 0.097 mysql.time_zone_leap_second
04-24T16:06:47 0 0 0 1 0 0.100 mysql.time_zone_name
04-24T16:06:47 0 0 0 1 0 0.100 mysql.time_zone_transition
04-24T16:06:47 0 0 0 1 0 0.095 mysql.time_zone_transition_type
04-24T16:06:47 0 1 38 1 0 0.100 mysql.user

How to check slaves

Once you've collected all those fancy checksums for your tables, nicely timestamped, you'll want to verify that your slaves are happily in sync. You can do that with the following command, also on the master:

$ pt-table-checksum --replicate=test.checksum --replicate-check-only --databases=mysql localhost

If there's no differences you'll see no output. If you have a difference it'll look something like this:

Differences on ip-10-15-27-19TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
mysql.user 1 1 1

In our case you can see we created some users on the slaves accidentally, hence the differences. It illustrates how easy it is for differences to creep into your environment and also how easy it now is to find them!

Special Cases

Since one of my clients uses Drupal, they've had trouble replicating the semaphore table. This tables is a MyISAM table, and unfortunately no one dares convert it to InnoDB. So from time to time some gunk builds up in there, and it fails on the slave. We could clean out the table, but we decided to just filter out this one table. Since Drupal doesn't use fully qualified schema.table names in it's code, only "use" we have found this to be safe.

However the percona toolkit explicitely checks for replication filters and will not run. It'll stop with an error as follows:

$ pt-table-checksum --replicate=test.checksum --databases=sean --ignore-tables=semaphore localhost04-24T15:59:29 Replication filters are set on these hosts:
ip-10.15.27.19
replicate_ignore_table = sean.semaphore
ip-10-15-27-72
replicate_ignore_table = sean.semaphore
ip-10-15-27-18
replicate_ignore_table = sean.semaphore

Please read the --check-replication-filters documentation to learn how to solve this problem. at /usr/bin/pt-table-checksum line 6166.

The solution is the --nocheck-replication-filters option. Keep in mind that this sanity check is there for a reason, so be sure to skip the relevant tables in your checksum building, and checksum checks.

To build checksums skipping the semaphore table use this command:

$ pt-table-checksum --replicate=test.checksum --ignore-tables=prod.semaphore --nocheck-replication-filters localhost

Now you can check your slaves but ignore the semaphore table:


 $ pt-table-checksum --replicate=test.checksum --replicate-check-only --ignore-tables=prod.semaphore --nocheck-replication-filters localhost

We also found a bug which preventing us from specifying multiple tabes on the ignore-tables line. So we used multiple invocations to do different schemas like this:


 $ pt-table-checksum --replicate=test.checksum --replicate-check-only --databases=prod1 --ignore-tables=prod1.semaphore --nocheck-replication-filters localhost$ pt-table-checksum --replicate=test.checksum --replicate-check-only --databases=prod2 --ignore-tables=prod2.semaphore --nocheck-replication-filters localhost

Crash Protection

If you've used MySQL replication for any length of time, you've probably seen a server crash. MySQL replication can have trouble restarting if you're using temporary tables, as they'll be missing upon restart. Also MySQL before 5.5 leaves syncing the info files to the operating system. So they may be incorrect after a crash.

1. Use MySQL 5.5 if possible

There are some new parameters in 5.5 that protect the info files from a crash. These are a great addition, and will make your slave databases more bulletproof.

sync_master_info = 1sync_relay_log = 1
sync_relay_log_info = 1

2. Don't use temporary tables

After a restart they'll simply be gone, so queries requiring or running against them will fail.

The Percona guys' new book High Performance MySQL third edition, suggests an alternate solution to using temporary tables. Use a special schema to hold your temp data, but create them as normal permanent tables. Be sure your application creates them with unique names, using the connection_id() for example. Lastly have a cleanup process drop tables periodically, based on closed connection_ids.

For more articles like these go to iHeavy, Inc +1-212-533-6828

PlanetMySQL Voting: Vote UP / Vote DOWN

Bulletproofing MySQL replication with checksums

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112