No, DRBD doesn’t magically make your application crash safe

It is a common misconception that DRBD (or any block-level data replication) solution can magically make an application crash-safe that intrinsically isn’t. Baron highlights that misconception in a recent blog post.

I want to reiterate and stress that point here: if your application can’t reliably survive a node crash, it won’t successfully fail over on a replicated (or shared, for that matter) data device. But if it can, and DRBD is replicating synchronously, then DRBD won’t break it. In other words: try pulling the power plug on your machine while your app is running, and power back on. If your application recovers to a consistent state, you’re clear. If it doesn’t, don’t bother adding DRBD until you fix that.

You must fix any layer in your stack that isn’t crash safe, if you even want to start thinking about high availability. ext2, which Baron mentions in his post, isn’t crash safe. MySQL with a database using the MyISAM storage engine isn’t crash safe. KVM with virtual block devices in cache=writeback mode isn’t crash safe. Running on a RAID controller with the write cache enabled when its battery is dead isn’t crash safe.

Thus, if you want high availability, use ext3. Or ext4. Or any journaling file system. Use InnoDB for MySQL. Use cache=none for KVM. And check those batteries. It’s that simple.