Partitioning, Free Lunches, & Indexing, Part 2

Review

In part one, I presented a very brief and particular view of partitioning. I covered what partitioning is, with hardly a mention of why one would use partitioning. In this post, I’ll talk about a few use cases often cited as justification for using partitions.

Lots of disks → Lots of partitioning of tables

One use case for justifying partitions is that each partition can be placed on a separate disk to avoid spindle contention. I have to say that on this one, I agree with Kevin Burton, who makes the point that if you want to distribute I/O load across several disks, you can use a RAID configuration on the disks. In this case, he says that partitioning is not worth the trouble. [NB. He makes the point that this is a problem with MySQL's implementation of partitions and its interaction with MySQL Cluster. He proposes some fixes. If those fixes ever come to pass, we'll have to see what the new performance characteristics look like.]

I spent some time trying to find comparisons between partitioning and RAIDing for load balancing in MySQL. If I missed something, I’d love to know about it, but the results aren’t jumping out.

In summary, if you’re going to claim that a partitioning scheme helps load balance across disks, it makes sense to compare this with a well-designed RAID system. Partitioning is high maintenance. RAID is low maintenance by comparison. I may be lazy, but I’d rather the system did the load balancing and not me.

Avoiding Table Scans

The most commonly claimed performance improvement for partitioning, from what I can tell, is their use in avoiding full table scans during queries. Even though the details vary from case to case, I’ve been able to distill one scenario that illustrates what’s going on most of the time:

A query on data causes a table scan.
Adding a secondary index doesn’t help, because the secondary index does not cover the query. The non-covering index would induce a large number of point queries into the primary table. This is slow enough that the query optimizer chooses a primary table scan instead. (Why not build the secondary index as covering? Probably because secondary indexes can be expensive to build in InnoDB. Perhaps the queries are date+user_id, then date+ip, then date+…, so you’d need a lot of secondary indexes; or even more directly, you know what secondary indexes you need, but InnoDB can’t keep up with the insertion load when you turn them all on. The first problem is solved by using TokuDB clustering indexes. The second is also solved by TokuDB, because it’s so fast at indexing.)
The table is partitioned by date. Now, when a query induces a table scan, one or a few partitions are scanned, instead of the whole table.

The table didn’t get any smaller. We were just able to restrict a table scan to a smaller portion of the table. For me, this helps place partitions into a context of well-known DB techniques. Partitions are often a replacement for covering indexes!

After all, that’s what covering indexes are, right? They are arrangements of data that allow you to scan part of a table to answer a query, without resorting to point queries into a primary table.

If partitions are indexes, are they good ones?

Once again, I looked for comparisons of proposed partitioning schemes with indexing-based choices. In particular, I looked for comparisons of partitioning with covering schemes. No luck. And as before, if I missed something I’d love to hear about it.

If partitions are designed to avoid table scans, then they are a high-effort way to do so. I’m not suggesting that they should never be used for this, but unless the low maintenance option of covering and clustering indexing is considered, partitioning just feels like a lot of work, potentially for no benefit.

Conclusion

The common reasons for using partitioning have low-maintenance, high-performance alternatives. In the case of load balancing across disks, I’d rather use RAID than partition by hand. In the case of avoiding table scans, I’d rather design indexes that avoid table scans than partition by hand. If I can’t get the insertion performance I need once I’ve designed a sensible set of indexes for my query load, I’d rather use a storage engine that does fast indexing (like TokuDB) so I can maintain the indexes I need, rather than the ones I’m forced to by the storage engine.

And in any case, it’s important to compare partitioning to the most effective alternative solutions for the performance problem encountered.

PlanetMySQL Voting: Vote UP / Vote DOWN

Partitioning, Free Lunches, & Indexing, Part 2

Review

Lots of disks → Lots of partitioning of tables

Avoiding Table Scans

If partitions are indexes, are they good ones?

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112