edit: added “Low cardinality isn’t always bad” section after Morgan’s comment
As we’ve seen already column size is important for indexes. Cardinality is really important too, it’s the uniqueness of the values included in the index.
Indexes are used by MySQL (and in any RDBMS) to help find rows quickly. We want to make it as easy as possible for MySQL to find the relevant rows, the more precise or specific we are the less the number of rows MySQL has to fetch.
Example Table
For this post I’ve created a table with some dummy data in it. Here’s the SQL I used to create it (in case you want to) which was generated with this hack script.
The table is fairly simple. Note the three indexes: PRIMARY plus two composite indexes on (sex, age)
and (age, sex)
CREATE TABLE `user` ( `id` mediumint(9) NOT NULL auto_increment, `name` varchar(20) default NULL, `age` tinyint(3) unsigned default NULL, `sex` enum('MALE','FEMALE') default NULL, PRIMARY KEY (`id`), KEY `age_sex` (`age`,`sex`), KEY `sex_age` (`sex`,`age`) ) ENGINE=MyISAM AUTO_INCREMENT=4917 DEFAULT CHARSET=utf8
If we run ANALYZE TABLE
on the table and then SHOW INDEXES
we can see the following:
mysql> SHOW INDEXES FROM user; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | user | 0 | PRIMARY | 1 | id | A | 4916 | NULL | NULL | | BTREE | | | user | 1 | age_sex | 1 | age | A | 100 | NULL | NULL | YES | BTREE | | | user | 1 | age_sex | 2 | sex | A | 196 | NULL | NULL | YES | BTREE | | | user | 1 | sex_age | 1 | sex | A | 2 | NULL | NULL | YES | BTREE | | | user | 1 | sex_age | 2 | age | A | 196 | NULL | NULL | YES | BTREE | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 5 rows in set (0.00 sec)
Assuming the male:female ratio in our dataset is 50:50 then a query with WHERE sex="MALE"
is only ever going to help us find 50% of the rows. MySQL will then have to sift through that 50% of rows. Let’s demonstrate that:
mysql> EXPLAIN SELECT id FROM user WHERE sex="MALE"; +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | user | ref | sex_age | sex_age | 2 | const | 3335 | Using where | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ 1 row in set (0.00 sec)
If age
has a fairly even distribution across the age range you could use a query with WHERE age=31
and MySQL should be able to limit the relevant rows down to just ~1% of the dataset. This will mean that any further searching, either using the next part of a composite index or analysis or scanning the records will be much faster. Again EXPLAIN
proves this:
mysql> EXPLAIN SELECT id FROM user WHERE age=32; +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | user | ref | age_sex | age_sex | 2 | const | 5 | Using where | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ 1 row in set (0.00 sec)
Cardinality and composite indexes
Keeping with our example dataset if we commonly query the table knowing the age and sex of the people. A typical query might look like:
SELECT id FROM people WHERE age=31 AND sex="MALE";
We’ve got the two composite indexes on the table (age, sex)
and (sex, age)
but which one will work best for this results. Based around everything learnt so far in this post (age, sex)
should do. First let’s prove that the MySQL Optimiser agrees:
mysql> EXPLAIN SELECT id FROM user WHERE age=32 AND sex="MALE"; +----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+ | 1 | SIMPLE | user | ref | age_sex,sex_age | age_sex | 4 | const,const | 2 | Using where | +----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+ 1 row in set (0.00 sec)
How do I know what cardinality my data has?
If you’re looking at indexes that are there already ANALYZE TABLE
followed by SHOW INDEXES FROM user
as above will do the trick. If the data isn’t there yet you can do a simple SELECT DISTINCT
like this:
mysql> SELECT COUNT(DISTINCT age) AS cardinality FROM user; +-------------+ | cardinality | +-------------+ | 100 | +-------------+ 1 row in set (0.00 sec)
Why isn’t MySQL using my index?
Just because you have the index there it doesn’t mean that MySQL will decide to use it. Have a look at the following similar queries:
mysql> EXPLAIN SELECT id FROM user WHERE age>80; +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | user | range | age_sex | age_sex | 2 | NULL | 764 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM user WHERE age<80; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | user | ALL | age_sex | NULL | NULL | NULL | 4916 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec)
The first query uses the index but the second one chooses to do a table scan. That’s no co-incidence, MySQL is smart enough to know about the distribution of the data and decides that it’d be faster to just munch through all of the data rather than the 80% that it could return from the index lookup. When our query is restrictive however with the >80
query it knows it’ll be better to use the index. Most of the time the optimizer gets it right!
There’s something very important to learn from this issue of distribution of data. Unless you’re checking query optimisation with real world queries and a real world dataset then you’re not optimising properly! In some cases it could pay to create some sample data (if you can predict what it’ll be like) with a simple script like the one I spun up for the user table we used as an example.
Low cardinality isn’t always bad
It could be that although the cardinality is low the selectivity is still high because of the distribution of data. Let’s change the characteristics of the sex
column in the table:
mysql> UPDATE user SET SEX="MALE"; Query OK, 2503 rows affected (0.58 sec) Rows matched: 4916 Changed: 2503 Warnings: 0 mysql> UPDATE user SET SEX="FEMALE" WHERE id<10; Query OK, 9 rows affected (0.00 sec) Rows matched: 9 Changed: 9 Warnings: 0 mysql> SELECT sex, COUNT(*) FROM user GROUP BY sex; +--------+----------+ | sex | COUNT(*) | +--------+----------+ | MALE | 4907 | | FEMALE | 9 | +--------+----------+ 2 rows in set (0.00 sec)
We’ve now got low cardinality (just two possible values) but if we’re looking for just "FEMALE"
rows then they should be easy to find with this index. Again EXPLAIN
agrees:
mysql> EXPLAIN SELECT id FROM user WHERE sex="MALE"; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | user | ALL | sex_age | NULL | NULL | NULL | 3687 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.11 sec) mysql> EXPLAIN SELECT id FROM user WHERE sex="FEMALE"; +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | user | ref | sex_age | sex_age | 2 | const | 9 | Using where | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ 1 row in set (0.00 sec)
Image may be NSFW.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.

PlanetMySQL Voting: Vote UP / Vote DOWN