Quantcast
Viewing all articles
Browse latest Browse all 18769

What makes a good MySQL index? Part 2: Cardinality

edit: added “Low cardinality isn’t always bad” section after Morgan’s comment

As we’ve seen already column size is important for indexes. Cardinality is really important too, it’s the uniqueness of the values included in the index.

Indexes are used by MySQL (and in any RDBMS) to help find rows quickly. We want to make it as easy as possible for MySQL to find the relevant rows, the more precise or specific we are the less the number of rows MySQL has to fetch.

Example Table

For this post I’ve created a table with some dummy data in it. Here’s the SQL I used to create it (in case you want to) which was generated with this hack script.

The table is fairly simple. Note the three indexes: PRIMARY plus two composite indexes on (sex, age) and (age, sex)

CREATE TABLE `user` (
  `id` mediumint(9) NOT NULL auto_increment,
  `name` varchar(20) default NULL,
  `age` tinyint(3) unsigned default NULL,
  `sex` enum('MALE','FEMALE') default NULL,
  PRIMARY KEY  (`id`),
  KEY `age_sex` (`age`,`sex`),
  KEY `sex_age` (`sex`,`age`)
) ENGINE=MyISAM AUTO_INCREMENT=4917 DEFAULT CHARSET=utf8

If we run ANALYZE TABLE on the table and then SHOW INDEXES we can see the following:

mysql> SHOW INDEXES FROM user;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| user  |          0 | PRIMARY  |            1 | id          | A         |        4916 |     NULL | NULL   |      | BTREE      |         |
| user  |          1 | age_sex  |            1 | age         | A         |         100 |     NULL | NULL   | YES  | BTREE      |         |
| user  |          1 | age_sex  |            2 | sex         | A         |         196 |     NULL | NULL   | YES  | BTREE      |         |
| user  |          1 | sex_age  |            1 | sex         | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| user  |          1 | sex_age  |            2 | age         | A         |         196 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)

Assuming the male:female ratio in our dataset is 50:50 then a query with WHERE sex="MALE" is only ever going to help us find 50% of the rows. MySQL will then have to sift through that 50% of rows. Let’s demonstrate that:

mysql> EXPLAIN SELECT id FROM user WHERE sex="MALE";
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | user  | ref  | sex_age       | sex_age | 2       | const | 3335 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

If age has a fairly even distribution across the age range you could use a query with WHERE age=31 and MySQL should be able to limit the relevant rows down to just ~1% of the dataset. This will mean that any further searching, either using the next part of a composite index or analysis or scanning the records will be much faster. Again EXPLAIN proves this:

mysql> EXPLAIN SELECT id FROM user WHERE age=32;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | user  | ref  | age_sex       | age_sex | 2       | const |    5 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Cardinality and composite indexes

Keeping with our example dataset if we commonly query the table knowing the age and sex of the people. A typical query might look like:

SELECT id FROM people WHERE age=31 AND sex="MALE";

We’ve got the two composite indexes on the table (age, sex) and (sex, age) but which one will work best for this results. Based around everything learnt so far in this post (age, sex) should do. First let’s prove that the MySQL Optimiser agrees:

mysql> EXPLAIN SELECT id FROM user WHERE age=32 AND sex="MALE";
+----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys   | key     | key_len | ref         | rows | Extra       |
+----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+
|  1 | SIMPLE      | user  | ref  | age_sex,sex_age | age_sex | 4       | const,const |    2 | Using where |
+----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+
1 row in set (0.00 sec)

How do I know what cardinality my data has?

If you’re looking at indexes that are there already ANALYZE TABLE followed by SHOW INDEXES FROM user as above will do the trick. If the data isn’t there yet you can do a simple SELECT DISTINCT like this:

mysql> SELECT COUNT(DISTINCT age) AS cardinality FROM user;
+-------------+
| cardinality |
+-------------+
|         100 |
+-------------+
1 row in set (0.00 sec)

Why isn’t MySQL using my index?

Just because you have the index there it doesn’t mean that MySQL will decide to use it. Have a look at the following similar queries:

mysql> EXPLAIN SELECT id FROM user WHERE age>80;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | user  | range | age_sex       | age_sex | 2       | NULL |  764 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM user WHERE age<80;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | user  | ALL  | age_sex       | NULL | NULL    | NULL | 4916 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

The first query uses the index but the second one chooses to do a table scan. That’s no co-incidence, MySQL is smart enough to know about the distribution of the data and decides that it’d be faster to just munch through all of the data rather than the 80% that it could return from the index lookup. When our query is restrictive however with the >80 query it knows it’ll be better to use the index. Most of the time the optimizer gets it right!

There’s something very important to learn from this issue of distribution of data. Unless you’re checking query optimisation with real world queries and a real world dataset then you’re not optimising properly! In some cases it could pay to create some sample data (if you can predict what it’ll be like) with a simple script like the one I spun up for the user table we used as an example.

Low cardinality isn’t always bad

It could be that although the cardinality is low the selectivity is still high because of the distribution of data. Let’s change the characteristics of the sex column in the table:

mysql> UPDATE user SET SEX="MALE";
Query OK, 2503 rows affected (0.58 sec)
Rows matched: 4916  Changed: 2503  Warnings: 0

mysql> UPDATE user SET SEX="FEMALE" WHERE id<10;
Query OK, 9 rows affected (0.00 sec)
Rows matched: 9  Changed: 9  Warnings: 0

mysql> SELECT sex, COUNT(*) FROM user GROUP BY sex;
+--------+----------+
| sex    | COUNT(*) |
+--------+----------+
| MALE   |     4907 |
| FEMALE |        9 |
+--------+----------+
2 rows in set (0.00 sec)

We’ve now got low cardinality (just two possible values) but if we’re looking for just "FEMALE" rows then they should be easy to find with this index. Again EXPLAIN agrees:

mysql> EXPLAIN SELECT id FROM user WHERE sex="MALE";
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | user  | ALL  | sex_age       | NULL | NULL    | NULL | 3687 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.11 sec)

mysql> EXPLAIN SELECT id FROM user WHERE sex="FEMALE";
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | user  | ref  | sex_age       | sex_age | 2       | const |    9 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

PlanetMySQL Voting: Vote UP / Vote DOWN

Viewing all articles
Browse latest Browse all 18769

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>