Quantcast
Viewing all articles
Browse latest Browse all 18770

Comparison of subquery optimizations in MySQL 5.6 and MariaDB 5.5

MySQL 5.6 is now RC, I suppose this means that all features that were intended to be in release are pushed, so it’s time to take a look and see what’s going to be in MySQL 5.6 GA.

I decided to look at subquery optimizations and compare them to what we’ve got in MariaDB 5.3/5.5. In case you don’t know, subquery optimizations in MySQL 5.6 and MariaDB come from a common ancestor - MySQL 6.0 alpha, which was released in spring 2009 and then abandoned because there were too many unstable features pushed into it.

Then, both MariaDB team and Oracle salvaged subquery code out of the 6.0 mess, and worked to get it in shape for release. MariaDB released its results in GA quality in April 2012 as MariaDB 5.3, which was quickly followed by MariaDB 5.5.

Inside MariaDB, we’ve considered 6.0-alpha’s feature set to be incomplete, and released only after we’ve added these three features:

We’ve even had a subquery optimizations map to make sure we cover all kinds of subqueries.

Now, I’m looking through MySQL 5.6.7 and 5.6.x changelogs, and the only improvement over original set of MySQL 6.0’s optimizations seems to be this part in 5.6.4 changelog:

The optimizer detects and optimizes away these useless query parts within IN/ALL/SOME/EXISTS subqueries:
  • DISTINCT
  • GROUP BY, if there is no HAVING clause and no aggregate functions
  • ORDER BY, which has no effect because LIMIT is not supported in these subqueries

This is a nice-to-have feature. The problem is that this feature does not cover the gaps in the set of MySQL 5.6’s subquery optimizations. To see what I’m talking about, let’s take the “Cost-based choice between Materialization and IN->EXISTS strategies” feature, and see why we absolutely had to have it in MariaDB 5.3.

Let’s consider a query with an uncorrelated subquery:
select col1, col1 in (select col2 from one_k_rows_tbl) from ten_rows_tbl

In MySQL 5.1, the only way to execute it was to use IN->EXISTS conversion. EXPLAIN in MySQL 5.1 looks like this:

+------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+
| id   | select_type        | table          | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+
|    1 | PRIMARY            | ten_rows_tbl   | ALL  | NULL          | NULL | NULL    | NULL |   10 |             |
|    2 | DEPENDENT SUBQUERY | one_k_rows_tbl | ALL  | NULL          | NULL | NULL    | NULL | 1344 | Using where |
+------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+

“DEPENDENT SUBQUERY” means that the subselect is re-executed for each record of the parent select. The original subquery is not correlated, but IN->EXISTS transformation converts the subquery from

col1 in (select col2 from one_k_rows_table)

into

exists (select 1 from one_k_rows_table where col2=ten_rows_table.col1 LIMIT 1)

The conversion makes the subquery correlated, but in return the subquery gets the “col2=col1″ IN-equality in the WHERE, which can make a big difference.

MySQL 6.0 has added Materialization strategy, and both MySQL 5.6 and MariaDB have got it from there. In MySQL 5.6, the EXPLAIN looks like this:

MySQL [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from ten_rows_tbl;
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+
|  1 | PRIMARY     | ten_rows_tbl   | ALL  | NULL          | NULL | NULL    | NULL |   10 | NULL  |
|  2 | SUBQUERY    | one_k_rows_tbl | ALL  | NULL          | NULL | NULL    | NULL | 1000 | NULL  |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+

It looks like uncorrelated subquery, but actually it is Materialization. MariaDB’s EXPLAIN output is very similar:

MariaDB [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from ten_rows_tbl;
+------+--------------+----------------+------+---------------+------+---------+------+------+-------+
| id   | select_type  | table          | type | possible_keys | key  | key_len | ref  | rows | Extra |
+------+--------------+----------------+------+---------------+------+---------+------+------+-------+
|    1 | PRIMARY      | ten_rows_tbl   | ALL  | NULL          | NULL | NULL    | NULL |   10 |       |
|    2 | MATERIALIZED | one_k_rows_tbl | ALL  | NULL          | NULL | NULL    | NULL | 1344 |       |
+------+--------------+----------------+------+---------------+------+---------+------+------+-------+

except that we have decided to print MATERIALIZED in select_type column to avoid possible confusion with regular uncorrelated subqueries.

Now, let’s see the gap. MySQL 6.0 had support for subquery Materialization, but it lacked ability to make a cost-based choice whether to use Materialization. It used it whenever possible. Let’s change our query so that the outer select expects to read only one row:

MariaDB [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from one_row_tbl;
+------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+
| id   | select_type        | table          | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+
|    1 | PRIMARY            | one_row_tbl    | ALL  | NULL          | NULL | NULL    | NULL |    1 |             |
|    2 | DEPENDENT SUBQUERY | one_k_rows_tbl | ALL  | NULL          | NULL | NULL    | NULL | 1344 | Using where |
+------+--------------------+----------------+------+---------------+------+---------+------+------+-------------+

As you can see, MariaDB has decided to use the good oldDEPENDENT SUBQUERY, that is, IN->EXISTS strategy. Why? It figured that the upper select expects to read only one row, which means the subquery is expected to be evaluated only once. With Materialization strategy, one needs to pay a big upfront cost (materialization) in order to have cheap subquery re-execution. This pays off when the subquery is executed many times. This doesn’t pay off when you need to execute the subquery once, or just a few times. MariaDB was able to recognize this and made a cost-based decision.

Now, let’s run this query on MySQL 5.6:

MySQL [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from one_row_tbl;
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+
|  1 | PRIMARY     | one_row_tbl    | ALL  | NULL          | NULL | NULL    | NULL |    1 | NULL  |
|  2 | SUBQUERY    | one_k_rows_tbl | ALL  | NULL          | NULL | NULL    | NULL | 1000 | NULL  |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+

It still uses Materialization. Neither changelogs, nor the source code have any mention of cost-based choice, so I assume they haven’t made any improvements over MySQL 6.0 code. As I’ve mentioned, Materialization is a startup-heavy strategy, so one can expect MySQL 5.6 to show regressions for queries where the subquery is executed only a few times.


PlanetMySQL Voting: Vote UP / Vote DOWN

Viewing all articles
Browse latest Browse all 18770

Trending Articles