…with a bit of strategic thinking
They come back, every now and then. Subqueries are far from being perfect at MySQL and they can give you some serious headaches.
Skilled MySQL developers know it better. They avoid subqueries as much as they can. It is not that subqueries do not work, it is just that the optimizer sometimes is, well, “not that optimised”.
So you may stay away from subqueries with some good SQL review. But what happens when the subquery is automatically generated by a script or a tool? If you can change the statement, I’m afraid you need to find some serious workarounds that vary case by case.
Here is an example that I found few weeks ago when I visited one of our customers.
Our customer used Magento for its site. Magento used a couple of queries that I will report here as sales and sales_items, although they are not their real name:
CREATE TABLE `sales` (
`sales_id` int(11) NOT NULL,
`sales_entity` int(11) NOT NULL,
`sales_status` int(11) NOT NULL,
PRIMARY KEY (`sales_id`),
KEY `ix2` (`sales_entity`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sales_items` (
`sales_item_id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`quantity` int(11) NOT NULL,
PRIMARY KEY (`sales_item_id`),
KEY `ix2` (`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Both queries had tens of thousands of rows, but in our example I will use only few rows:
mysql> select * from sales;
+----------+--------------+--------------+
| sales_id | sales_entity | sales_status |
+----------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
| 7 | 4 | 1 |
| 8 | 4 | 2 |
| 9 | 5 | 1 |
| 10 | 5 | 2 |
+----------+--------------+--------------+
10 rows in set (0.00 sec)
mysql> select * from sales_items;
+---------------+------------+----------+
| sales_item_id | product_id | quantity |
+---------------+------------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 2 | 1 |
| 4 | 2 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
+---------------+------------+----------+
6 rows in set (0.00 sec)
The application was running very slow. When we analysed the query log using SkySQL Enterprise Monitor, we found an UPDATE like this:
UPDATE sales SET sales_status = ?
WHERE sales_entity IN (
SELECT DISTINCT sales_item_id
FROM sales_items
WHERE product_id IN (?) )
The update was executed several times in a minute and it took several seconds to be executed. We already knew that the issue was in the subquery, but just to explain the situation to the customer we wanted to show an execution plan.
As you may know, you cannot use EXPLAIN with and UPDATE statement, therefore we use a SELECT that acts like the UPDATE. This SELECT did the trick:
SELECT * FROM sales
WHERE sales_entity IN ( SELECT DISTINCT sales_item_id
FROM sales_items
WHERE product_id IN (3) );
+----------+--------------+--------------+
| sales_id | sales_entity | sales_status |
+----------+--------------+--------------+
| 9 | 5 | 1 |
| 10 | 5 | 2 |
+----------+--------------+--------------+
2 rows in set (0.00 sec)
The execution plan of the query is:
EXPLAIN SELECT * FROM sales
WHERE sales_entity IN (
SELECT DISTINCT sales_item_id FROM sales_items
WHERE product_id IN (3) );
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | sales | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 2 | DEPENDENT SUBQUERY | sales_items | unique_subquery | PRIMARY,ix2 | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)
As you can see, the query uses the index in the subquery to retrieve the rows from sales_items, but then it does not use the index on sales_entity in sales. The result is that the UPDATE takes several seconds to be executed, because mysqld performs a full scan of the sales table.
If you want to check the details of this issue, you can have a look at the infamous bug 18826 - http://bugs.mysql.com/bug.php?id=18826
So what should you do to fix the problem? If you cannot change the statement, I am afraid there is not much you can do. We were luck and the customer could change the statement. By changing the select with a LEFT JOIN that replaces the subquery, something like:
SELECT sales.* FROM sales LEFT JOIN sales_items
ON (sales.sales_entity = sales_items.sales_item_id )
WHERE sales_items.product_id in (3);
+----------+--------------+--------------+
| sales_id | sales_entity | sales_status |
+----------+--------------+--------------+
| 9 | 5 | 1 |
| 10 | 5 | 2 |
+----------+--------------+--------------+
2 rows in set (0.00 sec)
The SELECT gives the same result of the previous SELECT and the EXPLAIN command shows:
EXPLAIN SELECT sales.* FROM sales LEFT JOIN sales_items
ON (sales.sales_entity = sales_items.sales_item_id )
WHERE sales_items.product_id in (3);
+----+-------------+-------------+------+---------------+------+---------+-------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+-------+------+--------------------------------+
| 1 | SIMPLE | sales_items | ref | PRIMARY,ix2 | ix2 | 4 | const | 2 | Using where; Using index |
| 1 | SIMPLE | sales | ALL | ix2 | NULL | NULL | NULL | 10 | Using where; Using join buffer |
+----+-------------+-------------+------+---------------+------+---------+-------+------+--------------------------------+
2 rows in set (0.00 sec)
So ix2 can be used to filter the rows in the SELECT, and ultimately in the UPDATE.
In fact, the new UPDATE is:
UPDATE sales LEFT JOIN sales_items ON ( sales.sales_entity = sales_items.sales_item_id )
SET sales_status = ?
WHERE sales_items.product_id IN ( ? )
The server tales few milliseconds to execute the query, fixing also the headache of our developers better than a pain killer.
Some people may argue that the use of subqueries is not efficient and developers should learn some good SQL. Unfortunately, this is not the point. Products like Magento and others – I have primarily in mind BI and reporting tools – are supposed to be database agnostic and generate queries for many RDBMSs with minor customisation. Bugs like this give the end users the impression that MySQL is terribly slow and is simply not good enough for the job, so they should refer to expensive commercial RDBMSs – the ones that have subqueries working properly.
I really, really hope that the optimiser team at Monty Program one day will fix this bug. We will see even more adoption of MySQL in the Enterprise world, something that has been always underestimated. More projects in the “Enterprise world” mean more resources injected into the MySQL ecosystem, resources (funds, skilled developers and DBAs, expertise) that may be used to improve the product and the solutions around it.

PlanetMySQL Voting: Vote UP / Vote DOWN