
This is the second part of a two-articles series. In the first part, we introduced the Common Table Expression (CTE), a new feature available on MySQL 8.0 as well as Percona Server for MySQL 8.0. In this article, we’ll present the Recursive Common Table Expression. SQL is generally poor at recursive structures, but it is now possible on MySQL to write recursive queries. Before MySQL 8.0, recursion was possible only by creating stored routines.
What is a Recursive Common Table Expression?
A recursive CTE is one having a subquery that refers to its own name. It is particularly useful in the following cases:
- To generate series
- Hierarchical or tree-structured data traversal
Let’s see the main components of a recursive CTE. The following is the syntax to create it:
WITH RECURSIVE cte AS ( initial_query -- "seed" member UNION ALL recursive_query -- recusive member that references to the same CTE name ) SELECT * FROM cte; -- main query
First of all, the clause RECURSIVE is mandatory, and then there are two mandatory components. The seed member is the initial query, the one that will be executed at the first iteration. The recursive member is the query containing the reference to the same CTE name. This second component will generate all the remaining items of the main query.
The process stops when an iteration does not generate any rows. Be aware of that in order to avoid generating a lot of iterations that can exhaust the memory.
It is important for recursive CTEs that the recursive member includes a condition to terminate the recursion. As a development technique you can force termination by placing a limit on execution time:
- The
cte_max_recursion_depth
system variable enforces a limit on the number of recursion levels for CTEs. The server terminates the execution of any CTE that recurses more levels than the value of this variable. The default value is 1000. - The
max_execution_time
system variable enforces an execution timeout for SELECT statements executed within the current session. - The
MAX_EXECUTION_TIME
optimizer hint enforces a per-query execution timeout for the SELECT statement in which it appears.
Generate Series
Let’s see now some simple usage of Recursive CTE to generate series.
One-Level Sequence
First, create a simple series of integer numbers from 1 to 10. This a one-level sequence because the N+1 value is a function of the previous one N only.
WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n -- seed member: our sequence starts from 1 UNION ALL SELECT n + 1 FROM natural_sequence -- recursive member: reference to itself WHERE n < 10 -- stop condition ) SELECT * FROM natural_sequence; -- main query +------+ | n | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | +------+ # let's see what happen if we miss the stop condition mysql> WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM natural_sequence ) SELECT * FROM natural_sequence; ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.
Another typical example is calculating the factorial.
mysql> WITH RECURSIVE factorial(n, fact) AS ( SELECT 0, 1 UNION ALL SELECT n + 1, fact * (n+1) FROM factorial WHERE n < 20 ) SELECT * from factorial; +------+---------------------+ | n | fact | +------+---------------------+ | 0 | 1 | | 1 | 1 | | 2 | 2 | | 3 | 6 | | 4 | 24 | | 5 | 120 | | 6 | 720 | | 7 | 5040 | | 8 | 40320 | | 9 | 362880 | | 10 | 3628800 | | 11 | 39916800 | | 12 | 479001600 | | 13 | 6227020800 | | 14 | 87178291200 | | 15 | 1307674368000 | | 16 | 20922789888000 | | 17 | 355687428096000 | | 18 | 6402373705728000 | | 19 | 121645100408832000 | | 20 | 2432902008176640000 | +------+---------------------+
Two-Level Sequence
In this case, we would like to create a two-level sequence where the N+2 value is a function of the two previous values N+1 and N.
The typical example here is the Fibonacci Series; each number is the sum of the two preceding ones, starting from 0 and 1. Let’s calculate the first 20 items of the Fibonacci series.
mysql> WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS ( SELECT 1, 0, 1 UNION ALL SELECT n + 1, next_fib_n, fib_n + next_fib_n FROM fibonacci WHERE n < 20 ) SELECT * FROM fibonacci; +------+-------+------------+ | n | fib_n | next_fib_n | +------+-------+------------+ | 1 | 0 | 1 | | 2 | 1 | 1 | | 3 | 1 | 2 | | 4 | 2 | 3 | | 5 | 3 | 5 | | 6 | 5 | 8 | | 7 | 8 | 13 | | 8 | 13 | 21 | | 9 | 21 | 34 | | 10 | 34 | 55 | | 11 | 55 | 89 | | 12 | 89 | 144 | | 13 | 144 | 233 | | 14 | 233 | 377 | | 15 | 377 | 610 | | 16 | 610 | 987 | | 17 | 987 | 1597 | | 18 | 1597 | 2584 | | 19 | 2584 | 4181 | | 20 | 4181 | 6765 | +------+-------+------------+
Date Sequence
Let’s consider having a simple table containing our shop’s sales such as the following:
CREATE TABLE sales ( id INT AUTO_INCREMENT PRIMARY KEY, order_date DATE, product VARCHAR(20), price DECIMAL(10,2)); # populate the table INSERT INTO sales(order_date, product, price) VALUES('2020-02-01','DVD PLAYER',100.50),('2020-02-01','TV',399.99),('2020-02-02','LAPTOP',1249.00), ('2020-02-04','DISHWASHER',500.00),('2020-02-04','TV',699.00),('2020-02-06','LAPTOP',990.50),('2020-02-06','HAIRDRYER',29.90), ('2020-02-06','GAME CONSOLE',299.00),('2020-02-08','BOOK',9.00),('2020-02-08','REFRIGERATOR',600.00); # let's run a query to generate the sales report by day SELECT order_date, SUM(price) AS sales FROM sales GROUP BY order_date; +------------+---------+ | order_date | sales | +------------+---------+ | 2020-02-01 | 500.49 | | 2020-02-02 | 1249.00 | | 2020-02-04 | 1199.00 | | 2020-02-06 | 1319.40 | | 2020-02-07 | 609.00 | +------------+---------+
Notice, however, that our sales report has missing dates: Feb 3rd and Feb 5th. We would like to generate a report including even the dates with no sales.
A recursive CTE can help.
WITH RECURSIVE dates(date) AS ( SELECT '2020-02-01' UNION ALL SELECT date + INTERVAL 1 DAY FROM dates WHERE date < '2020-02-07' ) SELECT dates.date, COALESCE(SUM(price), 0) sales FROM dates LEFT JOIN sales ON dates.date = sales.order_date GROUP BY dates.date; +------------+---------+ | date | sales | +------------+---------+ | 2020-02-01 | 500.49 | | 2020-02-02 | 1249.00 | | 2020-02-03 | 0.00 | | 2020-02-04 | 1199.00 | | 2020-02-05 | 0.00 | | 2020-02-06 | 1319.40 | | 2020-02-07 | 609.00 | +------------+---------+
Hierarchical Data Traversal
Let’s take a look now at some other use cases for recursive CTE: a simple tree for an Org Chart, a more complex tree for family genealogy and a graph for train paths, of the following picture.
A Simple Tree: Org Chart
# create the table CREATE TABLE orgchart( id INT PRIMARY KEY, name VARCHAR(20), role VARCHAR(20), manager_id INT, FOREIGN KEY (manager_id) REFERENCES orgchart(id)); # insert the rows INSERT INTO orgchart VALUES(1,'Matthew','CEO',NULL), (2,'Caroline','CFO',1),(3,'Tom','CTO',1), (4,'Sam','Treasurer',2),(5,'Ann','Controller',2), (6,'Anthony','Dev Director',3),(7,'Lousie','Sys Admin',3), (8,'Travis','Senior DBA',3),(9,'John','Developer',6), (10,'Jennifer','Developer',6),(11,'Maria','Junior DBA',8); # let's see the table, The CEO has no manager, so the manager_id is set to NULL SELECT * FROM orgchat; +----+----------+--------------+------------+ | id | name | role | manager_id | +----+----------+--------------+------------+ | 1 | Matthew | CEO | NULL | | 2 | Caroline | CFO | 1 | | 3 | Tom | CTO | 1 | | 4 | Sam | Treasurer | 2 | | 5 | Ann | Controller | 2 | | 6 | Anthony | Dev Director | 3 | | 7 | Lousie | Sys Admin | 3 | | 8 | Travis | Senior DBA | 3 | | 9 | John | Developer | 6 | | 10 | Jennifer | Developer | 6 | | 11 | Maria | Junior DBA | 8 | +----+----------+--------------+------------+
Let’s run some queries using recursive CTE to traverse this kind of hierarchy.
# find the reporting chain for all the employees mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( SELECT id, name, CAST(name AS CHAR(100)) FROM org_chart WHERE manager_id IS NULL UNION ALL SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) SELECT * FROM reporting_chain; +------+----------+---------------------------------------+ | id | name | path | +------+----------+---------------------------------------+ | 1 | Matthew | Matthew | | 2 | Caroline | Matthew -> Caroline | | 3 | Tom | Matthew -> Tom | | 4 | Sam | Matthew -> Caroline -> Sam | | 5 | Ann | Matthew -> Caroline -> Ann | | 6 | Anthony | Matthew -> Tom -> Anthony | | 7 | Lousie | Matthew -> Tom -> Lousie | | 8 | Travis | Matthew -> Tom -> Travis | | 9 | John | Matthew -> Tom -> Anthony -> John | | 10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer | | 11 | Maria | Matthew -> Tom -> Travis -> Maria | +------+----------+---------------------------------------+
Please note the usage of the CAST function on the “seed” member of the CTE. This was done on purpose. Let’s look what happens in case you don’t use the CAST function:
mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( SELECT id, name, name FROM org_chart WHERE manager_id IS NULL UNION ALL SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) SELECT * FROM reporting_chain; ERROR 1406 (22001): Data too long for column 'path' at row 1
Why an error? The query is, in theory, correct, but the problem is that the type of column path is determined from the non-recursive SELECT only, and so it is CHAR(7) (Matthew length). On the recursive part of the CTE it would cause a character truncation, so: error!
Let’s look at a query to traverse the tree and calculate the level of the employees in the Org Chart.
mysql> WITH RECURSIVE reporting_chain(id, name, path, level) AS ( SELECT id, name, CAST(name AS CHAR(100)), 1 FROM org_chart WHERE manager_id IS NULL UNION ALL SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name), rc.level+1 FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) SELECT * FROM reporting_chain ORDER BY level; +------+----------+---------------------------------------+-------+ | id | name | path | level | +------+----------+---------------------------------------+-------+ | 1 | Matthew | Matthew | 1 | | 2 | Caroline | Matthew -> Caroline | 2 | | 3 | Tom | Matthew -> Tom | 2 | | 4 | Sam | Matthew -> Caroline -> Sam | 3 | | 5 | Ann | Matthew -> Caroline -> Ann | 3 | | 6 | Anthony | Matthew -> Tom -> Anthony | 3 | | 7 | Lousie | Matthew -> Tom -> Lousie | 3 | | 8 | Travis | Matthew -> Tom -> Travis | 3 | | 9 | John | Matthew -> Tom -> Anthony -> John | 4 | | 10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer | 4 | | 11 | Maria | Matthew -> Tom -> Travis -> Maria | 4 | +------+----------+---------------------------------------+-------+
A More Complex Tree: Genealogy
Creating a table to represent the following genealogy with grandparents, parents, and sons.
CREATE TABLE genealogy( id INT PRIMARY KEY, name VARCHAR(20), father_id INT, mother_id INT, FOREIGN KEY(father_id) REFERENCES genealogy(id), FOREIGN KEY(mother_id) REFERENCES genealogy(id)); # populate the table INSERT INTO genealogy VALUES(1,'Maria',NULL,NULL), (2,'Tom',NULL,NULL),(3,'Robert',NULL,NULL), (4,'Claire',NULL,NULL),(5,'John',2,1), (6,'Jennifer',2,1),(7,'Sam',3,4), (8,'James',7,6); SELECT * FROM genealogy; +----+----------+-----------+-----------+ | id | name | father_id | mother_id | +----+----------+-----------+-----------+ | 1 | Maria | NULL | NULL | | 2 | Tom | NULL | NULL | | 3 | Robert | NULL | NULL | | 4 | Claire | NULL | NULL | | 5 | John | 2 | 1 | | 6 | Jennifer | 2 | 1 | | 7 | Sam | 3 | 4 | | 8 | James | 7 | 6 | +----+----------+-----------+-----------+
Let’s find all of James’s ancestors and the relationship:
mysql> WITH RECURSIVE ancestors AS ( SELECT *, CAST('son' AS CHAR(20)) AS relationship, 0 level FROM genealogy WHERE name='James' UNION ALL SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' WHEN g.id=a.mother_id AND level=0 THEN 'mother' WHEN g.id=a.father_id AND level=1 THEN 'grandfather' WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' END, level+1 FROM genealogy g, ancestors a WHERE g.id=a.father_id OR g.id=a.mother_id) SELECT * FROM ancestors; +------+----------+-----------+-----------+--------------+-------+ | id | name | father_id | mother_id | relationship | level | +------+----------+-----------+-----------+--------------+-------+ | 8 | James | 7 | 6 | son | 0 | | 6 | Jennifer | 2 | 1 | mother | 1 | | 7 | Sam | 3 | 4 | father | 1 | | 1 | Maria | NULL | NULL | grandmother | 2 | | 2 | Tom | NULL | NULL | grandfather | 2 | | 3 | Robert | NULL | NULL | grandfather | 2 | | 4 | Claire | NULL | NULL | grandmother | 2 | +------+----------+-----------+-----------+--------------+-------+
Using the same query but changing the initial condition we can find out the ancestors of anyone in the hierarchy, for example, Jennifer:
mysql> WITH RECURSIVE ancestors AS ( SELECT *, CAST('daughter' AS CHAR(20)) AS relationship, 0 level FROM genealogy WHERE name='Jennifer' UNION ALL SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' WHEN g.id=a.mother_id AND level=0 THEN 'mother' WHEN g.id=a.father_id AND level=1 THEN 'grandfather' WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' END, level+1 FROM genealogy g, ancestors a WHERE g.id=a.father_id OR g.id=a.mother_id) SELECT * FROM ancestors; +------+----------+-----------+-----------+--------------+-------+ | id | name | father_id | mother_id | relationship | level | +------+----------+-----------+-----------+--------------+-------+ | 6 | Jennifer | 2 | 1 | daughter | 0 | | 1 | Maria | NULL | NULL | mother | 1 | | 2 | Tom | NULL | NULL | father | 1 | +------+----------+-----------+-----------+--------------+-------+
A Graph: Train Routes
Let’s create a graph representing train routes in Italy for the more important cities, from the image below:
Be aware of uni-directional and bi-directional connections. Each connection also has a distance in km.
CREATE TABLE train_route( id INT PRIMARY KEY, origin VARCHAR(20), destination VARCHAR(20), distance INT); # populate the table INSERT INTO train_route VALUES(1,'MILAN','TURIN',150), (2,'TURIN','MILAN',150),(3,'MILAN','VENICE',250), (4,'VENICE','MILAN',250),(5,'MILAN','GENOA',200), (6,'MILAN','ROME',600),(7,'ROME','MILAN',600), (8,'MILAN','FLORENCE',380),(9,'TURIN','GENOA',160), (10,'GENOA','TURIN',160),(11,'FLORENCE','VENICE',550), (12,'FLORENCE','ROME',220),(13,'ROME','FLORENCE',220), (14,'GENOA','ROME',500),(15,'ROME','NAPLES',210), (16,'NAPLES','VENICE',800); SELECT * FROM train_route; +----+----------+-------------+----------+ | id | origin | destination | distance | +----+----------+-------------+----------+ | 1 | MILAN | TURIN | 150 | | 2 | TURIN | MILAN | 150 | | 3 | MILAN | VENICE | 250 | | 4 | VENICE | MILAN | 250 | | 5 | MILAN | GENOA | 200 | | 6 | MILAN | ROME | 600 | | 7 | ROME | MILAN | 600 | | 8 | MILAN | FLORENCE | 380 | | 9 | TURIN | GENOA | 160 | | 10 | GENOA | TURIN | 160 | | 11 | FLORENCE | VENICE | 550 | | 12 | FLORENCE | ROME | 220 | | 13 | ROME | FLORENCE | 220 | | 14 | GENOA | ROME | 500 | | 15 | ROME | NAPLES | 210 | | 16 | NAPLES | VENICE | 800 | +----+----------+-------------+----------+
Returning all the train destinations with Milan as the origin:
mysql> WITH RECURSIVE train_destination AS ( SELECT origin AS dest FROM train_route WHERE origin='MILAN' UNION SELECT tr.destination FROM train_route tr JOIN train_destination td ON td.dest=tr.origin) SELECT * from train_destination; +----------+ | dest | +----------+ | MILAN | | TURIN | | VENICE | | GENOA | | ROME | | FLORENCE | | NAPLES | +----------+
Basically starting from any city, you can go wherever you want in Italy, but there are different paths. So let’s run a query to find out all the possible paths, and the total length of each, starting from Milan and Naples.
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='MILAN' UNION SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) SELECT * FROM paths; +-------------------------------------------------------+----------+--------------+ | cur_path | cur_dest | tot_distance | +-------------------------------------------------------+----------+--------------+ | MILAN | MILAN | 0 | | MILAN -> TURIN | TURIN | 150 | | MILAN -> VENICE | VENICE | 250 | | MILAN -> GENOA | GENOA | 200 | | MILAN -> ROME | ROME | 600 | | MILAN -> FLORENCE | FLORENCE | 380 | | MILAN -> TURIN -> GENOA | GENOA | 310 | | MILAN -> GENOA -> TURIN | TURIN | 360 | | MILAN -> GENOA -> ROME | ROME | 700 | | MILAN -> ROME -> FLORENCE | FLORENCE | 820 | | MILAN -> ROME -> NAPLES | NAPLES | 810 | | MILAN -> FLORENCE -> VENICE | VENICE | 930 | | MILAN -> FLORENCE -> ROME | ROME | 600 | | MILAN -> TURIN -> GENOA -> ROME | ROME | 810 | | MILAN -> GENOA -> ROME -> FLORENCE | FLORENCE | 920 | | MILAN -> GENOA -> ROME -> NAPLES | NAPLES | 910 | | MILAN -> ROME -> FLORENCE -> VENICE | VENICE | 1370 | | MILAN -> ROME -> NAPLES -> VENICE | VENICE | 1610 | | MILAN -> FLORENCE -> ROME -> NAPLES | NAPLES | 810 | | MILAN -> TURIN -> GENOA -> ROME -> FLORENCE | FLORENCE | 1030 | | MILAN -> TURIN -> GENOA -> ROME -> NAPLES | NAPLES | 1020 | | MILAN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE | 1470 | | MILAN -> GENOA -> ROME -> NAPLES -> VENICE | VENICE | 1710 | | MILAN -> FLORENCE -> ROME -> NAPLES -> VENICE | VENICE | 1610 | | MILAN -> TURIN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE | 1580 | | MILAN -> TURIN -> GENOA -> ROME -> NAPLES -> VENICE | VENICE | 1820 | +-------------------------------------------------------+----------+--------------+ mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='NAPLES' UNION SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) SELECT * FROM paths; +-----------------------------------------------------------------+----------+--------------+ | cur_path | cur_dest | tot_distance | +-----------------------------------------------------------------+----------+--------------+ | NAPLES | NAPLES | 0 | | NAPLES -> VENICE | VENICE | 800 | | NAPLES -> VENICE -> MILAN | MILAN | 1050 | | NAPLES -> VENICE -> MILAN -> TURIN | TURIN | 1200 | | NAPLES -> VENICE -> MILAN -> GENOA | GENOA | 1250 | | NAPLES -> VENICE -> MILAN -> ROME | ROME | 1650 | | NAPLES -> VENICE -> MILAN -> FLORENCE | FLORENCE | 1430 | | NAPLES -> VENICE -> MILAN -> TURIN -> GENOA | GENOA | 1360 | | NAPLES -> VENICE -> MILAN -> GENOA -> TURIN | TURIN | 1410 | | NAPLES -> VENICE -> MILAN -> GENOA -> ROME | ROME | 1750 | | NAPLES -> VENICE -> MILAN -> ROME -> FLORENCE | FLORENCE | 1870 | | NAPLES -> VENICE -> MILAN -> FLORENCE -> ROME | ROME | 1650 | | NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME | ROME | 1860 | | NAPLES -> VENICE -> MILAN -> GENOA -> ROME -> FLORENCE | FLORENCE | 1970 | | NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME -> FLORENCE | FLORENCE | 2080 | +-----------------------------------------------------------------+----------+--------------+
It’s quite easy now to find out which is the shortest path from one origin to any final destination. You just need to filter and order the main query. Here are some examples:
# shortest path from MILAN to NAPLES mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='MILAN' UNION SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) SELECT * FROM paths WHERE cur_dest='NAPLES' ORDER BY tot_distance ASC LIMIT 1 +-------------------------+----------+--------------+ | cur_path | cur_dest | tot_distance | +-------------------------+----------+--------------+ | MILAN -> ROME -> NAPLES | NAPLES | 810 | +-------------------------+----------+--------------+ # shortest path from VENICE to GENOA mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE' UNION SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) SELECT * FROM paths WHERE cur_dest='GENOA' ORDER BY tot_distance ASC LIMIT 1; +--------------------------+----------+--------------+ | cur_path | cur_dest | tot_distance | +--------------------------+----------+--------------+ | VENICE -> MILAN -> GENOA | GENOA | 450 | +--------------------------+----------+--------------+ # shortest path from VENICE to NAPLES mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE' UNION SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) SELECT * FROM paths WHERE cur_dest='NAPLES' ORDER BY tot_distance ASC LIMIT 1; +-----------------------------------+----------+--------------+ | cur_path | cur_dest | tot_distance | +-----------------------------------+----------+--------------+ | VENICE -> MILAN -> ROME -> NAPLES | NAPLES | 1060 | +-----------------------------------+----------+--------------+
Limitations
Apart from the limitations we have already seen for limiting the execution time and the number of iterations, there are other built-in limitations you should be aware of.
The recursive SELECT must not contain the following constructs:
- An aggregate function such as SUM()
- GROUP BY
- ORDER BY
- DISTINCT
- Window functions
These limitations are not valid for non-recursive CTE. Also, the recursive SELECT part must reference the CTE only once and only in its FROM clause, not in any subquery.
Conclusion
Recursive common table expression is a new interesting feature to implement queries for your applications using MySQL 8.0. Recursion was already possible in the past by creating stored routines but now it’s simpler. Furthermore, you don’t need special and additional grants to create a recursive query.
Generally, recursive CTE is quite simple, but compared to non-recursive CTE, it is a little more complicated. Recursive CTE is more tricky because of recursion, obviously. It’s not a matter of syntax, of course, it’s only a matter of “thinking recursively”.