Someone will probably tell me that this is an elementary-level topic, but I got some good questions regarding joins from my most recent ZCE class students, so I thought I'd put down the examples that I used to explain this to them. Being able to join with confidence is a key skill, because it means that you can refactor and normalise your data, without worrying about how hard something will be to retrieve.
The database structure I'm using here looks like this (the same example that I used when I wrote the Databases chapter for PHP Master):
If you want to grab the actual data I'm using for my examples, then here's the zipped mysqldump file: recipes-db.sql.
The database includes a "many-to-many" relationship; each recipe is linked to the ingredients needed for this particular dish, while each ingredient can of course appear in many dishes. To represent this, there is a recipes
table, an ingredients
table, and a recipe_ingredients
table to show the combinations which apply. To join twice and understand whether you're getting all the rows or just the ones with matches in all the tables can be confusing, so this post is all about showing you some examples.
Fetching Matching Records From All the Tables
Let's say we want to see which ingredients are in which recipe. We'll do this in a number of steps, which I will show below, along with the query and result that goes with that step.
Get all the recipes
mysql> SELECT id, name FROM recipes; +----+-------------------+ | id | name | +----+-------------------+ | 1 | Apple Crumble | | 4 | Beans Chili | | 5 | Chicken Casserole | | 2 | Fruit Salad | | 3 | Weekday Risotto | +----+-------------------+ 5 ROWS IN SET (0.00 sec)
OK, so we have some recipes, looking at the relationship diagram, we can see we need to link across to recipe_ingredients using the ID of each recipe.
All recipes, linked with recipe_ingredients
mysql> SELECT r.id AS recipe_id, r.name, ri.ingredient_id FROM recipes r JOIN recipe_ingredients ri ON (r.id = ri.recipe_id); +-----------+---------------+---------------+ | recipe_id | name | ingredient_id | +-----------+---------------+---------------+ | 1 | Apple Crumble | 1 | | 1 | Apple Crumble | 5 | | 1 | Apple Crumble | 7 | | 1 | Apple Crumble | 8 | | 2 | Fruit Salad | 6 | | 2 | Fruit Salad | 2 | | 2 | Fruit Salad | 1 | | 2 | Fruit Salad | 3 | | 2 | Fruit Salad | 4 | +-----------+---------------+---------------+ 9 ROWS IN SET (0.00 sec)
There are a couple of different things going on here. Firstly, I have aliased the table names to r
and ri
respectively. This is so that they are less in the way of us reading the important bits! Since there are multiple tables in the query now, it is important to "qualify" the table names - to say which table this column belongs to so that if there is more than one id
column, it's obvious which one you meant.
Secondly, there is now a join between two tables. The query doesn't specify what kind of join it is, but the default join type is INNER JOIN
. You might notice that not all of the recipes appear in this result set - only the ones where there are matching records in the recipes_ingredients
table will match this query. To get all of the records, we would use an outer join - more on that later.
All the recipes, their ingredients, and the actual display information for those ingredients
To get more than the ingredient_id
, we need to join on the ingredients
table to get the details. This is a very common type of join, we'll use it when we're linking to pretty much any kind of detail record by its ID. Once again, it's an inner join, and it looks like this:
mysql> SELECT r.id AS recipe_id, r.name, ri.ingredient_id, i.item FROM recipes r JOIN recipe_ingredients ri ON (r.id = ri.recipe_id) JOIN ingredients i ON (ri.ingredient_id = i.id); +-----------+---------------+---------------+--------------+ | recipe_id | name | ingredient_id | item | +-----------+---------------+---------------+--------------+ | 1 | Apple Crumble | 1 | apple | | 1 | Apple Crumble | 5 | flour | | 1 | Apple Crumble | 7 | butter | | 1 | Apple Crumble | 8 | sugar | | 2 | Fruit Salad | 6 | fruit juice | | 2 | Fruit Salad | 2 | banana | | 2 | Fruit Salad | 1 | apple | | 2 | Fruit Salad | 3 | kiwi fruit | | 2 | Fruit Salad | 4 | strawberries | +-----------+---------------+---------------+--------------+ 9 ROWS IN SET (0.00 sec)
And there you have it - a many-to-many relationship join across three tables. Now you have this result, you can start adding whatever "where" clauses you need to get exactly the information you're after. This dataset includes chef and category, so you could search by all pudding recipes for example (my favourite!)
All the Records, Even Those Without Matches (Outer Join)
How about those "lost" recipes, the ones without details of their ingredients? You might want to include them in your results even if the chef didn't include ingredient details yet. To do this, we'll use an outer join.
The outer join brings in all the records, plus any matching ones. So in our example, we'll see all recipes, plus any ingredients records that match. If there are no ingredients that match, then we'll still get those records, but with a NULL in the ingredients fields.
Outer joins come in two flavours: LEFT
and RIGHT
. The way that I think of this is that the first table is the one on the left; if you want all the records from that first table, plus any matches from the second, you'll choose LEFT
. If you want all the records from the second table, plus information from the first one if there is any, you'll choose RIGHT
. Take a look at the diagram at the top of this post, and you'll see that we have the recipes table on the left of the diagram, and first in our query, so we'll be using a LEFT
join here. (as a complete aside, I almost always have left joins, because I build up my queries that way around: getting the records I want, then the additional information to go with them, and so on. Maybe that's just the way my brain works?). Here's our recipes and ingredients again, but including the recipes without ingredients:
mysql> SELECT r.id AS recipe_id, r.name, ri.ingredient_id, i.item FROM recipes r LEFT JOIN recipe_ingredients ri ON (r.id = ri.recipe_id) LEFT JOIN ingredients i ON (ri.ingredient_id = i.id); +-----------+-------------------+---------------+--------------+ | recipe_id | name | ingredient_id | item | +-----------+-------------------+---------------+--------------+ | 1 | Apple Crumble | 1 | apple | | 1 | Apple Crumble | 5 | flour | | 1 | Apple Crumble | 7 | butter | | 1 | Apple Crumble | 8 | sugar | | 4 | Beans Chili | NULL | NULL | | 5 | Chicken Casserole | NULL | NULL | | 2 | Fruit Salad | 6 | fruit juice | | 2 | Fruit Salad | 2 | banana | | 2 | Fruit Salad | 1 | apple | | 2 | Fruit Salad | 3 | kiwi fruit | | 2 | Fruit Salad | 4 | strawberries | | 3 | Weekday Risotto | NULL | NULL | +-----------+-------------------+---------------+--------------+ 12 ROWS IN SET (0.00 sec)
Since we're working on a many-to-many relationship here, we're joining three tables in total. This means two joins: one between the first table and the linking table, and another from the result of that query to the other end of the relationship. In this example, this means we had two left joins. Each join operates between two data sets, which are the resultset you have so far, and the new table that is being joined. Since we still want to have records from the recipe/recipe_ingredients join even if there isn't a matching record in ingredients, then the second join here also needs to be an outer join.
Joining All Tables
My initial diagram showed five tables in all, so here's the query and result for pulling all that data together:
mysql> SELECT r.name AS recipe, c.name AS category, ch.name AS chef, i.item AS ingredient FROM recipes r LEFT JOIN recipe_ingredients ri ON (r.id = ri.recipe_id) LEFT JOIN ingredients i ON (ri.ingredient_id = i.id) INNER JOIN chefs ch ON (r.chef_id = ch.id) INNER JOIN categories c ON (r.category_id = c.id); +-------------------+----------+-------+--------------+ | recipe | category | chef | ingredient | +-------------------+----------+-------+--------------+ | Weekday Risotto | Main | Lorna | NULL | | Beans Chili | Main | Lorna | NULL | | Chicken Casserole | Main | Lorna | NULL | | Apple Crumble | Pudding | Lorna | apple | | Apple Crumble | Pudding | Lorna | flour | | Apple Crumble | Pudding | Lorna | butter | | Apple Crumble | Pudding | Lorna | sugar | | Fruit Salad | Pudding | Lorna | fruit juice | | Fruit Salad | Pudding | Lorna | banana | | Fruit Salad | Pudding | Lorna | apple | | Fruit Salad | Pudding | Lorna | kiwi fruit | | Fruit Salad | Pudding | Lorna | strawberries | +-------------------+----------+-------+--------------+ 12 ROWS IN SET (0.00 sec)
Once you have this dataset, you can filter it down any way you want to. Since the chef and category elements are required fields, I used inner joins for those; left joins would have given the same results though on this dataset.
If you were looking for examples of inner and outer joins across a real (if simple!) database with multiple joins, I hope this helped. All this talking about food is making me hungry, I'm off to the kitchen :)
Lorna is an independent web development consultant, writer and trainer, open source project lead and community evangelist. This post was originally published at LornaJane
PlanetMySQL Voting: Vote UP / Vote DOWN