Many of our users, developers and database administrators, keep asking our team about EverSQL's indexing recommendations algorithm. So, we decided to write about it.
The first option is to use EverSQL to automatically find indexes that are best for your database.
The second option is to read our detailed tutorial below and learn more about indexing best practices.
This tutorial won't detail all the internals of the algorithm, but rather try to lay down the basic and important aspects of indexing, in simple terms.
Also, and most importantly, we'll present practical examples for properly indexing your tables and queries by relying on a set of rules, rather than on guessing.
Our focus in this tutorial is on MySQL, MariaDB and Percona Server databases. This information may be relevant for other database vendors as well, but in some cases may not.
Which indexes should I create for an SQL query?
As a general rule of thumb, MySQL can only use one index for each table in the query. Therefore, there is no point in creating more than one index for each query. Preferably, same indexes should match as many of the queries as possible, as it will reduce the load on the database when inserting or updating data (which requires updating the indexes as well).
When creating an index, the most important parts are the equality conditions in the WHERE and JOIN conditions. In most cases, conditions such as name = 'John' will allow the database to filter most of the rows from the table and go through a small amount of rows to return the required results. Therefore, we should start indexing by adding these columns to the index.
Then, you should look into the range conditions, but you should only add one of them - the most selective one, as MySQL can't handle more of them. In some cases when there are no range conditions, it makes sense to add the GROUP BY / ORDER BY columns, assuming the ordering is done in only one direction (ASC / DESC).
In some cases, it also makes sense to create a separate index that contains the ORDER BY clause's columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC). This doesn't guarantee that the database's optimizer will pick this index rather than the other compound indexes, but it's worth a try.
Also, in some cases, it makes sense to also add the columns from the SELECT clause to the index, to have a complete covering index. This only relevant if the index isn't already 'too large'. What's too large? Well, no official rule of thumb here, but let's say... 5-7 columns? Creating a covering index allows the database to not only filter using the index, but to also fetch the information required by the SELECT clause directly from the index, which saves precious I/O operations.
Let's look at an example to clarify:
SELECT id, first_name, last_name, age FROM employees WHERE first_name = 'John' AND last_name = 'Brack' AND age > 25 ORDER BY age ASC;
For this query, we'll start with adding the columns first_name and last_name, which are compared with an equality operator. Then, we'll add the age column which is compared with a range condition. No need to have the ORDER BY clause indexed here, as the age column is already in the index. Last but not least, we'll add id from the SELECT clause to the index to have a covering index.
So to index this query properly, you should add the index:
employees (first_name, last_name, age, id).
The above is a very simplified pseudo-algorithm that will allow you to build simple indexes for rather simple SQL queries.
If you're looking for a way to automate your index creation, while also adding the benefit of a proprietary indexing algorithm and query optimization recommendations, you can try out EverSQL Index Advisor which does all the heavy lifting for you.
What not to do when indexing (or writing SQL queries)?
We gathered some of the most common mistakes we see programmers and database administrators do when writing queries and indexing their tables.
Indexing each and every column in the table separately
In most cases, MySQL won't be able to use more than one index for each table in the query.
Therefore, when creating a separate index for each column in the table, the database is bound to perform only one of the search operations using an index, and the rest of them will be significantly slower, as the database can't use an index to execute them.
We recommend using compound indexes (explained later in this article) rather than single-column indexes.
The OR operator in filtering conditions
Consider this query:
SELECT a, b FROM tbl WHERE a = 3 OR b = 8;
In many cases, MySQL won't be able to use an index to apply an OR condition, and as a result, this query is not index-able.
Therefore, we recommend to avoid such OR conditions and consider splitting the query to two parts, combined with a UNION DISTINCT (or even better, UNION ALL, in case you know there won't be any duplicate results)
The order of columns in an index is important
Let's say I hand you my contacts phone book which is ordered by the contact's first name and ask you to count how many people are there named "John" in the book. You'll grab the book in both hands and say "no problem". You will navigate to the page that holds all names starting with John, and start counting from there.
Now, let's say I change the assignment and hand you a phone book that is ordered by the contact's last name, but ask you to still count all contacts with the first name "John". How would you approach that? Well, the database scratches his head in this situation as well.
Now lets look at an SQL query to demonstrate the same behavior with the MySQL optimizer:
SELECT first_name, last_name FROM contacts WHERE first_name = 'John';
Having the index contacts (first_name, last_name) is ideal here, because the index starts with our filtering condition and ends with another column in the SELECT clause.
But, having the reverse index contacts (last_name, first_name) is rather useless, as the database can't use the index for filtering, as the column we need is second in the index and not first.
The conclusion from this example is that the order of columns in an index is rather important.
Adding redundant indexes
Indexes are magnificent when trying to optimize your SQL queries and they can improve performance significantly.
But, they come with a downside as well. Each index you're creating should be kept updated and in sync when changes occur in your databases. So for each INSERT / UPDATE / DELETE in your databases, all relevant indexes should be updated. This update can take sometime, especially with large tables / indexes.
Therefore, do not create indexes unless you know you'll need them.
Also, we highly recommend to analyze your database once in a while, searching for any redundant indexes that can be removed.
How to automate index creation and SQL query optimization?
If you're looking for a way to automate your index creation, while also adding the benefit of a proprietary indexing algorithm and query optimization recommendations, you can try out EverSQL FREE Index Advisor which does all the heavy lifting for you.