Symmetric indices to make JOINs faster
I am frequently asked how to increase the performance of Rails, and here's a great starting point. This advice generalizes to just about any database or platform that relies on B-Tree indices. If your using MySQL out of the box, then this definitely applies.
Consider the following three models which are a very basic "has and belongs to many" setup:
So as you can see, a user can be in many groups and a group can have many users, all by way of the memberships join relationship.
The two example use cases I'm going to work with are:
Now let's repeat the previous queries.
This is where I see most people stop when it comes to performance optimization. Note though that these tables aren't the same! If you join from users to groups (the first query) you see a massive speedup. Only one row is consulted (instead of 100) and it's in the index. A further benefit we see in both queries is the "Using index" in the Extra column. This means that MySQL can determine the query result without ever checking the actual table because all required info is in the index (ie no extra disk hits). Unfortunately, joining from groups to users (second query) still (sorta) sucks. It says index instead of ALL, but that just means it will have to scan the entire index rather than scan the entire table on disk. This is a marginal improvement at best, so 50% of our use cases still suck.
Here's the explanation: B-Tree indices are unidimensional structures. That means that the interior nodes of the index tree are strongly ordered, and thus cannot be arbitrarily accessed out of order. If that doesn't make any sense, it means that joining from users to groups is not an equivalent operation to joining groups to users because of the ordering of the index elements.
So let's cleanup the second use case by adding an additional index, exactly like the first, except the order of the elements is reversed. Here's the SQL:
Now we have symmetric indices. Let's run our queries again with our second index in place:
Voila! Now it doesn't matter which way we join the tables because we have an index that is correctly ordered based on the directionality of the join. You can even see that the optimizer selects a different index (key) depending on which direction you join the tables, exactly as expected. Also, both queries now only require consulting the exact number of rows necessary and won't involve any further disk hits as both queries can be satisfied with data available entirely within the index.
- Given a user, what groups is he/she in?, and
- Given a group, who are the members?
ALTER TABLE memberships ADD INDEX test_index(user_id,group_id)
ALTER TABLE memberships ADD INDEX test_index1(group_id,user_id)