AxiomQ

Ruby on Rails gives us, out of the box, a few very useful query methods that can reduce number of database queries. In most cases that will significantly improve application performance.

This post will try to explain and compare some of the most commonly used methods like joins and includes, and to explain few different techniques how we can resolve N+1 query “problem”.

For demonstration purpose I created an application with two database tables teams and players, with one-to-many association between them.

class Player < ActiveRecord::Base
  belongs_to :team
end

class Team < ActiveRecord::Base
  has_many :players
end

Also, we will need some database records for teams and players.

red_star = Team.create(name: "Red Star")
partizan = Team.create(name: "Partizan")

Player.create(full_name: "Ognjen Dobrić", team: red_star)
Player.create(full_name: "Nemanja Dangubić", team: red_star)
Player.create(full_name: "Novica Veličković", team: partizan)
Player.create(full_name: "Vanja Marinković", team: partizan)

Our goal will be to fetch all the players together with the team name of each player, so let’s play a little bit in the Rails console. We will start with lazy loading technique which the ActiveRecord query interface uses by default.

Lazy loading

Philosophy of lazy loading design pattern is to delay initalization of an object until the time when it is really needed.

> Player.all.map { |player| [player.full_name, player.team.name] }

This query will result with this array,

[
  [ "Ognjen Dobrić",     "Red Star" ],
  [ "Nemanja Dangubić",  "Red Star" ],
  [ "Novica Veličković", "Partizan" ],
  [ "Vanja Marinković",  "Partizan" ]
]

and this logs.

Player Load (0.4ms)  SELECT "players".* FROM "players"
Team Load (0.2ms)  SELECT  "teams".* FROM "teams" WHERE "teams"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
Team Load (0.1ms)  SELECT  "teams".* FROM "teams" WHERE "teams"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
Team Load (0.1ms)  SELECT  "teams".* FROM "teams" WHERE "teams"."id" = ? LIMIT ?  [["id", 2], ["LIMIT", 1]]
Team Load (0.1ms)  SELECT  "teams".* FROM "teams" WHERE "teams"."id" = ? LIMIT ?  [["id", 2], ["LIMIT", 1]]

Our result is just what we wanted, but the number of queries in our logs indicates that we have N+1 situation, which may be a problem is case when we have a lot of records.

N+1 means that we have executed one query for the players and then one query for each player’s team (in our case N is 4)

Let’s eager load teams and reduce the number of queries.

Eager loading

Eager loading technique allow as to preload all “players” associated data (“teams” in our case).

> Player.
    includes(:team).
    map { |player| [player.full_name, player.team.name] }

Our result is still the same, but the number of queries is now only two, which is great.

Player Load (0.4ms)  SELECT "players".* FROM "players"
Team Load (0.8ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" IN (1, 2)

The number of queries can be reduced even more, to just one, and includes() method is smart enough to figure out when will be the good situation to run two queries and when a single query will be the most suitable option.

In short, includes() combines behaviours of two Rails methods, preload() when we need two separate queries and eager_load() in cases when it is best to run a single query.

Let’s examine the next example. Suppose that we want to fetch only the Red Star players.

> Player.
    includes(:team).
    where(teams: { name: "Red Star" }).
    map { |player| [player.full_name, player.team.name] }

Which results with the new array.

[
  [ "Ognjen Dobrić",     "Red Star" ],
  [ "Nemanja Dangubić",  "Red Star" ]
]

As we can see the includes() method allows as to filter players by the team name and our logs show that we triggered only one query to accomplish that.

SQL (2.4ms)  SELECT "players"."id" AS t0_r0, "players"."full_name" AS t0_r1, "players"."created_at" AS t0_r2, "players"."updated_at" AS t0_r3, "players"."team_id" AS t0_r4, "teams"."id" AS t1_r0, "teams"."name" AS t1_r1, "teams"."created_at" AS t1_r2, "teams"."updated_at" AS t1_r3
FROM "players"
LEFT OUTER JOIN "teams" ON "teams"."id" = "players"."team_id"
WHERE "teams"."name" = ?  [["name", "Red Star"]]

In most cases eager loading will do the trick, but it is very obvious that we just loaded all columns from teams table but we only needed the team name.

Let’s try with joins

joins() is a Rails prefered way when we need to specify conditions on associations, so let’s test it.

> Player.
    joins(:team).
    where(teams: { name: "Red Star" }).
    map { |player| [player.full_name, player.team.name] }

This query will produce the following logs:

Player Load (0.2ms)  SELECT "players".*
FROM "players"
INNER JOIN "teams" ON "teams"."id" = "players"."team_id"
WHERE "teams"."name" = ?  [["name", "Red Star"]]

Team Load (0.1ms)  SELECT  "teams".* FROM "teams" WHERE "teams"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
Team Load (0.1ms)  SELECT  "teams".* FROM "teams" WHERE "teams"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]

The result is still the same, and a good thing is that we don’t load unnecessary data into memory any more. But as we can see from the logs, N+1 situation is back again.

In order to eliminate unnecessary queries, we will need to use select() method together with joins().

joins combined with select method

> Player.
    joins(:team).
    where(teams: { name: "Red Star" }).
    select("players.*, teams.name as team_name").
    map { |player| [player.full_name, player.team_name] }

Our result is still ok and our logs show that we needed only one query to achieve that.

Player Load (0.5ms)  SELECT players.*, teams.name as team_name
FROM "players"
INNER JOIN "teams" ON "teams"."id" = "players"."team_id"
WHERE "teams"."name" = ?  [["name", "Red Star"]]

You can notice, that I changed player.team.name to player.team_name in map() block after I added select() method, that is because team_name is now a Player model instance attribute.

For this situation it is very useful if we add team_name instance method to Player class.

class Player < ActiveRecord::Base
  belongs_to :team

  def team_name
    read_attribute("team_name") || team.name
  end
end

What we will use depends on case by case. My personal guidelines are: “Start with eager loading when you need to use an associated data and go with joins when you only need to query an associated data.”

Comments

comments powered by Disqus