Optimizing an App, Part 1
Jumping right into it
NB: This is a two part series on small little optimizations that had huge impacts on a client's app.
A while ago, a colleague/mentor of mine was telling me about the power of .to_sql
, which, when used in console, outputs the SQL query to the ActiveRecord query you wrote. It's a super handy tool.
At the time, I didn't think too much about it except that it was something nifty to keep in my toolbox.
And now, last month. A client of mine had asked me to help troubleshoot some issues with his staging app. The app only had 2 active users, but memory usage was inexplicably high. As in, the app kept running at around 70-120% of the 512MB allotted RAM on Heroku's hobby tier.
Why not get a bigger dyno?
Remember, the app only had 2 active users. For something like that, there really isn't a reason for the app to be maxing out on memory usage. Luckily Heroku has a metrics tab that allows you to see the state of the dyno for the past 24 hours. At first I thought it was a question of leaky memory, but the data didn't hold true. It would start at around 256MB, which is normal and great, but almost immediately would shoot up towards max capacity, then every hour it'd cause the dyno to restart due to OOM (out of memory) errors. The client was baffled. So was I.
NB: a leaky app would have had a slow creep of memory usage over a longer period of time, something that'd look like a y = ax + b kind of graph. The current app was looking more like a y = ax^3 + b kind of graph. Exponential growth
As the app was for staging, neither he nor I wanted to bump up the dyno to account for the RAM, even though that would immediately solve the problem. I spoke with his developer, and he walked me through the app's functionality and showed me their logs.
Almost immediately I noticed a few things. One, he had his production config set to :debug
, which, admittedly is the default, but it makes for extremely noisy output. Two, even with all that excess noise, the app was still outputting so much stuff.
We dove deeper.
Hello, Ruby
I asked the dev to run the app locally so I could scrutinize the output of the logs while pouring over the code line by line. Literally, line by line. I haven't had to visually parse through code like this since learning Java in Intro to CS 102.
Something stood out. There was code that said something to the effect of:
self.post.items[-1].id
It almost slipped by me, except the [-1]
part stood out, I was a bit baffled as I'd never seen code like this in Rails before, even though it's a common thing for Arrays in Ruby.
Let's break this code down. The self
refers to an ActiveRecord instance, e.g. User.first
. In this particular instance, a User has a 1:1 relationship with Post. This Post instance object has a 1:many relationship with items. In case you missed this lesson when learning about arrays in Ruby, the [-1]
is to return the very last item in the array. This kind of call is essentially saying in simple English: "I need all the Items of this Post, and then I want the last one of that list, and I want its id." Because of the 1:many relationship Post has with Items, the SQL query is a where
lookup, which returns an array of ActiveRecord objects.
One thing that I've learned is that Arrays are great if there's only a small number of them, but if there are many, it's super sluggish. It's really not ideal when you're talking about a ton of records. Not only that but when you do post.items
, the returned collection of AR objects gets shoved into Ruby memory. Imagine shoving 100,000 records into memory.... No bueno. Essentially, the code would return all the items as an array, put it into Ruby memory, then from there get the last object's id.
Repeat that kind of query a bunch of times, and I hope you see where this is going.
To top it off, the developer only wanted the last item's ID. We didn't need anything else. Technically we didn't want all the items belonging to the post, nor did we even want the last item's full AR object. In simple English, it was "Return the id of the last item that belongs to Post." So why make Rails do this? To me, this screams "I can be a SQL query!" Asking your database to do SQL queries is far more efficient than asking ActiveRecord/Rails to translate to SQL and return a bunch of objects and then get other data.
Now we had something we could optimize.
SQL power to the rescue
We wanted the last item's ID based on the current post. We wanted a SQL query that looked something like: (in pseudo code)
select max id from items where user.post is a number (e.g. 1)
And SQL-fied:
SELECT MAX('items'.'id') FROM 'items' WHERE 'user'.'post_id' = 1
Let's see if Rails can do this. We know the post_id
because it's been given to us as part of user
so...
Item.where(post_id: 1).maximum(:id)
We ran the above and appended .to_sql
and confirmed that the generated SQL was exactly what we wanted above. Thank you to my colleague/mentor, Adam!
Here's what it looked like afterwards, because "pics or it didn't happen" (which, if you think about it, is the new millennium equivalent of "a picture is worth a 1000 words"). It would have been great if I had gotten a screenshot of what it looked like before, but as you can see, none of this "off the chart" business.
Punt it!
Lesson learned. If you can punt the work to the database, let it do the heavy lifting of searching/sorting/finding. Don't let Rails do it.
This one change resulted in the memory to drop to a steady 256MB. We haven't seen it bust out the OOM errors despite throwing more users and data at it. EXCELLENT.
Stay tuned for Part 2, where I talk about some other small optimizations including our favourite N+1 query.
Level up +5
Questions? Comments? Hit me up at risaonrails !