ActiveRecord has lots of tools to help us simplify database interactions, but it's important to make sure our queries are still efficient and easy to understand as our apps get bigger.
This article outlines all the ways you can make ActiveRecord queries simpler, faster, and easier to maintain. When necessary, we’ll dive into the limitations of each and what alternative performance technique can be used instead.
1. Eager Loading
Ever come across a website where all of the images, gifs, and text display a loading sign upon the initial load? If yes, you may have witnessed eager loading.
Achieved through the includes method, eager loading is the process in which a query for one type of model also loads related model(s). In other words, eager loading ensures that your page’s model and all its associated models are loaded all at once.
Example
You have a Blog model with related Blog Post models. Your Blog Post models each have many Comment models. You want to show a list of blog posts with the number of comments for each post. Without eager loading, you might end up with slow performance due to additional queries fetching the comments for each blog post.
In the code snippet above, we select all the blog posts using BlogPost.all and then go through them one at a time. But when we use post.comments.size, it has to ask the database for the comments on each post, which produces the N+1 query problem.
Here’s what the database queries look like for the above code:
With eager loading, you use the includes method to fetch all the blog posts and their associated comments in a single query. This way, you simplify your code and improve performance by reducing the number of database hits.
In the code snippet above, we call BlogPost.includes(:comments).all to get all the blog posts and their comments in one database query.
Now, when we use post.comments.size, the comments are already loaded, so it doesn't need to ask the database again. This makes things faster by using fewer database queries.
When to use
Use eager loading when you anticipate the need for associated data alongside your main query and want to keep the number of database queries to a minimum to avoid the N+1 problem.
Limitations
Performance: While eager loading can sometimes make your application faster, it can also use more memory and slow down your application's initial load time.
Stale Data: Eager loading can potentially return stale data. When you have complex nested associations and you modify associated records during a request, you might get inconsistent results.
For example, in scenarios where relationship data is eager loaded, subsequent updates made to related records may not be reflected in the originally loaded data:
Loading Unnecessary Data: Eager loading can load data that you don't need. For example, I have a list of users, and each user has a list of posts. Only when I select a user from the list am I directed to a new page that lists their related post models. In this scenario, I should not use eager loading because I only need to load the user models for that initial page. Related post models in this scenario should be loaded on-demand when a particular user is selected.
2. Lazy Loading
Ever come across a website where certain images, gifs, or text display a loading sign after the initial load has already been completed? Perhaps they started loading as you scrolled down the page. If so, you may have witnessed lazy loading in action.
Lazy loading is the process of loading models on demand. It postpones the retrieval of associated models until needed, which prevents unnecessary data from getting loaded into memory.
Example
You have a User model associated with the Post model. You can use lazy loading to fetch user records first and then load their associated posts only when necessary, reducing the initial database load. Here's an example:
In the code above, we first fetch all the user records by using User.all.
Then, when necessary, we check user.posts.size inside the fetch_posts_per_user function. ActiveRecord will retrieve the count of the associated posts for a user when that data is actually requested, rather than retrieving it all at once for each user, which helps reduce unnecessary overhead.
When to Use
Use lazy loading when you want to optimize the initial load time by loading resources as needed rather than eager loading data into memory that you may not need.
For example, images on a webpage can be loaded as they are scrolled into view.
Limitations
N+1 Performance Problem: Lazy loading should not be used if it is causing an N+1 query problem: If you access related records one by one, you'll need extra database queries.
This can slow things down, especially if you have a lot of data. In this case, it's better to use eager loading instead, where you can fetch parent records and their related data in one go.
3. Retrieving Specific Columns: Select and Pluck
Leveraging ActiveRecord's query interface methods directly without instantiating a model can greatly enhance query performance because it avoids unnecessary object creation and retrieval of unneeded data.
By using methods like select, and pluck, you retrieve only the data you need without the overhead of object instantiation. This approach minimizes memory usage and processing time, especially when dealing with large datasets or when you require specific attributes or aggregated values.
pluck returns raw data, whereas select returns an ActiveRecord::Relation object. In terms of performance, pluck is often a better choice, but if it's too limiting and you need to work with ActiveRecord instances, select is the next best option.
Example
In the code snippet above, the where method filters the records based on the condition (age: 25), and then pluck extracts only the name attribute values from the database, without creating actual model instances, while select alternatively returns an ActiveRecord::Relation object with the name attribute populated.
When to Use
Use pluck or select when you’re tired of selecting numerous records with unnecessary fields that load excess data into memory, leading to longer database times and wasted memory.
Use the select method instead of pluck when you want to retrieve specific columns while still working with ActiveRecord objects. This is particularly useful when you need to access model methods or wish to chain other ActiveRecord::QueryMethods.
Use pluck if you only need raw data and do not plan to work with ActiveRecord objects.
Limitations
Pluck
Limited support for complex scenarios: pluck is designed specifically for extracting values from one or more columns. If you need to perform more complex queries, aggregations, or work with associations, pluck might not be the most suitable choice.
Select
Not necessarily a limitation, but because you’re initializing model objects with limited fields, make sure you don’t attempt to access a field that is not in the initialized record. Otherwise, watch out for this error:
Keep in mind that while you can access model methods, you will see ActiveModel::MissingAttributeError raised if those model methods attempt to access a field that is not in the initialized record.
For example, if you run something like
and the model method you’re attempting to call looks like
you’ll end up with
4. Bulk Operations: insert_all, update_all, delete_all
Bulk operations play a crucial role in enhancing ActiveRecord query performance, particularly for large datasets. These operations allow for processing multiple records in a single operation, minimizing the amount of individual queries and skipping callbacks and validations. They deal directly with the database, without instantiating an active record object for each record.
Note: The insert_all bulk operation method used in the example below was introduced in Rails 6.
Example
In the code snippet above, insert_all inserts multiple records into the database, update_all performs bulk updates, and delete_all removes multiple records matching a condition.
When to Use
Use bulk operations when working with large datasets or when performing bulk actions such as data imports, updates, or deletions.
Only use insert_all, delete_all, or update_all if you’re okay with your callbacks and/or validations getting bypassed when processing the data.
Limitations
Skips validations and callbacks: Use insert_all, delete_all, update_all only when callbacks and/or validations are okay to be ignored. These operations bypass model validations and callbacks, which can be problematic if you rely on these mechanisms to ensure data integrity or trigger certain actions. This can lead to inconsistent data and unintended side effects.
For example, I have a model, Product, with a price attribute. I have a validation that ensures price cannot be a negative number. I attempt to update some models individually with the update method. Some of these models have negative price values, which would fail validations. Only the models with positive price values are updated when the operation is complete. If I attempt to make the same updates using update_all, all the models will be updated, including the ones whose price value goes against the model validations set.
No Active Record Object Instantiation: insert_all, delete_all, update_all operations don't instantiate individual Active Record objects, which means you lose the benefits of object-oriented features like instance methods, associations, or custom behavior defined within the models.
If you cannot use insert_all , delete_all, update_all because you need validations and/or callbacks to be respected when committing the model, and/or object-oriented features defined within your models, you can try iterating the list of models in batches to preserve performance instead.
Please be aware that callbacks can still impact performance when processing large batches.
5. Avoid Unnecessary Data Retrieval
ActiveRecord methods like limit, offset, and where can be useful when minimizing the amount of data we load.
Example
limit and offset help to "paginate" the loaded data, preventing all of it from being loaded into memory at once.
where allows conditions to be applied to the data, filtering out records that do not match the conditions. Only the matching records are then loaded into memory.
When to Use
Use where when you need to filter records based on specific conditions.
Use limit when you want to cap the number of records returned by a query to a specific number, such as for implementing pagination.
Use offset when you want to skip a certain number of records from the beginning of the result set. This is commonly used in combination with limit for implementing pagination.
Limitations
Data Overload: Sometimes, even when querying with conditions or pagination, the number of records we are loading into memory at once is still too large. In cases with extremely large datasets, it is advisable to use batch processing to load records.
6. Batch Processing when Loading Records
Using ActiveRecord’s where, all, or find methods may not be the best way to query records when we’re dealing with large datasets.
With large datasets, the performance is bound to decrease when we use these methods because all of the matching records are loaded into memory.
Performance can be improved by processing records in smaller groups, AKA batch processing. This can be accomplished through Rails’ find_each, find_in_batches and in_batches methods.
Note: The in_batches method was introduced in Rails 5.
Example
When to Use
Looping through a collection of records (for example Product.all.each { |product| puts product.price }) will instantiate all the objects at once.
Use batch processing when you need to query a large number of records so that memory consumption is reduced.
Determining the ideal batch size typically involves a process of trial and error, where you experiment with different batch sizes while considering the memory consumption of your container or VM.
Limitations
If you are using a Rails version before 5.1, you will be unable to limit the number of records fetched in batch processing, as the example below demonstrates:
If you are using a Rails version before 6.1, your records are automatically processed in ascending order. This can be a problem if you wish to process records in descending order so that newer records are processed first, as the example below demonstrates:
7. Minimize Database Hits
When writing queries with ActiveRecord, ensure that you are only fetching data when it is required and you are reusing the fetched data as much as possible, rather than re-fetching it from the database.
When to Use
Before making new queries, we can check whether the data already exists in memory, thinking in terms of: "Check the memory for x/y/z record. If the record does not already exist in memory, then fetch it from the database".
Additionally, we must ensure we are retrieving the necessary amount of data. Always check if a task can be accomplished with a single database query. We can use query methods like pluck if we only care about selecting raw data rather than the entire model instance(s).
Example
The example below is precisely what not to do when we wish to filter out the items on a list that are not related to a specified vendor.
We are making product_code.length amount of queries above as we select the product whose code matches the code of the current iteration of product_codes.
Instead, we can use pluck so that we only load the raw data (code and store_type) in a single query.
8. Upgrades
Upgrading Ruby and Rails versions can improve query performance.
Newer Ruby versions have enhanced garbage collection, which helps reduce memory usage. Efficient memory usage can contribute to improved application performance, including the execution of ActiveRecord queries.
Newer versions of Rails often come with performance enhancements, bug fixes, and optimizations in the ActiveRecord query interface. These improvements can lead to faster and more efficient query execution, reducing the time required to grab data from the database.
9. Indexing
Indexing in ActiveRecord improves query performance by creating data structures that enable the database engine to quickly locate and retrieve records.
For instance, adding an index on the name column in a 'Product' model allows faster retrieval of a product by name, optimizing the performance of queries like Product.find_by(name:'Woody from Toy Story').
When to Use
Implement an index in a database when dealing with large tables containing a substantial amount of records, especially if certain columns are frequently queried in the WHERE clause, involved in JOIN operations, or used for sorting and grouping.
Focus on indexing critical parts of the application where query performance is important.
Limitations
Storage Requirements Increase: While indexes enhance read performance, it's essential to carefully choose which columns to index based on query patterns. Indexing increases storage requirements, as each index consumes additional disk space when used. When space and memory are limited, too many indexes can be a problem.
Slower write speed: Storage requirements increasing also means slower write speed for INSERT / UPDATE / DELETE operations with indexes.
Every insert, update, or delete operation on a table with indexes requires additional maintenance of those indexes. When a record is added or modified, the corresponding indexes must be updated to reflect the changes.
For example, when a new record is inserted, the database must update the index to include the new entry. The more indexes there are, the more maintenance work is required for each insert operation.
10. Prefer these query methods over these:
Prefer exists? over present? or any?: When checking the existence of records, using the exists? method is more efficient than present? because it avoids unnecessary record retrieval. It retrieves a boolean value unlike present? and any? which are fetching the actual records.
Prefer size over count: In scenarios where you only need the number of records and not the actual records themselves, using the size method is more performant. size fetches the count directly from the database, while count retrieves all matching records and then counts them.
Prefer pluck over select: When you only need specific attribute values from a query result, pluck is generally more efficient than select. pluck retrieves only the specified attribute values from the database, whereas select fetches the entire record object.
Prefer find_by over where: If you expect a single record to match a condition, find_by is more performant than where. find_by stops searching and returns the first matching record, whereas where retrieves all matching records and then returns the first one.
11. Tools for Preventing N+1 Queries
As we went over above, lazy loading can be a performance enhancer or a performance killer. When lazy loading happens in a loop, it’s called the “N+1 query problem”.
The tools below can help us prevent N+1 queries.
Bullet Gem
The Bullet gem is a well-known tool designed to boost your app’s performance by decreasing its query count. The bullet gem can detect N+1 query problems and suggest solutions for when eager loading of associations is optimal. This gem also helps identify unnecessary database queries and increased load times due to unused eager loaded associations. Finally, it can flag areas where additional indexes are beneficial and advise when counter cache usage is appropriate.
Example
With the Bullet gem installed and configured to log into the Bullet log file log/bullet.log, you can capture log entries that notify you when a query can be optimized. The entries below indicate whether eager loading is being used and should be avoided, or when it's not in use but should be.
Besides logging warnings in the Bullet log file, you have the option to view them via Javascript alerts or the browser's console.
For full details on how to use this gem, please refer to the Bullet documentation.
When to Use
Use the Bullet gem during development and testing stages to anticipate and address potential performance hurdles. Expect Bullet to promptly notify you of any N+1 query problems it encounters as you write code or run tests!
Limitations
Suppose you have scenarios where you’d prefer automatic enforcement of eager loading associated records, rather than relying on suggestions or alerts provided by an external gem. In that case, you may prefer to use strict_loading.
strict_loading
Since Rails 6.1, we can use strict_loading mode on individual queries, associations, entire models, and even the entire application!
When enabled, strict_loading will raise an error anytime we try to lazy load that particular query/record/association/model/application.
Example
When enabled on a particular record, strict_loading will raise an error anytime we try to lazy load from that record.
When to Use
Use strict_loading mode if you only intend to eager load the query/record/association/model/application you enabled the mode for, and want to prevent any lazy loading on that particular query/record/association/model/application.
Limitations
Be careful not to enable strict_loading if you’re not 100% sure you intend on using eager loading on every part of the query/record/association/model/application it’s been enabled for.
Otherwise, you’ll receive an unwanted ActiveRecord::StrictLoadingViolationError. Note that you can choose to switch off the StrictLoadingViolationError exception and log it instead.
Premature Optimization Warning
We all want our queries to be fast, but if we try to optimize our ActiveRecord queries too early, it can lead to complex SQL queries or custom caching mechanisms that add unnecessary complexity to our code, making it hard to read and maintain.
Some queries just don’t have a significant impact on performance. Make sure to evaluate the real performance bottlenecks and optimize where it matters most.
Regularly monitoring and profiling your Rails application, along with familiarizing yourself with traffic metrics, can help you identify areas for improvement.
Consider a scenario where one request takes 4 seconds to complete and another takes 2 seconds. Even though both can be optimized, the first one is accessed only once a day, while the second is accessed 100 times a day. Although the second request isn't the slowest, it's the one users interact with more frequently and might need higher priority.
Remember, speed is important, but it's not the only thing that matters. Make sure you optimize only where it counts, and keep your code clean and easy to read for future developers. Don't let optimization make your queries harder to understand!
Author: Anva Kondakciu
Subscribe for Updates
Top 5 Reasons Why You Should Subscribe
-
#1
Stay up-to-date on industry news and trends
-
#2
Learn about claims management best practices
-
#3
Gain insights into customer behavior and preferences
-
#4
Discover new technologies and innovations
-
#5
Engage with a community of industry professionals
Appraisals
Reduce cycle time and improve appraisal accuracy
Claims
Claims management with our modular, end-to-end solutions
Payments
Deliver payments effortlessly with automated tools and no-code workflows