API Pagination at Scale: Moving Beyond OFFSET
Designing a paginated API seems simple: just use LIMIT 20 OFFSET 100. This works perfectly for the first few pages. However, once your users reach page 5,000, your database performance will fall off a cliff.
1. The Problem with OFFSET
When you execute OFFSET 100,000, the database doesn't magically jump to the 100,000th row. It must:
- Scan the first 100,000 rows.
- Store them in memory to sort them.
- Discard the first 100,000 and return the next 20. The result: As the offset increases, the query gets linearly slower and more memory-intensive.
2. The Solution: Cursor-based Pagination (Keyset)
Instead of telling the database how many rows to skip, we tell it where we left off. This is Cursor-based Pagination (or Keyset Pagination).
- How it works: We include a unique, ordered value (like
idorcreated_at) in the API response as a "cursor." - The Query:
-- Using the ID of the last item on the previous page
SELECT * FROM orders
WHERE id > 54321
ORDER BY id ASC
LIMIT 20;
The Benefit: The database uses an index to jump directly to the specific row in $O(\log N)$ time, regardless of how deep the user is in the dataset.
3. Dealing with Complex Sorts
If you need to sort by something non-unique (like price), you must use a Tie-breaker.
- The Cursor:
(price, id). - The Query:
SELECT * FROM products
WHERE (price, id) > (19.99, 101)
ORDER BY price ASC, id ASC
LIMIT 20;
4. When to use OFFSET?
Offset-based pagination is only acceptable if:
- Your dataset is small (a few thousand rows).
- You must support jumping to a specific page number (e.g., "Go to page 45").
- Data consistency across pages is not a concern (Offset pagination skips/duplicates items if new rows are inserted during a scroll).
Summary
For high-scale APIs, OFFSET is a debt. It leads to slow queries and inconsistent results. By switching to Cursor-based pagination, you provide sub-millisecond page loads for your users, even if they are browsing a list of 100 million items.
Next: Database Sharding vs. Partitioning: High Scale Strategies Previous: Distributed Caching at Scale: Thundering Herd Strategies
