API Rate Limiting at Scale with Redis
Rate limiting is essential for protecting your APIs from abuse, ensuring fair usage, and preventing cascading failures. Redis is the ideal store for rate limiting because of its speed and atomic operations.
1. Fixed Window Algorithm
The simplest approach. You divide time into fixed windows (e.g., 1 minute) and increment a counter for each user.
- The Logic:
INCR user:123:min:45. If the counter > limit, reject. - The Problem: The Edge Case. A user can send their full limit at the very end of window A and another full limit at the start of window B, doubling the allowed rate in a short burst.
2. Sliding Window Log
To fix the edge case, we store a timestamp for every request in a Redis Sorted Set (ZSET).
- The Logic:
- Remove timestamps older than the current window:
ZREMRANGEBYSCORE user:123 0 (now - window). - Count remaining timestamps:
ZCARD user:123. - If count < limit, add current timestamp:
ZADD user:123 now now.
- Remove timestamps older than the current window:
- Pros: Extremely accurate.
- Cons: High memory usage for very high-traffic APIs.
3. Token Bucket Algorithm
This is the most flexible algorithm. A "bucket" is filled with tokens at a constant rate. Each request consumes one token.
- The Logic: We store the last update time and the current token count. When a request arrives, we calculate how many tokens should have been added since the last update.
- Implementation: Use a Redis Lua script to make the "calculate and consume" logic atomic and prevent race conditions.
4. Distributed Rate Limiting Gotchas
- Clock Drift: In a distributed system, ensure all your application servers and your Redis nodes are synced via NTP.
- Redis Availability: If Redis is down, should you allow all requests (fail-open) or block all requests (fail-closed)? Most public APIs prefer fail-open to maintain availability.
- Local Caching: For extremely high-volume APIs, use a two-tier approach: a small local memory limit followed by a global Redis limit.
Summary
Redis-based rate limiting provides the perfect balance of accuracy and performance. By choosing the right algorithm—whether it's the simplicity of Fixed Window or the precision of Token Bucket—you can protect your infrastructure while providing a consistent experience for your users.
