System Design

API Rate Limiting at Scale: Redis-Based Strategies

Learn how to build a high-performance rate limiter using Redis. Explore Fixed Window, Sliding Window, and Token Bucket algorithms implemented with Lua scripts.

Sachin Sarawgi·April 20, 2026·2 min read
#redis#api-gateway#rate-limiting#performance#system-design

API Rate Limiting at Scale with Redis

Rate limiting is essential for protecting your APIs from abuse, ensuring fair usage, and preventing cascading failures. Redis is the ideal store for rate limiting because of its speed and atomic operations.

1. Fixed Window Algorithm

The simplest approach. You divide time into fixed windows (e.g., 1 minute) and increment a counter for each user.

  • The Logic: INCR user:123:min:45. If the counter > limit, reject.
  • The Problem: The Edge Case. A user can send their full limit at the very end of window A and another full limit at the start of window B, doubling the allowed rate in a short burst.

2. Sliding Window Log

To fix the edge case, we store a timestamp for every request in a Redis Sorted Set (ZSET).

  • The Logic:
    1. Remove timestamps older than the current window: ZREMRANGEBYSCORE user:123 0 (now - window).
    2. Count remaining timestamps: ZCARD user:123.
    3. If count < limit, add current timestamp: ZADD user:123 now now.
  • Pros: Extremely accurate.
  • Cons: High memory usage for very high-traffic APIs.

3. Token Bucket Algorithm

This is the most flexible algorithm. A "bucket" is filled with tokens at a constant rate. Each request consumes one token.

  • The Logic: We store the last update time and the current token count. When a request arrives, we calculate how many tokens should have been added since the last update.
  • Implementation: Use a Redis Lua script to make the "calculate and consume" logic atomic and prevent race conditions.

4. Distributed Rate Limiting Gotchas

  • Clock Drift: In a distributed system, ensure all your application servers and your Redis nodes are synced via NTP.
  • Redis Availability: If Redis is down, should you allow all requests (fail-open) or block all requests (fail-closed)? Most public APIs prefer fail-open to maintain availability.
  • Local Caching: For extremely high-volume APIs, use a two-tier approach: a small local memory limit followed by a global Redis limit.

Summary

Redis-based rate limiting provides the perfect balance of accuracy and performance. By choosing the right algorithm—whether it's the simplicity of Fixed Window or the precision of Token Bucket—you can protect your infrastructure while providing a consistent experience for your users.

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: