System Design: Designing a Proximity Service (Yelp)
A Proximity Service allows users to find nearby places (restaurants, gas stations, etc.). The technical challenge is searching millions of businesses based on a user's current GPS coordinates with millisecond latency.
1. Core Requirements
- Add/Update Business: Business owners can add or modify their listings.
- Proximity Search: Users find businesses within a given radius (e.g., 5km).
- Filtering: Filtering by category, rating, or price.
2. The Scaling Problem
Storing latitude and longitude in a standard database index (B-Tree) is not enough. A standard index only works on one dimension. To find businesses near a point, we need a 2D Spatial Index.
3. The Solution: Geohashing
Geohash is a popular technique that divides the world into a grid and assigns each cell a short string of letters and numbers.
- The Logic:
- The world is divided into 32 large squares.
- Each square is subdivided recursively.
- The more characters in a Geohash string, the smaller and more precise the area.
- Example:
9q8yyis a broad area in San Francisco, while9q8yyzhis a specific street corner.
Why Geohash works for search
Geohashes have a unique property: Prefix Matching. Two locations that are close to each other will often share the same Geohash prefix. To find restaurants near a user, we simply calculate the user's Geohash and perform a Prefix Search in our database.
4. Database Selection
- Metadata Store: PostgreSQL or MySQL to store business details (name, description, menu).
- Index Store: Redis or Elasticsearch is ideal for Geohash indexing.
- Redis Geospatial (GEOADD/GEORADIUS): Uses Geohashes internally to provide (\log N)$ proximity searches.
5. Handling High Read Volume
Proximity services are read-heavy.
- Caching: Cache the results of popular searches (e.g., "Sushi in Manhattan") in Redis.
- Read Replicas: Since data updates (new restaurants) are rare, we can scale out using read replicas across different geographic regions.
6. Edge Case: The "Boundary" Problem
If a user is at the very edge of a Geohash cell, the nearest restaurant might be in the neighboring cell.
- The Fix: Always search the user's current Geohash cell plus the 8 surrounding cells. This ensures no nearby locations are missed.
Summary
The heart of Yelp's architecture is Geohashing. By converting 2D coordinates into 1D strings, we can leverage traditional database indexing to build a blazing-fast proximity search engine that scales to millions of businesses and users.
