DatabasesIntermediateguide

MongoDB Anti-Patterns: From Unbounded Arrays to Shard Imbalance

Master MongoDB by avoiding common architectural mistakes like the unbounded array anti-pattern, poor index selection, and sharding bottlenecks.

Sachin SarawgiApril 20, 20263 min read3 minute lesson
Recommended Prerequisites
Database Indexing Deep Dive

MongoDB Anti-Patterns: Building Scalable Document Stores

MongoDB's flexibility is its greatest strength, but it's also a trap for those coming from relational backgrounds. Here are the most critical "gotchas" and anti-patterns to avoid.

1. The Unbounded Array Anti-Pattern

In a document database, it's tempting to store everything related to an entity inside one document.

  • The Pitfall: Storing all comments for a post or all logs for a user inside an array in the main document. Since documents have a 16MB limit, this array will eventually break your application. Even before that, updating a large document causes significant disk I/O.
  • The Solution: Use a subset pattern or link to a separate collection for "many" relationships. Store only the last 10 items in the main document and move the rest elsewhere.

2. Index Bloat and Write Performance

Every index you add makes reads faster but writes slower.

  • The Pitfall: Adding an index for every possible query field. Excessive indexes consume memory (the "Working Set") and force MongoDB to update multiple data structures for every insert/update.
  • The Solution: Use Compound Indexes efficiently. Remember the ESR (Equal, Sort, Range) rule for index design. Monitor your index usage with db.collection.aggregate([ { $indexStats: {} } ]) and remove unused ones.

3. Shard Key Selection

Once you shard a collection, changing the shard key is extremely difficult and time-consuming.

  • The Pitfall: Choosing a low-cardinality shard key (like "country") or a monotonically increasing key (like "timestamp"). This leads to Hot Shards, where all writes go to a single server while the others sit idle.
  • The Solution: Choose a key with high cardinality and even distribution, or use a Hashed Shard Key.

4. Neglecting the Working Set

MongoDB is most efficient when your frequently accessed data and indexes fit into RAM.

  • The Pitfall: Growing your database size without increasing RAM. Once the "Working Set" exceeds available memory, MongoDB starts swapping to disk, causing latency to skyrocket.
  • The Solution: Monitor page faults and document reads from disk. Scale your memory or shard your data before the Working Set exceeds your RAM capacity.

5. Write Concern Trade-offs

MongoDB allows you to specify how many nodes must acknowledge a write before it's considered successful.

  • The Pitfall: Using w: 1 for critical financial transactions (risking data loss if the primary fails) or w: "majority" for high-volume logs (unnecessary latency).
  • The Solution: Tailor your Write Concern to the importance of the data. Use majority for mission-critical data and w: 1 for non-essential telemetry.

Summary

Building a successful MongoDB application requires thinking about how documents grow and how data is accessed. By avoiding unbounded arrays and choosing the right sharding strategy, you can build a system that scales linearly with your user base.

Learning Path: Databases Track

Keep the momentum going

Step 26 of 54: Your next milestone in this track.

Next Article

NEXT UP

MongoDB Internals: Deep Dive into WiredTiger and Replication

2 min readAdvanced

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

More in Databases

Category-based suggestions if you want to stay in the same domain.