NoSQL Schema Evolution: Managing Data over Time
The biggest lie in software engineering is that NoSQL is "schema-less." In reality, it's just schema-on-read. While the database doesn't enforce a structure, your application code absolutely does. As your product grows, you need a strategy to evolve your data without breaking existing features.
1. The Schema-on-Read Reality
In a relational database, you run an ALTER TABLE statement. In NoSQL, you just start writing new fields. However, your code must now handle both the "old" and "new" versions of a document.
2. Strategy: Lazy Migration (The "On-the-Fly" approach)
This is the most common strategy for document databases like MongoDB or DynamoDB.
- The Process: When a document is read, the application checks for the new fields. If they are missing, it adds them with default values. When the document is saved back to the database, it's saved in the new format.
- Pros: Zero downtime, spreads the migration load over time.
- Cons: You have "dirty" data for a long time; logic for both schemas must live in your code indefinitely.
3. Strategy: The Expand-Contract Pattern
Used for breaking changes where you can't just add a field.
- Expand: Add the new field and start writing to both the old and new fields.
- Migrate: Run a background script to copy data from the old field to the new field for all existing documents.
- Switch: Change the application code to read only from the new field.
- Contract: Delete the old field from the database.
4. Strategy: Schema Versioning
Add a version field to every document (e.g., "schema_version": 2).
- Your application uses a factory pattern or migration middleware to transform the raw document into the current expected object model based on its version number.
- This is the cleanest approach for long-lived systems with multiple breaking changes.
5. Handling Deletions and Renames
- Renames: Treat a rename as an "Add New + Delete Old" operation using the Expand-Contract pattern.
- Deletions: Never truly delete a field immediately. Mark it as
deprecatedin your code first, then remove it only after you're sure no legacy systems (like old mobile app versions) are still using it.
Summary
Schema evolution in NoSQL requires more discipline than in SQL because the database won't stop you from making mistakes. By using versioning and the expand-contract pattern, you can ensure your data remains a clean, reliable asset rather than a tangled web of legacy documents.
