Distributed Garbage Collection

In a microservices world, if Service A creates a resource in Service B, who is responsible for deleting it? If Service A crashes, that resource leaks forever. This is Distributed Memory Management.

1. Reference Counting vs. Leases

Ref Counting: A service counts how many times a resource is used. This is fragile; a missed leads to permanent leaks.
Leases: The resource is granted to a service for a fixed time (e.g., 60 seconds). If the service doesn't renew the lease, the backend automatically deletes the resource.

2. The Cycle Problem

If Service A depends on B, and B depends on A, you have a distributed cycle. Standard GC fails here. You need a distributed global garbage collector (like a marker-sweeper that traverses service boundaries) or, more simply, enforced time-based TTLs on all shared resources.

3. Why this appears in real architectures

Examples:

workflow engine creates temporary objects in storage service
authorization service issues delegated grants consumed by others
media pipeline creates intermediate blobs across stages

When ownership spans services, cleanup guarantees become unclear.

4. Lease-based strategy in practice

Leases are often the safest default:

creator obtains resource lease for fixed duration
active owner renews lease via heartbeat
missed renewals trigger automatic expiration cleanup

This bounds leak lifetime and removes dependence on perfect explicit delete calls.

5. Tombstones and deferred cleanup

Hard delete can be unsafe if references may still exist.
Many systems use:

soft-delete tombstone
grace period
asynchronous sweeper that verifies no active references

This pattern reduces accidental data loss during transient reference delays.

6. Detecting distributed reference leaks

Track:

orphan resource count by type
lease renewal failure rate
average resource age beyond expected TTL
cleanup backlog depth

Without leak telemetry, distributed GC failures surface only as storage/cost explosions.

7. Handling cycles safely

For complex dependency graphs:

model resources as graph edges with ownership metadata
run periodic graph traversal to find unreachable components
sweep in topological order when possible

For many teams, strict TTL + explicit ownership conventions give better ROI than full global tracing GC.

8. Design guidelines

every resource has a declared owner
every shared object has expiry policy
renewal protocol is idempotent
cleanup jobs are retry-safe and observable
emergency manual cleanup runbook exists

Distributed GC is mostly about ownership contracts and lifecycle discipline.

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Distributed Garbage Collection: Managing References Across Networks