System Design

System Design: Building an Audit Log System for Compliance and Debugging

Design a production audit log system with immutable events, schema design, write paths, search, retention, tamper resistance, PII controls, partitioning, and compliance tradeoffs.

Sachin Sarawgi·April 7, 2026·5 min read
#system design#audit logs#compliance#security#event architecture#backend engineering

Audit logs answer a simple question: who did what, to which resource, from where, and when?

That question matters during security investigations, customer support, compliance audits, data recovery, and debugging. A normal application log is not enough. Application logs are optimized for engineers. Audit logs are product and compliance records.

An audit log system must be durable, queryable, immutable, and careful with sensitive data.

Requirements

Functional requirements:

  • record user actions
  • record system actions
  • show audit history for a resource
  • search by actor, action, resource, tenant, and time range
  • export logs for compliance
  • retain logs based on policy

Non-functional requirements:

  • high write availability
  • append-only behavior
  • tamper resistance
  • low-latency search for recent data
  • cheap storage for old data
  • PII minimization
  • tenant isolation

Event Schema

A practical audit event:

{
  "eventId": "evt_01HXYZ",
  "tenantId": "t_123",
  "actor": {
    "type": "USER",
    "id": "u_456",
    "emailHash": "f2a1..."
  },
  "action": "ROLE_ASSIGNED",
  "resource": {
    "type": "USER_ROLE",
    "id": "role_admin"
  },
  "result": "SUCCESS",
  "ipAddress": "203.0.113.10",
  "userAgent": "Mozilla/5.0",
  "requestId": "req_789",
  "occurredAt": "2025-07-24T10:15:30Z",
  "metadata": {
    "targetUserId": "u_999"
  }
}

Avoid storing raw sensitive data when a stable hash is enough. For example, emailHash may be enough for investigation without storing the full email in the audit stream.

Write Path

There are two common approaches.

Synchronous Write

The API writes audit logs inside the request path:

userRoleService.assignRole(userId, role);
auditLogService.record(RoleAssignedEvent.from(userId, role));

This is simple but risky. If the audit log store is slow, the product action becomes slow. If audit logging fails, do you fail the user request? For compliance-critical actions, maybe yes. For lower-risk actions, maybe no.

Asynchronous Write

The API publishes an event and an audit consumer persists it:

@Transactional
public void assignRole(String userId, String role) {
    roleRepository.assign(userId, role);
    outboxRepository.save(AuditEvent.roleAssigned(userId, role));
}

Then a publisher sends the audit event to Kafka:

application -> outbox table -> Kafka -> audit-log-service -> storage

This avoids losing audit events when the app crashes after the business transaction commits.

Storage Model

Audit logs are append-heavy. A relational table works well for moderate volume:

CREATE TABLE audit_events (
  event_id UUID PRIMARY KEY,
  tenant_id VARCHAR(128) NOT NULL,
  actor_type VARCHAR(50) NOT NULL,
  actor_id VARCHAR(128) NOT NULL,
  action VARCHAR(100) NOT NULL,
  resource_type VARCHAR(100) NOT NULL,
  resource_id VARCHAR(128) NOT NULL,
  result VARCHAR(20) NOT NULL,
  occurred_at TIMESTAMP NOT NULL,
  request_id VARCHAR(128),
  ip_address INET,
  metadata JSONB NOT NULL DEFAULT '{}'
);

CREATE INDEX idx_audit_resource_time
  ON audit_events (tenant_id, resource_type, resource_id, occurred_at DESC);

CREATE INDEX idx_audit_actor_time
  ON audit_events (tenant_id, actor_id, occurred_at DESC);

For high-volume systems, use a two-tier model:

  • PostgreSQL or OpenSearch for recent searchable events
  • S3/Glacier for long-term retention

Search Design

Common access patterns:

  • "Show all changes to user u_123"
  • "Show all actions by admin a_456 last week"
  • "Show failed login attempts for tenant t_1"
  • "Export all permission changes for Q2"

OpenSearch mapping should keep fields structured:

{
  "mappings": {
    "properties": {
      "tenantId": { "type": "keyword" },
      "actor.id": { "type": "keyword" },
      "action": { "type": "keyword" },
      "resource.type": { "type": "keyword" },
      "resource.id": { "type": "keyword" },
      "occurredAt": { "type": "date" },
      "metadata": { "type": "flattened" }
    }
  }
}

Do not index every nested metadata field dynamically forever. Mapping explosion is a real production problem.

Immutability and Tamper Resistance

Audit logs should be append-only. Application code should not update or delete individual events.

At the database layer:

REVOKE UPDATE, DELETE ON audit_events FROM app_user;
GRANT INSERT, SELECT ON audit_events TO app_user;

For stronger tamper evidence, add hash chaining:

{
  "eventId": "evt_2",
  "payloadHash": "hash(current_payload)",
  "previousHash": "hash(evt_1)"
}

If someone modifies an old event, the chain breaks. This is not a replacement for access control, but it helps detect tampering.

For regulated environments, write old logs to S3 with Object Lock/WORM retention.

Retention and PII

Retention is a policy decision. Do not keep audit logs forever by default.

Example:

Event Type Retention
Authentication events 1 year
Permission changes 7 years
Billing changes 7 years
Debug-only admin views 90 days

PII rules:

  • store IDs instead of names/emails where possible
  • hash sensitive values used only for matching
  • encrypt long-term archives
  • restrict who can search audit logs
  • log access to the audit log itself

Production Checklist

  • Define audit-worthy actions explicitly
  • Use an append-only event schema
  • Write through outbox for critical actions
  • Store structured fields, not free-form strings only
  • Index by tenant, actor, resource, action, and time
  • Separate recent search storage from long-term archive
  • Minimize PII
  • Make audit log access itself auditable
  • Add retention policies
  • Consider hash chaining or WORM storage for tamper evidence

An audit log system is not just a compliance checkbox. It is the memory of your product. Design it as a reliable event system, and it will pay for itself during the first serious investigation.

Read Next

📚

Recommended Resources

System Design Interview — Alex XuBest Seller

Step-by-step guide to ace system design interviews with real-world examples.

View on Amazon
Grokking System Design on Educative

Interactive course teaching system design with visual diagrams and practice problems.

View Course
Designing Data-Intensive Applications

Martin Kleppmann's book is essential reading for any system design role.

View on Amazon

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: