System Design: Building a Feature Flag Platform

A feature flag platform decouples deployment from release. Code can be deployed safely while the feature remains disabled, then enabled gradually for internal users, beta customers, 1% of traffic, 10% of traffic, and eventually everyone.

At small scale, feature flags are a config file. At production scale, they become release infrastructure. They need local evaluation, streaming updates, targeting rules, percentage rollouts, audit logs, approval workflows, kill switches, experiment assignment, tenant isolation, and lifecycle cleanup.

This guide designs a production feature flag platform. It focuses on the architecture behind tools like LaunchDarkly, Flagsmith, Unleash, and internal progressive delivery systems.

Requirements

Functional requirements:

create boolean and multivariate flags
target users, tenants, regions, environments, and plans
support percentage rollouts
support kill switches
evaluate flags from application SDKs
update SDKs quickly when flags change
record audit history
support experiments
support flag cleanup and ownership

Non-functional requirements:

low-latency evaluation
high availability during flag service outages
predictable consistency
safe defaults
tenant isolation
strong auditability
protection against targeting mistakes
scalable SDK update distribution

The most important design constraint: application code should not call a remote feature flag API for every request. Flag evaluation must happen locally in the SDK from an in-memory snapshot.

High-Level Architecture

Admin UI
  |
  v
Flag API
  |
  +-- validates changes
  +-- writes flag config
  +-- records audit event
  +-- publishes config update
        |
        v
Flag store -> update stream -> SDKs
                          |
                          v
                    local evaluation

The runtime path should be:

application -> SDK local cache -> evaluate flag -> return variation

Not:

application -> flag service -> database -> evaluate flag -> return variation

Remote evaluation on every request adds latency and creates a dependency that can break your entire application during a flag service outage.

Flag Data Model

A flag is a versioned configuration object:

{
  "key": "new-checkout-flow",
  "project": "commerce",
  "environment": "production",
  "type": "boolean",
  "enabled": true,
  "defaultVariation": false,
  "variations": [
    { "key": "off", "value": false },
    { "key": "on", "value": true }
  ],
  "rules": [
    {
      "id": "rule_enterprise_beta",
      "conditions": [
        { "attribute": "plan", "operator": "equals", "value": "enterprise" },
        { "attribute": "betaUser", "operator": "equals", "value": true }
      ],
      "serve": { "variation": "on" }
    }
  ],
  "fallthrough": {
    "rollout": [
      { "variation": "on", "weight": 1000 },
      { "variation": "off", "weight": 9000 }
    ],
    "bucketBy": "userId"
  },
  "version": 42,
  "updatedAt": "2026-04-08T10:15:30Z"
}

Weights use basis points: 1000 means 10%, 9000 means 90%. This avoids floating point mistakes.

Storage Schema

CREATE TABLE feature_flags (
  flag_id UUID PRIMARY KEY,
  project TEXT NOT NULL,
  environment TEXT NOT NULL,
  flag_key TEXT NOT NULL,
  flag_type TEXT NOT NULL,
  enabled BOOLEAN NOT NULL DEFAULT false,
  config JSONB NOT NULL,
  version BIGINT NOT NULL DEFAULT 1,
  owner_team TEXT NOT NULL,
  expires_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (project, environment, flag_key)
);

CREATE TABLE feature_flag_audit_events (
  event_id UUID PRIMARY KEY,
  flag_id UUID NOT NULL REFERENCES feature_flags(flag_id),
  actor_id TEXT NOT NULL,
  action TEXT NOT NULL,
  old_version BIGINT,
  new_version BIGINT,
  diff JSONB NOT NULL,
  reason TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

The config JSON contains targeting rules and rollout configuration. The relational columns make common lookups and governance queries cheap.

Examples:

"show all production flags owned by checkout"
"show expired flags"
"show all flags changed today"
"show all disabled kill switches"

Evaluation Context

SDKs evaluate a flag against a context:

{
  "userId": "u_123",
  "tenantId": "tenant_abc",
  "email": "sachin@example.com",
  "plan": "enterprise",
  "region": "us-east-1",
  "country": "US",
  "betaUser": true,
  "appVersion": "4.12.0"
}

Do not require every service to send every attribute. Targeting rules must handle missing attributes.

Example rule evaluation:

type EvaluationContext = Record<string, string | number | boolean | undefined>;

type Condition = {
  attribute: string;
  operator: "equals" | "not_equals" | "in" | "starts_with";
  value: string | number | boolean | Array<string | number | boolean>;
};

function conditionMatches(condition: Condition, context: EvaluationContext): boolean {
  const actual = context[condition.attribute];

  switch (condition.operator) {
    case "equals":
      return actual === condition.value;
    case "not_equals":
      return actual !== condition.value;
    case "in":
      return Array.isArray(condition.value) && condition.value.includes(actual as never);
    case "starts_with":
      return typeof actual === "string"
        && typeof condition.value === "string"
        && actual.startsWith(condition.value);
  }
}

Keep the rule language intentionally small. A feature flag platform is not a general-purpose programming language.

Percentage Rollouts

Percentage rollouts must be deterministic. A user should not bounce between on/off across requests.

function bucketUser(input: {
  flagKey: string;
  bucketByValue: string;
  salt: string;
}): number {
  const hashInput = `${input.flagKey}:${input.bucketByValue}:${input.salt}`;
  const hash = murmur3(hashInput);
  return Math.abs(hash) % 10000; // 0..9999
}

function evaluateRollout(
  bucket: number,
  rollout: Array<{ variation: string; weight: number }>
): string {
  let cumulative = 0;

  for (const allocation of rollout) {
    cumulative += allocation.weight;
    if (bucket < cumulative) {
      return allocation.variation;
    }
  }

  return rollout[rollout.length - 1].variation;
}

If 10% of users are enabled and you increase to 20%, the original 10% should remain enabled. Deterministic hashing gives you that property.

SDK Local Evaluation

The SDK keeps an in-memory snapshot:

class FeatureFlagClient {
  private snapshot: Map<string, FlagConfig> = new Map();

  constructor(private defaults: Record<string, unknown>) {}

  updateSnapshot(flags: FlagConfig[]) {
    this.snapshot = new Map(flags.map(flag => [flag.key, flag]));
  }

  variation<T>(flagKey: string, defaultValue: T, context: EvaluationContext): T {
    const flag = this.snapshot.get(flagKey);
    if (!flag || !flag.enabled) {
      return defaultValue;
    }

    try {
      return evaluateFlag(flag, context) as T;
    } catch (error) {
      recordFlagEvaluationError(flagKey, error);
      return defaultValue;
    }
  }
}

Evaluation failure must return the caller-provided default. The default should be safe for the application.

For a release flag, safe default is usually old behavior. For a kill switch, safe default may be disabled. Make defaults explicit in code.

Propagating Updates

Common update strategies:

| Strategy | Latency | Complexity | Notes | |---|---|---| | Polling | seconds to minutes | low | simple and robust | | Server-Sent Events | sub-second to seconds | medium | good for server SDKs | | WebSocket | sub-second | medium | useful for bidirectional needs | | CDN snapshot | seconds | low/medium | good for browser/mobile SDKs |

A practical pattern:

server-side SDKs use streaming with polling fallback
client-side SDKs use CDN snapshots
all SDKs persist last-known-good snapshot

If the flag service is down, applications should keep using the last-known-good snapshot. They should not suddenly forget every flag and change behavior across the fleet.

Consistency Tradeoffs

Feature flag systems are usually eventually consistent.

That means:

a kill switch may take a few seconds to propagate
different app instances may briefly evaluate different versions
SDKs need version tracking
audit logs should show when the change was made, not when every SDK received it

For critical kill switches, add a faster emergency path:

admin kill switch -> high-priority update stream -> SDK immediate refresh

You can also separate emergency ops flags from experiment flags. Ops flags need faster propagation and stricter approval than UI experiments.

Audit And Approval

Every production flag change should be auditable:

{
  "eventId": "evt_123",
  "flagKey": "new-checkout-flow",
  "environment": "production",
  "actorId": "u_456",
  "action": "ROLLOUT_UPDATED",
  "oldVersion": 41,
  "newVersion": 42,
  "diff": {
    "fallthrough.rollout[on].weight": { "old": 1000, "new": 2500 }
  },
  "reason": "Increase rollout after stable 1% canary",
  "createdAt": "2026-04-08T10:15:30Z"
}

Require approval for:

production kill switch changes
rollout above a threshold
targeting all tenants
permission or billing flags
flags owned by another team

Do not require approval for every development flag update, or teams will route around the platform.

Experiments

Experiments need stable assignment and analytics.

Experiment assignment:

{
  "experimentKey": "checkout-button-copy",
  "userId": "u_123",
  "variation": "treatment",
  "flagVersion": 42,
  "assignedAt": "2026-04-08T10:15:30Z"
}

Emit exposure events only when the user actually sees the variant:

const variant = flags.variation("checkout-button-copy", "control", context);

renderButton(variant);

analytics.track("experiment_exposure", {
  experimentKey: "checkout-button-copy",
  variant,
  userId: context.userId,
  flagVersion: flags.version("checkout-button-copy"),
});

Do not count exposure just because a backend evaluated a flag if the user never saw the UI.

Kill Switches

Kill switches should be designed differently from ordinary release flags.

Properties:

safe default is usually disabled behavior
fast propagation
clear owner
clear runbook
prominent UI placement
audit event with incident reference

Example:

if (!featureFlags.getBooleanValue("payments.card-processing.enabled", true, context)) {
    return paymentFallbackService.queueForManualReview(request);
}

return cardProcessor.charge(request);

The fallback path must be tested. A kill switch that flips to broken code is theater.

Multi-Tenancy

Feature flag platforms need tenant isolation:

project-level access controls
environment-level access controls
tenant targeting limits
audit events per tenant-affecting change
rate limits on SDK polling/streaming
snapshot size limits

If you support customer-specific flags, avoid creating one-off flags for every customer forever. Prefer targeting rules with tenant attributes and expiration dates.

Flag Lifecycle

Flag debt is real. Every flag adds branches and test combinations.

Lifecycle:

created
internal testing
beta rollout
percentage rollout
fully enabled
code cleanup required
archived

Add lifecycle fields:

ALTER TABLE feature_flags
ADD COLUMN lifecycle_stage TEXT NOT NULL DEFAULT 'CREATED',
ADD COLUMN cleanup_owner TEXT,
ADD COLUMN cleanup_due_at TIMESTAMPTZ;

Weekly cleanup query:

SELECT flag_key, owner_team, cleanup_due_at
FROM feature_flags
WHERE environment = 'production'
  AND lifecycle_stage = 'CLEANUP_REQUIRED'
  AND cleanup_due_at < now()
ORDER BY cleanup_due_at ASC;

If a release flag is fully enabled for weeks, remove the old code path and archive the flag.

Failure Modes

Remote evaluation in request path. Every request depends on the flag service and latency grows.

Bad default value. Flag service outage changes application behavior because defaults were unsafe.

Stale SDK snapshot. An instance misses updates and keeps evaluating an old version.

Targeting typo. A rule targets all users instead of beta users.

Cardinality explosion. Experiment exposure events include high-cardinality or sensitive attributes.

Flag debt. Old flags stay in code forever and create untested combinations.

Kill switch not tested. The emergency path fails when it is finally needed.

Client-side secret leakage. Browser/mobile SDK gets rules or attributes that should only be server-side.

Production Checklist

Evaluate flags locally in SDKs.
Use last-known-good snapshots.
Provide polling fallback for streaming.
Make defaults explicit and safe.
Use deterministic hashing for rollouts.
Keep the targeting language small.
Audit every production flag change.
Require approval for high-risk changes.
Separate kill switches from experiments.
Emit experiment exposure only when actually shown.
Track SDK snapshot version and staleness.
Add expiration and owner metadata.
Review stale flags weekly.
Protect server-only targeting rules from client SDKs.
Test fallback paths for kill switches.

System Design: Building a Feature Flag Platform

Requirements

High-Level Architecture

Flag Data Model

Storage Schema

Evaluation Context

Percentage Rollouts

SDK Local Evaluation

Propagating Updates

Consistency Tradeoffs

Audit And Approval

Experiments

Kill Switches

Multi-Tenancy

Flag Lifecycle

Failure Modes

Production Checklist

Read Next

Recommended Resources

Sachin Sarawgi

Related Articles

System Design: Building a Distributed Cache

System Design: Building a Distributed Configuration Platform

System Design: Building a Fraud Detection Platform

System Design: Building a Feature Flag Platform

Requirements

High-Level Architecture

Flag Data Model

Storage Schema

Evaluation Context

Percentage Rollouts

SDK Local Evaluation

Propagating Updates

Consistency Tradeoffs

Audit And Approval

Experiments

Kill Switches

Multi-Tenancy

Flag Lifecycle

Failure Modes

Production Checklist

Read Next

Recommended Resources

Get the next backend guide in your inbox

Sachin Sarawgi

Related Articles

System Design: Building a Distributed Cache

System Design: Building a Distributed Configuration Platform

System Design: Building a Fraud Detection Platform