System DesignIntermediateguidePart 21 of 29 in Backend Systems Mastery

Kubernetes Networking: What Happens Between the Load Balancer and Your Pod?

A backend engineer's guide to K8s networking. Learn about Services, ClusterIP, NodePort, Ingress Controllers, and the Container Network Interface (CNI).

Sachin Sarawgi•April 20, 2026•3 min read•3 minute lesson

#kubernetes #networking #ingress #distributed-systems #infrastructure

On This PageOpen

1. The Service Abstraction
2. The Ingress Controller (The Front Door)
3. CNI: The Plumbing
4. kube-proxy and traffic steering
5. North-south vs east-west traffic
6. NetworkPolicy and zero-trust segmentation
7. Common latency and timeout causes
8. Practical debugging workflow
9. Design recommendations for backend teams
Summary

Recommended Prerequisites

Kubernetes Production Best Practices

Kubernetes Networking for Backend Developers

As a backend engineer, you usually stop thinking about a request once it hits the Load Balancer. In Kubernetes, that is just the beginning. Understanding the network hop between the Ingress and your code is critical for debugging latency and connection timeouts.

1. The Service Abstraction

Pods in K8s are ephemeral; they die and get new IP addresses. You cannot point a client to a Pod IP.

ClusterIP: A stable internal IP that load balances traffic across a set of pods. It is only accessible within the cluster.
NodePort: Exposes the service on a specific port on every Node's IP.

2. The Ingress Controller (The Front Door)

The Ingress is an API object that manages external access, typically HTTP.

The Controller: A pod (like Nginx or Envoy) that actually implements the rules.
The Flow: External Client -> Cloud Load Balancer -> Ingress Controller -> Service -> Pod.

3. CNI: The Plumbing

The Container Network Interface (CNI) is the plugin that allows pods to talk to each other. Popular CNIs like Calico or Cilium use eBPF or IPtable rules to route packets with near-native speed.

4. kube-proxy and traffic steering

Service routing is implemented through kube-proxy (or eBPF replacements), which programs node-level rules for service VIP translation.

Two important behaviors:

connection distribution is influenced by hashing and NAT rules
long-lived connections may stay pinned to specific backend pods

This is why scaling replicas does not always rebalance existing traffic immediately.

5. North-south vs east-west traffic

Kubernetes traffic has two broad classes:

North-south: external users entering cluster (LB/Ingress path)
East-west: service-to-service calls inside cluster

Latency and policy controls differ between them. Most backend bottlenecks hide in east-west paths.

6. NetworkPolicy and zero-trust segmentation

By default many clusters allow broad pod-to-pod communication.
Use NetworkPolicy to enforce least-privilege communication:

limit namespace/service access
block lateral movement risk
reduce blast radius during compromise

Security posture depends heavily on CNI support and policy enforcement mode.

7. Common latency and timeout causes

cross-zone traffic due to uneven pod scheduling
DNS resolution delays under CoreDNS load
connection tracking table pressure on busy nodes
ingress/controller misconfigured timeouts
sidecar + ingress + service hop amplification

Observability across each hop is required before tuning blindly.

8. Practical debugging workflow

trace request path hop by hop
compare ingress, service, and app latency metrics
inspect retries and timeout mismatches between layers
validate endpoint health and pod readiness states
check CNI datapath drops and node-level saturation

This structured approach avoids "Kubernetes is slow" guesswork.

9. Design recommendations for backend teams

keep service dependencies explicit and shallow
align timeout budgets across ingress/service/client layers
prefer readiness probes that reflect actual app readiness
use topology-aware routing for zone-local traffic when possible

Networking reliability is part of application design, not only platform team ownership.

Summary

Understanding the hop-count in your K8s cluster is essential for P99 optimization. Every layer (Ingress, Service, Sidecar) adds a few milliseconds of latency.

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon →

Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon →

Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course →

Practical engineering notes

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

LinkedIn GitHub Medium More articles

Share this lesson

Share on X Share on LinkedIn

Continue Series

Backend Systems Mastery

Lesson 21 of 29 in this learning sequence.

Next in series

Beginner

What is Load Balancing? A Simple Guide for Backend Engineers

What is Load Balancing? Load balancing is a core component of any distributed system. It acts as a traffic cop sitting in front of your servers and routing client requests across all servers capable of fulfilling those r…

Beginner

System Design: Designing a Distributed ID Generator (Snowflake)

Designing a Distributed ID Generator > Prerequisite: To understand why distributed IDs are hard, first read about Database Sharding and Partitioning. In a distributed system, you often need to generate unique identifiers…

Beginner

gRPC vs REST: A Decision-Maker's Guide for Backend Architecture

gRPC vs REST: Which One for Your Microservices? > Prerequisite: Before diving into protocols, ensure you understand the fundamentals of Load Balancing and API Idempotency. Choosing between REST and gRPC is one of the mos…

Advanced

SQL vs NoSQL: Which One for Your Next Production MVP?

SQL vs NoSQL: Making the Right Choice One of the most debated topics in software engineering is whether to use a Relational (SQL) or Non-Relational (NoSQL) database. As a senior engineer, your choice shouldn't be based o…

Intermediate

System Design: Designing a URL Shortener (TinyURL)

System Design Masterclass: Designing a URL Shortener (TinyURL) Designing a URL shortener like TinyURL or Bitly is the most ubiquitous System Design interview question in the world. While it sounds trivial on the surface…

Advanced

Database Indexing Deep Dive: B-Trees, Hash Indexes, and Query Planning

Indexes are the single most impactful optimization in database performance. A 10-second query becomes 20ms with the right index. A wrong index slows writes and misleads the query planner. Understanding the internals — no…

Advanced

System Design: Designing a Global Distributed Rate Limiter

System Design Masterclass: Designing a Distributed Rate Limiter In a distributed environment, a single malicious script, a misconfigured client, or a massive traffic spike can easily overwhelm your backend servers, bring…

Advanced

Designing a Database Sharding Strategy for 100 Million Users

Vertical scaling has a ceiling. For most applications, that ceiling arrives somewhere between 1 million and 10 million users, depending on write patterns and data size. At 100 million users, the question is not whether t…

Beginner

gRPC vs REST: The Decision-Maker's Guide for Backend Architecture

gRPC vs REST: Which One for Your Microservices? In modern backend architecture, how services talk is as important as what they say. Choosing between REST and gRPC isn't just about syntax; it's about the trade-off between…

Advanced

System Design: Designing a Global Payment Gateway (Stripe Scale)

System Design Masterclass: Designing a Payment Gateway (Stripe) Designing a system to serve photos or short URLs is fundamentally about optimizing for read-latency and disk space. If a user's photo fails to load, they re…

Intermediate

Optimistic vs. Pessimistic Locking: Concurrency Control in Practice

Optimistic vs. Pessimistic Locking Imagine two users trying to book the last seat on a flight at the same time. If both read the count as "1" and decrement it, you've oversold the flight. This is the Lost Update Problem,…

Advanced

System Design: Designing a Distributed Task Scheduler

System Design Masterclass: Designing a Distributed Task Scheduler Every backend engineer has written a cron job. It's simple: you put a script on a Linux server and tell the OS to run it every night at midnight. But what…

Intermediate

Docker for Java Developers: A Production Guide to Containerization

Docker for Java Developers: Production Guide A common mistake in Java containerization is copying a fat JAR into a single-layer image. This results in 200MB+ images and slow deployment cycles. Here is how to build produc…

Advanced

Beyond CAP: Why PACELC is the Real Rule for Distributed Databases

Beyond CAP: Understanding the PACELC Theorem The CAP theorem (Consistency, Availability, Partition-tolerance) is a useful abstraction, but it only describes what happens when the network is broken. In the real world, the…

Advanced

Distributed Caching at Scale: Mitigating the Thundering Herd

Distributed Caching at Scale In a distributed system, caching is often the difference between a sub-100ms response and a total system collapse. However, most developers treat Redis as a simple "key-value bucket." At scal…

Advanced

The Transactional Outbox Pattern: Reliability in Microservices

The Transactional Outbox Pattern In a microservice, you often need to save data to a database (e.g., Order) and send an event to Kafka (e.g., OrderCreated). If the DB write succeeds but the Kafka send fails, your system…

Intermediate

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

API Pagination at Scale: Moving Beyond OFFSET Designing a paginated API seems simple: just use LIMIT 20 OFFSET 100. This works perfectly for the first few pages. However, once your users reach page 5,000, your database p…

Advanced

Inside the Linux Page Cache: The Invisible Database Accelerator

Inside the Linux Page Cache When your database (PostgreSQL, MongoDB, etc.) reads a row from disk, it doesn't just read the bytes and forget them. The Linux kernel intercepts the request and caches the data in a region of…

Intermediate

System Design: Designing Stateless Authentication

System Design: Designing Stateless Authentication In a microservices architecture, you can't rely on server-side sessions (stored in memory/database) because every request might hit a different service instance. Stateles…

Advanced

The Shadow Database Pattern: Verifying Schema Changes with Production Traffic

The Shadow Database Pattern Changing the schema of a 10TB database that is processing 50,000 requests per second is a high-stakes operation. Even with perfect testing in a staging environment, production traffic often re…

Intermediate

Kubernetes Networking: What Happens Between the Load Balancer and Your Pod?

Kubernetes Networking for Backend Developers As a backend engineer, you usually stop thinking about a request once it hits the Load Balancer. In Kubernetes, that is just the beginning. Understanding the network hop betwe…

Expert

S3 Express One Zone: When to Use it for Stateful Workloads

S3 Express One Zone Amazon S3 Express One Zone stores data in a single AZ, reducing network hops and latency. It's not a general-purpose storage; it's a specialized tool. 1. Use Case: Transient Data Perfect for Spark Shu…

Advanced

Service Mesh Internals: How Envoy and Istio Manage the Mesh

Service Mesh Internals A Service Mesh is a dedicated infrastructure layer for handling service-to-service communication. It's responsible for the reliable delivery of requests through a complex topology of services. 1. T…

Advanced

S3 Express One Zone: When to use it

S3 Express One Zone For stateful data processing (like Spark shuffle files), standard S3 latency is too high. S3 Express One Zone offers sub-millisecond access for transient data.

Advanced

Testing Distributed Systems: Chaos Mesh and Failure Injection

Testing Distributed Systems: Embracing Chaos In a distributed system, failure is the default state. To build resilient systems, you must move beyond unit tests and proactively inject failure into your production-like env…

Advanced

Terraform for Backend Engineers: Managing Your Own Infra

Terraform for Backend Engineers In modern engineering teams, the boundary between "Code" and "Infra" is blurring. As a backend developer, you should be able to spin up your own SQS queues or Postgres instances without op…

Advanced

The Expand-Contract Pattern: Zero-Downtime Database Schema Changes

The Expand-Contract Pattern: Zero-Downtime Migration The most dangerous operation in backend engineering is a breaking database schema change (e.g., renaming a column). If you just rename it, your existing application co…

Intermediate

System Design: Designing Idempotent APIs for Reliable Services

System Design: Designing Idempotent APIs In a distributed system, network failures are inevitable. A common failure scenario is: "The client sends a request -> The server processes it -> The server's response fails to re…

Advanced

LSM-Tree Compaction Strategies: Leveled vs. Size-Tiered

LSM-Tree Compaction Strategies LSM-tree based databases (Cassandra, RocksDB, ScyllaDB) don't update data in place. They write immutable SSTables. Over time, these files must be merged to reclaim space and improve reads.…

Keep Learning

Move through the archive without losing the thread.

Linearizability vs. Sequential Consistency: A Developer's Guide to Correctness

Linearizability vs. Sequential Consistency If you use a "Consistent" database, what guarantees are you actually getting? In distributed computing, there are two major models of "Strong" consistency. 1. Linearizability (T…

System Design3 min readAdvanced

Kosaraju's Algorithm for Strongly Connected Components

Kosaraju's Algorithm A Strongly Connected Component (SCC) of a directed graph is a maximal subset of vertices such that for every pair of vertices (u, v) in the subset, there exists a directed path from u to v and a dire…

DSA2 min readAdvanced

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignIntermediate

Kubernetes Networking: What Happens Between the Load Balancer and Your Pod?

Kubernetes Networking for Backend Developers

1. The Service Abstraction

2. The Ingress Controller (The Front Door)

3. CNI: The Plumbing

4. kube-proxy and traffic steering

5. North-south vs east-west traffic

6. NetworkPolicy and zero-trust segmentation

7. Common latency and timeout causes

8. Practical debugging workflow

9. Design recommendations for backend teams

Summary

Recommended Resources

Get the next backend guide in your inbox

Sachin Sarawgi

Backend Systems Mastery

What is Load Balancing? A Simple Guide for Backend Engineers

System Design: Designing a Distributed ID Generator (Snowflake)

gRPC vs REST: A Decision-Maker's Guide for Backend Architecture

SQL vs NoSQL: Which One for Your Next Production MVP?

System Design: Designing a URL Shortener (TinyURL)

Database Indexing Deep Dive: B-Trees, Hash Indexes, and Query Planning

System Design: Designing a Global Distributed Rate Limiter

Designing a Database Sharding Strategy for 100 Million Users

gRPC vs REST: The Decision-Maker's Guide for Backend Architecture

System Design: Designing a Global Payment Gateway (Stripe Scale)

Optimistic vs. Pessimistic Locking: Concurrency Control in Practice

System Design: Designing a Distributed Task Scheduler

Docker for Java Developers: A Production Guide to Containerization

Beyond CAP: Why PACELC is the Real Rule for Distributed Databases

Distributed Caching at Scale: Mitigating the Thundering Herd

The Transactional Outbox Pattern: Reliability in Microservices

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Inside the Linux Page Cache: The Invisible Database Accelerator

System Design: Designing Stateless Authentication

The Shadow Database Pattern: Verifying Schema Changes with Production Traffic

Kubernetes Networking: What Happens Between the Load Balancer and Your Pod?

S3 Express One Zone: When to Use it for Stateful Workloads

Service Mesh Internals: How Envoy and Istio Manage the Mesh

S3 Express One Zone: When to use it

Testing Distributed Systems: Chaos Mesh and Failure Injection

Terraform for Backend Engineers: Managing Your Own Infra

The Expand-Contract Pattern: Zero-Downtime Database Schema Changes

System Design: Designing Idempotent APIs for Reliable Services

LSM-Tree Compaction Strategies: Leveled vs. Size-Tiered

Keep Learning

Linearizability vs. Sequential Consistency: A Developer's Guide to Correctness

Kosaraju's Algorithm for Strongly Connected Components

Related Articles

System Design: Designing Idempotent APIs for Reliable Services

Service Mesh Internals: How Envoy and Istio Manage the Mesh

Beyond CAP: Why PACELC is the Real Rule for Distributed Databases

System Design: Designing a Global Distributed Rate Limiter

More in System Design

System Design: Designing Stateless Authentication

gRPC vs REST: The Decision-Maker's Guide for Backend Architecture

gRPC vs REST: A Decision-Maker's Guide for Backend Architecture