How do load balanced applications work

Published: March 19, 2021 (Updated: Mar 19, 2021)

Enjoying this content? Subscribe to the Channel!

Mastering Load Balancers: The Beginner’s Guide to Key Concepts and Interview Readiness


Welcome to Darren’s Tech Tutorials! If you’ve ever wondered how major websites handle millions of users without crashing, the answer almost always involves one fundamental concept: Load Balancing.

Load balancing is more than just a buzzword; it’s the traffic cop of modern computing, ensuring stability, high availability, and incredible performance for all your applications. Whether you’re scaling a small web app or preparing for a high-level system design interview, understanding these concepts is absolutely critical.

In this comprehensive guide, we’re breaking down the core concepts of load balancing into simple, actionable chunks. Let’s dive in and make you an expert on distributing traffic!

What Exactly Is a Load Balancer?

At its simplest, a Load Balancer (LB) is a device or software that acts as a reverse proxy, sitting in front of a group of servers (often called a server pool, server farm, or target group).

Its core job is twofold:

  1. Distribute Incoming Traffic: It intelligently spreads incoming client requests across all available healthy servers. This prevents any single server from becoming overwhelmed and crashing.
  2. Ensure High Availability: If one server fails, the Load Balancer instantly recognizes the failure and redirects all traffic only to the remaining healthy servers, guaranteeing continuous service for the user.

Think of it this way: instead of sending 100% of your requests to one server, the Load Balancer might send 10% to Server A, 10% to Server B, and so on. This distribution is the key to scalability.

The Mechanics: How Load Balancing Is Configured

While modern cloud providers make setting up a Load Balancer relatively straightforward, knowing the components involved is essential for understanding the configuration process:

  1. The Listener: This is the front door. The listener constantly checks for incoming connection requests on specific protocols (like HTTP, HTTPS, or TCP) and ports (like 80 or 443).
  2. The Target Group: This is the destination—the collection of backend servers (or “targets”) that will receive the requests. The Load Balancer monitors the health and availability of every server in this group.
  3. Routing Rules: These are the instructions that tell the Load Balancer how to distribute traffic. For example, a rule might specify: “If the request is for /api/users, send it to the User Service Target Group.”

Load Balancing Algorithms: Who Gets the Request?

The algorithm is the brains of the Load Balancer—it determines which backend server receives the next incoming request. Choosing the right algorithm can drastically affect performance.

Here are the two most common algorithms you should know:

1. Round Robin

This is the simplest and fairest method. The Load Balancer rotates the requests sequentially among the servers.

  • Request 1 goes to Server A.
  • Request 2 goes to Server B.
  • Request 3 goes to Server C.
  • Request 4 goes back to Server A.

Pros: Extremely simple to implement and ensures even distribution of traffic volume. Cons: It doesn’t account for server capacity or current load. If Server A is handling 10 long-running database queries and Server B is idle, Round Robin still sends the next request to Server A equally.

2. Least Connection

This is a dynamic algorithm that is far more intelligent. It directs the new request to the server that currently has the fewest active connections.

Pros: Excellent for environments where server processing power or request complexity varies greatly. It ensures the fastest response time by not overloading servers already busy with active users. Cons: Requires the Load Balancer to actively track every connection state, requiring slightly more processing power than simpler methods.

Health Checks and High Availability

The greatest feature of a Load Balancer is resilience, and that resilience relies entirely on health checks.

A health check is a regular, automated process where the Load Balancer pings or attempts to connect to a backend server. If the server doesn’t respond correctly within a set time, the Load Balancer assumes it is unhealthy and takes it out of the rotation immediately.

Why Health Checks Matter

Imagine an e-commerce site during a holiday sale. If one server suddenly runs out of memory, the Load Balancer will detect this failure almost instantaneously, ensuring that no further customer requests are sent to that downed server. When the server recovers, the Load Balancer will automatically detect its good status and place it back into the target group. This mechanism is the bedrock of high availability.

Understanding Session Stickiness (Affinity)

Most modern web applications are “stateless,” meaning every request is independent. However, certain applications (like complex shopping carts or personalized login sessions) require the user to interact with the same server for the duration of their session. This is where session stickiness, or session affinity, comes in.

If a user adds items to a cart, and their next request is routed to a different server that has no memory of the previous actions, the cart will appear empty—a terrible user experience!

To solve this, the Load Balancer uses cookies or source IP addresses to ensure that once a user establishes a connection with Server A, all subsequent requests from that user are routed back to Server A for a defined period.

Load Balancing in System Design

Load balancing is rarely a standalone component; it’s central to modern, scalable system architecture.

The Two-Tier Application Model

In a simplified two-tier model (Web Server and Database), the Load Balancer typically sits in front of the web servers. The web servers are scaled horizontally (meaning you add more identical servers), and the Load Balancer ensures they all share the user load equally.

The Power of DNS Health Checking

While most health checks monitor immediate server status, large-scale distributed systems often use DNS Load Balancing for balancing traffic across wide geographic regions.

If an entire data center goes offline (a major regional failure), a Load Balancer can’t help. Instead, DNS health checking routes users to the closest healthy data center globally, providing the ultimate layer of disaster recovery and geographic optimization.

Ready to Scale Your Applications?

Load balancing is an indispensable tool in the world of cloud computing and modern application architecture. Whether you are building a resilient service or gearing up to ace that system design interview, a solid grasp of these core concepts—from algorithms and health checks to session stickiness—will set you apart.

The best way to truly understand load balancing is to get hands-on! Spin up a small server farm on your favorite cloud platform and start experimenting with different algorithms and configurations.

If you found this breakdown helpful, please hit that Like button, subscribe to Darren’s Tech Tutorials for more clear tech guides, and let me know in the comments which load balancing algorithm you prefer to use!