High Availability & Scalability in AWS¶
Scalability means that an application / system can handle greater workloads by adapting. There are 2 types of scalability:
- Horizontal Scalability (Elasticity) - Provision additional servers quickly based on demand
- Vertical Scalability - Increase size / resource of exisitng server
High Availability means running applications / system in multiple AZ's to be able to survive in case of a data center wide outage. HA can be passive (Eg: RDS Multi AZ) or active (Eg: Horizontal scaling).
Load balancers are servers that forward traffic to multiple servers / EC2 instances downstream. Load balancers are useful for following use cases:
- Spread load across multiple downstream instances
- Expose a single point of access (DNS) to your application
- Seamlessly handle failure of downstream instances
- Regular health check & SSL termination for websites.
- Enforce stickiness with cookies
- Enable high availability across zones
- Seperate public traffic from private traffic.
Elastic Load Balancer - ELB¶
A managed load balancer. AWS guarantees uptime, upgrades, maintenance & high availability. Costs less to than setting up your own load balancer. But, it will be a lot more effort on your end also very difficult to manage from a scalability perspective.
Integrated with many AWS offerings:
- EC2, EC2 Auto Scaling Groups, Amazon ECS - AWS Certificate manager (ACM), cloudwatch - Route 53, AWS WAF, AWS Global Accelerator
Health check enable the load balancer to know if instances it forwards traffic to are available to reply to requests. Health checks are done on a port & route. (Eg:
Application Load Balancer (ALB)¶
Protocols: HTTP, HTTPS, WebSocket
- ALB is layer 7 (HTTP)
- Load balancing to multiple http applications across machines (target Groups)
- Load balancing to multiple applications on the same machine (containers)
redirect from HTTP to HTTPS
- Fixed hostname (xxx.region.elb.amazonaws.com)
- The application servers don't see the IP of the client directly. THe IP is inserted in the header
X-Forwarded-For, port in the
X-Forwarded-Port& protocol in the
Supports routing traffic to different target groups:
- Routing based on path in URL:
- Routing based on hostname in URL:
- Routing based on Query String & Headers:
- Target Groups:
ALB are a great fit for micro services & container based application. Also, supports port mapping for dynamic ECS applications.
Load Balancer Security Groups
- Load Balancer security groups will have source as
0.0.0.0/0so that users can access from anywhere.
- Rules also will have a Port range & TCP or UDP protocol. For EC2 instances, the source of incoming traffic will be a security group from the load balancer that allows communication on a specific port & protocol.
Security group of EC2 instance is linked with security group of LoadBalancerwhich means only traffic originating from Load Balancer can communicate with EC2 instance.
Network Load Balancer (NLB)¶
Protocols: TCP, TLS, UDP
- NLB is layer 4 (Transport)
- Allows to forward TCP & UDP traffic to your instances
- Handle millions of traffic per second
- Health Checks support
- Less latency ~100 ms vs 400ms for ALB
- NLB does not have a security group defined
- Target Groups:
NLB has 1 static IP per AZ & supports assigning elastic IP. This is helpful for whitelisting IP. NLb are used for extreme performance, TCP or UDP traffic.
Gateway Load Balancer (GWLB)¶
Deploy, scale & manage a fleet of 3rd party network virtual appliances in AWS. Eg Usage:
Firewalls, Intrusion Detection & Prevention, Deep Packet Inspection, Payload manipulation.
- Operates at Layer 3 (Network Layer)
- Uses GENEVE Protocol on 6081
- Combines the following functions:
- Transparent Network Gateway: Single entry/exit for network traffic
- LoadBalancer: Distributes traffic to virtual appliances
stickiness so that the same client is always redirected to the same instance / container behind a load balancer.
- Works for Application Load Balancer (ALB)
- Uses a
cookiewhich has an expiration date that you can control.
- Use case: Make sure the user doesn't loose their session data.
Types of Cookies
Application based cookie
- Generated by target (application)
- Can include any custom attributes required by the application
- Cookie name must be specified individually for each target group
- Reserved cookie names:
- Generated by the load balancer
- Cookie name is
Duration based Cookie
- Cookie generated by the load balancer
- Cookie name is
- Duration is generated by the load balacner
Enabling stickiness can imbalance to the load over the backend EC2 instances.
Cross Zone Load Balancing¶
With Cross Zone Load Balancing
- Each load balancer distributes evenly across all registered instances in all az.
- Enabled by default for ALB but disabled by default for NLB & GWLB
- No charges for inter AZ data for ALB but charges are incurred for NLB & GWLB
Without Cross Zone Load Balacnding
- Requests are distributed in the instances of the node in the ELB
- Traffic is contained in each AZ
If there are imbalanced number of EC2 instances, then some instances will receive more traffic when cross zone load balancing is not enabled.
- Time to complete 'in-flight requests' while the instance is de-registering or unhealthy
- Stops sending new requests to the EC2 instance which is de-registering
- Between 1 - 3600 seconds (Default: 300 seconds)
- Can be disabled (Set to 0)
- Set to low value if requests are short
- The load balancer issues an X.509 certificate (SSL/TLS) server certificate
- You can manage certificates using ACM (AWS Certificate Manager)
- You can upload your onwn certificates alternatively
- Server Name Indication (SNI) solves the problem of multiple SSL certificates onto 1 web server (To serve multiple websites)
- It is a 'newer' protocol, & requires the client to indicate the hostname of the target server in the initial SSL handshake
- Only works for the ALB, NLB & Cloudfront
Multiple SSL certificates in Load Balancer -> ALB or NLB
Auto Scaling Group¶
The role of auto scaling group (ASG) is to:
- Scale out (Add EC2 instances) to match an increased load
- Scale in (Remove EC2 instances) to match decreased load
- Ensure we have minimum & maximum number of EC2 instances running
- Automatically register new instances to a load balacner
- Re-create EC2 instance in case previous one is terminated or unhealthy
- It is possible to scale ASG based on cloudwatch alarms
ASG are free (You only pay for underlying EC2 instances)
ASG are created with a Launch Template. It Contains information about how to launch EC2 instances
- AMI + instance type
- EC2 user data
- SSH Keys
- EBS Volumes
- IAM Roles
- Network + Subnet Information
- Load balancer information
- ASG has a min size, max size & scaling policies
Dynamic Scaling policies¶
Target Tracking Scaling
- Track the average ASG CPU to stay at a percentage (40%)
Simple / Step Scaling
- Setup your own cloudwatch alarms
- When cloudwatch alarm is triggered Eg( CPU > 80%) add 2 units
- Anticipate scaling based on known usage patterns
- Eg: increase min capacity to 10 at 6 PM on Friday
- Continously forecast load & schedule scaling ahead
- Good Metrics to scale on:
Request count per target,
Average Network In/Outor any custom policy
After a scaling activity happens, you are in the cooldown period (Default 300 seconds). During the cooldown period ASG will allow for metrics to stabilize. Use ready-to-use AMI to reduce configuration time in order to be serving requests faster & reduce cooldown period.