Amazon EC2: Auto Scaling

6 min readApr 14, 2024

Explained about AutoScaling its use cases, features, key components, and the setup, and also about the difference between horizontally scaling and vertical scaling.

Amazon EC2 Auto Scaling is a feature provided by Amazon Web Services (AWS) that helps you automatically adjust the number of Amazon Elastic Compute Cloud (EC2) instances in your application fleet according to demand.

Think of it like this: When your application gets busy, Auto Scaling can automatically add more EC2 instances to handle the increased workload. And when the demand decreases, it can reduce the number of instances to save you money.

It’s like having a team of workers who come in when things get busy and go home when things quiet down, all managed automatically based on predefined rules you set up. This helps ensure your application stays responsive and cost-effective without requiring manual intervention.

Use cases:

Schedule application scaling
Reduce manual provisioning

Features of Amazon EC2 Auto Scaling:

Monitoring the health of running instances
Custom health checks
Balancing capacity across Availability Zones
Multiple instance types and purchase options
Load balancing
Scalability
Instance refresh
Lifecycle hooks
Support for stateful workloads
Automated replacement of Spot Instances

Auto Scaling Key Components:

Auto Scaling Groups (ASGs): These are collections of EC2 instances that work together to ensure your application can handle varying levels of traffic. You define the minimum, maximum, and desired number of instances in an ASG, and Auto Scaling automatically adjusts the number of instances within these limits based on demand.
Launch Configurations or Launch Templates: These are templates that define the configuration settings used when launching new instances within an ASG. They specify details such as the EC2 instance type, AMI (Amazon Machine Image), key pair, security groups, and user data scripts.
Scaling Plans: These are sets of instructions for scaling your ASGs in response to changes in demand over time. They can include scheduled scaling actions, which allow you to anticipate changes in traffic patterns (e.g., increasing capacity before peak hours), and recurring scaling actions, which automatically adjust capacity on a recurring schedule.
CloudWatch Alarms: These are used to monitor the metrics that are important to your application’s performance. You can create alarms that trigger scaling actions based on predefined thresholds (e.g., CPU utilization exceeding a certain percentage).
Scaling Policies: These are rules that dictate how Auto Scaling should respond to changes in demand. There are two types of scaling policies:

Target Tracking Scaling: This policy allows you to specify a target value for a certain metric (such as CPU utilization or request count per instance), and Auto Scaling adjusts the number of instances to maintain that target.
Step Scaling: This policy allows you to define a series of scaling adjustments based on the value of a specified CloudWatch metric. As the metric crosses predefined thresholds, Auto Scaling adds or removes instances in discrete steps.

Setup (Create Auto Scaling group):

Step1: Choose launch template :

Auto Scaling group name
Launch template: Choose a launch template that contains the instance-level settings, such as the Amazon Machine Image (AMI), instance type, key pair, and security groups.

Step2: Choose instance launch options :

Instance type requirements: Specify instance attributes / Manually add instance types
Instance purchase options
Allocation strategies
Network

Step3: Configure advanced options :

Load balancing: Use the options below to attach your Auto Scaling group to an existing load balancer, or to a new load balancer that you define
VPC Lattice integration options: VPC Lattice facilitates communications between AWS services and helps you connect and manage your applications across compute services in AWS.
Health checks
Additional settings: Monitoring, Default instance warmup (Helps ensure newly launched instances are fully initialized and ready to handle traffic before they are added to the the Auto Scaling group)

Step4: Configure group size and scaling :

Group size: Set the initial size of the Auto Scaling group
Scaling: You can resize your Auto Scaling group manually or automatically to meet changes in demand. (Min desired capacity/Max desired capacity)
Automatic scaling: Choose whether to use a target tracking policy (Choose a CloudWatch metric and target value)
Instance maintenance policy: Control your Auto Scaling group’s availability during instance replacement events. (Launch before terminating / Terminate and launch / Custom behavior)
Instance scale-in protection: Scale-in protection prevents newly launched instances from being terminated by scaling activities

Step5: Add notifications :

Send notifications to SNS topics whenever Amazon EC2 Auto Scaling launches or terminates the EC2 instances in your Auto Scaling group..

Step6: Add tags

This was all about Auto Scaling its use cases, features, key components, and setup….

Now, Let's Discuss about

Difference between horizontal scaling and vertical scaling...

Horizontal scaling and vertical scaling are two strategies used to increase the capacity and performance of a system, particularly in the context of servers and databases.

Horizontal scaling means that you scale by adding more ec2 machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing ec2 machine.

Horizontal Scaling (Scale-Out):

Horizontal Scaling is adding more ec2 machines.

Imagine you run a popular video streaming service like Netflix.
During peak hours, like in the evenings when many people are watching movies, there’s a huge surge in demand for your service.
With horizontal autoscaling, your system automatically adds more servers (machines) to handle the increased number of viewers.
These additional servers help distribute the video streams to users, ensuring everyone gets smooth playback without buffering.
When the demand decreases, like late at night when fewer people are watching, the extra servers are automatically scaled down to save costs.
It’s like Netflix adding more streaming servers during prime time to accommodate more viewers, then reducing them when fewer people are watching.

Vertical Scaling (Scale-Up):

Vertical Scaling usually means the upgrade of server hardware.

Now, let’s say there’s a sudden viral video that everyone wants to watch, causing a spike in demand.
With vertical autoscaling, instead of adding more servers, your existing servers are automatically upgraded to handle the increased load.
For example, each server might get more CPU power or memory to process and deliver more video streams.
This ensures that even though there’s a sudden surge in viewership, your servers can handle the extra workload without issues.
Once the hype around the viral video dies down, the servers are scaled back to their original capacity to avoid unnecessary costs.
It’s like your streaming service boosting the power of its existing servers when a popular video goes viral, then reverting them to normal when the hype subsides.

“In both cases, autoscaling helps the video streaming service adapt to changes in demand dynamically, ensuring a smooth and reliable viewing experience for users while optimizing resource usage and costs.”.

Good examples of horizontal scaling are Redis and MongoDB.

A good example of vertical scaling is MySQL — Amazon RDS (The cloud version of MySQL).

If there’s a specific topic you’re curious about, feel free to drop a personal note or comment. I’m here to help you explore whatever interests you!

Thanks for spending your valuable time learning to enhance your knowledge!