Scaling

Scaling is the process of adjusting the number of instances (or replicas) of a service to meet the current demand. Services that receive requests—such as APIs, workers, or background jobs—can be scaled up or down to optimize performance, availability, and cost.

Scaling is a core concept in distributed systems and cloud-native applications. It ensures your system can handle varying workloads without degrading user experience or over-provisioning resources.

Why Scale?

Scaling enables services to respond effectively under different conditions:

High Traffic: When demand spikes, scaling up ensures your service can process more requests in parallel.
Cost Optimization: Scaling down during periods of low demand helps reduce unnecessary resource usage and cloud costs.
Fault Tolerance: Multiple instances of a service provide redundancy in case of instance failure.
Throughput & Latency: Additional instances can reduce response times and increase the number of operations your service can perform per second.

Types of Scaling

There are two main ways to scale a service:

Horizontal Scaling: Adds or removes instances of a service. This is the most common approach for stateless services.
Vertical Scaling: Increases or decreases the resources (CPU, memory) available to a single instance.

In most modern deployments, horizontal scaling is preferred because it aligns well with cloud-native principles and is easier to automate and distribute.

Auto-Scaling

Auto-scaling refers to automatically adjusting the number of service instances based on defined policies or metrics.

Instead of manually adding more instances when traffic increases, an auto-scaling system watches key indicators (like CPU usage) and takes action in real time.

Example

With Defang, users on the Pro plan or higher can enable service-level autoscaling in three steps:

Add the x-defang-autoscaling : true extension to the service you want to autoscale.
Remove any replicas field in the deploy mapping (if present).
Deploy using staging or production mode. (e.g. defang compose up --provider=aws --mode=production)

services:
  service-example:
    x-defang-autoscaling: true #enable autoscaling for this service
    build: 
       context: .
       dockerfile: Dockerfile
    ports: 
        - mode: ingress
        target: 8080
        published: 8080

How It Works

Auto-scaling systems typically rely on:

Metrics Collection: Real-time monitoring of system metrics.
Scaling Policies: Rules that define when to scale up or down. For example:
- If average CPU > 85% for 5 minutes → scale up by 2 instances.
Cooldown Periods: Delays between scaling events to prevent rapid, repeated changes (flapping).

Supported Platforms

Platform	Auto-Scaling Support
Playground	❌
AWS	✅
DigitalOcean	❌
GCP	✅

Benefits of Auto-Scaling

Elasticity: Automatically adapts to changing workloads.
Resilience: Helps maintain performance during traffic surges or partial outages.
Efficiency: Reduces the need for manual intervention or over-provisioning.

Considerations

Ensure services are stateless or use externalized state (e.g., databases, caches) for smooth scaling. (12 Factor App)
Test services under load to identify scaling bottlenecks.

See Tutorial page Scaling Tutorial

Scaling

Why Scale?​

Types of Scaling​

Auto-Scaling​

Example​

How It Works​

Supported Platforms​

Benefits of Auto-Scaling​

Considerations​