Scaling Your Services

This tutorial will show you how to scale your services with Defang.

There are two primary ways to scale a service. The first way is to increase the resources allocated to a service. For example, giving a service more CPUs or memory. The second way is to deploy multiple instances of a service. This is called scaling with replicas. Defang makes it easy to do both.

The Compose Specification, which is used by Defang, includes a deploy section which allows you to configure the deployment configuration for a service. This includes your service's resource requirements and the number of replicas of a service should be deployed.

Scaling Resource Reservations

In order to scale a service's resource reservations, you will need to update the deploy section associated with your service in your application's compose.yaml file.

Use the resources section to specify the resource reservation requirements. These are the minimum resources which must be available for the platform to provision your service. You may end up with more resources than you requested, but you will never be allocated less.

For example, if my app needs 2 CPUs and 512MB of memory, I would update the compose.yaml file like this:

services:
  my_service:
    image: my_app:latest
    deploy:
      resources:
        reservations:
          cpus: "2"
          memory: "512M"

The minimum resources which can be reserved:

Resource	Minimum
CPUs	0.5
Memory	512M

info

Note that the memory field must be specified as a "byte value string" using the {amount}{byte unit} format. The supported units are b (bytes), k or kb (kilobytes), m or mb (megabytes) and g or gb (gigabytes).

Scaling with Replicas

In order to scale a service's replica count, you will need to update the deploy section associated with your service in your application's compose.yaml file.

Use the replicas section to specify the number of containers which should be running at any given time.

For example, if I want to run 3 instances of my app, I would update the compose.yaml file like this:

services:
  my_service:
    image: my_app:latest
    deploy:
      replicas: 3

Autoscaling Your Services

Autoscaling allows your services to automatically adjust the number of replicas based on CPU usage — helping you scale up during traffic spikes and scale down during quieter periods.

Note: Autoscaling is only available to Pro tier or higher users.

Enabling Autoscaling

To enable autoscaling for a service, add the x-defang-autoscaling: true extension under the service definition in your compose.yaml file and remove the replicas field in your deploy mapping, if present. Autoscaling is available in staging and production deployments modes only.

Example:

services:
  web:
    image: myorg/web:latest
    ports:
      - 80:80
    x-defang-autoscaling: true

Once deployed, your services' CPU usage is monitored for how much load it is handling, sustained high loads will result in more replicas being started.

Requirements

BYOC, your own cloud platform account.
You must be on the Pro or higher plan to use autoscaling. (Defang plans)
replicas must NOT be defined
Only staging and production deployment modes supported. (Deployment modes)
The service must be stateless or able to run in multiple instances. (Scaling)

Best Practices

Design your services to be horizontally scalable. (12 Factor App)
Use shared or external storage if your service writes data. (e.g. Postgres or Redis managed services )

Scaling Your Services

Scaling Resource Reservations​

Scaling with Replicas​

Autoscaling Your Services​

Enabling Autoscaling​

Scaling Resource Reservations

Scaling with Replicas

Autoscaling Your Services

Enabling Autoscaling