Scaling and Autoscaling

Zerops autoscales vertically (CPU/RAM/disk) and horizontally (container count). Runtimes support both. Managed services (DB, cache, shared-storage) support vertical only with fixed container count (NON_HA=1, HA=3). Object-storage and Docker have no autoscaling. Extends grammar.md section 9 with mechanics, thresholds, YAML syntax, and common mistakes.

When to Scale Which Way

Symptom	Scale type	Why
CPU/memory pressure on existing containers	Vertical (CPU/RAM)	More resources per container
High request volume, stateless service	Horizontal (containers)	Distribute load across more instances
Disk filling up	Vertical (disk)	More storage per container
Latency-sensitive workload on SHARED CPU	CPU mode → DEDICATED	Guaranteed cores, no burstable throttling

Applicability Matrix

Service type	Vertical autoscaling	Horizontal scaling	Notes
Runtime (Node.js, Go, PHP, Python, Java, etc.)	Yes	Yes (1-10 containers)	Full autoscaling
Linux containers (Alpine, Ubuntu)	Yes	Yes (1-10 containers)	Same as runtimes
Managed DB (PostgreSQL, MariaDB)	Yes	No (fixed: NON_HA=1, HA=3)	Mode immutable after creation
Managed cache (KeyDB/Valkey)	Yes	No (fixed: NON_HA=1, HA=3)	Mode immutable after creation
Shared storage	No (automatic, not configurable)	No (fixed: NON_HA=1, HA=3)	DO NOT set verticalAutoscaling in import.yml
Object storage	No	No	Fixed size at creation, no verticalAutoscaling
Docker	No (manual, triggers VM restart)	Yes (VM count changeable, triggers restart)	No autoscaling at all

Vertical Autoscaling

CPU Modes

Mode	Behavior	Best for
SHARED	Physical core shared with up to 10 tenants. Performance ranges 1/10 to 10/10 depending on neighbors	Dev, staging, low-traffic production
DEDICATED	Exclusive full physical core(s). Consistent performance	Production, CPU-intensive workloads

CPU mode can be changed once per hour
startCpuCoreCount: cores allocated at container start (default: 2). Increase for apps with heavy initialization

CPU Scaling Thresholds (DEDICATED mode only)

Min free CPU cores (minFreeCpuCores): scale-up triggers when free capacity on a single core drops below this fraction, range 0.0-1.0 (default: 0.1 = 10%)
Min free CPU percent (minFreeCpuPercent): scale-up triggers when total free capacity across all cores drops below this percentage, range 0-100 (default: 0, disabled)

RAM Dual-Threshold System

RAM is monitored every 10 seconds. Two independent thresholds control scale-up -- whichever provides more free memory wins:

Threshold	Field	Default	Behavior
Absolute	`minFreeRamGB`	0.0625 GB (64 MB)	Scale up when free RAM drops below this fixed amount
Percentage	`minFreeRamPercent`	0% (disabled)	Scale up when free RAM drops below this % of granted RAM

Both thresholds serve dual purpose: prevent OOM crashes and preserve space for kernel disk caching. Swap is enabled as a safety net but does not replace proper threshold configuration.

Higher wins example: with 12 GB granted RAM, minFreeRamGB=0.5 (500 MB buffer) and minFreeRamPercent=5 (600 MB buffer) — the 600 MB threshold applies. As granted RAM grows, the percentage threshold automatically provides a larger buffer.

Disk

Grows only -- never shrinks (no scale-down). Set minDisk = maxDisk to disable.

Resource Limits (Defaults)

Resource	Min	Max
CPU cores	1	8
RAM	0.125 GB	48 GB
Disk	1 GB	250 GB

PostgreSQL and MariaDB override RAM minimum to 0.25 GB.

Scaling Behavior Parameters

Parameter	CPU	RAM	Disk
Collection interval	10s	10s	10s
Scale-up window	20s	10s	10s
Scale-down window	60s	120s	300s
Scale-up percentile	60th	50th	50th
Scale-down percentile	40th	50th	50th
Minimum step	1 (0.1 cores)	0.125 GB	0.5 GB
Maximum step	40	32 GB	128 GB

Scaling uses exponential growth: small increments initially, larger jumps under sustained high load.

Scale-up behavior summary:

RAM/Disk: immediate scale-up when free resources drop below threshold (single measurement)
CPU: requires 2 consecutive measurements below threshold (~20s window) to avoid spikes
Scale-down is conservative: 3-5 consecutive measurements above threshold (60-300s depending on resource) — prevents flapping

Spike Protection via minRam

Autoscaling reacts within 10-20 seconds. Compilation and package installation create RAM spikes faster than scaling can respond. Set minRam high enough to absorb the first spike WITHOUT relying on autoscaling:

Dev services (compilation on container via SSH): minRam must cover the build tool peak — npm install, go build, cargo build spike within seconds
Stage/prod services (pre-built artifacts): minRam only needs to cover the startup peak (JVM heap allocation, SSR warming)

Thresholds (minFreeRamGB, minFreeRamPercent) handle gradual load growth. They cannot protect against sub-10s spikes that exceed the total allocated RAM. See runtime guides for per-runtime minRam recommendations.

Disabling autoscaling: set minimum = maximum for any resource to pin it at a fixed value (e.g., minRam: 2, maxRam: 2).

Horizontal Scaling

Applies to runtimes and Linux containers only. New containers are added when vertical scaling reaches configured maximums.

minContainers: baseline always running (system range: 1-10)
maxContainers: upper limit during peak (system range: 1-10)
Set minContainers = maxContainers to disable horizontal autoscaling

HA requirement: applications must be stateless and handle distributed operation (no local file sessions, no local uploads).

Managed Services (DB, Cache, Shared Storage)

Container count is fixed by deployment mode, set at creation, immutable:

Mode	Containers	Use case
`NON_HA`	1	Development, non-critical
`HA`	3 (on separate physical machines)	Production, automatic failover

HA recovery: failed container is disconnected, new one created on different hardware, data synchronized from healthy copies, failed container removed.

PostgreSQL HA exposes read replica port 5433 for distributing SELECT queries.

Configuring Thresholds via zerops_scale

Threshold parameters can be set via the zerops_scale MCP tool, not just import.yml:

zerops_scale serviceHostname="api" minFreeRamGB=0.5 minFreeRamPercent=5 minFreeCpuCores=0.2

All four threshold parameters (minFreeRamGB, minFreeRamPercent, minFreeCpuCores, minFreeCpuPercent) are optional and can be combined with any other scaling parameters in a single call.

Docker Services

Run in VMs, not containers. No autoscaling -- resources fixed at creation
Changing resources or VM count triggers VM restart (downtime). Disk can only increase
Consider runtime services or Linux containers if dynamic scaling is needed

import.yml Syntax

services:
  # Runtime with full scaling
  - hostname: api
    type: nodejs@22
    minContainers: 2
    maxContainers: 6
    verticalAutoscaling:
      cpuMode: SHARED
      minCpu: 1
      maxCpu: 4
      startCpuCoreCount: 2
      minRam: 0.5
      maxRam: 8
      minFreeRamGB: 0.125
      minFreeRamPercent: 10
      minDisk: 1
      maxDisk: 20

  # Managed DB (vertical only, no container settings)
  - hostname: db
    type: postgresql@16
    mode: HA
    verticalAutoscaling:
      cpuMode: DEDICATED
      minCpu: 1
      maxCpu: 4
      minRam: 1
      maxRam: 16
      minDisk: 5
      maxDisk: 100

Strategy Presets

Development — SHARED CPU, min resources, 1 container. Cost-effective for dev/staging:

zerops_scale serviceHostname="api" cpuMode="SHARED" minCpu=1 maxCpu=2 minRam=0.25 maxRam=1 minContainers=1 maxContainers=1

Production — DEDICATED CPU, higher minimums, multiple containers for HA:

zerops_scale serviceHostname="api" cpuMode="DEDICATED" minCpu=2 maxCpu=8 minRam=2 maxRam=8 minContainers=2 maxContainers=6

Burst workloads — Wide autoscaling range, SHARED CPU:

zerops_scale serviceHostname="worker" cpuMode="SHARED" minCpu=1 maxCpu=8 minRam=1 maxRam=16 minContainers=1 maxContainers=10

Common Mistakes

DO NOT add verticalAutoscaling to object-storage or shared-storage services in import.yml -- causes import failure. Object storage has a fixed objectStorageSize only. Shared storage is managed automatically.

DO NOT set minContainers or maxContainers for managed services (DB, cache, shared-storage) -- container count is fixed by mode (NON_HA=1, HA=3). Setting these causes import failure.

DO NOT use DEDICATED CPU for low-traffic or dev services -- wastes resources. Use SHARED and switch to DEDICATED only when consistent performance matters.

DO NOT set minFreeRamGB: 0 and minFreeRamPercent: 0 simultaneously -- the API rejects this with "Invalid custom autoscaling value". Always keep at least the default absolute threshold (0.0625 GB).

DO NOT forget that disk never shrinks -- setting a high minDisk is permanent for that container's lifetime.

DO NOT assume horizontal scaling works automatically -- your application must be stateless. File-based sessions, local uploads, and in-memory state break with multiple containers.

When to Scale Which Way​

Applicability Matrix​

Vertical Autoscaling​

CPU Modes​

CPU Scaling Thresholds (DEDICATED mode only)​

RAM Dual-Threshold System​

Disk​

Resource Limits (Defaults)​

Scaling Behavior Parameters​

Spike Protection via minRam​

Horizontal Scaling​

Managed Services (DB, Cache, Shared Storage)​

Configuring Thresholds via zerops_scale​

Docker Services​

import.yml Syntax​

Strategy Presets​

Common Mistakes​