Skip to main content
Skip to main content

Scaling and Autoscaling

Keywords

scaling, autoscaling, vertical scaling, horizontal scaling, CPU, RAM, disk, containers, SHARED, DEDICATED, cpuMode, minCpu, maxCpu, minRam, maxRam, minDisk, maxDisk, minContainers, maxContainers, minFreeRamGB, minFreeRamPercent, startCpuCoreCount, verticalAutoscaling, HA, NON_HA, OOM, out of memory, scale up, scale down, threshold, Docker VM

TL;DR

Zerops autoscales vertically (CPU/RAM/disk) and horizontally (container count). Runtimes support both. Managed services (DB, cache, shared-storage) support vertical only with fixed container count (NON_HA=1, HA=3). Object-storage and Docker have no autoscaling. Extends grammar.md section 9 with mechanics, thresholds, YAML syntax, and common mistakes.

When to Scale Which Way

SymptomScale typeWhy
CPU/memory pressure on existing containersVertical (CPU/RAM)More resources per container
High request volume, stateless serviceHorizontal (containers)Distribute load across more instances
Disk filling upVertical (disk)More storage per container
Latency-sensitive workload on SHARED CPUCPU mode → DEDICATEDGuaranteed cores, no burstable throttling

Applicability Matrix

Service typeVertical autoscalingHorizontal scalingNotes
Runtime (Node.js, Go, PHP, Python, Java, etc.)YesYes (1-10 containers)Full autoscaling
Linux containers (Alpine, Ubuntu)YesYes (1-10 containers)Same as runtimes
Managed DB (PostgreSQL, MariaDB)YesNo (fixed: NON_HA=1, HA=3)Mode immutable after creation
Managed cache (KeyDB/Valkey)YesNo (fixed: NON_HA=1, HA=3)Mode immutable after creation
Shared storageNo (automatic, not configurable)No (fixed: NON_HA=1, HA=3)DO NOT set verticalAutoscaling in import.yml
Object storageNoNoFixed size at creation, no verticalAutoscaling
DockerNo (manual, triggers VM restart)Yes (VM count changeable, triggers restart)No autoscaling at all

Vertical Autoscaling

CPU Modes

ModeBehaviorBest for
SHAREDPhysical core shared with up to 10 tenants. Performance ranges 1/10 to 10/10 depending on neighborsDev, staging, low-traffic production
DEDICATEDExclusive full physical core(s). Consistent performanceProduction, CPU-intensive workloads
  • CPU mode can be changed once per hour
  • startCpuCoreCount: cores allocated at container start (default: 2). Increase for apps with heavy initialization

CPU Scaling Thresholds (DEDICATED mode only)

  • Min free CPU cores (minFreeCpuCores): scale-up triggers when free capacity on a single core drops below this fraction, range 0.0-1.0 (default: 0.1 = 10%)
  • Min free CPU percent (minFreeCpuPercent): scale-up triggers when total free capacity across all cores drops below this percentage, range 0-100 (default: 0, disabled)

RAM Dual-Threshold System

RAM is monitored every 10 seconds. Two independent thresholds control scale-up -- whichever provides more free memory wins:

ThresholdFieldDefaultBehavior
AbsoluteminFreeRamGB0.0625 GB (64 MB)Scale up when free RAM drops below this fixed amount
PercentageminFreeRamPercent0% (disabled)Scale up when free RAM drops below this % of granted RAM

Both thresholds serve dual purpose: prevent OOM crashes and preserve space for kernel disk caching. Swap is enabled as a safety net but does not replace proper threshold configuration.

Higher wins example: with 12 GB granted RAM, minFreeRamGB=0.5 (500 MB buffer) and minFreeRamPercent=5 (600 MB buffer) — the 600 MB threshold applies. As granted RAM grows, the percentage threshold automatically provides a larger buffer.

Disk

  • Grows only -- never shrinks (no scale-down). Set minDisk = maxDisk to disable.

Resource Limits (Defaults)

ResourceMinMax
CPU cores18
RAM0.125 GB48 GB
Disk1 GB250 GB

PostgreSQL and MariaDB override RAM minimum to 0.25 GB.

Scaling Behavior Parameters

ParameterCPURAMDisk
Collection interval10s10s10s
Scale-up window20s10s10s
Scale-down window60s120s300s
Scale-up percentile60th50th50th
Scale-down percentile40th50th50th
Minimum step1 (0.1 cores)0.125 GB0.5 GB
Maximum step4032 GB128 GB

Scaling uses exponential growth: small increments initially, larger jumps under sustained high load.

Scale-up behavior summary:

  • RAM/Disk: immediate scale-up when free resources drop below threshold (single measurement)
  • CPU: requires 2 consecutive measurements below threshold (~20s window) to avoid spikes
  • Scale-down is conservative: 3-5 consecutive measurements above threshold (60-300s depending on resource) — prevents flapping

Spike Protection via minRam

Autoscaling reacts within 10-20 seconds. Compilation and package installation create RAM spikes faster than scaling can respond. Set minRam high enough to absorb the first spike WITHOUT relying on autoscaling:

  • Dev services (compilation on container via SSH): minRam must cover the build tool peak — npm install, go build, cargo build spike within seconds
  • Stage/prod services (pre-built artifacts): minRam only needs to cover the startup peak (JVM heap allocation, SSR warming)

Thresholds (minFreeRamGB, minFreeRamPercent) handle gradual load growth. They cannot protect against sub-10s spikes that exceed the total allocated RAM. See runtime guides for per-runtime minRam recommendations.

Disabling autoscaling: set minimum = maximum for any resource to pin it at a fixed value (e.g., minRam: 2, maxRam: 2).

Horizontal Scaling

Applies to runtimes and Linux containers only. New containers are added when vertical scaling reaches configured maximums.

  • minContainers: baseline always running (system range: 1-10)
  • maxContainers: upper limit during peak (system range: 1-10)
  • Set minContainers = maxContainers to disable horizontal autoscaling

HA requirement: applications must be stateless and handle distributed operation (no local file sessions, no local uploads).

Managed Services (DB, Cache, Shared Storage)

Container count is fixed by deployment mode, set at creation, immutable:

ModeContainersUse case
NON_HA1Development, non-critical
HA3 (on separate physical machines)Production, automatic failover

HA recovery: failed container is disconnected, new one created on different hardware, data synchronized from healthy copies, failed container removed.

PostgreSQL HA exposes read replica port 5433 for distributing SELECT queries.

Configuring Thresholds via zerops_scale

Threshold parameters can be set via the zerops_scale MCP tool, not just import.yml:

zerops_scale serviceHostname="api" minFreeRamGB=0.5 minFreeRamPercent=5 minFreeCpuCores=0.2

All four threshold parameters (minFreeRamGB, minFreeRamPercent, minFreeCpuCores, minFreeCpuPercent) are optional and can be combined with any other scaling parameters in a single call.

Docker Services

  • Run in VMs, not containers. No autoscaling -- resources fixed at creation
  • Changing resources or VM count triggers VM restart (downtime). Disk can only increase
  • Consider runtime services or Linux containers if dynamic scaling is needed

import.yml Syntax

services:
# Runtime with full scaling
- hostname: api
type: nodejs@22
minContainers: 2
maxContainers: 6
verticalAutoscaling:
cpuMode: SHARED
minCpu: 1
maxCpu: 4
startCpuCoreCount: 2
minRam: 0.5
maxRam: 8
minFreeRamGB: 0.125
minFreeRamPercent: 10
minDisk: 1
maxDisk: 20

# Managed DB (vertical only, no container settings)
- hostname: db
type: postgresql@16
mode: HA
verticalAutoscaling:
cpuMode: DEDICATED
minCpu: 1
maxCpu: 4
minRam: 1
maxRam: 16
minDisk: 5
maxDisk: 100

Strategy Presets

Development — SHARED CPU, min resources, 1 container. Cost-effective for dev/staging:

zerops_scale serviceHostname="api" cpuMode="SHARED" minCpu=1 maxCpu=2 minRam=0.25 maxRam=1 minContainers=1 maxContainers=1

Production — DEDICATED CPU, higher minimums, multiple containers for HA:

zerops_scale serviceHostname="api" cpuMode="DEDICATED" minCpu=2 maxCpu=8 minRam=2 maxRam=8 minContainers=2 maxContainers=6

Burst workloads — Wide autoscaling range, SHARED CPU:

zerops_scale serviceHostname="worker" cpuMode="SHARED" minCpu=1 maxCpu=8 minRam=1 maxRam=16 minContainers=1 maxContainers=10

Common Mistakes

DO NOT add verticalAutoscaling to object-storage or shared-storage services in import.yml -- causes import failure. Object storage has a fixed objectStorageSize only. Shared storage is managed automatically.

DO NOT set minContainers or maxContainers for managed services (DB, cache, shared-storage) -- container count is fixed by mode (NON_HA=1, HA=3). Setting these causes import failure.

DO NOT use DEDICATED CPU for low-traffic or dev services -- wastes resources. Use SHARED and switch to DEDICATED only when consistent performance matters.

DO NOT set minFreeRamGB: 0 and minFreeRamPercent: 0 simultaneously -- the API rejects this with "Invalid custom autoscaling value". Always keep at least the default absolute threshold (0.0625 GB).

DO NOT forget that disk never shrinks -- setting a high minDisk is permanent for that container's lifetime.

DO NOT assume horizontal scaling works automatically -- your application must be stateless. File-based sessions, local uploads, and in-memory state break with multiple containers.

See Also

  • zerops://themes/core -- import.yml schema and platform rules (section 9: Scaling basics)
  • zerops://guides/production-checklist -- HA mode, minContainers recommendations
  • zerops://themes/services -- managed service reference and mode constraints