# "Scaling and Autoscaling" ## Keywords scaling, autoscaling, vertical scaling, horizontal scaling, CPU, RAM, disk, containers, SHARED, DEDICATED, cpuMode, minCpu, maxCpu, minRam, maxRam, minDisk, maxDisk, minContainers, maxContainers, minFreeRamGB, minFreeRamPercent, startCpuCoreCount, verticalAutoscaling, HA, NON_HA, OOM, out of memory, scale up, scale down, threshold, Docker VM ## TL;DR Zerops autoscales vertically (CPU/RAM/disk) and horizontally (container count). Runtimes support both. Managed services (DB, cache, shared-storage) support vertical only with fixed container count (NON_HA=1, HA=3). Object-storage and Docker have no autoscaling. Extends grammar.md section 9 with mechanics, thresholds, YAML syntax, and common mistakes. ## When to Scale Which Way | Symptom | Scale type | Why | |---------|-----------|-----| | CPU/memory pressure on existing containers | Vertical (CPU/RAM) | More resources per container | | High request volume, stateless service | Horizontal (containers) | Distribute load across more instances | | Disk filling up | Vertical (disk) | More storage per container | | Latency-sensitive workload on SHARED CPU | CPU mode → DEDICATED | Guaranteed cores, no burstable throttling | ## Applicability Matrix | Service type | Vertical autoscaling | Horizontal scaling | Notes | |---|---|---|---| | **Runtime** (Node.js, Go, PHP, Python, Java, etc.) | Yes | Yes (1-10 containers) | Full autoscaling | | **Linux containers** (Alpine, Ubuntu) | Yes | Yes (1-10 containers) | Same as runtimes | | **Managed DB** (PostgreSQL, MariaDB) | Yes | No (fixed: NON_HA=1, HA=3) | Mode immutable after creation | | **Managed cache** (KeyDB/Valkey) | Yes | No (fixed: NON_HA=1, HA=3) | Mode immutable after creation | | **Shared storage** | No (automatic, not configurable) | No (fixed: NON_HA=1, HA=3) | DO NOT set verticalAutoscaling in import.yml | | **Object storage** | No | No | Fixed size at creation, no verticalAutoscaling | | **Docker** | No (manual, triggers VM restart) | Yes (VM count changeable, triggers restart) | No autoscaling at all | ## Vertical Autoscaling ### CPU Modes | Mode | Behavior | Best for | |---|---|---| | **SHARED** | Physical core shared with up to 10 tenants. Performance ranges 1/10 to 10/10 depending on neighbors | Dev, staging, low-traffic production | | **DEDICATED** | Exclusive full physical core(s). Consistent performance | Production, CPU-intensive workloads | - CPU mode can be changed **once per hour** - **startCpuCoreCount**: cores allocated at container start (default: 2). Increase for apps with heavy initialization ### CPU Scaling Thresholds (DEDICATED mode only) - **Min free CPU cores** (`minFreeCpuCores`): scale-up triggers when free capacity on a single core drops below this fraction, range 0.0-1.0 (default: 0.1 = 10%) - **Min free CPU percent** (`minFreeCpuPercent`): scale-up triggers when total free capacity across all cores drops below this percentage, range 0-100 (default: 0, disabled) ### RAM Dual-Threshold System RAM is monitored every **10 seconds**. Two independent thresholds control scale-up -- whichever provides **more free memory** wins: | Threshold | Field | Default | Behavior | |---|---|---|---| | **Absolute** | `minFreeRamGB` | 0.0625 GB (64 MB) | Scale up when free RAM drops below this fixed amount | | **Percentage** | `minFreeRamPercent` | 0% (disabled) | Scale up when free RAM drops below this % of granted RAM | Both thresholds serve dual purpose: prevent OOM crashes and preserve space for kernel disk caching. Swap is enabled as a safety net but does not replace proper threshold configuration. **Higher wins example**: with 12 GB granted RAM, `minFreeRamGB=0.5` (500 MB buffer) and `minFreeRamPercent=5` (600 MB buffer) — the 600 MB threshold applies. As granted RAM grows, the percentage threshold automatically provides a larger buffer. ### Disk - **Grows only -- never shrinks** (no scale-down). Set `minDisk = maxDisk` to disable. ### Resource Limits (Defaults) | Resource | Min | Max | |---|---|---| | CPU cores | 1 | 8 | | RAM | 0.125 GB | 48 GB | | Disk | 1 GB | 250 GB | PostgreSQL and MariaDB override RAM minimum to **0.25 GB**. ### Scaling Behavior Parameters | Parameter | CPU | RAM | Disk | |---|---|---|---| | Collection interval | 10s | 10s | 10s | | Scale-up window | 20s | 10s | 10s | | Scale-down window | 60s | 120s | 300s | | Scale-up percentile | 60th | 50th | 50th | | Scale-down percentile | 40th | 50th | 50th | | Minimum step | 1 (0.1 cores) | 0.125 GB | 0.5 GB | | Maximum step | 40 | 32 GB | 128 GB | Scaling uses exponential growth: small increments initially, larger jumps under sustained high load. **Scale-up behavior summary:** - **RAM/Disk**: immediate scale-up when free resources drop below threshold (single measurement) - **CPU**: requires 2 consecutive measurements below threshold (~20s window) to avoid spikes - **Scale-down is conservative**: 3-5 consecutive measurements above threshold (60-300s depending on resource) — prevents flapping ### Spike Protection via minRam Autoscaling reacts within 10-20 seconds. Compilation and package installation create RAM spikes faster than scaling can respond. Set `minRam` high enough to absorb the first spike WITHOUT relying on autoscaling: - **Dev services** (compilation on container via SSH): `minRam` must cover the build tool peak — `npm install`, `go build`, `cargo build` spike within seconds - **Stage/prod services** (pre-built artifacts): `minRam` only needs to cover the startup peak (JVM heap allocation, SSR warming) Thresholds (`minFreeRamGB`, `minFreeRamPercent`) handle gradual load growth. They cannot protect against sub-10s spikes that exceed the total allocated RAM. See runtime guides for per-runtime `minRam` recommendations. **Disabling autoscaling**: set **minimum = maximum** for any resource to pin it at a fixed value (e.g., `minRam: 2, maxRam: 2`). ## Horizontal Scaling Applies to **runtimes and Linux containers only**. New containers are added when vertical scaling reaches configured maximums. - **minContainers**: baseline always running (system range: 1-10) - **maxContainers**: upper limit during peak (system range: 1-10) - Set `minContainers = maxContainers` to disable horizontal autoscaling **HA requirement**: applications must be stateless and handle distributed operation (no local file sessions, no local uploads). ### Managed Services (DB, Cache, Shared Storage) Container count is **fixed by deployment mode**, set at creation, **immutable**: | Mode | Containers | Use case | |---|---|---| | `NON_HA` | 1 | Development, non-critical | | `HA` | 3 (on separate physical machines) | Production, automatic failover | HA recovery: failed container is disconnected, new one created on different hardware, data synchronized from healthy copies, failed container removed. PostgreSQL HA exposes read replica port **5433** for distributing SELECT queries. ## Configuring Thresholds via zerops_scale Threshold parameters can be set via the `zerops_scale` MCP tool, not just import.yml: ``` zerops_scale serviceHostname="api" minFreeRamGB=0.5 minFreeRamPercent=5 minFreeCpuCores=0.2 ``` All four threshold parameters (`minFreeRamGB`, `minFreeRamPercent`, `minFreeCpuCores`, `minFreeCpuPercent`) are optional and can be combined with any other scaling parameters in a single call. ## Docker Services - Run in **VMs**, not containers. **No autoscaling** -- resources fixed at creation - Changing resources or VM count triggers **VM restart** (downtime). Disk can only increase - Consider runtime services or Linux containers if dynamic scaling is needed ## import.yml Syntax ```yaml services: # Runtime with full scaling - hostname: api type: nodejs@22 minContainers: 2 maxContainers: 6 verticalAutoscaling: cpuMode: SHARED minCpu: 1 maxCpu: 4 startCpuCoreCount: 2 minRam: 0.5 maxRam: 8 minFreeRamGB: 0.125 minFreeRamPercent: 10 minDisk: 1 maxDisk: 20 # Managed DB (vertical only, no container settings) - hostname: db type: postgresql@16 mode: HA verticalAutoscaling: cpuMode: DEDICATED minCpu: 1 maxCpu: 4 minRam: 1 maxRam: 16 minDisk: 5 maxDisk: 100 ``` ## Strategy Presets **Development** — SHARED CPU, min resources, 1 container. Cost-effective for dev/staging: ``` zerops_scale serviceHostname="api" cpuMode="SHARED" minCpu=1 maxCpu=2 minRam=0.25 maxRam=1 minContainers=1 maxContainers=1 ``` **Production** — DEDICATED CPU, higher minimums, multiple containers for HA: ``` zerops_scale serviceHostname="api" cpuMode="DEDICATED" minCpu=2 maxCpu=8 minRam=2 maxRam=8 minContainers=2 maxContainers=6 ``` **Burst workloads** — Wide autoscaling range, SHARED CPU: ``` zerops_scale serviceHostname="worker" cpuMode="SHARED" minCpu=1 maxCpu=8 minRam=1 maxRam=16 minContainers=1 maxContainers=10 ``` ## Common Mistakes **DO NOT** add `verticalAutoscaling` to **object-storage** or **shared-storage** services in import.yml -- causes import failure. Object storage has a fixed `objectStorageSize` only. Shared storage is managed automatically. **DO NOT** set `minContainers` or `maxContainers` for managed services (DB, cache, shared-storage) -- container count is fixed by `mode` (NON_HA=1, HA=3). Setting these causes import failure. **DO NOT** use `DEDICATED` CPU for low-traffic or dev services -- wastes resources. Use `SHARED` and switch to `DEDICATED` only when consistent performance matters. **DO NOT** set `minFreeRamGB: 0` and `minFreeRamPercent: 0` simultaneously -- the API rejects this with "Invalid custom autoscaling value". Always keep at least the default absolute threshold (0.0625 GB). **DO NOT** forget that disk **never shrinks** -- setting a high `minDisk` is permanent for that container's lifetime. **DO NOT** assume horizontal scaling works automatically -- your application must be stateless. File-based sessions, local uploads, and in-memory state break with multiple containers. ## See Also - zerops://themes/core -- import.yml schema and platform rules (section 9: Scaling basics) - zerops://guides/production-checklist -- HA mode, minContainers recommendations - zerops://themes/services -- managed service reference and mode constraints