Troubleshooting
Use troubleshooting when the agent is confused, a session was interrupted, deploy keeps failing, verification does not match the final answer, or you are taking over manually.
Start from current state, not chat memory:
That should make the agent read live services, saved workflow state, recent deploys, logs, events, and verification output before it edits anything else.
Find where the run is stuck
| Stuck point | Ask for | What you should get |
|---|---|---|
| Agent lost context | Current project status and services in scope. | Runtime target, managed services, last deploy, last verify result, and any saved workflow state. |
| Service setup is unclear | Runtime target and dependency plan. | Which existing services will be used, which missing services are needed, and what needs human approval. |
| Deploy failed | Failure category plus build logs, runtime logs, and recent service events. | A cause or next diagnostic step, not another blind deploy. |
| Runtime is reachable but app behavior fails | The failing behavior check and request-time runtime logs. | Endpoint/UI/job/data evidence tied to the product request. |
| Local app cannot reach services | VPN state, generated .env, and selected runtime/setup. | Whether MCP auth works separately from private service access. |
| Delivery is ambiguous | Delivery mode, git-push state, build integration, or handoff note. | What will happen after proof: direct deploy, git push, CI, package, or production handoff. |
Useful prompt:
Evidence order
The useful evidence depends on the failure category surfaced by deploy or verify tools.
| Category | Read first | Avoid |
|---|---|---|
build | Build logs, build commands, dependency manifests, deploy file list. | Runtime logs; the runtime did not start yet. |
start | Prepare/runtime logs, start command, ports, env references. | Rebuilding without checking why the process exited. |
verify | Failing check detail, HTTP response, request-time runtime logs, stored state. | Calling a green deploy "done" before behavior passes. |
network | VPN, SSH, DNS, subdomain readiness, service status, transport error. | Editing app code before proving connectivity. |
config | Field-level rejection, zerops.yaml, setup name, env references, service settings. | Guessing from service names or dashboard memory. |
credential | The named credential surface: ZCP_API_KEY, git, SSH, managed-service, CI, or external API. | Rotating unrelated secrets. |
other | Raw events/logs and the exact phase that failed. | Repeating the same attempt after the same unknown reason. |
Failure categories come from the deploy/verify evidence surface. They are not a verdict; they tell the agent where the next useful signal is.
Local setup checks
Local setup has two separate connections:
- MCP uses
ZCP_API_KEYto talk to the Zerops API. - Your local app and shell use
zcli vpn upto reach private service hostnames such asdborcache.
That means MCP can work while the app cannot reach the database.
| Symptom | Check |
|---|---|
Agent does not list the zerops MCP server. | Relaunch the agent from the directory containing .mcp.json. |
| MCP startup says the token reaches multiple projects. | Replace ZCP_API_KEY with a single-project token. |
Re-running zcp init made MCP disappear. | Re-add the ZCP_API_KEY env block to .mcp.json and restart the agent. |
Local app cannot reach db, cache, or storage hostnames. | Run zcli vpn up <project-id> again. |
| Local app reads stale credentials. | Regenerate .env; it is a snapshot, not a live sync. |
zcp is not found after install. | Add ~/.local/bin or the install target to PATH, then restart the shell/agent. |
For local setup details, use Run locally.
Manual takeover
If you take over from the agent, read evidence in this order:
- Service list and runtime target.
- Service-scoped events for the runtime in question.
- Build logs for build failures, runtime logs for start or request failures.
- Verify output for reachability and requested behavior.
- Git history only when delivery uses git-push or CI.
Do not inspect every service first. Start with the runtime in scope and expand only when the evidence points to a dependency.
When to stop
Stop the loop when:
- the same failure repeats without new evidence,
- the agent needs an external credential,
- the target runtime or stage choice is ambiguous,
- a destructive action would delete or replace a service,
- production release authority is needed,
- the request no longer fits the current project layout.
A blocker is acceptable only when it names the runtime in scope, failure category, evidence read, fixes attempted, and the human decision or credential still needed.
Before destructive recovery, read service-scoped events, logs, deploy/verify result, and git history when delivery uses git-push. Token scope and destructive confirmations are covered in Tokens and credentials.