Skip to main content
Skip to main content

Troubleshooting

Use troubleshooting when the agent is confused, a session was interrupted, deploy keeps failing, verification does not match the final answer, or you are taking over manually.

Start from current state, not chat memory:

Read current project status and tell me where things stand before changing anything.

That should make the agent read live services, saved workflow state, recent deploys, logs, events, and verification output before it edits anything else.

Find where the run is stuck

Stuck pointAsk forWhat you should get
Agent lost contextCurrent project status and services in scope.Runtime target, managed services, last deploy, last verify result, and any saved workflow state.
Service setup is unclearRuntime target and dependency plan.Which existing services will be used, which missing services are needed, and what needs human approval.
Deploy failedFailure category plus build logs, runtime logs, and recent service events.A cause or next diagnostic step, not another blind deploy.
Runtime is reachable but app behavior failsThe failing behavior check and request-time runtime logs.Endpoint/UI/job/data evidence tied to the product request.
Local app cannot reach servicesVPN state, generated .env, and selected runtime/setup.Whether MCP auth works separately from private service access.
Delivery is ambiguousDelivery mode, git-push state, build integration, or handoff note.What will happen after proof: direct deploy, git push, CI, package, or production handoff.

Useful prompt:

Show me the runtime in scope, failure category, evidence read, fixes tried, and the next decision needed.

Evidence order

The useful evidence depends on the failure category surfaced by deploy or verify tools.

CategoryRead firstAvoid
buildBuild logs, build commands, dependency manifests, deploy file list.Runtime logs; the runtime did not start yet.
startPrepare/runtime logs, start command, ports, env references.Rebuilding without checking why the process exited.
verifyFailing check detail, HTTP response, request-time runtime logs, stored state.Calling a green deploy "done" before behavior passes.
networkVPN, SSH, DNS, subdomain readiness, service status, transport error.Editing app code before proving connectivity.
configField-level rejection, zerops.yaml, setup name, env references, service settings.Guessing from service names or dashboard memory.
credentialThe named credential surface: ZCP_API_KEY, git, SSH, managed-service, CI, or external API.Rotating unrelated secrets.
otherRaw events/logs and the exact phase that failed.Repeating the same attempt after the same unknown reason.

Failure categories come from the deploy/verify evidence surface. They are not a verdict; they tell the agent where the next useful signal is.

Local setup checks

Local setup has two separate connections:

  • MCP uses ZCP_API_KEY to talk to the Zerops API.
  • Your local app and shell use zcli vpn up to reach private service hostnames such as db or cache.

That means MCP can work while the app cannot reach the database.

SymptomCheck
Agent does not list the zerops MCP server.Relaunch the agent from the directory containing .mcp.json.
MCP startup says the token reaches multiple projects.Replace ZCP_API_KEY with a single-project token.
Re-running zcp init made MCP disappear.Re-add the ZCP_API_KEY env block to .mcp.json and restart the agent.
Local app cannot reach db, cache, or storage hostnames.Run zcli vpn up <project-id> again.
Local app reads stale credentials.Regenerate .env; it is a snapshot, not a live sync.
zcp is not found after install.Add ~/.local/bin or the install target to PATH, then restart the shell/agent.

For local setup details, use Run locally.

Manual takeover

If you take over from the agent, read evidence in this order:

  1. Service list and runtime target.
  2. Service-scoped events for the runtime in question.
  3. Build logs for build failures, runtime logs for start or request failures.
  4. Verify output for reachability and requested behavior.
  5. Git history only when delivery uses git-push or CI.

Do not inspect every service first. Start with the runtime in scope and expand only when the evidence points to a dependency.

When to stop

Stop the loop when:

  • the same failure repeats without new evidence,
  • the agent needs an external credential,
  • the target runtime or stage choice is ambiguous,
  • a destructive action would delete or replace a service,
  • production release authority is needed,
  • the request no longer fits the current project layout.

A blocker is acceptable only when it names the runtime in scope, failure category, evidence read, fixes attempted, and the human decision or credential still needed.

Before destructive recovery, read service-scoped events, logs, deploy/verify result, and git history when delivery uses git-push. Token scope and destructive confirmations are covered in Tokens and credentials.