Security (4 issues):
1. Remove the 172.* Docker-bridge auth bypass — any bridge container inherited
tag:admin (incl /unlock, /progress/reset). Bridge callers now need Tailscale
identity or bearer token. (kua-mcp-core unaffected — reaches engine via
docker exec localhost.)
2. Validate request-supplied source_branch/target_branch on /release (400 on bad
input) before they reach the shell in release().
3. Check .ok on previously-ignored runOnServer results: post_deploy hook
(→partial), no-health-url docker compose ps (→unhealthy); add a catch to
rollback() so a failed rollback records failure instead of hanging 'running'.
4. Replace hardcoded bruno/gal Tailscale IP map with runtime resolution via the
tailscaled LocalAPI over the mounted socket (cached per host).
Regression fix (ensure-checkout):
- ensureCheckout now probes/clones the GIT ROOT (registry repo_dir), not deploy_dir.
They differ for sub-monorepo apps (coder-core: repo_dir=/root/apps/coder-core,
deploy_dir=.../services/production). Probing deploy_dir/.git falsely reported
MISSING and broke coder-core deploys (e33b1e9 regression). 18 normal apps where
repo_dir==deploy_dir are unchanged.
- ensureCheckout(server, deployDir, repoUrl): clone-if-missing, inside the
per-app acquireLock, called before deploy() git_pull and before rollback()
cd. No-op when .git present (asserts origin==repo_url if set); requires
registry repo_url when absent; refuses to clobber a non-empty non-repo dir.
- rollback(appName, opts): opts.to_ref (validated /^[A-Za-z0-9._/-]+$/,
rejected before any mutation) checks out that ref; default = previous
successful tag from deployHistory. fetch now --prune --tags.
- route POST /api/v1/apps/:app/rollback reads body.to_ref.
The prior expected_image_sha was captured via docker compose images, which returns the image of the existing (pre-recreate) container — not the freshly-built one. Switch to docker images ${project}-${service}:latest --quiet --no-trunc which returns the post-build image SHA. Also normalize sha256: prefix in completeSelfRecreate comparison so docker compose images output (sha256-prefixed) and docker inspect output (also sha256-prefixed) match cleanly.
When kua-deploy is recreating itself (target appName == kua-deploy on same host), the OLD process is about to be killed by the docker daemon mid-flight. Without a handoff, progress would be stuck at deploy:running forever and release-app would poll until timeout.
Self-recreate path: (1) pre-mark progress phase=self_recreate_pending with the freshly-built image SHA + deployStartTs + stateless services list; (2) fire-and-forget recreateService (do not await its return — the OLD process is dying anyway); (3) sleep 90s as a ceiling — if were still alive, recreate failed and we throw.
On startup, completeSelfRecreate() reads progress-kua-deploy.json; if phase is self_recreate_pending, queries its own container via docker inspect, compares running image SHA to the pre-recreate expected SHA, checks StartedAt > recreate_started_at + state=running, then writes phase=succeeded (or failed) plus a verify struct on the deploy step. Idempotent — no-op if no marker is found.
Replaces the runOnServer("docker compose up -d --force-recreate") pattern with a one-shot transient docker:cli container that runs OUTSIDE kua-deploy lifecycle. Solves the self-recreate chicken-and-egg: when the target is kua-deploy itself, the recreate completes because the transient survives kua-deploy stopping (docker daemon does the actual work).
Secrets are fetched via kua-vault export, written to a 600-perm tempfile on /app/data, passed via --env-file (docker CLI reads it from kua-deploys perspective; never on the docker run command line). Tempfile is unlinked in finally{}.
Replaces: deploy() stateless recreate (force=true), deploy() stateful up (force=false), rollback() recreate (force=true with all-services svcList).
Build step keeps runOnServer (local exec on bruno) since build doesnt kill kua-deploy. envPrefix/kvPrefix vars retained for the build command.
Split rationale: kua-deploy used to be a service in coder-core/services/kua-services/docker-compose.yml, which meant every release-app coder-core rebuilt+force-recreated kua-deploy as a side-effect. The recreate-self path is structurally racy (the compose-up process is killed mid-flight when its own container stops), causing silent false-success deploys.
This split makes kua-deploy its own deploy unit (own repo, own compose project, own release-app entry), so coder-core releases no longer touch it. Phase A (transient-container recreateService pattern) will follow to make deliberate kua-deploy self-updates also reliable.
Handoff: v2-deploy-coordination -> kua-deploy-split (.sessions.md 2026-05-21 21:35).