commit 26804c692ed560d16ee52a42e1f371630fc95e3a Author: kua-deploy-split Date: Thu May 21 18:04:45 2026 -0400 feat: initial commit — extracted from coder-core/services/kua-deploy Split rationale: kua-deploy used to be a service in coder-core/services/kua-services/docker-compose.yml, which meant every release-app coder-core rebuilt+force-recreated kua-deploy as a side-effect. The recreate-self path is structurally racy (the compose-up process is killed mid-flight when its own container stops), causing silent false-success deploys. This split makes kua-deploy its own deploy unit (own repo, own compose project, own release-app entry), so coder-core releases no longer touch it. Phase A (transient-container recreateService pattern) will follow to make deliberate kua-deploy self-updates also reliable. Handoff: v2-deploy-coordination -> kua-deploy-split (.sessions.md 2026-05-21 21:35). diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..25948fe --- /dev/null +++ b/Dockerfile @@ -0,0 +1,18 @@ +FROM node:22-alpine + +RUN apk add --no-cache openssh-client bash docker-cli docker-cli-compose git curl jq \ + && git config --global --add safe.directory '*' + +ENV DOCKER_BUILDKIT=0 +ENV COMPOSE_DOCKER_CLI_BUILD=0 + +WORKDIR /app + +COPY package.json ./ +RUN npm install --production + +COPY . . + +EXPOSE 3200 + +CMD ["node", "server.js"] diff --git a/NOTES-image-digest-pinning.md b/NOTES-image-digest-pinning.md new file mode 100644 index 0000000..b83d617 --- /dev/null +++ b/NOTES-image-digest-pinning.md @@ -0,0 +1,89 @@ +# NOTES — compose-image digest pinning (deferred prevention item #4) + +**Date:** 2026-05-20 +**Author:** v2-deploy-coordination session (kua-deploy-verify branch) +**Status:** design sketch, **not implemented** — needs platform-wide coordination. + +## Why this matters + +Today every app's `docker-compose.yml` references its app image by **tag** — +typically `:latest`. `docker compose up -d` (without `--force-recreate`) +is then a legal no-op when the tag string is unchanged, *even if the +digest behind that tag changed*. That is structurally why kua-deploy's +`deploy` step false-success'd on muralla 2026-05-19 — the build pushed +`muralla-muralla:latest` to a new digest, but the running container was +still bound to the previous digest, and `up -d` saw "no change." + +The other prevention items in this branch (post-deploy SHA verify, +runtime-status endpoint, release-app post-verify) **detect** the failure +after the fact. **Digest pinning would prevent it at the compose level** +— the reference itself changes every build, so compose has no choice +but to recreate. + +## The design + +1. **Build step writes the just-built digest to a small artifact**, e.g. + `/root/apps//.deploy/.sha` + contents = `sha256:655fe0c64391...`. Captured immediately after + `docker compose build` from `docker compose images --quiet `. + +2. **Deploy step rewrites compose with the pinned digest** before `up`, + either: + - **In-place rewrite** of `docker-compose.yml`: replace + `image: muralla-muralla:latest` with `image: muralla-muralla@`. + Risk: dirties Bruno's working tree → trips the existing dirty-tree + gate. Mitigation: rewrite a temporary `docker-compose.deploy.yml` + instead, pass via `-f` to compose. + - **Override file**: keep `docker-compose.yml` untouched; generate a + sibling `docker-compose.override.deploy.yml` per build that just + contains `services: : image: muralla-muralla@`. Apply via + `docker compose -f docker-compose.yml -f docker-compose.override.deploy.yml up -d`. + Cleaner; doesn't dirty the primary file. + +3. **Recreate is then guaranteed**: compose sees the `image:` field + changed → must recreate. The post-deploy verifier from this branch + becomes a belt-and-suspenders, not the load-bearing safety. + +## Why this is deferred + +- Touches **every app's deploy path** — kua-cashier, muralla, + muralla-socials, atlas, playgram, coder-core, kua-mail, etc. Each + needs its compose conventions checked (some may already pin digests; + some may use private registries with digest-bound auth). +- Interacts with the **dual-SoT** between `bin/webhook-repos.json` and + `services/kua-deploy/deploy-registry.json` (the coordinator broadcast + called this out — cadencia is mid-untangling it). +- The post-deploy verify alone closes the false-success class for now; + digest pinning is the *durable* fix layered on top. +- Stateful services (postgres, redis) explicitly do **not** want digest + pinning at the compose level — they should drift across kua-deploy + releases. The override-file pattern naturally limits pinning to the + app's own services. + +## Recommended owner / sequencing + +- Land this branch (`chore/kua-deploy-sha-verify-and-release-app-post`) + first — verify catches false-success cleanly. +- Sequence after the coordinator's `chore/deploy-mode-default-direct-and-docs` + branch (cadencia) so the webhook→direct migration is in place. +- Then a separate focused branch implements the override-file pattern + in kua-deploy + a per-app smoke test (one app at a time, kua-cashier + is a good first because it's not yet in production traffic). +- Add an integration test that simulates compose-no-op (build new + image, run deploy without bumping tag) and confirms the override-file + variant recreates while the plain-tag variant would not. + +## Open questions + +- Should we keep `:latest` as the *human-readable* tag and *additionally* + pin by digest in the override? (Yes — operators still want + `muralla-muralla:latest` as a stable rollback anchor.) +- Where does the override file live across the deploy lifecycle? + Tempted: in `/root/apps//.deploy/` (git-ignored), rotated per + build, kept for N builds for fast rollback. +- Does the existing `release-app` workflow pass an explicit deploy SHA + that the build step could record alongside the image digest? (Yes — + the `deployCommit` variable in kua-deploy server.js already carries + this.) + +— end notes — diff --git a/README.md b/README.md new file mode 100644 index 0000000..e9bef47 --- /dev/null +++ b/README.md @@ -0,0 +1,32 @@ +# kua-deploy + +Authoritative deploy orchestrator for the Kua infrastructure fleet. Receives release triggers (admin API and Forgejo webhooks), runs git-pull → migration gate → docker build → recreate → SHA-verify on managed apps. + +Split out of `coder-core/services/kua-deploy/` on 2026-05-21 to break the self-rebuild loop that ran every coder-core release through this service as a side-effect. + +## Layout + +- `server.js` — Fastify app exposing `/api/v1/apps/:app/deploy`, `/progress`, `/runtime-status`, `/webhook/forgejo`. +- `Dockerfile` — node:22-alpine + docker-cli + ssh + git + kua-vault binary (mounted at runtime). +- `docker-compose.yml` — single-service compose project. Joins `kua-services` + `production_proxy` networks. +- `kua.json` — release-app manifest (`mode: direct`, `server: bruno`). +- `NOTES-image-digest-pinning.md` — design notes for deferred prevention #4. + +## Registry + +`deploy-registry.json` lives in `coder-core/services/kua-deploy/deploy-registry.json` and is bind-mounted in at `/app/deploy-registry.json`. This is a transitional arrangement; a future change can migrate the registry into this repo. + +## Deploying kua-deploy + +Via release-app: + +``` +release-app kua-deploy +``` + +Which goes through `kua-deploy`'s own admin POST `/api/v1/apps/kua-deploy/deploy` and uses the transient-container recreate pattern (Phase A) so the service can replace its own running container without false-success. + +## See also + +- `services/kua-deploy/NOTES-image-digest-pinning.md` in this repo +- `infra-docs/docs/04-operations/deploy-listener.md` in coder-core (current-state callout + deploy_mode reference) diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000..5ecd609 --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,61 @@ +# kua-deploy — extracted from coder-core/services/kua-services/docker-compose.yml on 2026-05-21. +# Run as its own compose project on Bruno, decoupled from coder-core's deploy lifecycle. +# +# Network attachments: +# - kua-services: reach kua-vault, kua-db, kua-mcp-core, etc. +# - production_proxy: reach forgejo (git operations) + Caddy edge labels +# +# Registry: deploy-registry.json is bind-mounted from coder-core's checkout +# during this transition. Future cleanup can migrate it into this repo. +services: + kua-deploy: + build: + context: . + dockerfile: Dockerfile + container_name: kua-deploy + restart: always + environment: + - HOSTNAME=bruno + - NODE_ENV=production + - KUA_VAULT_URL=http://kua-vault:3000 + - KUA_DB_URL=http://kua-db:3100 + - KUA_DB_ADMIN_TOKEN=${KUA_ADMIN_TOKEN:-} + - KUA_ALLOWED_NODES=${KUA_ALLOWED_NODES:-gal,bruno,genesis} + - KUA_DEPLOY_WEBHOOK_SECRET=${KUA_DEPLOY_WEBHOOK_SECRET:-} + - KUA_DEPLOY_ADMIN_TOKEN=${KUA_ADMIN_TOKEN:-} + ports: + - "100.74.17.6:3200:3200" + volumes: + - /var/run/tailscale/tailscaled.sock:/var/run/tailscale/tailscaled.sock:ro + - /var/run/docker.sock:/var/run/docker.sock:ro + - kua-deploy-data:/app/data + - /root/.ssh:/root/.ssh:ro + - /root/apps:/root/apps + - /root/apps/coder-core/services/kua-deploy/deploy-registry.json:/app/deploy-registry.json:ro + - /usr/local/bin/kua-vault:/usr/local/bin/kua-vault:ro + - /root/.config/kua-vault:/root/.config/kua-vault:ro + networks: + - kua-services + - production_proxy + labels: + - "caddy=deploy.kua.cl" + - "caddy.reverse_proxy={{upstreams 3200}}" + healthcheck: + test: ["CMD", "curl", "-sf", "http://localhost:3200/health"] + interval: 30s + timeout: 5s + retries: 3 + start_period: 10s + +volumes: + kua-deploy-data: + name: kua-services_kua-deploy-data + external: true + +networks: + kua-services: + name: kua-services + external: true + production_proxy: + name: production_proxy + external: true diff --git a/kua.json b/kua.json new file mode 100644 index 0000000..ad3af78 --- /dev/null +++ b/kua.json @@ -0,0 +1,20 @@ +{ + "name": "kua-deploy", + "git": { + "branch": { + "development": "main", + "production": "production" + } + }, + "deploy": { + "production": { + "mode": "direct", + "server": "bruno" + } + }, + "environments": { + "production": { + "server": "bruno" + } + } +} diff --git a/package-lock.json b/package-lock.json new file mode 100644 index 0000000..92fa501 --- /dev/null +++ b/package-lock.json @@ -0,0 +1,625 @@ +{ + "name": "kua-deploy", + "version": "1.0.0", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "name": "kua-deploy", + "version": "1.0.0", + "dependencies": { + "fastify": "^5.0.0" + } + }, + "node_modules/@fastify/ajv-compiler": { + "version": "4.0.5", + "resolved": "https://registry.npmjs.org/@fastify/ajv-compiler/-/ajv-compiler-4.0.5.tgz", + "integrity": "sha512-KoWKW+MhvfTRWL4qrhUwAAZoaChluo0m0vbiJlGMt2GXvL4LVPQEjt8kSpHI3IBq5Rez8fg+XeH3cneztq+C7A==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "ajv": "^8.12.0", + "ajv-formats": "^3.0.1", + "fast-uri": "^3.0.0" + } + }, + "node_modules/@fastify/error": { + "version": "4.2.0", + "resolved": "https://registry.npmjs.org/@fastify/error/-/error-4.2.0.tgz", + "integrity": "sha512-RSo3sVDXfHskiBZKBPRgnQTtIqpi/7zhJOEmAxCiBcM7d0uwdGdxLlsCaLzGs8v8NnxIRlfG0N51p5yFaOentQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT" + }, + "node_modules/@fastify/fast-json-stringify-compiler": { + "version": "5.0.3", + "resolved": "https://registry.npmjs.org/@fastify/fast-json-stringify-compiler/-/fast-json-stringify-compiler-5.0.3.tgz", + "integrity": "sha512-uik7yYHkLr6fxd8hJSZ8c+xF4WafPK+XzneQDPU+D10r5X19GW8lJcom2YijX2+qtFF1ENJlHXKFM9ouXNJYgQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "fast-json-stringify": "^6.0.0" + } + }, + "node_modules/@fastify/forwarded": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/@fastify/forwarded/-/forwarded-3.0.1.tgz", + "integrity": "sha512-JqDochHFqXs3C3Ml3gOY58zM7OqO9ENqPo0UqAjAjH8L01fRZqwX9iLeX34//kiJubF7r2ZQHtBRU36vONbLlw==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT" + }, + "node_modules/@fastify/merge-json-schemas": { + "version": "0.2.1", + "resolved": "https://registry.npmjs.org/@fastify/merge-json-schemas/-/merge-json-schemas-0.2.1.tgz", + "integrity": "sha512-OA3KGBCy6KtIvLf8DINC5880o5iBlDX4SxzLQS8HorJAbqluzLRn80UXU0bxZn7UOFhFgpRJDasfwn9nG4FG4A==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "dequal": "^2.0.3" + } + }, + "node_modules/@fastify/proxy-addr": { + "version": "5.1.0", + "resolved": "https://registry.npmjs.org/@fastify/proxy-addr/-/proxy-addr-5.1.0.tgz", + "integrity": "sha512-INS+6gh91cLUjB+PVHfu1UqcB76Sqtpyp7bnL+FYojhjygvOPA9ctiD/JDKsyD9Xgu4hUhCSJBPig/w7duNajw==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "@fastify/forwarded": "^3.0.0", + "ipaddr.js": "^2.1.0" + } + }, + "node_modules/@pinojs/redact": { + "version": "0.4.0", + "resolved": "https://registry.npmjs.org/@pinojs/redact/-/redact-0.4.0.tgz", + "integrity": "sha512-k2ENnmBugE/rzQfEcdWHcCY+/FM3VLzH9cYEsbdsoqrvzAKRhUZeRNhAZvB8OitQJ1TBed3yqWtdjzS6wJKBwg==", + "license": "MIT" + }, + "node_modules/abstract-logging": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/abstract-logging/-/abstract-logging-2.0.1.tgz", + "integrity": "sha512-2BjRTZxTPvheOvGbBslFSYOUkr+SjPtOnrLP33f+VIWLzezQpZcqVg7ja3L4dBXmzzgwT+a029jRx5PCi3JuiA==", + "license": "MIT" + }, + "node_modules/ajv": { + "version": "8.18.0", + "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.18.0.tgz", + "integrity": "sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A==", + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.3", + "fast-uri": "^3.0.1", + "json-schema-traverse": "^1.0.0", + "require-from-string": "^2.0.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/epoberezkin" + } + }, + "node_modules/ajv-formats": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-3.0.1.tgz", + "integrity": "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ==", + "license": "MIT", + "dependencies": { + "ajv": "^8.0.0" + }, + "peerDependencies": { + "ajv": "^8.0.0" + }, + "peerDependenciesMeta": { + "ajv": { + "optional": true + } + } + }, + "node_modules/atomic-sleep": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/atomic-sleep/-/atomic-sleep-1.0.0.tgz", + "integrity": "sha512-kNOjDqAh7px0XWNI+4QbzoiR/nTkHAWNud2uvnJquD1/x5a7EQZMJT0AczqK0Qn67oY/TTQ1LbUKajZpp3I9tQ==", + "license": "MIT", + "engines": { + "node": ">=8.0.0" + } + }, + "node_modules/avvio": { + "version": "9.2.0", + "resolved": "https://registry.npmjs.org/avvio/-/avvio-9.2.0.tgz", + "integrity": "sha512-2t/sy01ArdHHE0vRH5Hsay+RtCZt3dLPji7W7/MMOCEgze5b7SNDC4j5H6FnVgPkI1MTNFGzHdHrVXDDl7QSSQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "@fastify/error": "^4.0.0", + "fastq": "^1.17.1" + } + }, + "node_modules/cookie": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/cookie/-/cookie-1.1.1.tgz", + "integrity": "sha512-ei8Aos7ja0weRpFzJnEA9UHJ/7XQmqglbRwnf2ATjcB9Wq874VKH9kfjjirM6UhU2/E5fFYadylyhFldcqSidQ==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/dequal": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/dequal/-/dequal-2.0.3.tgz", + "integrity": "sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA==", + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/fast-decode-uri-component": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/fast-decode-uri-component/-/fast-decode-uri-component-1.0.1.tgz", + "integrity": "sha512-WKgKWg5eUxvRZGwW8FvfbaH7AXSh2cL+3j5fMGzUMCxWBJ3dV3a7Wz8y2f/uQ0e3B6WmodD3oS54jTQ9HVTIIg==", + "license": "MIT" + }, + "node_modules/fast-deep-equal": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", + "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==", + "license": "MIT" + }, + "node_modules/fast-json-stringify": { + "version": "6.3.0", + "resolved": "https://registry.npmjs.org/fast-json-stringify/-/fast-json-stringify-6.3.0.tgz", + "integrity": "sha512-oRCntNDY/329HJPlmdNLIdogNtt6Vyjb1WuT01Soss3slIdyUp8kAcDU3saQTOquEK8KFVfwIIF7FebxUAu+yA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "@fastify/merge-json-schemas": "^0.2.0", + "ajv": "^8.12.0", + "ajv-formats": "^3.0.1", + "fast-uri": "^3.0.0", + "json-schema-ref-resolver": "^3.0.0", + "rfdc": "^1.2.0" + } + }, + "node_modules/fast-querystring": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/fast-querystring/-/fast-querystring-1.1.2.tgz", + "integrity": "sha512-g6KuKWmFXc0fID8WWH0jit4g0AGBoJhCkJMb1RmbsSEUNvQ+ZC8D6CUZ+GtF8nMzSPXnhiePyyqqipzNNEnHjg==", + "license": "MIT", + "dependencies": { + "fast-decode-uri-component": "^1.0.1" + } + }, + "node_modules/fast-uri": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/fast-uri/-/fast-uri-3.1.0.tgz", + "integrity": "sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "BSD-3-Clause" + }, + "node_modules/fastify": { + "version": "5.8.4", + "resolved": "https://registry.npmjs.org/fastify/-/fastify-5.8.4.tgz", + "integrity": "sha512-sa42J1xylbBAYUWALSBoyXKPDUvM3OoNOibIefA+Oha57FryXKKCZarA1iDntOCWp3O35voZLuDg2mdODXtPzQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "@fastify/ajv-compiler": "^4.0.5", + "@fastify/error": "^4.0.0", + "@fastify/fast-json-stringify-compiler": "^5.0.0", + "@fastify/proxy-addr": "^5.0.0", + "abstract-logging": "^2.0.1", + "avvio": "^9.0.0", + "fast-json-stringify": "^6.0.0", + "find-my-way": "^9.0.0", + "light-my-request": "^6.0.0", + "pino": "^9.14.0 || ^10.1.0", + "process-warning": "^5.0.0", + "rfdc": "^1.3.1", + "secure-json-parse": "^4.0.0", + "semver": "^7.6.0", + "toad-cache": "^3.7.0" + } + }, + "node_modules/fastq": { + "version": "1.20.1", + "resolved": "https://registry.npmjs.org/fastq/-/fastq-1.20.1.tgz", + "integrity": "sha512-GGToxJ/w1x32s/D2EKND7kTil4n8OVk/9mycTc4VDza13lOvpUZTGX3mFSCtV9ksdGBVzvsyAVLM6mHFThxXxw==", + "license": "ISC", + "dependencies": { + "reusify": "^1.0.4" + } + }, + "node_modules/find-my-way": { + "version": "9.5.0", + "resolved": "https://registry.npmjs.org/find-my-way/-/find-my-way-9.5.0.tgz", + "integrity": "sha512-VW2RfnmscZO5KgBY5XVyKREMW5nMZcxDy+buTOsL+zIPnBlbKm+00sgzoQzq1EVh4aALZLfKdwv6atBGcjvjrQ==", + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.3", + "fast-querystring": "^1.0.0", + "safe-regex2": "^5.0.0" + }, + "engines": { + "node": ">=20" + } + }, + "node_modules/ipaddr.js": { + "version": "2.3.0", + "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-2.3.0.tgz", + "integrity": "sha512-Zv/pA+ciVFbCSBBjGfaKUya/CcGmUHzTydLMaTwrUUEM2DIEO3iZvueGxmacvmN50fGpGVKeTXpb2LcYQxeVdg==", + "license": "MIT", + "engines": { + "node": ">= 10" + } + }, + "node_modules/json-schema-ref-resolver": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/json-schema-ref-resolver/-/json-schema-ref-resolver-3.0.0.tgz", + "integrity": "sha512-hOrZIVL5jyYFjzk7+y7n5JDzGlU8rfWDuYyHwGa2WA8/pcmMHezp2xsVwxrebD/Q9t8Nc5DboieySDpCp4WG4A==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "dequal": "^2.0.3" + } + }, + "node_modules/json-schema-traverse": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz", + "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==", + "license": "MIT" + }, + "node_modules/light-my-request": { + "version": "6.6.0", + "resolved": "https://registry.npmjs.org/light-my-request/-/light-my-request-6.6.0.tgz", + "integrity": "sha512-CHYbu8RtboSIoVsHZ6Ye4cj4Aw/yg2oAFimlF7mNvfDV192LR7nDiKtSIfCuLT7KokPSTn/9kfVLm5OGN0A28A==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "BSD-3-Clause", + "dependencies": { + "cookie": "^1.0.1", + "process-warning": "^4.0.0", + "set-cookie-parser": "^2.6.0" + } + }, + "node_modules/light-my-request/node_modules/process-warning": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/process-warning/-/process-warning-4.0.1.tgz", + "integrity": "sha512-3c2LzQ3rY9d0hc1emcsHhfT9Jwz0cChib/QN89oME2R451w5fy3f0afAhERFZAwrbDU43wk12d0ORBpDVME50Q==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT" + }, + "node_modules/on-exit-leak-free": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/on-exit-leak-free/-/on-exit-leak-free-2.1.2.tgz", + "integrity": "sha512-0eJJY6hXLGf1udHwfNftBqH+g73EU4B504nZeKpz1sYRKafAghwxEJunB2O7rDZkL4PGfsMVnTXZ2EjibbqcsA==", + "license": "MIT", + "engines": { + "node": ">=14.0.0" + } + }, + "node_modules/pino": { + "version": "10.3.1", + "resolved": "https://registry.npmjs.org/pino/-/pino-10.3.1.tgz", + "integrity": "sha512-r34yH/GlQpKZbU1BvFFqOjhISRo1MNx1tWYsYvmj6KIRHSPMT2+yHOEb1SG6NMvRoHRF0a07kCOox/9yakl1vg==", + "license": "MIT", + "dependencies": { + "@pinojs/redact": "^0.4.0", + "atomic-sleep": "^1.0.0", + "on-exit-leak-free": "^2.1.0", + "pino-abstract-transport": "^3.0.0", + "pino-std-serializers": "^7.0.0", + "process-warning": "^5.0.0", + "quick-format-unescaped": "^4.0.3", + "real-require": "^0.2.0", + "safe-stable-stringify": "^2.3.1", + "sonic-boom": "^4.0.1", + "thread-stream": "^4.0.0" + }, + "bin": { + "pino": "bin.js" + } + }, + "node_modules/pino-abstract-transport": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/pino-abstract-transport/-/pino-abstract-transport-3.0.0.tgz", + "integrity": "sha512-wlfUczU+n7Hy/Ha5j9a/gZNy7We5+cXp8YL+X+PG8S0KXxw7n/JXA3c46Y0zQznIJ83URJiwy7Lh56WLokNuxg==", + "license": "MIT", + "dependencies": { + "split2": "^4.0.0" + } + }, + "node_modules/pino-std-serializers": { + "version": "7.1.0", + "resolved": "https://registry.npmjs.org/pino-std-serializers/-/pino-std-serializers-7.1.0.tgz", + "integrity": "sha512-BndPH67/JxGExRgiX1dX0w1FvZck5Wa4aal9198SrRhZjH3GxKQUKIBnYJTdj2HDN3UQAS06HlfcSbQj2OHmaw==", + "license": "MIT" + }, + "node_modules/process-warning": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/process-warning/-/process-warning-5.0.0.tgz", + "integrity": "sha512-a39t9ApHNx2L4+HBnQKqxxHNs1r7KF+Intd8Q/g1bUh6q0WIp9voPXJ/x0j+ZL45KF1pJd9+q2jLIRMfvEshkA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT" + }, + "node_modules/quick-format-unescaped": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/quick-format-unescaped/-/quick-format-unescaped-4.0.4.tgz", + "integrity": "sha512-tYC1Q1hgyRuHgloV/YXs2w15unPVh8qfu/qCTfhTYamaw7fyhumKa2yGpdSo87vY32rIclj+4fWYQXUMs9EHvg==", + "license": "MIT" + }, + "node_modules/real-require": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/real-require/-/real-require-0.2.0.tgz", + "integrity": "sha512-57frrGM/OCTLqLOAh0mhVA9VBMHd+9U7Zb2THMGdBUoZVOtGbJzjxsYGDJ3A9AYYCP4hn6y1TVbaOfzWtm5GFg==", + "license": "MIT", + "engines": { + "node": ">= 12.13.0" + } + }, + "node_modules/require-from-string": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/require-from-string/-/require-from-string-2.0.2.tgz", + "integrity": "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/ret": { + "version": "0.5.0", + "resolved": "https://registry.npmjs.org/ret/-/ret-0.5.0.tgz", + "integrity": "sha512-I1XxrZSQ+oErkRR4jYbAyEEu2I0avBvvMM5JN+6EBprOGRCs63ENqZ3vjavq8fBw2+62G5LF5XelKwuJpcvcxw==", + "license": "MIT", + "engines": { + "node": ">=10" + } + }, + "node_modules/reusify": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/reusify/-/reusify-1.1.0.tgz", + "integrity": "sha512-g6QUff04oZpHs0eG5p83rFLhHeV00ug/Yf9nZM6fLeUrPguBTkTQOdpAWWspMh55TZfVQDPaN3NQJfbVRAxdIw==", + "license": "MIT", + "engines": { + "iojs": ">=1.0.0", + "node": ">=0.10.0" + } + }, + "node_modules/rfdc": { + "version": "1.4.1", + "resolved": "https://registry.npmjs.org/rfdc/-/rfdc-1.4.1.tgz", + "integrity": "sha512-q1b3N5QkRUWUl7iyylaaj3kOpIT0N2i9MqIEQXP73GVsN9cw3fdx8X63cEmWhJGi2PPCF23Ijp7ktmd39rawIA==", + "license": "MIT" + }, + "node_modules/safe-regex2": { + "version": "5.1.0", + "resolved": "https://registry.npmjs.org/safe-regex2/-/safe-regex2-5.1.0.tgz", + "integrity": "sha512-pNHAuBW7TrcleFHsxBr5QMi/Iyp0ENjUKz7GCcX1UO7cMh+NmVK6HxQckNL1tJp1XAJVjG6B8OKIPqodqj9rtw==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "MIT", + "dependencies": { + "ret": "~0.5.0" + }, + "bin": { + "safe-regex2": "bin/safe-regex2.js" + } + }, + "node_modules/safe-stable-stringify": { + "version": "2.5.0", + "resolved": "https://registry.npmjs.org/safe-stable-stringify/-/safe-stable-stringify-2.5.0.tgz", + "integrity": "sha512-b3rppTKm9T+PsVCBEOUR46GWI7fdOs00VKZ1+9c1EWDaDMvjQc6tUwuFyIprgGgTcWoVHSKrU8H31ZHA2e0RHA==", + "license": "MIT", + "engines": { + "node": ">=10" + } + }, + "node_modules/secure-json-parse": { + "version": "4.1.0", + "resolved": "https://registry.npmjs.org/secure-json-parse/-/secure-json-parse-4.1.0.tgz", + "integrity": "sha512-l4KnYfEyqYJxDwlNVyRfO2E4NTHfMKAWdUuA8J0yve2Dz/E/PdBepY03RvyJpssIpRFwJoCD55wA+mEDs6ByWA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "BSD-3-Clause" + }, + "node_modules/semver": { + "version": "7.7.4", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.4.tgz", + "integrity": "sha512-vFKC2IEtQnVhpT78h1Yp8wzwrf8CM+MzKMHGJZfBtzhZNycRFnXsHk6E5TxIkkMsgNS7mdX3AGB7x2QM2di4lA==", + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/set-cookie-parser": { + "version": "2.7.2", + "resolved": "https://registry.npmjs.org/set-cookie-parser/-/set-cookie-parser-2.7.2.tgz", + "integrity": "sha512-oeM1lpU/UvhTxw+g3cIfxXHyJRc/uidd3yK1P242gzHds0udQBYzs3y8j4gCCW+ZJ7ad0yctld8RYO+bdurlvw==", + "license": "MIT" + }, + "node_modules/sonic-boom": { + "version": "4.2.1", + "resolved": "https://registry.npmjs.org/sonic-boom/-/sonic-boom-4.2.1.tgz", + "integrity": "sha512-w6AxtubXa2wTXAUsZMMWERrsIRAdrK0Sc+FUytWvYAhBJLyuI4llrMIC1DtlNSdI99EI86KZum2MMq3EAZlF9Q==", + "license": "MIT", + "dependencies": { + "atomic-sleep": "^1.0.0" + } + }, + "node_modules/split2": { + "version": "4.2.0", + "resolved": "https://registry.npmjs.org/split2/-/split2-4.2.0.tgz", + "integrity": "sha512-UcjcJOWknrNkF6PLX83qcHM6KHgVKNkV62Y8a5uYDVv9ydGQVwAHMKqHdJje1VTWpljG0WYpCDhrCdAOYH4TWg==", + "license": "ISC", + "engines": { + "node": ">= 10.x" + } + }, + "node_modules/thread-stream": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/thread-stream/-/thread-stream-4.0.0.tgz", + "integrity": "sha512-4iMVL6HAINXWf1ZKZjIPcz5wYaOdPhtO8ATvZ+Xqp3BTdaqtAwQkNmKORqcIo5YkQqGXq5cwfswDwMqqQNrpJA==", + "license": "MIT", + "dependencies": { + "real-require": "^0.2.0" + }, + "engines": { + "node": ">=20" + } + }, + "node_modules/toad-cache": { + "version": "3.7.0", + "resolved": "https://registry.npmjs.org/toad-cache/-/toad-cache-3.7.0.tgz", + "integrity": "sha512-/m8M+2BJUpoJdgAHoG+baCwBT+tf2VraSfkBgl0Y00qIWt41DJ8R5B8nsEw0I58YwF5IZH6z24/2TobDKnqSWw==", + "license": "MIT", + "engines": { + "node": ">=12" + } + } + } +} diff --git a/package.json b/package.json new file mode 100644 index 0000000..6ff7bd3 --- /dev/null +++ b/package.json @@ -0,0 +1,14 @@ +{ + "name": "kua-deploy", + "version": "1.0.0", + "description": "Deploy orchestration, release management, and rollback for Kua infrastructure", + "type": "module", + "main": "server.js", + "scripts": { + "start": "node server.js", + "dev": "node --watch server.js" + }, + "dependencies": { + "fastify": "^5.0.0" + } +} diff --git a/server.js b/server.js new file mode 100644 index 0000000..214ca83 --- /dev/null +++ b/server.js @@ -0,0 +1,1255 @@ +import Fastify from 'fastify'; +import fs from 'fs/promises'; +import path from 'path'; +import crypto from 'crypto'; +import http from 'http'; +import { exec as execCb, execFile as execFileCb } from 'child_process'; +import { promisify } from 'util'; + +const exec = promisify(execCb); +const execFile = promisify(execFileCb); + +// Input validation +const SAFE_MESSAGE_RE = /^[a-zA-Z0-9 _.,!?:;@#/()[\]{}<>='"+*&^%$~`|-]{1,500}$/; +function validateMessage(msg) { + if (!msg || typeof msg !== 'string') return 'Release to production'; + if (!SAFE_MESSAGE_RE.test(msg)) throw new Error('Invalid message: only printable ASCII allowed, max 500 chars'); + return msg; +} + +// --- Configuration --- +const DATA_DIR = path.join(process.cwd(), 'data'); +const LOG_DIR = path.join(process.cwd(), 'logs'); +const AUDIT_LOG_FILE = path.join(LOG_DIR, 'audit.log'); +const DEPLOY_HISTORY_FILE = path.join(DATA_DIR, 'deploys.json'); +const REGISTRY_FILE = path.join(process.cwd(), 'deploy-registry.json'); +const ADMIN_TOKEN = process.env.KUA_DEPLOY_ADMIN_TOKEN; +const TAILSCALE_SOCKET = '/var/run/tailscale/tailscaled.sock'; +const HOSTNAME = process.env.HOSTNAME || 'gal'; +const KUA_DB_URL = process.env.KUA_DB_URL || 'http://localhost:3100'; +const KUA_DB_TOKEN = process.env.KUA_DB_ADMIN_TOKEN || ADMIN_TOKEN; +const WEBHOOK_SECRET = process.env.KUA_DEPLOY_WEBHOOK_SECRET || ''; +const DEV_MODE = process.env.NODE_ENV !== 'production'; + +// Per-app webhook rate limiter — max 5 triggers per 60s to prevent spam +const webhookRateLimiter = new Map(); // app -> [timestamps] +const WEBHOOK_RATE_LIMIT = 5; +const WEBHOOK_RATE_WINDOW_MS = 60_000; +function checkWebhookRateLimit(app) { + const now = Date.now(); + const hits = (webhookRateLimiter.get(app) || []).filter(t => now - t < WEBHOOK_RATE_WINDOW_MS); + if (hits.length >= WEBHOOK_RATE_LIMIT) return false; + hits.push(now); + webhookRateLimiter.set(app, hits); + return true; +} +const ALLOWED_NODES = new Set((process.env.KUA_ALLOWED_NODES || 'gal,bruno,genesis').split(',').map(s => s.trim())); + +function isAuthorizedNode(tsIdentity) { + if (tsIdentity.tags?.includes('tag:admin')) return true; + if (ALLOWED_NODES.has(tsIdentity.hostname)) return true; + return false; +} + +const fastify = Fastify({ + logger: true, + // Preserve raw body for webhook HMAC verification + addContentTypeParser: undefined, +}); + +// Override default JSON parser to capture rawBody for webhook HMAC verification. +// We store it on both req.rawBody (legacy) and req.raw.rawBody so both access paths work. +fastify.addContentTypeParser('application/json', { parseAs: 'buffer' }, (req, body, done) => { + const raw = body.toString('utf-8'); + req.rawBody = raw; + req.raw.rawBody = raw; + try { done(null, JSON.parse(body)); } catch (err) { done(err); } +}); + +// --- Deploy locks (prevent concurrent deploys per app) --- +const deployLocks = new Map(); +const LOCK_TTL_MS = 20 * 60 * 1000; // 20 minutes — auto-expire stale locks from crashed deploys + +function acquireLock(app) { + const lock = deployLocks.get(app); + if (lock) { + if (Date.now() - lock.acquiredAt < LOCK_TTL_MS) return false; + fastify.log.warn(`Clearing expired deploy lock for ${app} (${Math.round((Date.now() - lock.acquiredAt) / 1000)}s old)`); + } + deployLocks.set(app, { acquiredAt: Date.now(), deployId: crypto.randomUUID() }); + return true; +} + +function releaseLock(app) { + deployLocks.delete(app); +} + +// Returns the active deploy ID for the app, or null if no lock held. +function getDeployId(app) { + return deployLocks.get(app)?.deployId ?? null; +} + +// --- Load Registry --- +let registry = { apps: {} }; +async function loadRegistry() { + const data = await fs.readFile(REGISTRY_FILE, 'utf-8'); + registry = JSON.parse(data); + fastify.log.info(`Registry loaded: ${Object.keys(registry.apps).length} apps`); +} + +function getApp(name) { + return registry.apps[name] || null; +} + +function getAllApps() { + return Object.keys(registry.apps); +} + +// --- Deploy History --- +let deployHistory = {}; + +async function loadHistory() { + try { + const data = await fs.readFile(DEPLOY_HISTORY_FILE, 'utf-8'); + deployHistory = JSON.parse(data); + } catch { + deployHistory = {}; + } +} + +async function saveHistory() { + await fs.mkdir(DATA_DIR, { recursive: true }); + await fs.writeFile(DEPLOY_HISTORY_FILE, JSON.stringify(deployHistory, null, 2), 'utf-8'); +} + +function recordDeploy(app, entry) { + if (!deployHistory[app]) deployHistory[app] = []; + deployHistory[app].unshift({ + ...entry, + timestamp: new Date().toISOString(), + id: crypto.randomUUID().slice(0, 8), + }); + // Keep last 50 deploys per app + if (deployHistory[app].length > 50) deployHistory[app] = deployHistory[app].slice(0, 50); + saveHistory().catch(() => {}); +} + +function getLastSuccessfulDeploy(app) { + return (deployHistory[app] || []).find(d => d.result === 'success'); +} + +function progressFilePath(app) { + return path.join(DATA_DIR, `progress-${app}.json`); +} + +async function readProgress(app) { + try { + const raw = await fs.readFile(progressFilePath(app), 'utf-8'); + return JSON.parse(raw); + } catch { + return null; + } +} + +async function writeProgress(app, patch) { + const now = Math.floor(Date.now() / 1000); + const current = (await readProgress(app)) || {}; + const next = { + app, + ...current, + ...patch, + updated_at: now, + }; + if (!next.started_at) next.started_at = now; + await fs.mkdir(DATA_DIR, { recursive: true }); + await fs.writeFile(progressFilePath(app), JSON.stringify(next, null, 2), 'utf-8'); + return next; +} + +async function markProgressPhase(app, phase, patch = {}) { + return writeProgress(app, { + phase, + current_step: phase, + status: patch.status || 'running', + ...patch, + }); +} + +// --- Tailscale Whois --- +async function tailscaleWhois(remoteAddr) { + return new Promise((resolve) => { + const timeout = setTimeout(() => resolve(null), 2000); + const req = http.request({ + socketPath: TAILSCALE_SOCKET, + path: `/localapi/v0/whois?addr=${encodeURIComponent(remoteAddr)}`, + method: 'GET', + }, (res) => { + let data = ''; + res.on('data', (chunk) => { data += chunk; }); + res.on('end', () => { + clearTimeout(timeout); + try { + const parsed = JSON.parse(data); + const node = parsed.Node; + const user = parsed.UserProfile; + if (!node) return resolve(null); + resolve({ + stableId: node.StableID || '', + hostname: node.ComputedName || node.Hostinfo?.Hostname || '', + tags: node.Tags || [], + user: user?.LoginName || '', + }); + } catch { resolve(null); } + }); + }); + req.on('error', () => { clearTimeout(timeout); resolve(null); }); + req.end(); + }); +} + +// --- Auth Hook --- +fastify.addHook('onRequest', async (request, reply) => { + if (request.url === '/health') return; + // Webhook endpoint uses its own auth (HMAC signature verification inside the handler) + if (request.url === '/webhook/forgejo') return; + + const isLocalhost = ['127.0.0.1', '::1', '::ffff:127.0.0.1'].includes(request.ip) || request.ip.startsWith('172.'); + if (isLocalhost) { + request.identity = { stableId: 'local', hostname: HOSTNAME, tags: ['tag:admin'], user: 'local' }; + return; + } + + const remoteAddr = request.ip + ':' + (request.socket.remotePort || 0); + const tsIdentity = await tailscaleWhois(remoteAddr); + if (tsIdentity) { + if (!isAuthorizedNode(tsIdentity)) return reply.code(403).send({ error: `Node '${tsIdentity.hostname}' not authorized` }); + request.identity = tsIdentity; return; + } + + const authHeader = request.headers.authorization; + const providedToken = authHeader?.split('Bearer ')[1]; + if (providedToken && ADMIN_TOKEN) { + const bufP = Buffer.from(providedToken); + const bufA = Buffer.from(ADMIN_TOKEN); + if (bufP.length === bufA.length && crypto.timingSafeEqual(bufP, bufA)) { + request.identity = { stableId: 'admin-token', hostname: 'admin', tags: ['tag:admin'], user: 'admin' }; + return; + } + } + + return reply.code(401).send({ error: 'Unauthorized' }); +}); + +// --- Audit --- +async function audit(entry) { + try { + await fs.mkdir(LOG_DIR, { recursive: true }); + await fs.appendFile(AUDIT_LOG_FILE, JSON.stringify({ ...entry, timestamp: new Date().toISOString() }) + '\n', 'utf-8'); + } catch (err) { + fastify.log.error(`Audit write failed: ${err.message}`); + } +} + +// --- Shell helpers --- +function isLocal(server) { + const host = server.includes('@') ? server.split('@')[1] : server; + return host === HOSTNAME; +} + +function tailscaleIpForServer(server) { + const host = server.includes('@') ? server.split('@')[1] : server; + const ips = { + bruno: '100.74.17.6', + gal: '100.122.129.114', + }; + return ips[host] || ''; +} + +function composeEnvPrefix(server) { + const tailscaleIp = tailscaleIpForServer(server); + return tailscaleIp ? `TAILSCALE_IP=${tailscaleIp} ` : ''; +} + +async function run(cmd, opts = {}) { + const timeout = opts.timeout || 30000; + try { + const { stdout, stderr } = await exec(cmd, { timeout }); + return { ok: true, stdout: stdout.trim(), stderr: stderr.trim() }; + } catch (err) { + return { ok: false, stdout: err.stdout?.trim() || '', stderr: err.stderr?.trim() || '', error: err.message }; + } +} + +// Allow only simple host strings: optional user@, then hostname/IP with dots/hyphens only. +const SAFE_HOST_RE = /^([a-zA-Z0-9._-]+@)?[a-zA-Z0-9][a-zA-Z0-9._-]*$/; + +async function runOnServer(server, cmd, opts = {}) { + if (!SAFE_HOST_RE.test(server)) throw new Error(`Unsafe server name rejected: ${JSON.stringify(server)}`); + if (isLocal(server)) return run(cmd, opts); + // Use execFile to avoid shell interpretation of server/cmd + const timeout = opts.timeout || 30000; + try { + const { stdout, stderr } = await execFile( + 'ssh', + ['-o', 'StrictHostKeyChecking=no', server, cmd], + { timeout }, + ); + return { ok: true, stdout: stdout.trim(), stderr: stderr.trim() }; + } catch (err) { + return { ok: false, stdout: err.stdout?.trim() || '', stderr: err.stderr?.trim() || '', error: err.message }; + } +} + +// --- kua-db integration --- +async function kuaDbSafeCheck(app) { + try { + const res = await fetch(`${KUA_DB_URL}/api/v1/migrations/${encodeURIComponent(app)}/safe-to-deploy?env=production`, { + headers: KUA_DB_TOKEN ? { Authorization: `Bearer ${KUA_DB_TOKEN}` } : {}, + signal: AbortSignal.timeout(60000), + }); + if (!res.ok) return { safe: false, reason: `kua-db returned ${res.status} — blocking deploy (use --force to skip)` }; + return await res.json(); + } catch { + return { safe: false, reason: 'kua-db unreachable — blocking deploy (use --force to skip)' }; + } +} + +async function kuaDbMigrate(app) { + try { + const res = await fetch(`${KUA_DB_URL}/api/v1/migrations/${app}/apply`, { + method: 'POST', + headers: { 'Content-Type': 'application/json', ...(KUA_DB_TOKEN ? { Authorization: `Bearer ${KUA_DB_TOKEN}` } : {}) }, + body: JSON.stringify({ env: 'production' }), + signal: AbortSignal.timeout(120000), + }); + return await res.json(); + } catch (err) { + return { result: 'error', error: err.message }; + } +} + +// ============================================================================= +// RELEASE ENGINE +// ============================================================================= + +async function release(appName, message = 'Release to production', opts = {}) { + const app = getApp(appName); + if (!app) throw new Error(`Unknown app: ${appName}`); + + message = validateMessage(message); + + const repoDir = app.repo_dir; + const remote = app.git_remote; + const sourceBranch = opts.source_branch ?? app.source_branch; + const deployBranch = opts.target_branch ?? app.deploy_branch; + + const steps = []; + + // Check clean worktree + const dirty = await run(`git -C ${repoDir} status --porcelain --ignore-submodules=dirty`); + if (dirty.stdout) { + throw new Error(`${repoDir} has uncommitted changes — commit or stash first`); + } + + // Fetch + steps.push({ step: 'fetch', status: 'running' }); + const fetchResult = await run(`git -C ${repoDir} fetch ${remote}`, { timeout: 30000 }); + if (!fetchResult.ok) throw new Error(`git fetch failed: ${fetchResult.stderr}`); + steps[steps.length - 1] = { step: 'fetch', status: 'done' }; + + // Push source branch + steps.push({ step: 'push_source', status: 'running' }); + await run(`git -C ${repoDir} checkout ${sourceBranch}`); + const pushResult = await run(`git -C ${repoDir} push ${remote} ${sourceBranch}`, { timeout: 60000 }); + if (!pushResult.ok) throw new Error(`push ${sourceBranch} failed: ${pushResult.stderr}`); + steps[steps.length - 1] = { step: 'push_source', status: 'done' }; + + // Get source commit + const headResult = await run(`git -C ${repoDir} rev-parse --short HEAD`); + const sourceCommit = headResult.stdout; + + // Merge to deploy branch (use execFile to avoid shell injection via message) + steps.push({ step: 'merge', status: 'running' }); + const branchExists = await run(`git -C ${repoDir} ls-remote --exit-code --heads ${remote} ${deployBranch}`); + if (branchExists.ok) { + await run(`git -C ${repoDir} checkout ${deployBranch}`); + try { + await execFile('git', ['-C', repoDir, 'merge', `${remote}/${sourceBranch}`, '-m', `[RELEASE] ${message}`], { timeout: 30000 }); + } catch (err) { + throw new Error(`merge failed: ${err.stderr || err.message}`); + } + } else { + await run(`git -C ${repoDir} checkout -B ${deployBranch} ${remote}/${sourceBranch}`); + } + steps[steps.length - 1] = { step: 'merge', status: 'done' }; + + // Tag (use execFile to avoid shell injection via message) + const tag = `prod-${new Date().toISOString().replace(/[-:T]/g, '').slice(0, 14)}`; + try { + await execFile('git', ['-C', repoDir, 'tag', '-a', tag, '-m', `[RELEASE] ${message}`], { timeout: 10000 }); + } catch (err) { + throw new Error(`tag failed: ${err.stderr || err.message}`); + } + + // Push deploy branch + tags + steps.push({ step: 'push_deploy', status: 'running' }); + const pushDeploy = await run(`git -C ${repoDir} push ${remote} ${deployBranch} --tags`, { timeout: 60000 }); + if (!pushDeploy.ok) throw new Error(`push ${deployBranch} failed: ${pushDeploy.stderr}`); + steps[steps.length - 1] = { step: 'push_deploy', status: 'done' }; + + // Return to source branch + await run(`git -C ${repoDir} checkout ${sourceBranch}`); + + await audit({ action: 'release', app: appName, tag, commit: sourceCommit, message }); + + return { + app: appName, + result: 'released', + tag, + commit: sourceCommit, + message, + deploy_mode: app.deploy_mode, + steps, + }; +} + +// ============================================================================= +// DEPLOY ENGINE +// ============================================================================= + +async function deploy(appName, opts = {}) { + const app = getApp(appName); + if (!app) throw new Error(`Unknown app: ${appName}`); + + const prod = app.production; + if (!prod) throw new Error(`${appName} has no production config`); + + if (!acquireLock(appName)) { + return { app: appName, result: 'locked', message: 'Deploy already in progress' }; + } + + const steps = []; + let finalResult = 'success'; + const action = opts.action || 'deploy'; + + try { + const server = prod.server; + const deployDir = prod.deploy_dir; + const remote = app.git_remote || 'origin'; + const deployBranch = app.deploy_branch; + await writeProgress(appName, { + action, + triggered_by: opts.triggered_by || 'api', + status: 'running', + phase: 'started', + current_step: 'started', + server, + steps, + }); + + // Step 1: kua-db safe-to-deploy check (if app has migrations) + if (prod.has_migrations && !opts.force) { + steps.push({ step: 'db_safety', status: 'running' }); + await markProgressPhase(appName, 'db_safety', { action, triggered_by: opts.triggered_by || 'api', steps }); + const safety = await kuaDbSafeCheck(appName); + if (!safety.safe) { + steps[steps.length - 1] = { step: 'db_safety', status: 'blocked', reasons: safety.reasons || [safety.reason] }; + finalResult = 'blocked'; + await writeProgress(appName, { + action, + triggered_by: opts.triggered_by || 'api', + status: 'blocked', + phase: 'db_safety_blocked', + current_step: 'db_safety', + result: 'blocked', + reason: 'db_safety', + steps, + finished_at: Math.floor(Date.now() / 1000), + }); + recordDeploy(appName, { result: 'blocked', reason: 'db_safety', steps, action, triggered_by: opts.triggered_by || 'api' }); + return { app: appName, result: 'blocked', steps }; + } + steps[steps.length - 1] = { step: 'db_safety', status: 'passed', note: safety.reason || 'safe' }; + await markProgressPhase(appName, 'db_safety_passed', { action, triggered_by: opts.triggered_by || 'api', steps }); + } + + // Step 2: Git pull on production server + steps.push({ step: 'git_pull', status: 'running' }); + await markProgressPhase(appName, 'git_pull', { action, triggered_by: opts.triggered_by || 'api', steps }); + const fetchCmd = `cd ${deployDir} && git fetch --prune ${remote}`; + const fetchRes = await runOnServer(server, fetchCmd, { timeout: 60000 }); + if (!fetchRes.ok) { + steps[steps.length - 1] = { step: 'git_pull', status: 'failed', error: fetchRes.stderr }; + throw new Error(`git fetch failed on ${server}: ${fetchRes.stderr}`); + } + + const checkoutCmd = `cd ${deployDir} && git checkout -B ${deployBranch} ${remote}/${deployBranch}`; + const checkoutRes = await runOnServer(server, checkoutCmd, { timeout: 30000 }); + if (!checkoutRes.ok) { + steps[steps.length - 1] = { step: 'git_pull', status: 'failed', error: checkoutRes.stderr }; + throw new Error(`git checkout failed on ${server}: ${checkoutRes.stderr}`); + } + + // Get current commit + const commitRes = await runOnServer(server, `cd ${deployDir} && git rev-parse --short HEAD`); + const deployCommit = commitRes.stdout; + steps[steps.length - 1] = { step: 'git_pull', status: 'done', commit: deployCommit }; + await markProgressPhase(appName, 'git_pull_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + + // Step 3: Pre-build dirty check — untracked/uncommitted files silently pollute COPY . . + const dirtyRes = await runOnServer(server, `cd ${deployDir} && git status --porcelain`); + if (!dirtyRes.ok || dirtyRes.stdout.trim()) { + const detail = dirtyRes.stdout.trim() || dirtyRes.stderr.trim() || 'git status failed'; + throw new Error(`Working tree is dirty — clean up before deploying: +${detail}`); + } + + // Step 3: Docker build + steps.push({ step: 'build', status: 'running' }); + await markProgressPhase(appName, 'build', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + const kvPrefix = prod.vault + ? `kua-vault run --project ${prod.vault.project} --env ${prod.vault.env} --` + : ''; + const envPrefix = composeEnvPrefix(server); + const buildCmd = `cd ${deployDir} && ${envPrefix}${kvPrefix} docker compose build`; + const buildRes = await runOnServer(server, buildCmd, { timeout: 600000 }); + if (!buildRes.ok) { + steps[steps.length - 1] = { step: 'build', status: 'failed', error: buildRes.stderr?.slice(-500) }; + throw new Error('docker compose build failed'); + } + steps[steps.length - 1] = { step: 'build', status: 'done' }; + await markProgressPhase(appName, 'build_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + + // ------------------------------------------------------------------ + // Post-deploy verification (added 2026-05-20 after the false-success + // bug that left muralla on a stale image for ~22h: kua-deploy marked + // step:deploy=done while the container kept running the prior image). + // For each STATELESS service (the ones we just told compose to + // --force-recreate), assert that: + // (i) the container's `.Image` SHA equals what `docker compose + // images --quiet ` reports as the expected image, AND + // (ii) the container's `.State.StartedAt` is newer than the + // timestamp captured *before* the `up -d` call. + // Either failing means the recreate did not actually take. Stateful + // services are intentionally NOT image-checked (they shouldn't have + // been recreated); we only assert they are running. + // Mode is env-configurable: KUA_DEPLOY_VERIFY=error|warn|off. + const verifyMode = (process.env.KUA_DEPLOY_VERIFY || 'error').toLowerCase(); + async function verifyStatelessRecreated(server, deployDir, services, deployStartTs) { + if (verifyMode === 'off') return { ok: true, results: [], skipped: true }; + const results = []; + for (const svc of services) { + const exp = await runOnServer(server, `cd ${deployDir} && docker compose images --quiet ${svc} 2>/dev/null | head -1`); + const cid = await runOnServer(server, `cd ${deployDir} && docker compose ps --quiet ${svc} 2>/dev/null | head -1`); + const expectedSha = (exp.stdout || '').trim(); + const containerId = (cid.stdout || '').trim(); + if (!containerId) { + results.push({ service: svc, ok: false, reason: 'no running container after up' }); + continue; + } + const insp = await runOnServer(server, `docker inspect --format '{{.Image}}|{{.State.StartedAt}}' ${containerId}`); + const [actualSha, startedAtStr] = (insp.stdout || '').trim().split('|'); + const startedAt = new Date(startedAtStr || 0); + const imageMatch = !!expectedSha && actualSha === expectedSha; + const freshlyStarted = !isNaN(startedAt) && startedAt >= deployStartTs; + results.push({ + service: svc, ok: imageMatch && freshlyStarted, + expected_image_sha: expectedSha, running_image_sha: actualSha, + started_at: startedAtStr, + deploy_started_at: deployStartTs.toISOString(), + image_match: imageMatch, freshly_started: freshlyStarted, + reason: imageMatch && freshlyStarted ? null + : !imageMatch ? `image SHA mismatch (expected ${expectedSha}, running ${actualSha})` + : `container not freshly started (StartedAt ${startedAtStr} < deploy start)`, + }); + } + const ok = results.every(r => r.ok); + return { ok, results, mode: verifyMode }; + } + // ------------------------------------------------------------------ + + // Step 4: Docker up (split stateful/stateless if needed) + steps.push({ step: 'deploy', status: 'running' }); + await markProgressPhase(appName, 'deploy', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + + // Get all services first — needed for both split and auto-detect paths + const svcRes = await runOnServer(server, `cd ${deployDir} && docker compose config --services`); + const allServices = svcRes.stdout.split('\n').filter(Boolean); + + let stateful = prod.stateful_services || []; + if (stateful.length === 0) { + // Auto-detect stateful services from image names so db/redis are never force-recreated + const composeContent = await runOnServer(server, `cat ${deployDir}/docker-compose.yml 2>/dev/null || cat ${deployDir}/docker-compose.yaml 2>/dev/null || echo ""`); + const statefulImagePat = /postgres|mysql|mariadb|mongo|redis|rabbitmq|cassandra|elasticsearch|opensearch/i; + let currentSvc = null; + for (const line of composeContent.stdout.split('\n')) { + const svcMatch = line.match(/^ ([a-zA-Z0-9_-]+)\s*:/); + if (svcMatch) currentSvc = svcMatch[1]; + if (currentSvc && /^\s+image\s*:/.test(line) && statefulImagePat.test(line)) { + stateful.push(currentSvc); + } + } + if (stateful.length > 0) { + fastify.log.warn({ app: appName, inferred_stateful: stateful }, 'Auto-detected stateful services — add to stateful_services in deploy-registry.json to silence this warning'); + } + } + + { + const stateless = allServices.filter(s => !stateful.includes(s)); + const deployStartTs = new Date(); + if (stateless.length > 0) { + const upRes = await runOnServer(server, `cd ${deployDir} && ${envPrefix}${kvPrefix} docker compose up -d --force-recreate --remove-orphans ${stateless.join(' ')}`, { timeout: 300000 }); + if (!upRes.ok) { + steps[steps.length - 1] = { step: 'deploy', status: 'failed', error: upRes.stderr?.slice(-500) }; + throw new Error('docker compose up failed for stateless services'); + } + // POST-DEPLOY VERIFY — catches false-success (see helper comment above). + const verify = await verifyStatelessRecreated(server, deployDir, stateless, deployStartTs); + if (!verify.ok) { + const bad = verify.results.filter(r => !r.ok).map(r => `${r.service}: ${r.reason}`).join('; '); + if (verifyMode === 'error') { + steps[steps.length - 1] = { step: 'deploy', status: 'failed', error: `verify: ${bad}`, verify }; + await markProgressPhase(appName, 'deploy', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + throw new Error(`post-deploy verify failed: ${bad}`); + } else { + fastify.log.warn({ app: appName, verify }, 'post-deploy verify failed (warn mode — not blocking)'); + steps[steps.length - 1] = { step: 'deploy', status: 'done', verify_warn: bad }; + } + } + } + if (stateful.length > 0) { + const upRes = await runOnServer(server, `cd ${deployDir} && ${envPrefix}${kvPrefix} docker compose up -d --remove-orphans ${stateful.join(' ')}`, { timeout: 300000 }); + if (!upRes.ok) { + steps[steps.length - 1] = { step: 'deploy', status: 'failed', error: upRes.stderr?.slice(-500) }; + throw new Error('docker compose up failed for stateful services'); + } + } + } + steps[steps.length - 1] = { step: 'deploy', status: 'done' }; + await markProgressPhase(appName, 'deploy_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + + // Step 5: Run migrations via kua-db (if applicable) + if (prod.has_migrations) { + steps.push({ step: 'migrate', status: 'running' }); + await markProgressPhase(appName, 'migrate', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + const migrateResult = await kuaDbMigrate(appName); + const migrateOk = ['success', 'no_pending_migrations', 'already_applied'].includes(migrateResult.result); + if (!migrateOk) { + steps[steps.length - 1] = { step: 'migrate', status: 'failed', error: migrateResult.error || migrateResult.steps?.find(s => s.step === 'migrate')?.error || migrateResult.result }; + finalResult = 'partial'; + await markProgressPhase(appName, 'migrate_failed', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit, result: finalResult }); + } else { + steps[steps.length - 1] = { step: 'migrate', status: 'done', result: migrateResult.result }; + await markProgressPhase(appName, 'migrate_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + } + } + + // Step 6: Health check + steps.push({ step: 'health', status: 'running' }); + await markProgressPhase(appName, 'health', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + if (prod.health_url) { + let healthy = false; + for (let i = 0; i < 20; i++) { + try { + const res = await fetch(prod.health_url, { signal: AbortSignal.timeout(5000) }); + if (res.ok) { healthy = true; break; } + } catch { /* retry */ } + await new Promise(r => setTimeout(r, 3000)); + } + if (!healthy) { + steps[steps.length - 1] = { step: 'health', status: 'failed', url: prod.health_url }; + finalResult = 'unhealthy'; + await markProgressPhase(appName, 'health_failed', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit, result: finalResult }); + } else { + steps[steps.length - 1] = { step: 'health', status: 'done', url: prod.health_url }; + await markProgressPhase(appName, 'health_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + } + } else { + // No health URL — check containers + const psRes = await runOnServer(server, `cd ${deployDir} && docker compose ps --format json`); + steps[steps.length - 1] = { step: 'health', status: 'done', note: 'no health URL configured' }; + await markProgressPhase(appName, 'health_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + } + + // Step 7: Post-deploy hooks + if (prod.post_deploy) { + steps.push({ step: 'post_deploy', status: 'running' }); + await markProgressPhase(appName, 'post_deploy', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + await runOnServer(server, prod.post_deploy, { timeout: 30000 }); + steps[steps.length - 1] = { step: 'post_deploy', status: 'done' }; + await markProgressPhase(appName, 'post_deploy_done', { action, triggered_by: opts.triggered_by || 'api', steps, commit: deployCommit }); + } + + // Get tag + const tagRes = await runOnServer(server, `cd ${deployDir} && git describe --tags --abbrev=0 2>/dev/null || echo "untagged"`); + const currentTag = tagRes.stdout; + + const entry = { + result: finalResult, + commit: deployCommit, + tag: currentTag, + server, + steps, + action, + triggered_by: opts.triggered_by || 'api', + }; + + await writeProgress(appName, { + action, + triggered_by: opts.triggered_by || 'api', + status: finalResult === 'success' ? 'done' : 'failed', + phase: finalResult === 'success' ? 'succeeded' : 'completed_with_issues', + current_step: 'done', + result: finalResult, + commit: deployCommit, + tag: currentTag, + server, + steps, + finished_at: Math.floor(Date.now() / 1000), + }); + recordDeploy(appName, entry); + await audit({ action, app: appName, ...entry }); + + return { app: appName, ...entry }; + + } catch (err) { + const entry = { + result: 'failed', + error: err.message, + steps, + action, + triggered_by: opts.triggered_by || 'api', + }; + await writeProgress(appName, { + action, + triggered_by: opts.triggered_by || 'api', + status: 'failed', + phase: 'failed', + current_step: steps[steps.length - 1]?.step || 'unknown', + result: 'failed', + error: err.message, + steps, + finished_at: Math.floor(Date.now() / 1000), + }); + recordDeploy(appName, entry); + await audit({ action: `${action}_failed`, app: appName, error: err.message }); + return { app: appName, ...entry }; + + } finally { + releaseLock(appName); + } +} + +// ============================================================================= +// ROLLBACK ENGINE +// ============================================================================= + +async function rollback(appName) { + const app = getApp(appName); + if (!app) throw new Error(`Unknown app: ${appName}`); + + const prod = app.production; + const server = prod.server; + const deployDir = prod.deploy_dir; + const remote = app.git_remote || 'origin'; + + // Find the previous successful deploy + const history = deployHistory[appName] || []; + const current = history[0]; + const previous = history.find((d, i) => i > 0 && d.result === 'success' && d.tag && d.tag !== 'untagged'); + + if (!previous) { + return { app: appName, result: 'no_rollback_target', message: 'No previous successful deploy with a tag found' }; + } + + if (!acquireLock(appName)) { + return { app: appName, result: 'locked', message: 'Deploy already in progress' }; + } + + try { + const tag = previous.tag; + await writeProgress(appName, { + action: 'rollback', + triggered_by: 'api', + status: 'running', + phase: 'rollback_started', + current_step: 'rollback', + rolled_back_to: tag, + rolled_back_from: current?.tag || current?.commit || 'unknown', + }); + + // Checkout the previous tag on production + const checkoutRes = await runOnServer(server, `cd ${deployDir} && git fetch --prune ${remote} && git checkout ${tag}`, { timeout: 60000 }); + if (!checkoutRes.ok) throw new Error(`Checkout ${tag} failed: ${checkoutRes.stderr}`); + + // Rebuild and restart + const kvPrefix = prod.vault + ? `kua-vault run --project ${prod.vault.project} --env ${prod.vault.env} --` + : ''; + await runOnServer(server, `cd ${deployDir} && ${composeEnvPrefix(server)}${kvPrefix} docker compose up -d --force-recreate --build`, { timeout: 600000 }); + + // Health check + let healthy = true; + if (prod.health_url) { + healthy = false; + for (let i = 0; i < 20; i++) { + try { + const res = await fetch(prod.health_url, { signal: AbortSignal.timeout(5000) }); + if (res.ok) { healthy = true; break; } + } catch { /* retry */ } + await new Promise(r => setTimeout(r, 3000)); + } + } + + const entry = { + result: healthy ? 'success' : 'unhealthy', + action: 'rollback', + rolled_back_to: tag, + rolled_back_from: current?.tag || current?.commit || 'unknown', + server, + triggered_by: 'api', + }; + + recordDeploy(appName, entry); + await audit({ action: 'rollback', app: appName, ...entry }); + await writeProgress(appName, { + action: 'rollback', + triggered_by: 'api', + status: healthy ? 'done' : 'failed', + phase: healthy ? 'rollback_succeeded' : 'rollback_unhealthy', + current_step: 'done', + result: healthy ? 'success' : 'unhealthy', + rolled_back_to: tag, + rolled_back_from: current?.tag || current?.commit || 'unknown', + server, + finished_at: Math.floor(Date.now() / 1000), + }); + + return { app: appName, ...entry }; + + } finally { + releaseLock(appName); + } +} + +// ============================================================================= +// APP STATUS +// ============================================================================= + +async function appStatus(appName) { + const app = getApp(appName); + if (!app) throw new Error(`Unknown app: ${appName}`); + + const prod = app.production; + const server = prod.server; + const deployDir = prod.deploy_dir; + + const status = { + app: appName, + deploy_mode: app.deploy_mode, + server, + locked: !!deployLocks.get(appName), + }; + + // Current commit + tag on production + try { + const commitRes = await runOnServer(server, `cd ${deployDir} && git rev-parse --short HEAD`); + status.current_commit = commitRes.stdout; + const tagRes = await runOnServer(server, `cd ${deployDir} && git describe --tags --abbrev=0 2>/dev/null || echo "untagged"`); + status.current_tag = tagRes.stdout; + const branchRes = await runOnServer(server, `cd ${deployDir} && git rev-parse --abbrev-ref HEAD`); + status.current_branch = branchRes.stdout; + } catch { + status.current_commit = 'unreachable'; + } + + // Latest commit on source branch (dev) + try { + const devCommitRes = await run(`git -C ${app.repo_dir} rev-parse --short HEAD`); + status.dev_commit = devCommitRes.stdout; + status.dev_ahead = status.current_commit !== status.dev_commit; + } catch { + status.dev_commit = 'unknown'; + } + + // Last deploy from history + const lastDeploy = (deployHistory[appName] || [])[0]; + if (lastDeploy) { + status.last_deploy = { + result: lastDeploy.result, + timestamp: lastDeploy.timestamp, + commit: lastDeploy.commit, + tag: lastDeploy.tag, + triggered_by: lastDeploy.triggered_by, + }; + } + + const progress = await readProgress(appName); + if (progress) { + status.progress = progress; + } + + // Health + if (prod.health_url) { + try { + const res = await fetch(prod.health_url, { signal: AbortSignal.timeout(5000) }); + status.healthy = res.ok; + status.health_status = res.status; + } catch { + status.healthy = false; + status.health_status = 'unreachable'; + } + } + + return status; +} + +// ============================================================================= +// ROUTES +// ============================================================================= + +// Health +fastify.get('/health', async () => { + return { status: 'ok', version: '1.0.0', apps: Object.keys(registry.apps).length }; +}); + +// --- Webhook --- + +// Forgejo webhook receiver (replaces forgejo-webhook.py) +fastify.post('/webhook/forgejo', async (request, reply) => { + // Webhook secret is mandatory in production + if (!WEBHOOK_SECRET && !DEV_MODE) { + return reply.code(503).send({ error: 'Webhook not configured: KUA_DEPLOY_WEBHOOK_SECRET is not set' }); + } + + if (WEBHOOK_SECRET) { + const sig = request.headers['x-forgejo-signature'] || request.headers['x-gitea-signature'] || ''; + if (!sig) return reply.code(401).send({ error: 'Missing webhook signature' }); + // rawBody must have been captured by the content-type parser; reject if it wasn't + const rawBody = request.rawBody; + if (!rawBody) return reply.code(400).send({ error: 'Could not read raw request body for HMAC verification' }); + const expected = crypto.createHmac('sha256', WEBHOOK_SECRET).update(rawBody).digest('hex'); + const bufSig = Buffer.from(sig); const bufExp = Buffer.from(expected); + if (bufSig.length !== bufExp.length || !crypto.timingSafeEqual(bufSig, bufExp)) { + return reply.code(401).send({ error: 'Invalid webhook signature' }); + } + } + + const data = request.body || {}; + const ref = data.ref || ''; + const repoInfo = data.repository || {}; + const repoName = repoInfo.name || ''; + + if (ref !== 'refs/heads/production') { + return { ignored: true, reason: `not production branch (got ${ref})` }; + } + + const app = getApp(repoName); + if (!app) { + fastify.log.warn(`Webhook for unknown repo: ${repoName}`); + return { ignored: true, reason: `unknown repo: ${repoName}` }; + } + + if (!checkWebhookRateLimit(repoName)) { + fastify.log.warn(`Webhook rate limit exceeded for ${repoName}`); + return reply.code(429).send({ error: `Rate limit exceeded for ${repoName} — max ${WEBHOOK_RATE_LIMIT} triggers per minute` }); + } + + fastify.log.info(`Webhook: deploying ${repoName} (push to production)`); + + await writeProgress(repoName, { + action: 'deploy', + triggered_by: 'webhook', + status: 'running', + phase: 'webhook_received', + current_step: 'webhook_received', + ref, + repo: repoName, + }); + + // Deploy async — use an IIFE to guarantee the .catch() is always attached + void (async () => { + try { + const result = await deploy(repoName, { triggered_by: 'webhook' }); + fastify.log.info(`Deploy ${repoName}: ${result.result}`); + } catch (err) { + fastify.log.error(`Deploy ${repoName} failed: ${err.message}`); + } + })(); + + return { triggered: true, app: repoName }; +}); + +// --- Apps --- + +// List all apps +fastify.get('/api/v1/apps', async () => { + const results = []; + for (const name of getAllApps()) { + try { + results.push(await appStatus(name)); + } catch (err) { + results.push({ app: name, error: err.message }); + } + } + return { apps: results }; +}); + +// Single app status +fastify.get('/api/v1/apps/:app', async (request) => { + return await appStatus(request.params.app); +}); + +// Deploy history +fastify.get('/api/v1/apps/:app/deploys', async (request) => { + const app = request.params.app; + const limit = parseInt(request.query.limit) || 20; + const history = (deployHistory[app] || []).slice(0, limit); + return { app, deploys: history, total: (deployHistory[app] || []).length }; +}); + +// --- Actions --- + +// Release (merge main→production, tag, push — triggers webhook deploy) +fastify.post('/api/v1/apps/:app/release', async (request) => { + const { message, source_branch, target_branch } = request.body || {}; + return await release(request.params.app, message || 'Release to production', { source_branch, target_branch }); +}); + +// Direct deploy (skip release, just pull + build + deploy on production) +fastify.post('/api/v1/apps/:app/deploy', async (request, reply) => { + const { app } = request.params; + const { force } = request.body || {}; + if (!getApp(app)) return reply.code(404).send({ ok: false, error: `Unknown app: ${app}` }); + + await writeProgress(app, { + action: 'deploy', + triggered_by: 'api', + status: 'running', + phase: 'api_received', + current_step: 'api_received', + }); + + // Fire-and-forget — mirrors /webhook/forgejo. A blocking response held the + // HTTP connection for the full ~3-min deploy; the kua-mcp-core ssh+curl + // pipe tore down on idle (http_code 000 -> "via ssh failed (0): {}") even + // though the deploy succeeded server-side. Caller polls /progress. + void (async () => { + try { + const result = await deploy(app, { force: force === true, triggered_by: 'api' }); + fastify.log.info(`Deploy ${app}: ${result.result}`); + } catch (err) { + fastify.log.error(`Deploy ${app} failed: ${err.message}`); + } + })(); + + return { triggered: true, app }; +}); + +// Force rebuild + recreate using the authoritative deploy path +fastify.post('/api/v1/apps/:app/rebuild', async (request, reply) => { + const { app } = request.params; + const { force } = request.body || {}; + if (!getApp(app)) return reply.code(404).send({ ok: false, error: `Unknown app: ${app}` }); + + await writeProgress(app, { + action: 'rebuild', + triggered_by: 'api_rebuild', + status: 'running', + phase: 'api_received', + current_step: 'api_received', + }); + + // Fire-and-forget — same rationale as /deploy above. + void (async () => { + try { + const result = await deploy(app, { + force: force !== false, + action: 'rebuild', + triggered_by: 'api_rebuild', + }); + fastify.log.info(`Rebuild ${app}: ${result.result}`); + } catch (err) { + fastify.log.error(`Rebuild ${app} failed: ${err.message}`); + } + })(); + + return { triggered: true, app }; +}); + +// Rollback +fastify.post('/api/v1/apps/:app/rollback', async (request) => { + return await rollback(request.params.app); +}); + +// --- Deploy Progress --- + +fastify.get('/api/v1/apps/:app/progress', async (request) => { + const { app } = request.params; + const data = await readProgress(app); + if (!data) { + return { ok: false, app, status: 'idle', message: 'No active or recent deployment' }; + } + const now = Math.floor(Date.now() / 1000); + const age = now - (data.updated_at || 0); + if (age > 300 && data.status !== 'done' && data.status !== 'failed' && data.status !== 'blocked') { + return { ok: false, app, stale: true, age_s: age, ...data }; + } + return { ok: true, ...data }; +}); + +// External progress reporter — used by forgejo-webhook.py to push phase updates +fastify.patch('/api/v1/apps/:app/progress', async (request, reply) => { + const { app } = request.params; + if (!getApp(app)) return reply.code(404).send({ ok: false, error: `Unknown app: ${app}` }); + const patch = request.body || {}; + const updated = await writeProgress(app, patch); + return { ok: true, ...updated }; +}); + +// Force-clear a stale running progress state (admin only — already requires auth via hook) +// GET /api/v1/apps/:app/runtime-status — per-service running-vs-expected +// image SHA and freshness. Read-only. Powers release-app post-verify and +// the deploy.status MCP tool's stale-detection field. Added 2026-05-20 +// alongside the post-deploy verify; together they make "kua-deploy says +// done" actually mean "container is running the just-built image." +fastify.get('/api/v1/apps/:app/runtime-status', async (request, reply) => { + const auth = request.headers.authorization || ''; + if (!auth.startsWith('Bearer ') || auth.slice(7) !== ADMIN_TOKEN) { + return reply.code(401).send({ error: 'unauthorized' }); + } + const appName = request.params.app; + const cfg = getApp(appName); + if (!cfg) return reply.code(404).send({ error: 'app not in registry' }); + const prod = cfg.production || {}; + const server = prod.server || cfg.deploy_server || 'bruno'; + const deployDir = prod.deploy_dir || cfg.repo_dir; + if (!deployDir) return reply.code(400).send({ error: 'no deploy_dir for app' }); + try { + const svcRes = await runOnServer(server, `cd ${deployDir} && docker compose config --services`); + const services = (svcRes.stdout || '').split('\n').filter(Boolean); + const out = []; + let anyStale = false; + for (const svc of services) { + const exp = await runOnServer(server, `cd ${deployDir} && docker compose images --quiet ${svc} 2>/dev/null | head -1`); + const cid = await runOnServer(server, `cd ${deployDir} && docker compose ps --quiet ${svc} 2>/dev/null | head -1`); + const expectedSha = (exp.stdout || '').trim(); + const containerId = (cid.stdout || '').trim(); + let running_image_sha = null, started_at = null, state = null, health = null; + if (containerId) { + const insp = await runOnServer(server, `docker inspect --format '{{.Image}}|{{.State.StartedAt}}|{{.State.Status}}|{{if .State.Health}}{{.State.Health.Status}}{{else}}n/a{{end}}' ${containerId}`); + const parts = (insp.stdout || '').trim().split('|'); + running_image_sha = parts[0] || null; + started_at = parts[1] || null; + state = parts[2] || null; + health = parts[3] || null; + } + const stale = !!expectedSha && !!running_image_sha && expectedSha !== running_image_sha; + if (stale) anyStale = true; + out.push({ service: svc, container_id: containerId || null, expected_image_sha: expectedSha || null, running_image_sha, started_at, state, health, stale }); + } + return { ok: true, app: appName, server, services: out, any_stale: anyStale, checked_at: new Date().toISOString() }; + } catch (e) { + return reply.code(500).send({ ok: false, error: String(e?.message || e) }); + } +}); + +fastify.post('/api/v1/apps/:app/progress/reset', async (request, reply) => { + const { app } = request.params; + if (!getApp(app)) return reply.code(404).send({ ok: false, error: `Unknown app: ${app}` }); + const progressFile = progressFilePath(app); + try { await fs.rm(progressFile, { force: true }); } catch { /* already gone */ } + releaseLock(app); // also clear any in-memory lock + return { ok: true, app, message: 'Progress state and lock cleared' }; +}); + +// Force-release the in-memory deploy lock (admin only — allows recovery without container restart) +fastify.post('/api/v1/apps/:app/unlock', async (request, reply) => { + const { app } = request.params; + if (!getApp(app)) return reply.code(404).send({ ok: false, error: `Unknown app: ${app}` }); + const hadLock = deployLocks.has(app); + releaseLock(app); + return { ok: true, app, had_lock: hadLock, message: hadLock ? 'Lock released' : 'No lock was held' }; +}); + +// --- Alerts --- + +fastify.get('/api/v1/alerts', async () => { + const alerts = []; + + for (const name of getAllApps()) { + const app = getApp(name); + const prod = app.production; + + // Check health + if (prod.health_url) { + try { + const res = await fetch(prod.health_url, { signal: AbortSignal.timeout(5000) }); + if (!res.ok) { + alerts.push({ app: name, severity: 'critical', type: 'unhealthy', status: res.status, url: prod.health_url }); + } + } catch { + alerts.push({ app: name, severity: 'critical', type: 'unreachable', url: prod.health_url }); + } + } + + // Check if dev is ahead + try { + const devCommit = (await run(`git -C ${app.repo_dir} rev-parse --short HEAD`)).stdout; + const prodCommit = (await runOnServer(prod.server, `cd ${prod.deploy_dir} && git rev-parse --short HEAD`)).stdout; + if (devCommit && prodCommit && devCommit !== prodCommit) { + alerts.push({ app: name, severity: 'info', type: 'dev_ahead', dev: devCommit, prod: prodCommit }); + } + } catch { /* skip if unreachable */ } + + // Check last deploy + const lastDeploy = (deployHistory[name] || [])[0]; + if (lastDeploy?.result === 'failed') { + alerts.push({ app: name, severity: 'warning', type: 'last_deploy_failed', timestamp: lastDeploy.timestamp }); + } + } + + return { alerts, count: alerts.length }; +}); + +// --- Audit --- + +fastify.get('/api/v1/audit', async (request) => { + const limit = parseInt(request.query.limit) || 50; + try { + const data = await fs.readFile(AUDIT_LOG_FILE, 'utf-8'); + const lines = data.trim().split('\n').reverse(); + const logs = lines.slice(0, limit).map(l => { + try { return JSON.parse(l); } catch { return { error: 'unparseable', raw: l }; } + }); + return { logs, total: lines.length }; + } catch { + return { logs: [], total: 0 }; + } +}); + +// ============================================================================= +// START +// ============================================================================= + +process.on('unhandledRejection', (reason, promise) => { + fastify.log.error({ reason, promise }, 'Unhandled promise rejection — investigate immediately'); +}); + +const start = async () => { + try { + // Fail fast if webhook secret is missing in production + if (!DEV_MODE && !WEBHOOK_SECRET) { + throw new Error('KUA_DEPLOY_WEBHOOK_SECRET must be set in production — refusing to start'); + } + await loadRegistry(); + await loadHistory(); + await fs.mkdir(DATA_DIR, { recursive: true }); + await fastify.listen({ port: 3200, host: '0.0.0.0' }); + } catch (err) { + fastify.log.error(err); + process.exit(1); + } +}; + +start();