kua-deploy/NOTES-image-digest-pinning.md

90 lines
4.4 KiB
Markdown

# NOTES — compose-image digest pinning (deferred prevention item #4)
**Date:** 2026-05-20
**Author:** v2-deploy-coordination session (kua-deploy-verify branch)
**Status:** design sketch, **not implemented** — needs platform-wide coordination.
## Why this matters
Today every app's `docker-compose.yml` references its app image by **tag**
typically `:latest`. `docker compose up -d` (without `--force-recreate`)
is then a legal no-op when the tag string is unchanged, *even if the
digest behind that tag changed*. That is structurally why kua-deploy's
`deploy` step false-success'd on muralla 2026-05-19 — the build pushed
`muralla-muralla:latest` to a new digest, but the running container was
still bound to the previous digest, and `up -d` saw "no change."
The other prevention items in this branch (post-deploy SHA verify,
runtime-status endpoint, release-app post-verify) **detect** the failure
after the fact. **Digest pinning would prevent it at the compose level**
— the reference itself changes every build, so compose has no choice
but to recreate.
## The design
1. **Build step writes the just-built digest to a small artifact**, e.g.
`/root/apps/<app>/.deploy/<svc>.sha`
contents = `sha256:655fe0c64391...`. Captured immediately after
`docker compose build` from `docker compose images --quiet <svc>`.
2. **Deploy step rewrites compose with the pinned digest** before `up`,
either:
- **In-place rewrite** of `docker-compose.yml`: replace
`image: muralla-muralla:latest` with `image: muralla-muralla@<sha>`.
Risk: dirties Bruno's working tree → trips the existing dirty-tree
gate. Mitigation: rewrite a temporary `docker-compose.deploy.yml`
instead, pass via `-f` to compose.
- **Override file**: keep `docker-compose.yml` untouched; generate a
sibling `docker-compose.override.deploy.yml` per build that just
contains `services: <svc>: image: muralla-muralla@<sha>`. Apply via
`docker compose -f docker-compose.yml -f docker-compose.override.deploy.yml up -d`.
Cleaner; doesn't dirty the primary file.
3. **Recreate is then guaranteed**: compose sees the `image:` field
changed → must recreate. The post-deploy verifier from this branch
becomes a belt-and-suspenders, not the load-bearing safety.
## Why this is deferred
- Touches **every app's deploy path** — kua-cashier, muralla,
muralla-socials, atlas, playgram, coder-core, kua-mail, etc. Each
needs its compose conventions checked (some may already pin digests;
some may use private registries with digest-bound auth).
- Interacts with the **dual-SoT** between `bin/webhook-repos.json` and
`services/kua-deploy/deploy-registry.json` (the coordinator broadcast
called this out — cadencia is mid-untangling it).
- The post-deploy verify alone closes the false-success class for now;
digest pinning is the *durable* fix layered on top.
- Stateful services (postgres, redis) explicitly do **not** want digest
pinning at the compose level — they should drift across kua-deploy
releases. The override-file pattern naturally limits pinning to the
app's own services.
## Recommended owner / sequencing
- Land this branch (`chore/kua-deploy-sha-verify-and-release-app-post`)
first — verify catches false-success cleanly.
- Sequence after the coordinator's `chore/deploy-mode-default-direct-and-docs`
branch (cadencia) so the webhook→direct migration is in place.
- Then a separate focused branch implements the override-file pattern
in kua-deploy + a per-app smoke test (one app at a time, kua-cashier
is a good first because it's not yet in production traffic).
- Add an integration test that simulates compose-no-op (build new
image, run deploy without bumping tag) and confirms the override-file
variant recreates while the plain-tag variant would not.
## Open questions
- Should we keep `:latest` as the *human-readable* tag and *additionally*
pin by digest in the override? (Yes — operators still want
`muralla-muralla:latest` as a stable rollback anchor.)
- Where does the override file live across the deploy lifecycle?
Tempted: in `/root/apps/<app>/.deploy/` (git-ignored), rotated per
build, kept for N builds for fast rollback.
- Does the existing `release-app` workflow pass an explicit deploy SHA
that the build step could record alongside the image digest? (Yes —
the `deployCommit` variable in kua-deploy server.js already carries
this.)
— end notes —