kua-deploy/NOTES-image-digest-pinning.md

4.4 KiB

NOTES — compose-image digest pinning (deferred prevention item #4)

Date: 2026-05-20
Author: v2-deploy-coordination session (kua-deploy-verify branch)
Status: design sketch, not implemented — needs platform-wide coordination.

Why this matters

Today every app's docker-compose.yml references its app image by tag — typically :latest. docker compose up -d (without --force-recreate) is then a legal no-op when the tag string is unchanged, even if the digest behind that tag changed. That is structurally why kua-deploy's deploy step false-success'd on muralla 2026-05-19 — the build pushed muralla-muralla:latest to a new digest, but the running container was still bound to the previous digest, and up -d saw "no change."

The other prevention items in this branch (post-deploy SHA verify, runtime-status endpoint, release-app post-verify) detect the failure after the fact. Digest pinning would prevent it at the compose level — the reference itself changes every build, so compose has no choice but to recreate.

The design

  1. Build step writes the just-built digest to a small artifact, e.g.
    /root/apps/<app>/.deploy/<svc>.sha
    contents = sha256:655fe0c64391.... Captured immediately after docker compose build from docker compose images --quiet <svc>.

  2. Deploy step rewrites compose with the pinned digest before up, either:

    • In-place rewrite of docker-compose.yml: replace image: muralla-muralla:latest with image: muralla-muralla@<sha>. Risk: dirties Bruno's working tree → trips the existing dirty-tree gate. Mitigation: rewrite a temporary docker-compose.deploy.yml instead, pass via -f to compose.
    • Override file: keep docker-compose.yml untouched; generate a sibling docker-compose.override.deploy.yml per build that just contains services: <svc>: image: muralla-muralla@<sha>. Apply via docker compose -f docker-compose.yml -f docker-compose.override.deploy.yml up -d. Cleaner; doesn't dirty the primary file.
  3. Recreate is then guaranteed: compose sees the image: field changed → must recreate. The post-deploy verifier from this branch becomes a belt-and-suspenders, not the load-bearing safety.

Why this is deferred

  • Touches every app's deploy path — kua-cashier, muralla, muralla-socials, atlas, playgram, coder-core, kua-mail, etc. Each needs its compose conventions checked (some may already pin digests; some may use private registries with digest-bound auth).
  • Interacts with the dual-SoT between bin/webhook-repos.json and services/kua-deploy/deploy-registry.json (the coordinator broadcast called this out — cadencia is mid-untangling it).
  • The post-deploy verify alone closes the false-success class for now; digest pinning is the durable fix layered on top.
  • Stateful services (postgres, redis) explicitly do not want digest pinning at the compose level — they should drift across kua-deploy releases. The override-file pattern naturally limits pinning to the app's own services.
  • Land this branch (chore/kua-deploy-sha-verify-and-release-app-post) first — verify catches false-success cleanly.
  • Sequence after the coordinator's chore/deploy-mode-default-direct-and-docs branch (cadencia) so the webhook→direct migration is in place.
  • Then a separate focused branch implements the override-file pattern in kua-deploy + a per-app smoke test (one app at a time, kua-cashier is a good first because it's not yet in production traffic).
  • Add an integration test that simulates compose-no-op (build new image, run deploy without bumping tag) and confirms the override-file variant recreates while the plain-tag variant would not.

Open questions

  • Should we keep :latest as the human-readable tag and additionally pin by digest in the override? (Yes — operators still want muralla-muralla:latest as a stable rollback anchor.)
  • Where does the override file live across the deploy lifecycle? Tempted: in /root/apps/<app>/.deploy/ (git-ignored), rotated per build, kept for N builds for fast rollback.
  • Does the existing release-app workflow pass an explicit deploy SHA that the build step could record alongside the image digest? (Yes — the deployCommit variable in kua-deploy server.js already carries this.)

— end notes —