docs(07-03): complete production compose stack and runbook plan

- SUMMARY for 07-03: docker-compose.prod.yaml, deploy/Caddyfile, README runbook
This commit is contained in:
Arthur Belleville 2026-05-15 18:26:01 +02:00
parent f261fb39b8
commit 7bca961bb0
No known key found for this signature in database

View file

@ -0,0 +1,115 @@
---
phase: 07-deploy-v1
plan: "03"
subsystem: backend-deploy
tags: [go, docker, docker-compose, caddy, tls, runbook, deploy]
dependency_graph:
requires:
- 07-02 (Dockerfile producing /app/web and /app/worker — referenced by image + command in compose)
- 07-01 (RunMigrations called at web startup — D-10 assumption in runbook)
provides:
- backend/docker-compose.prod.yaml — production compose stack (D-01..D-04, D-08)
- backend/deploy/Caddyfile — TLS reverse proxy config with {$DOMAIN} interpolation (D-04)
- backend/README.md Deploy/Rollback/Incident sections — operator runbook (DEPLOY-05)
affects:
- backend/docker-compose.prod.yaml (new file)
- backend/deploy/Caddyfile (new file)
- backend/README.md (extended)
tech_stack:
added:
- docker compose v2 (no `version:` key) — production orchestration on Hetzner VM
- caddy:2-alpine — TLS termination with automatic Let's Encrypt certificate management
patterns:
- Same image, different command: pattern for web/worker services (D-08)
- No postgres host ports binding — internal network only (RESEARCH Pitfall 5, T-07-09)
- depends_on with service_healthy to prevent goose.Up() racing Postgres init (T-07-12)
- {$DOMAIN} Caddy env var interpolation for domain-agnostic Caddyfile
- caddy_data named volume for persistent Let's Encrypt certificate storage (T-07-11)
key_files:
created:
- backend/docker-compose.prod.yaml
- backend/deploy/Caddyfile
modified:
- backend/README.md
decisions:
- "No `ports:` on postgres service — internal compose network only; no host access (T-07-09 mitigated)"
- "Caddyfile uses {$DOMAIN} env var interpolation — domain is operator-configured in .env.prod, Caddyfile stays generic"
- "caddy_data and caddy_config as named volumes — persists TLS certs across Caddy restarts; loss triggers rate-limit risk (T-07-11 accepted + documented)"
- "No CMD in compose — command: per service overrides Dockerfile's missing CMD, consistent with D-08 and Plan 02 decision"
- "Runbook includes break-glass goose down steps — schema rollback edge case documented even though it is not the normal path"
metrics:
duration: "~5 minutes"
completed: "2026-05-15"
tasks: 2
files: 3
---
# Phase 7 Plan 3: Production compose stack, Caddyfile, and operator runbook Summary
## What Was Built
Production Docker Compose stack with postgres, web, worker, and caddy services; a Caddy reverse proxy config with Let's Encrypt TLS and {$DOMAIN} env var interpolation; and an extended README runbook covering first-time deploy, routine deploys, rollback by image tag, and incident triage procedures.
## Tasks Completed
| Task | Name | Commit | Files |
|------|------|--------|-------|
| 1 | docker-compose.prod.yaml and deploy/Caddyfile | 273f063 | backend/docker-compose.prod.yaml, backend/deploy/Caddyfile |
| 2 | Extend README with Deploy, Rollback, Incident Runbook sections | f261fb3 | backend/README.md |
## Decisions Made
1. Postgres service has no `ports:` directive — the postgres container is only reachable within the Docker Compose internal network. Exposing port 5432 on a VM with a public IP would allow unauthenticated internet access (T-07-09 mitigated per RESEARCH Pitfall 5).
2. `{$DOMAIN}` Caddy env var interpolation is used in the Caddyfile site block — this keeps the Caddyfile generic and domain-agnostic. Operators set `DOMAIN=app.yourdomain.com` in `.env.prod`; Caddy reads it at startup without any Caddyfile editing.
3. `caddy_data` and `caddy_config` are named volumes — Let's Encrypt certificates are stored in `caddy_data`. If this volume is lost, Caddy must re-issue certificates, which risks hitting Let's Encrypt's rate limit of 5 duplicate certificates per week (T-07-11 accepted; recovery steps documented in incident runbook).
4. No CMD override in compose — the compose `command:` per service (D-08) is the only invocation path. This is consistent with Plan 02's decision to omit CMD from the Dockerfile's final stage.
5. Break-glass schema rollback via external goose CLI — the distroless runtime image has no shell and no goose binary (RESEARCH Pitfall 7). The runbook documents using either a locally-installed goose CLI or the `ghcr.io/kukymbr/goose-docker` image on the compose network as the path for manual `goose down` when needed.
## Deviations from Plan
None — plan executed exactly as written.
## Verification
All 10 success criteria checked:
1. `docker-compose.prod.yaml` contains postgres, web, worker, caddy services — PASS
2. postgres service has no `ports:` directive (internal network only) — PASS
3. web service has `command: /app/web` — PASS
4. worker service has `command: /app/worker` — PASS
5. Both web and worker have `depends_on: postgres: condition: service_healthy` — PASS (count: 2)
6. caddy service has `caddy_data` and `caddy_config` named volumes — PASS
7. `backend/deploy/Caddyfile` exists with `{$DOMAIN}` site block and `reverse_proxy web:8080` — PASS
8. `grep -c "^## Deploy\|^## Rollback\|^## Incident Runbook" backend/README.md` returns 3 — PASS
9. README includes `chmod 600 .env.prod` and `openssl rand -hex 32` for SESSION_SECRET — PASS
10. README rollback section documents image tag redeployment (D-11) — PASS
Note: `docker compose -f docker-compose.prod.yaml config --quiet` could not be run — Docker Engine
is not installed on this development machine. YAML syntax was validated by manual inspection
and Python content checks; the compose file uses standard compose v2 syntax with no custom
extensions.
## Known Stubs
None — this plan creates infrastructure config files and documentation only; no application
code stubs introduced.
## Threat Flags
No new threat surface beyond plan's threat_model. Security mitigations applied:
- T-07-09 mitigated: No `ports:` on postgres service in docker-compose.prod.yaml
- T-07-10 mitigated: README runbook includes `chmod 600 .env.prod` instruction
- T-07-11 accepted + documented: caddy_data volume loss recovery steps in incident runbook
- T-07-12 mitigated: `depends_on: postgres: condition: service_healthy` prevents goose.Up() racing Postgres init
## Self-Check: PASSED
- backend/docker-compose.prod.yaml: EXISTS (273f063)
- backend/deploy/Caddyfile: EXISTS (273f063)
- backend/README.md: EXISTS with 3 new H2 sections (f261fb3)
- Commits 273f063 and f261fb3: VERIFIED in git log