diff --git a/.planning/phases/07-deploy-v1/07-03-SUMMARY.md b/.planning/phases/07-deploy-v1/07-03-SUMMARY.md new file mode 100644 index 0000000..3c5a69a --- /dev/null +++ b/.planning/phases/07-deploy-v1/07-03-SUMMARY.md @@ -0,0 +1,115 @@ +--- +phase: 07-deploy-v1 +plan: "03" +subsystem: backend-deploy +tags: [go, docker, docker-compose, caddy, tls, runbook, deploy] +dependency_graph: + requires: + - 07-02 (Dockerfile producing /app/web and /app/worker — referenced by image + command in compose) + - 07-01 (RunMigrations called at web startup — D-10 assumption in runbook) + provides: + - backend/docker-compose.prod.yaml — production compose stack (D-01..D-04, D-08) + - backend/deploy/Caddyfile — TLS reverse proxy config with {$DOMAIN} interpolation (D-04) + - backend/README.md Deploy/Rollback/Incident sections — operator runbook (DEPLOY-05) + affects: + - backend/docker-compose.prod.yaml (new file) + - backend/deploy/Caddyfile (new file) + - backend/README.md (extended) +tech_stack: + added: + - docker compose v2 (no `version:` key) — production orchestration on Hetzner VM + - caddy:2-alpine — TLS termination with automatic Let's Encrypt certificate management + patterns: + - Same image, different command: pattern for web/worker services (D-08) + - No postgres host ports binding — internal network only (RESEARCH Pitfall 5, T-07-09) + - depends_on with service_healthy to prevent goose.Up() racing Postgres init (T-07-12) + - {$DOMAIN} Caddy env var interpolation for domain-agnostic Caddyfile + - caddy_data named volume for persistent Let's Encrypt certificate storage (T-07-11) +key_files: + created: + - backend/docker-compose.prod.yaml + - backend/deploy/Caddyfile + modified: + - backend/README.md +decisions: + - "No `ports:` on postgres service — internal compose network only; no host access (T-07-09 mitigated)" + - "Caddyfile uses {$DOMAIN} env var interpolation — domain is operator-configured in .env.prod, Caddyfile stays generic" + - "caddy_data and caddy_config as named volumes — persists TLS certs across Caddy restarts; loss triggers rate-limit risk (T-07-11 accepted + documented)" + - "No CMD in compose — command: per service overrides Dockerfile's missing CMD, consistent with D-08 and Plan 02 decision" + - "Runbook includes break-glass goose down steps — schema rollback edge case documented even though it is not the normal path" +metrics: + duration: "~5 minutes" + completed: "2026-05-15" + tasks: 2 + files: 3 +--- + +# Phase 7 Plan 3: Production compose stack, Caddyfile, and operator runbook Summary + +## What Was Built + +Production Docker Compose stack with postgres, web, worker, and caddy services; a Caddy reverse proxy config with Let's Encrypt TLS and {$DOMAIN} env var interpolation; and an extended README runbook covering first-time deploy, routine deploys, rollback by image tag, and incident triage procedures. + +## Tasks Completed + +| Task | Name | Commit | Files | +|------|------|--------|-------| +| 1 | docker-compose.prod.yaml and deploy/Caddyfile | 273f063 | backend/docker-compose.prod.yaml, backend/deploy/Caddyfile | +| 2 | Extend README with Deploy, Rollback, Incident Runbook sections | f261fb3 | backend/README.md | + +## Decisions Made + +1. Postgres service has no `ports:` directive — the postgres container is only reachable within the Docker Compose internal network. Exposing port 5432 on a VM with a public IP would allow unauthenticated internet access (T-07-09 mitigated per RESEARCH Pitfall 5). + +2. `{$DOMAIN}` Caddy env var interpolation is used in the Caddyfile site block — this keeps the Caddyfile generic and domain-agnostic. Operators set `DOMAIN=app.yourdomain.com` in `.env.prod`; Caddy reads it at startup without any Caddyfile editing. + +3. `caddy_data` and `caddy_config` are named volumes — Let's Encrypt certificates are stored in `caddy_data`. If this volume is lost, Caddy must re-issue certificates, which risks hitting Let's Encrypt's rate limit of 5 duplicate certificates per week (T-07-11 accepted; recovery steps documented in incident runbook). + +4. No CMD override in compose — the compose `command:` per service (D-08) is the only invocation path. This is consistent with Plan 02's decision to omit CMD from the Dockerfile's final stage. + +5. Break-glass schema rollback via external goose CLI — the distroless runtime image has no shell and no goose binary (RESEARCH Pitfall 7). The runbook documents using either a locally-installed goose CLI or the `ghcr.io/kukymbr/goose-docker` image on the compose network as the path for manual `goose down` when needed. + +## Deviations from Plan + +None — plan executed exactly as written. + +## Verification + +All 10 success criteria checked: + +1. `docker-compose.prod.yaml` contains postgres, web, worker, caddy services — PASS +2. postgres service has no `ports:` directive (internal network only) — PASS +3. web service has `command: /app/web` — PASS +4. worker service has `command: /app/worker` — PASS +5. Both web and worker have `depends_on: postgres: condition: service_healthy` — PASS (count: 2) +6. caddy service has `caddy_data` and `caddy_config` named volumes — PASS +7. `backend/deploy/Caddyfile` exists with `{$DOMAIN}` site block and `reverse_proxy web:8080` — PASS +8. `grep -c "^## Deploy\|^## Rollback\|^## Incident Runbook" backend/README.md` returns 3 — PASS +9. README includes `chmod 600 .env.prod` and `openssl rand -hex 32` for SESSION_SECRET — PASS +10. README rollback section documents image tag redeployment (D-11) — PASS + +Note: `docker compose -f docker-compose.prod.yaml config --quiet` could not be run — Docker Engine +is not installed on this development machine. YAML syntax was validated by manual inspection +and Python content checks; the compose file uses standard compose v2 syntax with no custom +extensions. + +## Known Stubs + +None — this plan creates infrastructure config files and documentation only; no application +code stubs introduced. + +## Threat Flags + +No new threat surface beyond plan's threat_model. Security mitigations applied: + +- T-07-09 mitigated: No `ports:` on postgres service in docker-compose.prod.yaml +- T-07-10 mitigated: README runbook includes `chmod 600 .env.prod` instruction +- T-07-11 accepted + documented: caddy_data volume loss recovery steps in incident runbook +- T-07-12 mitigated: `depends_on: postgres: condition: service_healthy` prevents goose.Up() racing Postgres init + +## Self-Check: PASSED + +- backend/docker-compose.prod.yaml: EXISTS (273f063) +- backend/deploy/Caddyfile: EXISTS (273f063) +- backend/README.md: EXISTS with 3 new H2 sections (f261fb3) +- Commits 273f063 and f261fb3: VERIFIED in git log