diff --git a/.planning/phases/07-deploy-v1/07-RESEARCH.md b/.planning/phases/07-deploy-v1/07-RESEARCH.md new file mode 100644 index 0000000..c2c7e2a --- /dev/null +++ b/.planning/phases/07-deploy-v1/07-RESEARCH.md @@ -0,0 +1,660 @@ +# Phase 7: Deploy v1 - Research + +**Researched:** 2026-05-15 +**Domain:** Go Docker multi-stage build, docker compose, Caddy reverse proxy, goose programmatic migrations, go:embed static assets, health checks +**Confidence:** HIGH + +## Summary + +Phase 7 packages the existing Go backend into a production-ready Docker image, deploys it to a Hetzner VM via plain Docker Compose, and wires Caddy as a TLS-terminating reverse proxy. The phase has five distinct work areas: (1) convert static asset serving from on-disk paths to `go:embed`, (2) add programmatic `goose.Up()` migration call in `cmd/web` startup, (3) build a multi-stage Dockerfile producing `/app/web` and `/app/worker` in a single image, (4) split the existing `/healthz` handler into a liveness route (no DB ping) and a new `/readyz` route (DB ping), and (5) write `docker-compose.prod.yaml`, `deploy/Caddyfile`, and the `backend/README.md` runbook. + +The codebase is well-prepared: `cmd/worker/main.go` already demonstrates the exact programmatic migration pattern (rivermigrate); `cmd/web/main.go` already reads all config from env vars; `signal.NotifyContext` graceful shutdown is in both binaries. The static files are currently served from `./static` on disk via `http.Dir`; they must be switched to `http.FS(embed.FS)` so the final container has zero runtime file dependencies. The existing `HealthzHandler` does a DB ping — that behavior must move to `/readyz`; `/healthz` becomes a pure liveness check. + +**Primary recommendation:** Build in this wave order — Wave 0 (go:embed + `/readyz` split), Wave 1 (goose.Up startup migration), Wave 2 (Dockerfile), Wave 3 (compose + Caddy + env docs), Wave 4 (README runbook). Each wave is independently testable. + + +## User Constraints (from CONTEXT.md) + +### Locked Decisions + +- **D-01:** Production host is a Hetzner VM running Docker Compose. No PaaS, no Kubernetes. +- **D-02:** The full stack runs via plain `docker compose` — no Dokploy or Swarm mode in v1. +- **D-03:** Postgres runs on the VM inside the compose stack, volume-backed. No managed Postgres service for v1. +- **D-04:** Caddy is a service in `docker-compose.prod.yaml`. It proxies to `web:8080` and handles TLS via Let's Encrypt. Config via a bind-mounted Caddyfile. +- **D-05:** Production secrets (`SESSION_SECRET`, `DATABASE_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINT_URL`, `AWS_BUCKET`, `PORT`, `ENV`) are stored in a `.env` file on the Hetzner host (gitignored). `docker compose --env-file .env.prod up` reads it. No SOPS, no Docker secrets API. +- **D-06:** S3-compatible storage in production is Cloudflare R2. R2 credentials live in the host `.env` file. MinIO remains in `compose.yaml` for local dev only. +- **D-07:** A single multi-stage Dockerfile produces one image containing two binaries: `/app/web` (from `cmd/web`) and `/app/worker` (from `cmd/worker`). Both compiled in the builder stage, copied to the final runtime stage. +- **D-08:** `docker-compose.prod.yaml` runs the same image twice: one service with `command: /app/web`, one with `command: /app/worker`. No subcommand dispatcher needed in Go code. +- **D-09:** All static assets (Tailwind-compiled CSS, HTMX JS, Sortable.js, templ-generated HTML) are embedded via `//go:embed` at build time. No volume mounts for assets. +- **D-10:** Migrations run programmatically inside the `web` binary at startup: `web` calls `goose.Up()` via the goose library before binding the HTTP server. +- **D-11:** Rollback strategy: redeploy the previous image tag. Normal rollback = update compose image tag + `docker compose up -d`. `goose down` is documented as a break-glass step only. +- **D-12:** `/healthz` — liveness: returns 200 OK immediately if the server is up (no DB ping). Used by Caddy / uptime monitor. +- **D-13:** `/readyz` — readiness: returns 200 OK only if the DB pool is reachable (one `db.Ping()` call). Returns 503 during startup until migrations complete and the pool is healthy. Worker does not expose HTTP. + +### Claude's Discretion + +- Exact Dockerfile base image for the builder stage (e.g., `golang:1.26-alpine` vs `golang:1.26`). +- Final runtime base: `distroless/static` vs `alpine`. +- Caddyfile content (reverse proxy config, TLS directive, HTTPS redirect). +- Whether `docker-compose.prod.yaml` includes a `healthcheck:` directive for the Postgres service. +- Exact docker compose version / syntax used (`compose.yaml` already uses v2 syntax). +- Whether the `web` service in prod compose `depends_on` the `postgres` service with a health condition. + +### Deferred Ideas (OUT OF SCOPE) + +- Dokploy layer +- CI/CD pipeline +- pg_dump backup cron +- MinIO for prod + + + +## Phase Requirements + +| ID | Description | Research Support | +|----|-------------|------------------| +| DEPLOY-01 | Both binaries build into a single multi-stage Docker image | Multi-stage Dockerfile pattern: builder copies both `cmd/web` and `cmd/worker`, runtime stage copies both binaries | +| DEPLOY-02 | Image runs on a single VPS with env-injected config (no Supabase, no GCP) | `docker-compose.prod.yaml` with `env_file` directive; all env vars already read from `os.Getenv` in both binaries | +| DEPLOY-03 | Migrations run on deploy without manual intervention | `goose.Up()` with `embed.FS` inside `cmd/web` startup, mirrors rivermigrate pattern in `cmd/worker` | +| DEPLOY-04 | Health checks (`/healthz`, `/readyz`) and structured logs | `/healthz` already exists (needs liveness-only refactor); `/readyz` is new; JSON slog already live on `ENV=production` | +| DEPLOY-05 | Documented runbook in `backend/README.md` covering local dev, deploy, rollback | Extends existing `backend/README.md`; adds deploy, rollback, incident sections | + + +## Architectural Responsibility Map + +| Capability | Primary Tier | Secondary Tier | Rationale | +|------------|-------------|----------------|-----------| +| TLS termination | Caddy (compose service) | — | Caddy owns ACME/Let's Encrypt; web binary speaks plain HTTP on internal network | +| Static asset serving | Web binary (go:embed) | — | D-09: embedded at build time; no runtime file mounts | +| Database migrations | Web binary startup | — | D-10: goose.Up() before HTTP server binds | +| Liveness check (/healthz) | Web binary | — | D-12: no DB dependency; fast 200 as long as server process is up | +| Readiness check (/readyz) | Web binary | — | D-13: DB ping; Caddy or uptime monitor checks this before routing traffic | +| Background jobs | Worker binary | — | Separate container in compose, same image, command: /app/worker | +| Secret injection | Docker Compose env_file | Host .env.prod file | D-05: no secrets API needed | +| Postgres persistence | Postgres compose service | Volume | D-03: volume-backed, not managed | + +## Standard Stack + +### Core +| Library | Version | Purpose | Why Standard | +|---------|---------|---------|--------------| +| `github.com/pressly/goose/v3` | v3.27.1 | Programmatic migrations with embed.FS | Already in go.mod; `SetBaseFS` + `Up` is the idiomatic pattern for embedded migrations [VERIFIED: go.mod] | +| `embed` (stdlib) | Go 1.16+ | Embed static/ and migrations/ into binary | No dependency, available since Go 1.16; project uses Go 1.26 [VERIFIED: go.mod] | +| `io/fs` (stdlib) | Go 1.16+ | `fs.Sub` to strip directory prefix for http.FileServer | Required companion to embed.FS for serving static files [VERIFIED: Go stdlib] | +| `gcr.io/distroless/static-debian12` | nonroot tag | Final container runtime base | Smallest option (~2MiB); no shell; correct for CGO_ENABLED=0 Go binaries [VERIFIED: GoogleContainerTools/distroless GitHub] | +| `golang:1.26-alpine` | current | Builder stage base | Matches go.mod version; alpine keeps layer small and avoids glibc issues for distroless final [ASSUMED] | +| `caddy:2-alpine` | current | Reverse proxy + automatic TLS | Official image; Let's Encrypt auto-cert; simple Caddyfile syntax [CITED: caddyserver.com/docs] | + +### Supporting +| Library | Version | Purpose | When to Use | +|---------|---------|---------|-------------| +| `github.com/jackc/pgx/v5/stdlib` | v5.9.2 | Bridge pgxpool → database/sql for goose | goose.Up requires *sql.DB; pgx/v5/stdlib wraps pgxpool conn string into sql.Open | + +**Installation — no new top-level dependencies needed.** `pgx/v5/stdlib` is already a transitive dependency via pgx/v5 (in go.mod). Only needs to be added as a direct import in the migration helper. + +**Version verification:** +``` +github.com/pressly/goose/v3 v3.27.1 [VERIFIED: go.mod] +github.com/jackc/pgx/v5 v5.9.2 [VERIFIED: go.mod] +``` + +## Architecture Patterns + +### System Architecture Diagram + +``` +Internet + │ + ▼ :80/:443 +[Caddy] ─── ACME/Let's Encrypt cert management + │ + │ :8080 (internal Docker network) + ▼ +[web container] ─── cmd/web binary (go:embed: static/ + migrations/) + │ startup: goose.Up() ──► [postgres:5432] + │ /healthz → 200 always (liveness) + │ /readyz → 200 if DB ping ok (readiness) + │ /static/* → http.FS(embed.FS) + │ /tablos/* → HTMX handlers → [postgres:5432] + │ +[worker container] ─── cmd/worker binary (same image, command: /app/worker) + │ startup: rivermigrate.Up() ──► [postgres:5432] + │ river periodic jobs ──► [postgres:5432] + │ orphan cleanup ──────► [Cloudflare R2] + │ +[postgres container] ─── postgres:16-alpine, volume: postgres_data +[caddy_data volume] ─── TLS certificate persistence + +Host .env.prod ──► docker compose --env-file .env.prod +``` + +### Recommended Project Structure +``` +backend/ +├── cmd/ +│ ├── web/main.go # add goose.Up() before http.ListenAndServe +│ └── worker/main.go # unchanged (rivermigrate already wired) +├── deploy/ +│ └── Caddyfile # bind-mounted into caddy container at runtime +├── migrations/ # existing SQL files +├── static/ # generated at build time; embedded via go:embed +├── Dockerfile # new: multi-stage, produces /app/web + /app/worker +├── docker-compose.prod.yaml # new: postgres + web + worker + caddy +├── .env.example # update: add R2 vars, DOMAIN, remove TEST_DATABASE_URL note +└── README.md # update: add Deploy, Rollback, Incident sections +``` + +### Pattern 1: go:embed for Static Assets + +**What:** Replace `http.Dir(staticDir)` with an embedded `embed.FS`. The `staticDir string` parameter in `NewRouter` becomes an `fs.FS` parameter. + +**When to use:** Production — binary has zero runtime file dependencies. + +**Two options for NewRouter signature change:** + +Option A (recommended — backward compatible for tests): Accept `fs.FS` instead of `string`: +```go +// Source: Go stdlib io/fs + embed docs +//go:embed static +var StaticFiles embed.FS + +// In NewRouter, change staticDir string → staticFS fs.FS +staticSub, _ := fs.Sub(staticFS, "static") +fileHandler := http.FileServer(http.FS(staticSub)) +r.Get("/static/*", http.StripPrefix("/static/", fileHandler).ServeHTTP) +``` + +In `cmd/web/main.go`: +```go +//go:embed static +var staticFiles embed.FS +// ... +router := web.NewRouter(pool, staticFiles, ...) +``` + +In tests: pass `os.DirFS("./static")` to avoid embedding during unit test runs. + +**Constraint:** The `//go:embed static` directive must live in a file in the same package as the embedded directory, or a parent package. Because `static/` is at the module root (not inside a Go package), the embed directive lives in `cmd/web/main.go` which can reference `../../static` — but embed paths must be relative to the file. The cleanest approach is an `assets` package at the module root: + +```go +// backend/assets/assets.go +package assets + +import "embed" + +//go:embed static +var Static embed.FS +``` + +Then `cmd/web/main.go` imports `backend/assets` and passes `assets.Static` to `NewRouter`. + +> NOTE: The `//go:embed` directive path must be relative to the .go file containing the directive. `static/` must be reachable from the Go file. Verify during implementation that the embed path resolves correctly relative to `cmd/web/main.go` or an `assets` package. + +### Pattern 2: goose.Up() at Web Startup + +**What:** Before binding the HTTP server, call goose programmatic migrations using the embedded SQL files and a `*sql.DB` derived from the existing pgxpool connection string. + +**Source:** [pressly/goose embed docs](https://pressly.github.io/goose/blog/2021/embed-sql-migrations/) [CITED] + +```go +// backend/internal/db/migrate.go (new file) +package db + +import ( + "context" + "database/sql" + "embed" + + "github.com/jackc/pgx/v5/pgxpool" + _ "github.com/jackc/pgx/v5/stdlib" // register "pgx/v5" driver + "github.com/pressly/goose/v3" +) + +//go:embed ../../migrations/*.sql +var migrationFS embed.FS + +// RunMigrations opens a sql.DB from the pool's DSN and runs all pending +// goose migrations embedded in the binary. +func RunMigrations(ctx context.Context, pool *pgxpool.Pool) error { + dsn := pool.Config().ConnConfig.ConnString() + db, err := sql.Open("pgx/v5", dsn) + if err != nil { + return err + } + defer db.Close() + + goose.SetBaseFS(migrationFS) + if err := goose.SetDialect("postgres"); err != nil { + return err + } + return goose.Up(db, "migrations") +} +``` + +Called in `cmd/web/main.go` after pool creation and before router/server setup: +```go +if err := db.RunMigrations(ctx, pool); err != nil { + slog.Error("migrations failed", "err", err) + os.Exit(1) +} +``` + +> IMPORTANT: The `//go:embed` path `../../migrations/*.sql` only works if `migrate.go` is in `backend/internal/db/`. Verify the relative path at implementation time. Alternative: use an `assets` package or place the embed directive in `cmd/web/main.go` directly. + +**Idempotency:** `goose.Up()` is idempotent — already-applied versions are skipped via the `goose_db_version` table. Safe to call on every startup. + +### Pattern 3: Multi-Stage Dockerfile + +**What:** Single Dockerfile, builder compiles both binaries with CGO_ENABLED=0, distroless runtime copies both. + +```dockerfile +# Source: GoogleContainerTools/distroless README + Go multi-stage build docs [CITED] + +# ── Stage 1: Generate assets ────────────────────────────────────────────────── +FROM node:20-alpine AS assets +WORKDIR /app +# Download Tailwind standalone CLI (pinned version from justfile) +RUN apk add --no-cache curl && \ + curl -sSL -o /usr/local/bin/tailwindcss \ + "https://github.com/tailwindlabs/tailwindcss/releases/download/v4.0.0/tailwindcss-linux-x64" && \ + chmod +x /usr/local/bin/tailwindcss && \ + curl -sSL -o static/htmx.min.js "https://unpkg.com/htmx.org@2/dist/htmx.min.js" && \ + curl -sSL -o static/sortable.min.js "https://cdn.jsdelivr.net/npm/sortablejs@1.15.7/Sortable.min.js" +COPY tailwind.input.css . +COPY templates/ templates/ +RUN tailwindcss -i tailwind.input.css -o static/tailwind.css --minify + +# ── Stage 2: Build Go binaries ──────────────────────────────────────────────── +FROM golang:1.26-alpine AS builder +WORKDIR /app +COPY go.mod go.sum ./ +RUN go mod download +COPY . . +COPY --from=assets /app/static ./static + +# templ generate must run before go build (templates compile to .go files) +RUN go install github.com/a-h/templ/cmd/templ@v0.3.1020 && templ generate + +RUN CGO_ENABLED=0 GOOS=linux \ + go build -ldflags="-s -w" -trimpath -o /app/web ./cmd/web +RUN CGO_ENABLED=0 GOOS=linux \ + go build -ldflags="-s -w" -trimpath -o /app/worker ./cmd/worker + +# ── Stage 3: Runtime ────────────────────────────────────────────────────────── +FROM gcr.io/distroless/static-debian12:nonroot +COPY --from=builder /app/web /app/web +COPY --from=builder /app/worker /app/worker +EXPOSE 8080 +# No CMD or ENTRYPOINT — compose overrides with `command: /app/web` or `/app/worker` +``` + +**Planner note on Dockerfile stages:** The assets stage (Tailwind build + JS downloads) could be merged into the Go builder stage to reduce complexity, at the cost of a heavier builder image. Two dedicated stages is cleaner but either approach is valid. + +### Pattern 4: /healthz and /readyz Split + +**What:** Current `HealthzHandler` pings the DB and is registered at `/healthz`. D-12 requires `/healthz` to be a pure liveness check (no DB ping); D-13 requires `/readyz` to do the DB ping. + +**Existing code:** `HealthzHandler(pinger Pinger)` in `handlers.go` — it already uses the `Pinger` interface. Simply: +1. Rename `HealthzHandler` → `ReadyzHandler` (or keep the name and change behavior — see below) +2. Add a new `HealthzHandler` that returns 200 unconditionally +3. Register `/healthz` → new liveness handler, `/readyz` → DB-pinging handler + +```go +// Liveness — no dependencies +func HealthzHandler() http.HandlerFunc { + return func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusOK) + _, _ = w.Write([]byte(`{"status":"ok"}`)) + } +} + +// Readiness — DB ping +func ReadyzHandler(pinger Pinger) http.HandlerFunc { + return func(w http.ResponseWriter, r *http.Request) { + ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second) + defer cancel() + w.Header().Set("Content-Type", "application/json") + if err := pinger.Ping(ctx); err != nil { + w.WriteHeader(http.StatusServiceUnavailable) + _, _ = w.Write([]byte(`{"status":"degraded","db":"down"}`)) + return + } + w.WriteHeader(http.StatusOK) + _, _ = w.Write([]byte(`{"status":"ok","db":"ok"}`)) + } +} +``` + +**Existing tests:** `TestHealthz_OK` and `TestHealthz_Down` in `handlers_test.go` test the current DB-pinging behavior. These must be updated to test the split: one test for the new liveness `HealthzHandler`, two tests for `ReadyzHandler`. + +### Pattern 5: docker-compose.prod.yaml + +```yaml +# Source: D-02 through D-09; v2 compose syntax matching existing compose.yaml + +services: + postgres: + image: postgres:16-alpine + restart: unless-stopped + environment: + POSTGRES_DB: ${POSTGRES_DB:-xtablo} + POSTGRES_USER: ${POSTGRES_USER:-xtablo} + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} + volumes: + - postgres_data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-xtablo}"] + interval: 10s + timeout: 5s + retries: 10 + # No ports: exposed — only reachable within compose network + + web: + image: ${IMAGE:-ghcr.io/yourusername/xtablo}:${TAG:-latest} + command: /app/web + restart: unless-stopped + env_file: .env.prod + depends_on: + postgres: + condition: service_healthy + expose: + - "8080" + # No ports: — Caddy handles external traffic + + worker: + image: ${IMAGE:-ghcr.io/yourusername/xtablo}:${TAG:-latest} + command: /app/worker + restart: unless-stopped + env_file: .env.prod + depends_on: + postgres: + condition: service_healthy + + caddy: + image: caddy:2-alpine + restart: unless-stopped + ports: + - "80:80" + - "443:443" + - "443:443/udp" # HTTP/3 + volumes: + - ./deploy/Caddyfile:/etc/caddy/Caddyfile:ro + - caddy_data:/data + - caddy_config:/config + +volumes: + postgres_data: + caddy_data: + caddy_config: +``` + +### Pattern 6: Caddyfile + +```caddyfile +# Source: caddyserver.com/docs/caddyfile [CITED] +# Place at: backend/deploy/Caddyfile +# Caddy automatically provisions and renews TLS via Let's Encrypt. +# Domain is read from env via {$DOMAIN} interpolation. + +{$DOMAIN} { + reverse_proxy web:8080 +} +``` + +For HTTPS redirect (HTTP → HTTPS), Caddy handles this automatically when a domain name is specified — no explicit redirect directive is needed. [CITED: caddyserver.com/docs/automatic-https] + +### Anti-Patterns to Avoid + +- **Volume-mounting static/ at runtime:** D-09 prohibits this. Assets must be embedded. A volume mount for assets would break the self-contained binary requirement. +- **Separate goose CLI binary in the image:** D-10 prohibits this. Migrations run inside the web binary via `goose.Up()`. +- **CMD /app/web in Dockerfile:** D-08 says compose overrides the command; having a default CMD is fine as documentation but the planner should use `command:` in compose to make the intent explicit. Prefer no CMD in the Dockerfile so the compose `command:` is the single source of truth. +- **Exposing postgres port to host:** Postgres should only be reachable inside the compose network. Bind a host port only for break-glass debug access, not permanently. +- **Single large env_file commit:** The `.env.prod` on the host is gitignored. The repo only contains `.env.example` updated with new R2 vars. +- **CGO_ENABLED=1 with distroless/static:** distroless/static has no C libraries. CGO must be disabled. +- **Missing `caddy_data` volume:** Without a persistent volume for Caddy's `/data`, TLS certificates are re-issued on every container restart, which will hit Let's Encrypt rate limits. + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| TLS certificate lifecycle | Custom ACME client | Caddy automatic HTTPS | ACME, renewal, stapling, redirects all handled transparently | +| Database migration versioning | Custom version table | goose.Up() | Race conditions, rollback tracking, idempotency already solved | +| Static file embedding | Custom asset bundler | `//go:embed` + `http.FS` | Stdlib; zero dependencies; correct path resolution | +| Let's Encrypt rate limit management | Manual cert issuance | Caddy + persistent `caddy_data` volume | Caddy manages staging/prod issuance and renewal automatically | + +**Key insight:** Every custom solution in this domain (cert management, migration versioning, asset bundling) replicates work the standard tools already do correctly with far fewer failure modes. + +## Common Pitfalls + +### Pitfall 1: embed.FS path relative to Go file, not working directory +**What goes wrong:** `//go:embed ../../static` fails — embed paths cannot traverse above the module root or use `..`. +**Why it happens:** `go:embed` paths are relative to the Go source file and cannot reference paths outside the module root. +**How to avoid:** Either place the embed directive in a file that is a sibling or ancestor of the `static/` directory (e.g., an `assets` package at `backend/assets/`), or place `static/` inside the same directory tree as `cmd/web/`. Since `static/` is at `backend/static/` and cmd/web is at `backend/cmd/web/`, an `assets` package at `backend/assets/` with `//go:embed ../static` works because it's still within the module. Verify the path during implementation. +**Warning signs:** Build error "pattern ../static: invalid pattern syntax" or "pattern must not begin with `..`". + +### Pitfall 2: goose needs *sql.DB, not *pgxpool.Pool +**What goes wrong:** Attempting to pass pgxpool.Pool directly to goose.Up() fails to compile — goose's API requires `*database/sql.DB`. +**Why it happens:** goose predates the pgx native pool API and abstracts over `database/sql`. +**How to avoid:** Extract the connection string via `pool.Config().ConnConfig.ConnString()` and open a `*sql.DB` with `sql.Open("pgx/v5", connStr)` after importing `_ "github.com/jackc/pgx/v5/stdlib"`. Close the sql.DB after migrations complete — the pool remains open for application use. +**Warning signs:** Compile error "cannot use pool (type *pgxpool.Pool) as type *sql.DB". + +### Pitfall 3: goose_db_version table collision with test schema +**What goes wrong:** Integration tests that create isolated schemas via `goose.SetTableName` in dev continue to work, but in production the goose_db_version table name must remain the default `goose_db_version` in the `public` schema. +**Why it happens:** Tests use `goose.SetTableName("schema.goose_db_version")` to namespace the version table. Production must use the plain default. +**How to avoid:** The migration helper (Pattern 2 above) calls `goose.SetDialect` and `goose.Up` without `SetTableName`. The global goose state is shared — if tests run in the same process as migrations, order matters. Keep the migration helper stateless and only set the base FS and dialect. +**Warning signs:** Production migrations running twice or out of sync with what tests created. + +### Pitfall 4: Missing caddy_data volume = Let's Encrypt rate limit +**What goes wrong:** Restarting Caddy without a persistent `/data` volume triggers a new ACME certificate request on every restart. Let's Encrypt allows 50 certificates per registered domain per week; repeated restarts during setup exhaust the quota. +**Why it happens:** Caddy stores issued certificates in `/data`. Without a named volume, the directory is ephemeral. +**How to avoid:** Always mount a named volume at `/data` and `/config` for the caddy service (see Pattern 5). Test with Let's Encrypt staging (`tls { ca https://acme-staging-v02.api.letsencrypt.org/directory }`) before switching to production. +**Warning signs:** Caddy logs "too many certificates already issued for this domain" or certificate errors after restart. + +### Pitfall 5: web service starts before postgres is ready +**What goes wrong:** The web binary attempts `goose.Up()` immediately at startup; if postgres is still initializing, the DB connection fails and the process exits. +**Why it happens:** Docker Compose `depends_on: service_started` (the default) only waits for the container to start, not for postgres to accept connections. +**How to avoid:** Use `depends_on: postgres: condition: service_healthy` in `docker-compose.prod.yaml`. This requires the postgres service to have a `healthcheck:` directive (see Pattern 5 above). The existing `compose.yaml` already uses this pattern — mirror it exactly. +**Warning signs:** web container exits at startup with "db connect failed" or "migrations failed"; postgres logs show "database system is starting up". + +### Pitfall 6: templ-generated .go files not committed = Docker build fails +**What goes wrong:** `go build ./cmd/web` inside Docker fails because `*_templ.go` files are in `.gitignore` and `COPY . .` does not include them. +**Why it happens:** `templ generate` is a dev-time step; the generated files are gitignored per project convention (STATE.md). +**How to avoid:** Run `templ generate` inside the Dockerfile builder stage before `go build`. Install the templ CLI in the builder image at the pinned version from justfile (`v0.3.1020`). +**Warning signs:** Build error "undefined: templates.TablosPage" or similar undefined references to templ-generated component functions. + +### Pitfall 7: distroless has no shell — debugging requires :debug tag +**What goes wrong:** `docker exec -it web sh` fails because distroless/static has no shell. +**Why it happens:** distroless deliberately removes all OS tooling to minimize attack surface. +**How to avoid:** Use the `:debug` tag variant during initial setup: `gcr.io/distroless/static-debian12:debug`. Switch to `:nonroot` for production. Document in runbook how to use an ephemeral debug container (`docker run --rm -it --network container: busybox sh`) when debugging production. +**Warning signs:** `docker exec` returns "OCI runtime exec failed: exec: 'sh': executable file not found". + +### Pitfall 8: /healthz currently pings DB — tests must be updated +**What goes wrong:** After splitting `/healthz` (liveness) from `/readyz` (readiness), the existing `TestHealthz_OK` and `TestHealthz_Down` tests fail because they expect the DB-pinging behavior on `/healthz`. +**Why it happens:** The current `HealthzHandler` does both jobs. D-12/D-13 require them to be separate routes with separate handlers. +**How to avoid:** Update `handlers_test.go` in the same plan that refactors the handlers. Add `TestReadyz_OK` and `TestReadyz_Down` mirroring the current test structure; update `TestHealthz_OK` to verify 200 with no pinger dependency. +**Warning signs:** Test failures after the handler refactor: "status = 503; want 200" for the new liveness check. + +## Code Examples + +### go:embed with fs.Sub for static files +```go +// Source: Go stdlib docs — embed + io/fs [CITED: pkg.go.dev/embed] +// In backend/assets/assets.go: +package assets + +import "embed" + +//go:embed static +var Static embed.FS + +// In internal/web/router.go — NewRouter now accepts fs.FS: +import "io/fs" + +func NewRouter(pinger Pinger, staticFS fs.FS, ...) http.Handler { + // ... + sub, err := fs.Sub(staticFS, "static") + if err != nil { + panic("static embed sub failed: " + err.Error()) + } + r.Get("/static/*", http.StripPrefix("/static/", + http.FileServer(http.FS(sub))).ServeHTTP) +} +``` + +### goose programmatic migration with pgxpool bridge +```go +// Source: pressly/goose embed docs + pgx/v5/stdlib [CITED: pressly.github.io/goose] +import ( + "database/sql" + _ "github.com/jackc/pgx/v5/stdlib" + "github.com/pressly/goose/v3" +) + +func RunMigrations(pool *pgxpool.Pool, migrationsFS embed.FS) error { + dsn := pool.Config().ConnConfig.ConnString() + db, err := sql.Open("pgx/v5", dsn) + if err != nil { + return err + } + defer db.Close() + goose.SetBaseFS(migrationsFS) + if err := goose.SetDialect("postgres"); err != nil { + return err + } + return goose.Up(db, "migrations") +} +``` + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| Separate goose CLI binary | `goose.SetBaseFS` + programmatic `goose.Up` with `embed.FS` | goose v3 (2021) | Binary is self-contained; no CLI tool needed in final image | +| `http.Dir(staticDir)` | `http.FS(embed.FS)` via `fs.Sub` | Go 1.16 (2021) | Binary has no runtime file dependencies | +| Separate migration service in compose | Migration in app startup | — | Fewer moving parts; migrations atomic with app start | + +**Deprecated/outdated:** +- `gcr.io/distroless/static` (without Debian variant suffix): The versioned tag `gcr.io/distroless/static-debian12` is preferred. The unversioned `static` tag still works but `static-debian12:nonroot` is more explicit about the security posture. + +## Assumptions Log + +| # | Claim | Section | Risk if Wrong | +|---|-------|---------|---------------| +| A1 | `golang:1.26-alpine` is the correct builder base (go.mod says `go 1.26.1`) | Standard Stack | If Alpine's musl causes subtle issues, switch to `golang:1.26` (debian); distroless is still compatible with CGO_ENABLED=0 | +| A2 | `//go:embed` can reference `static/` from an `assets` package at `backend/assets/` | Pattern 1 | If embed path resolution differs, the alternative is to place the static/ directory inside cmd/web/ or use a different package layout | +| A3 | Tailwind standalone binary download in Docker builder is reliable during CI/CD | Pattern 3 (Dockerfile) | If the external download is flaky, bake the Tailwind binary into the builder image or add it to the repo as a committed artifact | +| A4 | `pool.Config().ConnConfig.ConnString()` reconstructs a DSN compatible with `sql.Open("pgx/v5", ...)` | Pattern 2 (goose bridge) | If ConnString() omits sslmode or other params, pass the original DATABASE_URL env var directly to sql.Open instead | + +## Open Questions + +1. **embed.FS path for migrations/ in internal/db/migrate.go** + - What we know: `//go:embed` paths are relative to the Go source file and cannot use `..` to go above the module root. + - What's unclear: Whether `backend/internal/db/migrate.go` can embed `../../migrations/*.sql` — the `migrations/` directory is at `backend/migrations/`, and `internal/db/` is 2 levels deep. + - Recommendation: Test during Wave 1 implementation. If `../../migrations` is rejected, move the embed directive to an `assets` package at `backend/assets/` or to `cmd/web/main.go` itself and pass the `embed.FS` into `RunMigrations`. + +2. **templ generate in Docker build** + - What we know: `*_templ.go` files are gitignored (STATE.md); the build fails without them. + - What's unclear: Whether `RUN go install github.com/a-h/templ/cmd/templ@v0.3.1020 && templ generate` in the builder stage is fast enough or needs caching. + - Recommendation: Use `--mount=type=cache,target=/root/.cache/go-build` on the `go build` steps; the templ generate step is fast (it's pure Go → Go codegen, no compilation). + +3. **go:embed and files starting with `.` or `_`** + - What we know: By default, `//go:embed` excludes files/dirs starting with `.` or `_`. + - What's unclear: Whether `static/` contains any such files (e.g., `.gitkeep`). + - Recommendation: Check `ls -la backend/static/` during implementation. If such files exist, use `//go:embed all:static`. + +## Environment Availability + +| Dependency | Required By | Available | Version | Fallback | +|------------|------------|-----------|---------|----------| +| Docker / docker compose | Image build, compose stack | [ASSUMED: yes on Hetzner VM] | — | podman compose (used in dev) | +| Go 1.26 | Dockerfile builder stage | ✓ (pulled from registry) | golang:1.26-alpine | — | +| Caddy 2 | TLS proxy | ✓ (pulled from registry) | caddy:2-alpine | — | +| postgres:16-alpine | DB service | ✓ (pulled from registry) | 16-alpine | — | +| gcr.io/distroless/static-debian12 | Final image | ✓ (pulled from registry) | nonroot | alpine (has shell, larger) | + +**Missing dependencies with no fallback:** None identified. + +**Missing dependencies with fallback:** If the Hetzner VM has only `podman`, use `podman compose` — the `docker-compose.prod.yaml` syntax is identical and podman compose supports it. + +## Validation Architecture + +### Test Framework +| Property | Value | +|----------|-------| +| Framework | Go test (stdlib) + httptest | +| Config file | none | +| Quick run command | `cd backend && go test ./internal/web/... -run TestHealthz -v` | +| Full suite command | `cd backend && go test ./...` | + +### Phase Requirements → Test Map +| Req ID | Behavior | Test Type | Automated Command | File Exists? | +|--------|----------|-----------|-------------------|-------------| +| DEPLOY-01 | Docker image builds with both binaries | smoke | `docker build -f backend/Dockerfile backend/ --target builder` | ❌ Wave 2 | +| DEPLOY-02 | web binary reads all config from env | unit | `go test ./cmd/web/... -run TestEnvConfig` | ❌ Wave 3 (optional; main.go logic is simple) | +| DEPLOY-03 | goose.Up() runs on startup without error | unit | `go test ./internal/db/... -run TestRunMigrations` | ❌ Wave 1 | +| DEPLOY-04 | /healthz returns 200 (no pinger) | unit | `go test ./internal/web/... -run TestHealthz` | ✅ (needs refactor) | +| DEPLOY-04 | /readyz returns 200 when DB ok | unit | `go test ./internal/web/... -run TestReadyz_OK` | ❌ Wave 0 | +| DEPLOY-04 | /readyz returns 503 when DB down | unit | `go test ./internal/web/... -run TestReadyz_Down` | ❌ Wave 0 | +| DEPLOY-05 | README runbook exists and covers all sections | manual | read backend/README.md | ❌ Wave 4 | + +### Sampling Rate +- **Per task commit:** `cd backend && go test ./internal/web/... -count=1` +- **Per wave merge:** `cd backend && go test ./... -count=1` +- **Phase gate:** Full suite green before `/gsd-verify-work` + +### Wave 0 Gaps +- [ ] `backend/internal/web/handlers.go` — refactor HealthzHandler (liveness) + add ReadyzHandler +- [ ] `backend/internal/web/handlers_test.go` — update TestHealthz_* + add TestReadyz_* +- [ ] `backend/internal/web/router.go` — add `/readyz` route; update `/healthz` to use new liveness handler + +## Security Domain + +### Applicable ASVS Categories + +| ASVS Category | Applies | Standard Control | +|---------------|---------|-----------------| +| V2 Authentication | no | (sessions already implemented in Phase 2) | +| V3 Session Management | no | (already implemented in Phase 2) | +| V4 Access Control | no | (health endpoints are public by design — no auth on /healthz or /readyz) | +| V5 Input Validation | no | (no new user input in this phase) | +| V6 Cryptography | no | (SESSION_SECRET already handled; TLS delegated to Caddy) | +| V14 Configuration | yes | Secrets in host .env file (D-05); file must be chmod 600; not committed to git | + +### Known Threat Patterns for deploy phase + +| Pattern | STRIDE | Standard Mitigation | +|---------|--------|---------------------| +| Secrets in Docker image layers | Information Disclosure | Never COPY .env into image; use `env_file:` at runtime in compose | +| .env.prod committed to git | Information Disclosure | .env.prod in .gitignore; .env.example has no real values | +| Health endpoint information leakage | Information Disclosure | /healthz and /readyz return minimal JSON; no version strings, no stack traces | +| Postgres exposed to internet | Elevation of Privilege | No `ports:` directive on postgres service; only accessible within compose network | +| Caddy data volume not backed up | Denial of Service | Document in runbook that caddy_data volume loss requires waiting for cert re-issuance (or restoring from backup) | + +## Sources + +### Primary (HIGH confidence) +- `backend/go.mod` — verified goose v3.27.1, pgx/v5 v5.9.2, Go 1.26.1 +- `backend/cmd/web/main.go` — verified env var reading, pgxpool, static file serving pattern +- `backend/cmd/worker/main.go` — verified rivermigrate startup pattern (model for goose.Up) +- `backend/internal/web/router.go` + `handlers.go` — verified current healthz handler +- `backend/compose.yaml` — verified postgres healthcheck pattern to mirror in prod compose +- [pressly/goose embed docs](https://pressly.github.io/goose/blog/2021/embed-sql-migrations/) — programmatic Up with embed.FS +- [pkg.go.dev/embed](https://pkg.go.dev/embed) — embed.FS stdlib documentation + +### Secondary (MEDIUM confidence) +- [GoogleContainerTools/distroless GitHub](https://github.com/GoogleContainerTools/distroless) — distroless/static-debian12 verified, CGO_ENABLED=0 requirement +- [caddyserver.com/docs/caddyfile/patterns](https://caddyserver.com/docs/caddyfile/patterns) — reverse proxy Caddyfile pattern + +### Tertiary (LOW confidence) +- WebSearch results on Caddy + Docker Compose — consistent with official docs; using official Caddyfile reference as primary + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH — all libraries already in go.mod; only embed.FS and fs.Sub are new patterns +- Architecture: HIGH — compose + Caddy + distroless is well-established; codebase already has all the pieces +- Pitfalls: HIGH — derived from codebase inspection (embed path constraints, goose/pgx bridge, templ codegen, existing healthz tests) + +**Research date:** 2026-05-15 +**Valid until:** 2026-08-15 (stable ecosystem — go:embed, goose v3, Caddy 2 are stable)