docs(07): research phase domain

This commit is contained in:
Arthur Belleville 2026-05-15 17:46:37 +02:00
parent e14fd36fdc
commit 588c03dae2
No known key found for this signature in database

View file

@ -0,0 +1,660 @@
# Phase 7: Deploy v1 - Research
**Researched:** 2026-05-15
**Domain:** Go Docker multi-stage build, docker compose, Caddy reverse proxy, goose programmatic migrations, go:embed static assets, health checks
**Confidence:** HIGH
## Summary
Phase 7 packages the existing Go backend into a production-ready Docker image, deploys it to a Hetzner VM via plain Docker Compose, and wires Caddy as a TLS-terminating reverse proxy. The phase has five distinct work areas: (1) convert static asset serving from on-disk paths to `go:embed`, (2) add programmatic `goose.Up()` migration call in `cmd/web` startup, (3) build a multi-stage Dockerfile producing `/app/web` and `/app/worker` in a single image, (4) split the existing `/healthz` handler into a liveness route (no DB ping) and a new `/readyz` route (DB ping), and (5) write `docker-compose.prod.yaml`, `deploy/Caddyfile`, and the `backend/README.md` runbook.
The codebase is well-prepared: `cmd/worker/main.go` already demonstrates the exact programmatic migration pattern (rivermigrate); `cmd/web/main.go` already reads all config from env vars; `signal.NotifyContext` graceful shutdown is in both binaries. The static files are currently served from `./static` on disk via `http.Dir`; they must be switched to `http.FS(embed.FS)` so the final container has zero runtime file dependencies. The existing `HealthzHandler` does a DB ping — that behavior must move to `/readyz`; `/healthz` becomes a pure liveness check.
**Primary recommendation:** Build in this wave order — Wave 0 (go:embed + `/readyz` split), Wave 1 (goose.Up startup migration), Wave 2 (Dockerfile), Wave 3 (compose + Caddy + env docs), Wave 4 (README runbook). Each wave is independently testable.
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- **D-01:** Production host is a Hetzner VM running Docker Compose. No PaaS, no Kubernetes.
- **D-02:** The full stack runs via plain `docker compose` — no Dokploy or Swarm mode in v1.
- **D-03:** Postgres runs on the VM inside the compose stack, volume-backed. No managed Postgres service for v1.
- **D-04:** Caddy is a service in `docker-compose.prod.yaml`. It proxies to `web:8080` and handles TLS via Let's Encrypt. Config via a bind-mounted Caddyfile.
- **D-05:** Production secrets (`SESSION_SECRET`, `DATABASE_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINT_URL`, `AWS_BUCKET`, `PORT`, `ENV`) are stored in a `.env` file on the Hetzner host (gitignored). `docker compose --env-file .env.prod up` reads it. No SOPS, no Docker secrets API.
- **D-06:** S3-compatible storage in production is Cloudflare R2. R2 credentials live in the host `.env` file. MinIO remains in `compose.yaml` for local dev only.
- **D-07:** A single multi-stage Dockerfile produces one image containing two binaries: `/app/web` (from `cmd/web`) and `/app/worker` (from `cmd/worker`). Both compiled in the builder stage, copied to the final runtime stage.
- **D-08:** `docker-compose.prod.yaml` runs the same image twice: one service with `command: /app/web`, one with `command: /app/worker`. No subcommand dispatcher needed in Go code.
- **D-09:** All static assets (Tailwind-compiled CSS, HTMX JS, Sortable.js, templ-generated HTML) are embedded via `//go:embed` at build time. No volume mounts for assets.
- **D-10:** Migrations run programmatically inside the `web` binary at startup: `web` calls `goose.Up()` via the goose library before binding the HTTP server.
- **D-11:** Rollback strategy: redeploy the previous image tag. Normal rollback = update compose image tag + `docker compose up -d`. `goose down` is documented as a break-glass step only.
- **D-12:** `/healthz` — liveness: returns 200 OK immediately if the server is up (no DB ping). Used by Caddy / uptime monitor.
- **D-13:** `/readyz` — readiness: returns 200 OK only if the DB pool is reachable (one `db.Ping()` call). Returns 503 during startup until migrations complete and the pool is healthy. Worker does not expose HTTP.
### Claude's Discretion
- Exact Dockerfile base image for the builder stage (e.g., `golang:1.26-alpine` vs `golang:1.26`).
- Final runtime base: `distroless/static` vs `alpine`.
- Caddyfile content (reverse proxy config, TLS directive, HTTPS redirect).
- Whether `docker-compose.prod.yaml` includes a `healthcheck:` directive for the Postgres service.
- Exact docker compose version / syntax used (`compose.yaml` already uses v2 syntax).
- Whether the `web` service in prod compose `depends_on` the `postgres` service with a health condition.
### Deferred Ideas (OUT OF SCOPE)
- Dokploy layer
- CI/CD pipeline
- pg_dump backup cron
- MinIO for prod
</user_constraints>
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| DEPLOY-01 | Both binaries build into a single multi-stage Docker image | Multi-stage Dockerfile pattern: builder copies both `cmd/web` and `cmd/worker`, runtime stage copies both binaries |
| DEPLOY-02 | Image runs on a single VPS with env-injected config (no Supabase, no GCP) | `docker-compose.prod.yaml` with `env_file` directive; all env vars already read from `os.Getenv` in both binaries |
| DEPLOY-03 | Migrations run on deploy without manual intervention | `goose.Up()` with `embed.FS` inside `cmd/web` startup, mirrors rivermigrate pattern in `cmd/worker` |
| DEPLOY-04 | Health checks (`/healthz`, `/readyz`) and structured logs | `/healthz` already exists (needs liveness-only refactor); `/readyz` is new; JSON slog already live on `ENV=production` |
| DEPLOY-05 | Documented runbook in `backend/README.md` covering local dev, deploy, rollback | Extends existing `backend/README.md`; adds deploy, rollback, incident sections |
</phase_requirements>
## Architectural Responsibility Map
| Capability | Primary Tier | Secondary Tier | Rationale |
|------------|-------------|----------------|-----------|
| TLS termination | Caddy (compose service) | — | Caddy owns ACME/Let's Encrypt; web binary speaks plain HTTP on internal network |
| Static asset serving | Web binary (go:embed) | — | D-09: embedded at build time; no runtime file mounts |
| Database migrations | Web binary startup | — | D-10: goose.Up() before HTTP server binds |
| Liveness check (/healthz) | Web binary | — | D-12: no DB dependency; fast 200 as long as server process is up |
| Readiness check (/readyz) | Web binary | — | D-13: DB ping; Caddy or uptime monitor checks this before routing traffic |
| Background jobs | Worker binary | — | Separate container in compose, same image, command: /app/worker |
| Secret injection | Docker Compose env_file | Host .env.prod file | D-05: no secrets API needed |
| Postgres persistence | Postgres compose service | Volume | D-03: volume-backed, not managed |
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| `github.com/pressly/goose/v3` | v3.27.1 | Programmatic migrations with embed.FS | Already in go.mod; `SetBaseFS` + `Up` is the idiomatic pattern for embedded migrations [VERIFIED: go.mod] |
| `embed` (stdlib) | Go 1.16+ | Embed static/ and migrations/ into binary | No dependency, available since Go 1.16; project uses Go 1.26 [VERIFIED: go.mod] |
| `io/fs` (stdlib) | Go 1.16+ | `fs.Sub` to strip directory prefix for http.FileServer | Required companion to embed.FS for serving static files [VERIFIED: Go stdlib] |
| `gcr.io/distroless/static-debian12` | nonroot tag | Final container runtime base | Smallest option (~2MiB); no shell; correct for CGO_ENABLED=0 Go binaries [VERIFIED: GoogleContainerTools/distroless GitHub] |
| `golang:1.26-alpine` | current | Builder stage base | Matches go.mod version; alpine keeps layer small and avoids glibc issues for distroless final [ASSUMED] |
| `caddy:2-alpine` | current | Reverse proxy + automatic TLS | Official image; Let's Encrypt auto-cert; simple Caddyfile syntax [CITED: caddyserver.com/docs] |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| `github.com/jackc/pgx/v5/stdlib` | v5.9.2 | Bridge pgxpool → database/sql for goose | goose.Up requires *sql.DB; pgx/v5/stdlib wraps pgxpool conn string into sql.Open |
**Installation — no new top-level dependencies needed.** `pgx/v5/stdlib` is already a transitive dependency via pgx/v5 (in go.mod). Only needs to be added as a direct import in the migration helper.
**Version verification:**
```
github.com/pressly/goose/v3 v3.27.1 [VERIFIED: go.mod]
github.com/jackc/pgx/v5 v5.9.2 [VERIFIED: go.mod]
```
## Architecture Patterns
### System Architecture Diagram
```
Internet
▼ :80/:443
[Caddy] ─── ACME/Let's Encrypt cert management
│ :8080 (internal Docker network)
[web container] ─── cmd/web binary (go:embed: static/ + migrations/)
│ startup: goose.Up() ──► [postgres:5432]
│ /healthz → 200 always (liveness)
│ /readyz → 200 if DB ping ok (readiness)
│ /static/* → http.FS(embed.FS)
│ /tablos/* → HTMX handlers → [postgres:5432]
[worker container] ─── cmd/worker binary (same image, command: /app/worker)
│ startup: rivermigrate.Up() ──► [postgres:5432]
│ river periodic jobs ──► [postgres:5432]
│ orphan cleanup ──────► [Cloudflare R2]
[postgres container] ─── postgres:16-alpine, volume: postgres_data
[caddy_data volume] ─── TLS certificate persistence
Host .env.prod ──► docker compose --env-file .env.prod
```
### Recommended Project Structure
```
backend/
├── cmd/
│ ├── web/main.go # add goose.Up() before http.ListenAndServe
│ └── worker/main.go # unchanged (rivermigrate already wired)
├── deploy/
│ └── Caddyfile # bind-mounted into caddy container at runtime
├── migrations/ # existing SQL files
├── static/ # generated at build time; embedded via go:embed
├── Dockerfile # new: multi-stage, produces /app/web + /app/worker
├── docker-compose.prod.yaml # new: postgres + web + worker + caddy
├── .env.example # update: add R2 vars, DOMAIN, remove TEST_DATABASE_URL note
└── README.md # update: add Deploy, Rollback, Incident sections
```
### Pattern 1: go:embed for Static Assets
**What:** Replace `http.Dir(staticDir)` with an embedded `embed.FS`. The `staticDir string` parameter in `NewRouter` becomes an `fs.FS` parameter.
**When to use:** Production — binary has zero runtime file dependencies.
**Two options for NewRouter signature change:**
Option A (recommended — backward compatible for tests): Accept `fs.FS` instead of `string`:
```go
// Source: Go stdlib io/fs + embed docs
//go:embed static
var StaticFiles embed.FS
// In NewRouter, change staticDir string → staticFS fs.FS
staticSub, _ := fs.Sub(staticFS, "static")
fileHandler := http.FileServer(http.FS(staticSub))
r.Get("/static/*", http.StripPrefix("/static/", fileHandler).ServeHTTP)
```
In `cmd/web/main.go`:
```go
//go:embed static
var staticFiles embed.FS
// ...
router := web.NewRouter(pool, staticFiles, ...)
```
In tests: pass `os.DirFS("./static")` to avoid embedding during unit test runs.
**Constraint:** The `//go:embed static` directive must live in a file in the same package as the embedded directory, or a parent package. Because `static/` is at the module root (not inside a Go package), the embed directive lives in `cmd/web/main.go` which can reference `../../static` — but embed paths must be relative to the file. The cleanest approach is an `assets` package at the module root:
```go
// backend/assets/assets.go
package assets
import "embed"
//go:embed static
var Static embed.FS
```
Then `cmd/web/main.go` imports `backend/assets` and passes `assets.Static` to `NewRouter`.
> NOTE: The `//go:embed` directive path must be relative to the .go file containing the directive. `static/` must be reachable from the Go file. Verify during implementation that the embed path resolves correctly relative to `cmd/web/main.go` or an `assets` package.
### Pattern 2: goose.Up() at Web Startup
**What:** Before binding the HTTP server, call goose programmatic migrations using the embedded SQL files and a `*sql.DB` derived from the existing pgxpool connection string.
**Source:** [pressly/goose embed docs](https://pressly.github.io/goose/blog/2021/embed-sql-migrations/) [CITED]
```go
// backend/internal/db/migrate.go (new file)
package db
import (
"context"
"database/sql"
"embed"
"github.com/jackc/pgx/v5/pgxpool"
_ "github.com/jackc/pgx/v5/stdlib" // register "pgx/v5" driver
"github.com/pressly/goose/v3"
)
//go:embed ../../migrations/*.sql
var migrationFS embed.FS
// RunMigrations opens a sql.DB from the pool's DSN and runs all pending
// goose migrations embedded in the binary.
func RunMigrations(ctx context.Context, pool *pgxpool.Pool) error {
dsn := pool.Config().ConnConfig.ConnString()
db, err := sql.Open("pgx/v5", dsn)
if err != nil {
return err
}
defer db.Close()
goose.SetBaseFS(migrationFS)
if err := goose.SetDialect("postgres"); err != nil {
return err
}
return goose.Up(db, "migrations")
}
```
Called in `cmd/web/main.go` after pool creation and before router/server setup:
```go
if err := db.RunMigrations(ctx, pool); err != nil {
slog.Error("migrations failed", "err", err)
os.Exit(1)
}
```
> IMPORTANT: The `//go:embed` path `../../migrations/*.sql` only works if `migrate.go` is in `backend/internal/db/`. Verify the relative path at implementation time. Alternative: use an `assets` package or place the embed directive in `cmd/web/main.go` directly.
**Idempotency:** `goose.Up()` is idempotent — already-applied versions are skipped via the `goose_db_version` table. Safe to call on every startup.
### Pattern 3: Multi-Stage Dockerfile
**What:** Single Dockerfile, builder compiles both binaries with CGO_ENABLED=0, distroless runtime copies both.
```dockerfile
# Source: GoogleContainerTools/distroless README + Go multi-stage build docs [CITED]
# ── Stage 1: Generate assets ──────────────────────────────────────────────────
FROM node:20-alpine AS assets
WORKDIR /app
# Download Tailwind standalone CLI (pinned version from justfile)
RUN apk add --no-cache curl && \
curl -sSL -o /usr/local/bin/tailwindcss \
"https://github.com/tailwindlabs/tailwindcss/releases/download/v4.0.0/tailwindcss-linux-x64" && \
chmod +x /usr/local/bin/tailwindcss && \
curl -sSL -o static/htmx.min.js "https://unpkg.com/htmx.org@2/dist/htmx.min.js" && \
curl -sSL -o static/sortable.min.js "https://cdn.jsdelivr.net/npm/sortablejs@1.15.7/Sortable.min.js"
COPY tailwind.input.css .
COPY templates/ templates/
RUN tailwindcss -i tailwind.input.css -o static/tailwind.css --minify
# ── Stage 2: Build Go binaries ────────────────────────────────────────────────
FROM golang:1.26-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
COPY --from=assets /app/static ./static
# templ generate must run before go build (templates compile to .go files)
RUN go install github.com/a-h/templ/cmd/templ@v0.3.1020 && templ generate
RUN CGO_ENABLED=0 GOOS=linux \
go build -ldflags="-s -w" -trimpath -o /app/web ./cmd/web
RUN CGO_ENABLED=0 GOOS=linux \
go build -ldflags="-s -w" -trimpath -o /app/worker ./cmd/worker
# ── Stage 3: Runtime ──────────────────────────────────────────────────────────
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/web /app/web
COPY --from=builder /app/worker /app/worker
EXPOSE 8080
# No CMD or ENTRYPOINT — compose overrides with `command: /app/web` or `/app/worker`
```
**Planner note on Dockerfile stages:** The assets stage (Tailwind build + JS downloads) could be merged into the Go builder stage to reduce complexity, at the cost of a heavier builder image. Two dedicated stages is cleaner but either approach is valid.
### Pattern 4: /healthz and /readyz Split
**What:** Current `HealthzHandler` pings the DB and is registered at `/healthz`. D-12 requires `/healthz` to be a pure liveness check (no DB ping); D-13 requires `/readyz` to do the DB ping.
**Existing code:** `HealthzHandler(pinger Pinger)` in `handlers.go` — it already uses the `Pinger` interface. Simply:
1. Rename `HealthzHandler``ReadyzHandler` (or keep the name and change behavior — see below)
2. Add a new `HealthzHandler` that returns 200 unconditionally
3. Register `/healthz` → new liveness handler, `/readyz` → DB-pinging handler
```go
// Liveness — no dependencies
func HealthzHandler() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte(`{"status":"ok"}`))
}
}
// Readiness — DB ping
func ReadyzHandler(pinger Pinger) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
w.Header().Set("Content-Type", "application/json")
if err := pinger.Ping(ctx); err != nil {
w.WriteHeader(http.StatusServiceUnavailable)
_, _ = w.Write([]byte(`{"status":"degraded","db":"down"}`))
return
}
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte(`{"status":"ok","db":"ok"}`))
}
}
```
**Existing tests:** `TestHealthz_OK` and `TestHealthz_Down` in `handlers_test.go` test the current DB-pinging behavior. These must be updated to test the split: one test for the new liveness `HealthzHandler`, two tests for `ReadyzHandler`.
### Pattern 5: docker-compose.prod.yaml
```yaml
# Source: D-02 through D-09; v2 compose syntax matching existing compose.yaml
services:
postgres:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_DB: ${POSTGRES_DB:-xtablo}
POSTGRES_USER: ${POSTGRES_USER:-xtablo}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-xtablo}"]
interval: 10s
timeout: 5s
retries: 10
# No ports: exposed — only reachable within compose network
web:
image: ${IMAGE:-ghcr.io/yourusername/xtablo}:${TAG:-latest}
command: /app/web
restart: unless-stopped
env_file: .env.prod
depends_on:
postgres:
condition: service_healthy
expose:
- "8080"
# No ports: — Caddy handles external traffic
worker:
image: ${IMAGE:-ghcr.io/yourusername/xtablo}:${TAG:-latest}
command: /app/worker
restart: unless-stopped
env_file: .env.prod
depends_on:
postgres:
condition: service_healthy
caddy:
image: caddy:2-alpine
restart: unless-stopped
ports:
- "80:80"
- "443:443"
- "443:443/udp" # HTTP/3
volumes:
- ./deploy/Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
volumes:
postgres_data:
caddy_data:
caddy_config:
```
### Pattern 6: Caddyfile
```caddyfile
# Source: caddyserver.com/docs/caddyfile [CITED]
# Place at: backend/deploy/Caddyfile
# Caddy automatically provisions and renews TLS via Let's Encrypt.
# Domain is read from env via {$DOMAIN} interpolation.
{$DOMAIN} {
reverse_proxy web:8080
}
```
For HTTPS redirect (HTTP → HTTPS), Caddy handles this automatically when a domain name is specified — no explicit redirect directive is needed. [CITED: caddyserver.com/docs/automatic-https]
### Anti-Patterns to Avoid
- **Volume-mounting static/ at runtime:** D-09 prohibits this. Assets must be embedded. A volume mount for assets would break the self-contained binary requirement.
- **Separate goose CLI binary in the image:** D-10 prohibits this. Migrations run inside the web binary via `goose.Up()`.
- **CMD /app/web in Dockerfile:** D-08 says compose overrides the command; having a default CMD is fine as documentation but the planner should use `command:` in compose to make the intent explicit. Prefer no CMD in the Dockerfile so the compose `command:` is the single source of truth.
- **Exposing postgres port to host:** Postgres should only be reachable inside the compose network. Bind a host port only for break-glass debug access, not permanently.
- **Single large env_file commit:** The `.env.prod` on the host is gitignored. The repo only contains `.env.example` updated with new R2 vars.
- **CGO_ENABLED=1 with distroless/static:** distroless/static has no C libraries. CGO must be disabled.
- **Missing `caddy_data` volume:** Without a persistent volume for Caddy's `/data`, TLS certificates are re-issued on every container restart, which will hit Let's Encrypt rate limits.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| TLS certificate lifecycle | Custom ACME client | Caddy automatic HTTPS | ACME, renewal, stapling, redirects all handled transparently |
| Database migration versioning | Custom version table | goose.Up() | Race conditions, rollback tracking, idempotency already solved |
| Static file embedding | Custom asset bundler | `//go:embed` + `http.FS` | Stdlib; zero dependencies; correct path resolution |
| Let's Encrypt rate limit management | Manual cert issuance | Caddy + persistent `caddy_data` volume | Caddy manages staging/prod issuance and renewal automatically |
**Key insight:** Every custom solution in this domain (cert management, migration versioning, asset bundling) replicates work the standard tools already do correctly with far fewer failure modes.
## Common Pitfalls
### Pitfall 1: embed.FS path relative to Go file, not working directory
**What goes wrong:** `//go:embed ../../static` fails — embed paths cannot traverse above the module root or use `..`.
**Why it happens:** `go:embed` paths are relative to the Go source file and cannot reference paths outside the module root.
**How to avoid:** Either place the embed directive in a file that is a sibling or ancestor of the `static/` directory (e.g., an `assets` package at `backend/assets/`), or place `static/` inside the same directory tree as `cmd/web/`. Since `static/` is at `backend/static/` and cmd/web is at `backend/cmd/web/`, an `assets` package at `backend/assets/` with `//go:embed ../static` works because it's still within the module. Verify the path during implementation.
**Warning signs:** Build error "pattern ../static: invalid pattern syntax" or "pattern must not begin with `..`".
### Pitfall 2: goose needs *sql.DB, not *pgxpool.Pool
**What goes wrong:** Attempting to pass pgxpool.Pool directly to goose.Up() fails to compile — goose's API requires `*database/sql.DB`.
**Why it happens:** goose predates the pgx native pool API and abstracts over `database/sql`.
**How to avoid:** Extract the connection string via `pool.Config().ConnConfig.ConnString()` and open a `*sql.DB` with `sql.Open("pgx/v5", connStr)` after importing `_ "github.com/jackc/pgx/v5/stdlib"`. Close the sql.DB after migrations complete — the pool remains open for application use.
**Warning signs:** Compile error "cannot use pool (type *pgxpool.Pool) as type *sql.DB".
### Pitfall 3: goose_db_version table collision with test schema
**What goes wrong:** Integration tests that create isolated schemas via `goose.SetTableName` in dev continue to work, but in production the goose_db_version table name must remain the default `goose_db_version` in the `public` schema.
**Why it happens:** Tests use `goose.SetTableName("schema.goose_db_version")` to namespace the version table. Production must use the plain default.
**How to avoid:** The migration helper (Pattern 2 above) calls `goose.SetDialect` and `goose.Up` without `SetTableName`. The global goose state is shared — if tests run in the same process as migrations, order matters. Keep the migration helper stateless and only set the base FS and dialect.
**Warning signs:** Production migrations running twice or out of sync with what tests created.
### Pitfall 4: Missing caddy_data volume = Let's Encrypt rate limit
**What goes wrong:** Restarting Caddy without a persistent `/data` volume triggers a new ACME certificate request on every restart. Let's Encrypt allows 50 certificates per registered domain per week; repeated restarts during setup exhaust the quota.
**Why it happens:** Caddy stores issued certificates in `/data`. Without a named volume, the directory is ephemeral.
**How to avoid:** Always mount a named volume at `/data` and `/config` for the caddy service (see Pattern 5). Test with Let's Encrypt staging (`tls { ca https://acme-staging-v02.api.letsencrypt.org/directory }`) before switching to production.
**Warning signs:** Caddy logs "too many certificates already issued for this domain" or certificate errors after restart.
### Pitfall 5: web service starts before postgres is ready
**What goes wrong:** The web binary attempts `goose.Up()` immediately at startup; if postgres is still initializing, the DB connection fails and the process exits.
**Why it happens:** Docker Compose `depends_on: service_started` (the default) only waits for the container to start, not for postgres to accept connections.
**How to avoid:** Use `depends_on: postgres: condition: service_healthy` in `docker-compose.prod.yaml`. This requires the postgres service to have a `healthcheck:` directive (see Pattern 5 above). The existing `compose.yaml` already uses this pattern — mirror it exactly.
**Warning signs:** web container exits at startup with "db connect failed" or "migrations failed"; postgres logs show "database system is starting up".
### Pitfall 6: templ-generated .go files not committed = Docker build fails
**What goes wrong:** `go build ./cmd/web` inside Docker fails because `*_templ.go` files are in `.gitignore` and `COPY . .` does not include them.
**Why it happens:** `templ generate` is a dev-time step; the generated files are gitignored per project convention (STATE.md).
**How to avoid:** Run `templ generate` inside the Dockerfile builder stage before `go build`. Install the templ CLI in the builder image at the pinned version from justfile (`v0.3.1020`).
**Warning signs:** Build error "undefined: templates.TablosPage" or similar undefined references to templ-generated component functions.
### Pitfall 7: distroless has no shell — debugging requires :debug tag
**What goes wrong:** `docker exec -it web sh` fails because distroless/static has no shell.
**Why it happens:** distroless deliberately removes all OS tooling to minimize attack surface.
**How to avoid:** Use the `:debug` tag variant during initial setup: `gcr.io/distroless/static-debian12:debug`. Switch to `:nonroot` for production. Document in runbook how to use an ephemeral debug container (`docker run --rm -it --network container:<id> busybox sh`) when debugging production.
**Warning signs:** `docker exec` returns "OCI runtime exec failed: exec: 'sh': executable file not found".
### Pitfall 8: /healthz currently pings DB — tests must be updated
**What goes wrong:** After splitting `/healthz` (liveness) from `/readyz` (readiness), the existing `TestHealthz_OK` and `TestHealthz_Down` tests fail because they expect the DB-pinging behavior on `/healthz`.
**Why it happens:** The current `HealthzHandler` does both jobs. D-12/D-13 require them to be separate routes with separate handlers.
**How to avoid:** Update `handlers_test.go` in the same plan that refactors the handlers. Add `TestReadyz_OK` and `TestReadyz_Down` mirroring the current test structure; update `TestHealthz_OK` to verify 200 with no pinger dependency.
**Warning signs:** Test failures after the handler refactor: "status = 503; want 200" for the new liveness check.
## Code Examples
### go:embed with fs.Sub for static files
```go
// Source: Go stdlib docs — embed + io/fs [CITED: pkg.go.dev/embed]
// In backend/assets/assets.go:
package assets
import "embed"
//go:embed static
var Static embed.FS
// In internal/web/router.go — NewRouter now accepts fs.FS:
import "io/fs"
func NewRouter(pinger Pinger, staticFS fs.FS, ...) http.Handler {
// ...
sub, err := fs.Sub(staticFS, "static")
if err != nil {
panic("static embed sub failed: " + err.Error())
}
r.Get("/static/*", http.StripPrefix("/static/",
http.FileServer(http.FS(sub))).ServeHTTP)
}
```
### goose programmatic migration with pgxpool bridge
```go
// Source: pressly/goose embed docs + pgx/v5/stdlib [CITED: pressly.github.io/goose]
import (
"database/sql"
_ "github.com/jackc/pgx/v5/stdlib"
"github.com/pressly/goose/v3"
)
func RunMigrations(pool *pgxpool.Pool, migrationsFS embed.FS) error {
dsn := pool.Config().ConnConfig.ConnString()
db, err := sql.Open("pgx/v5", dsn)
if err != nil {
return err
}
defer db.Close()
goose.SetBaseFS(migrationsFS)
if err := goose.SetDialect("postgres"); err != nil {
return err
}
return goose.Up(db, "migrations")
}
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Separate goose CLI binary | `goose.SetBaseFS` + programmatic `goose.Up` with `embed.FS` | goose v3 (2021) | Binary is self-contained; no CLI tool needed in final image |
| `http.Dir(staticDir)` | `http.FS(embed.FS)` via `fs.Sub` | Go 1.16 (2021) | Binary has no runtime file dependencies |
| Separate migration service in compose | Migration in app startup | — | Fewer moving parts; migrations atomic with app start |
**Deprecated/outdated:**
- `gcr.io/distroless/static` (without Debian variant suffix): The versioned tag `gcr.io/distroless/static-debian12` is preferred. The unversioned `static` tag still works but `static-debian12:nonroot` is more explicit about the security posture.
## Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|-------|---------|---------------|
| A1 | `golang:1.26-alpine` is the correct builder base (go.mod says `go 1.26.1`) | Standard Stack | If Alpine's musl causes subtle issues, switch to `golang:1.26` (debian); distroless is still compatible with CGO_ENABLED=0 |
| A2 | `//go:embed` can reference `static/` from an `assets` package at `backend/assets/` | Pattern 1 | If embed path resolution differs, the alternative is to place the static/ directory inside cmd/web/ or use a different package layout |
| A3 | Tailwind standalone binary download in Docker builder is reliable during CI/CD | Pattern 3 (Dockerfile) | If the external download is flaky, bake the Tailwind binary into the builder image or add it to the repo as a committed artifact |
| A4 | `pool.Config().ConnConfig.ConnString()` reconstructs a DSN compatible with `sql.Open("pgx/v5", ...)` | Pattern 2 (goose bridge) | If ConnString() omits sslmode or other params, pass the original DATABASE_URL env var directly to sql.Open instead |
## Open Questions
1. **embed.FS path for migrations/ in internal/db/migrate.go**
- What we know: `//go:embed` paths are relative to the Go source file and cannot use `..` to go above the module root.
- What's unclear: Whether `backend/internal/db/migrate.go` can embed `../../migrations/*.sql` — the `migrations/` directory is at `backend/migrations/`, and `internal/db/` is 2 levels deep.
- Recommendation: Test during Wave 1 implementation. If `../../migrations` is rejected, move the embed directive to an `assets` package at `backend/assets/` or to `cmd/web/main.go` itself and pass the `embed.FS` into `RunMigrations`.
2. **templ generate in Docker build**
- What we know: `*_templ.go` files are gitignored (STATE.md); the build fails without them.
- What's unclear: Whether `RUN go install github.com/a-h/templ/cmd/templ@v0.3.1020 && templ generate` in the builder stage is fast enough or needs caching.
- Recommendation: Use `--mount=type=cache,target=/root/.cache/go-build` on the `go build` steps; the templ generate step is fast (it's pure Go → Go codegen, no compilation).
3. **go:embed and files starting with `.` or `_`**
- What we know: By default, `//go:embed` excludes files/dirs starting with `.` or `_`.
- What's unclear: Whether `static/` contains any such files (e.g., `.gitkeep`).
- Recommendation: Check `ls -la backend/static/` during implementation. If such files exist, use `//go:embed all:static`.
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| Docker / docker compose | Image build, compose stack | [ASSUMED: yes on Hetzner VM] | — | podman compose (used in dev) |
| Go 1.26 | Dockerfile builder stage | ✓ (pulled from registry) | golang:1.26-alpine | — |
| Caddy 2 | TLS proxy | ✓ (pulled from registry) | caddy:2-alpine | — |
| postgres:16-alpine | DB service | ✓ (pulled from registry) | 16-alpine | — |
| gcr.io/distroless/static-debian12 | Final image | ✓ (pulled from registry) | nonroot | alpine (has shell, larger) |
**Missing dependencies with no fallback:** None identified.
**Missing dependencies with fallback:** If the Hetzner VM has only `podman`, use `podman compose` — the `docker-compose.prod.yaml` syntax is identical and podman compose supports it.
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | Go test (stdlib) + httptest |
| Config file | none |
| Quick run command | `cd backend && go test ./internal/web/... -run TestHealthz -v` |
| Full suite command | `cd backend && go test ./...` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| DEPLOY-01 | Docker image builds with both binaries | smoke | `docker build -f backend/Dockerfile backend/ --target builder` | ❌ Wave 2 |
| DEPLOY-02 | web binary reads all config from env | unit | `go test ./cmd/web/... -run TestEnvConfig` | ❌ Wave 3 (optional; main.go logic is simple) |
| DEPLOY-03 | goose.Up() runs on startup without error | unit | `go test ./internal/db/... -run TestRunMigrations` | ❌ Wave 1 |
| DEPLOY-04 | /healthz returns 200 (no pinger) | unit | `go test ./internal/web/... -run TestHealthz` | ✅ (needs refactor) |
| DEPLOY-04 | /readyz returns 200 when DB ok | unit | `go test ./internal/web/... -run TestReadyz_OK` | ❌ Wave 0 |
| DEPLOY-04 | /readyz returns 503 when DB down | unit | `go test ./internal/web/... -run TestReadyz_Down` | ❌ Wave 0 |
| DEPLOY-05 | README runbook exists and covers all sections | manual | read backend/README.md | ❌ Wave 4 |
### Sampling Rate
- **Per task commit:** `cd backend && go test ./internal/web/... -count=1`
- **Per wave merge:** `cd backend && go test ./... -count=1`
- **Phase gate:** Full suite green before `/gsd-verify-work`
### Wave 0 Gaps
- [ ] `backend/internal/web/handlers.go` — refactor HealthzHandler (liveness) + add ReadyzHandler
- [ ] `backend/internal/web/handlers_test.go` — update TestHealthz_* + add TestReadyz_*
- [ ] `backend/internal/web/router.go` — add `/readyz` route; update `/healthz` to use new liveness handler
## Security Domain
### Applicable ASVS Categories
| ASVS Category | Applies | Standard Control |
|---------------|---------|-----------------|
| V2 Authentication | no | (sessions already implemented in Phase 2) |
| V3 Session Management | no | (already implemented in Phase 2) |
| V4 Access Control | no | (health endpoints are public by design — no auth on /healthz or /readyz) |
| V5 Input Validation | no | (no new user input in this phase) |
| V6 Cryptography | no | (SESSION_SECRET already handled; TLS delegated to Caddy) |
| V14 Configuration | yes | Secrets in host .env file (D-05); file must be chmod 600; not committed to git |
### Known Threat Patterns for deploy phase
| Pattern | STRIDE | Standard Mitigation |
|---------|--------|---------------------|
| Secrets in Docker image layers | Information Disclosure | Never COPY .env into image; use `env_file:` at runtime in compose |
| .env.prod committed to git | Information Disclosure | .env.prod in .gitignore; .env.example has no real values |
| Health endpoint information leakage | Information Disclosure | /healthz and /readyz return minimal JSON; no version strings, no stack traces |
| Postgres exposed to internet | Elevation of Privilege | No `ports:` directive on postgres service; only accessible within compose network |
| Caddy data volume not backed up | Denial of Service | Document in runbook that caddy_data volume loss requires waiting for cert re-issuance (or restoring from backup) |
## Sources
### Primary (HIGH confidence)
- `backend/go.mod` — verified goose v3.27.1, pgx/v5 v5.9.2, Go 1.26.1
- `backend/cmd/web/main.go` — verified env var reading, pgxpool, static file serving pattern
- `backend/cmd/worker/main.go` — verified rivermigrate startup pattern (model for goose.Up)
- `backend/internal/web/router.go` + `handlers.go` — verified current healthz handler
- `backend/compose.yaml` — verified postgres healthcheck pattern to mirror in prod compose
- [pressly/goose embed docs](https://pressly.github.io/goose/blog/2021/embed-sql-migrations/) — programmatic Up with embed.FS
- [pkg.go.dev/embed](https://pkg.go.dev/embed) — embed.FS stdlib documentation
### Secondary (MEDIUM confidence)
- [GoogleContainerTools/distroless GitHub](https://github.com/GoogleContainerTools/distroless) — distroless/static-debian12 verified, CGO_ENABLED=0 requirement
- [caddyserver.com/docs/caddyfile/patterns](https://caddyserver.com/docs/caddyfile/patterns) — reverse proxy Caddyfile pattern
### Tertiary (LOW confidence)
- WebSearch results on Caddy + Docker Compose — consistent with official docs; using official Caddyfile reference as primary
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all libraries already in go.mod; only embed.FS and fs.Sub are new patterns
- Architecture: HIGH — compose + Caddy + distroless is well-established; codebase already has all the pieces
- Pitfalls: HIGH — derived from codebase inspection (embed path constraints, goose/pgx bridge, templ codegen, existing healthz tests)
**Research date:** 2026-05-15
**Valid until:** 2026-08-15 (stable ecosystem — go:embed, goose v3, Caddy 2 are stable)