xtablo-source/backend/README.md
2026-05-15 21:41:22 +02:00

577 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Xtablo backend
Go + HTMX + Postgres. Phase 1: Walking Skeleton.
This README is the contract for FOUND-05: a developer with the prerequisites below
should be able to clone the repo, follow the Quickstart, and see the HTMX-driven
page within ~5 minutes.
## Prerequisites
Install these on your dev machine before starting:
- **Go** ≥ 1.22 (this project's `go.mod` declares 1.26)
- **just** — task runner (`brew install just` on macOS, `cargo install just`, or see
<https://github.com/casey/just>)
- **podman** with `podman compose` (preferred per D-11) **or** **docker** with
`docker compose`
- **curl**
- **git**
You do **not** need to install `goose`, `templ`, `sqlc`, `air`, the Tailwind CLI, or
`htmx.min.js``just bootstrap` installs the Go tools into `$GOBIN` and
bootstrap-downloads the Tailwind binary and HTMX script into local, gitignored
paths.
## Quickstart
Clone-to-running-page in ~5 minutes. Run from inside `backend/`.
```
cd backend
cp .env.example .env # adjust DATABASE_URL if Postgres is not on localhost:5432
just bootstrap # installs goose/templ/sqlc/air; bootstrap-downloads tailwindcss + htmx.min.js
just db-up # starts postgres via podman compose (see fallback below)
just migrate up # applies migrations from ./migrations
just dev # terminal 1: brings up db, runs generate, then air on :8080
# in a SECOND terminal:
just styles-watch # rebuilds static/tailwind.css on .templ / .go changes
# open http://localhost:8080
```
The page should render with a "Fetch server time" button. Clicking it swaps an
ISO-8601 timestamp into the page via HTMX. If the page shows "No time fetched
yet." and nothing happens on click, see Troubleshooting.
`bootstrap` is the slowest step (Go tool installs + two HTTP downloads). It only
needs to run once per clone.
## docker compose fallback
`compose.yaml` is portable across podman and docker — the service definition is
identical. If you don't have podman:
- Replace `podman compose` with `docker compose` mentally throughout this README.
- The `just db-up` / `just db-down` recipes call `podman compose` directly. Run
`docker compose up -d postgres` / `docker compose down` instead, and continue
with the rest of the Quickstart unchanged.
(Decision D-11.)
## Project layout
```
backend/
cmd/
web/main.go # HTTP server entry point
worker/main.go # background worker — river periodic jobs (Phase 6)
internal/
db/ # pgxpool wiring + sqlc-generated queries
web/ # chi router, handlers, middleware, design-system
ui/ # custom templ component library (Button, Card, Badge)
session/ # placeholder — Phase 2
tablos/ # placeholder — Phase 3
tasks/ # placeholder — Phase 4
files/ # placeholder — Phase 5
migrations/ # goose .sql migrations
templates/ # .templ files (layout, index, fragments)
static/
htmx.min.js # bootstrap-downloaded by `just bootstrap`; gitignored; no runtime CDN
tailwind.css # generated by the Tailwind standalone CLI
bin/ # gitignored — tailwindcss CLI binary, etc.
.air.toml # air live-reload config
.env.example # committed; copy to .env
compose.yaml # local Postgres
go.mod / go.sum
justfile # task runner recipes — the source of truth for commands
sqlc.yaml
tailwind.input.css
README.md
```
HTMX is served from `/static/htmx.min.js` at runtime — no CDN. The justfile's
bootstrap-time `unpkg.com` URL is the single authoritative version pin (D-10).
## Environment variables
`backend/.env` is gitignored; `backend/.env.example` is committed and lists the
keys consumed by `cmd/web` and `cmd/worker`. Local Just recipes load
`backend/.env` automatically, so `just dev` will pick up provider credentials
such as `GOOGLE_CLIENT_ID`.
| Variable | Description | Default |
| ------------------------ | ------------------------------------------------------------------------ | ---------------------------------------------------------------- |
| `DATABASE_URL` | Postgres DSN used by the web + worker binaries and by `just migrate` | `postgres://xtablo:xtablo@localhost:5432/xtablo?sslmode=disable` |
| `PORT` | HTTP port for `cmd/web` | `8080` |
| `ENV` | `development` enables slog's text handler; `production` switches to JSON | `development` |
| `GOOGLE_CLIENT_ID` | Google OAuth client ID | blank |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | blank |
| `GOOGLE_REDIRECT_URL` | Google callback URL, usually `/auth/google/callback` | `http://localhost:8080/auth/google/callback` |
Google config is optional in local development. When it is missing, the login
and signup pages keep the Google button visible but disabled with a
not-configured label. No real provider secrets should be committed to
`.env.example`. Apple sign-in is disabled in the current product surface.
## Common commands
Every command in this table is a recipe in `backend/justfile`.
| Recipe | What it does | When to use |
| ----------------------------------------------- | ---------------------------------------------------------------------------- | -------------------------------------------------------- |
| `just bootstrap` | Installs Go CLI tools (`goose`, `templ`, `sqlc`, `air`); bootstrap-downloads `bin/tailwindcss` and `static/htmx.min.js` | Once per clone; re-run after deleting `bin/` or `static/htmx.min.js` |
| `just db-up` | Starts the local Postgres container | Before `just migrate up` / `just dev` if not already running |
| `just db-down` | Stops the local Postgres container | When you're done for the day |
| `just migrate up` / `migrate down` / `migrate status` | Applies / reverts / inspects goose migrations against `DATABASE_URL` | After `just db-up`, or any time you change `migrations/` |
| `just generate` | One-shot: `templ generate`, `sqlc generate`, Tailwind compile to `static/tailwind.css` | After editing `.templ`, query SQL, or `tailwind.input.css` |
| `just styles-watch` | Tailwind standalone CLI in `--watch` mode | In a second terminal alongside `just dev` (D-14) |
| `just dev` | Loads `backend/.env`, brings up Postgres, runs `just generate`, then runs `air` for Go live-reload on `:8080` | Main dev loop, terminal 1 |
| `just test` | `templ generate` then `go test ./...` | Before committing |
| `just lint` | `go vet ./...` and `gofmt -l` check | Before committing |
| `just build` | Generates assets, then builds `bin/web` and `bin/worker` | Producing release binaries locally |
| `just clean` | Removes `bin/`, `tmp/`, `static/htmx.min.js`, `static/tailwind.css`, and `*_templ.go` files | Reset to a fresh-clone state without dropping the Postgres volume |
## Running the Worker
`cmd/worker` is the background job processor. It runs river periodic jobs against
the same Postgres as `cmd/web`. Start it with:
```
just worker
```
This requires `just db-up` (handled automatically as a dependency) and MinIO
running (used by the orphan-file cleanup job). If MinIO is not running, the worker
will exit on startup with "file store init failed".
### What to expect
- Structured logs appear immediately at startup.
- A `"worker ready"` log line appears within a few seconds after `rivermigrate`
and S3 init complete.
- A `"worker heartbeat"` log line appears almost immediately (the heartbeat job
is configured with `RunOnStart: true`, so it fires on the first scheduler tick
which happens within seconds of startup).
- Subsequent heartbeat logs appear every ~1 minute.
- The orphan-file cleanup job runs every hour (no `RunOnStart` — first run is
~1 hour after startup).
### Single-worker constraint
**Run only one worker process at a time (v1).** River uses advisory locks for
leader election and concurrent rivermigrate runs are unsafe. Do not run multiple
worker instances against the same database in this version.
### Graceful shutdown
Send SIGINT (Ctrl+C) and observe:
```
{"level":"INFO","msg":"shutting down"}
{"level":"INFO","msg":"shutdown complete"}
```
The worker calls `riverClient.StopAndCancel` with a 10-second timeout, which
cancels in-flight job contexts and waits for goroutines to exit before closing
the pool.
### Observing failed job retries
River logs each failure via the `SlogErrorHandler`. A failed job produces a log
line like:
```
{"level":"ERROR","msg":"job error","job_id":42,"job_kind":"heartbeat","attempt":1,"max_attempts":25,"err":"..."}
```
River retries up to 25 times with exponential backoff (`attempts^4` + jitter).
After 25 failed attempts the job is moved to the discarded state in `river_job`.
## Troubleshooting
The three issues most likely to trip you up on a fresh clone:
- **"Fresh clone fails to build with `undefined: templates.Index`"** — Templ
generates `*_templ.go` files from `.templ` sources, and those generated files
are not committed. Run `just generate` (or `just dev`, which calls it) before
invoking `go build` directly. (Pitfall 1.)
- **"First request to `/healthz` returns 503 right after `just db-up`"** — The
Postgres container needs ~510 seconds to become healthy after `podman compose
up -d` returns. Check `podman compose ps` (or `docker compose ps`) for the
`healthy` status, or just wait and retry. Subsequent calls succeed. The 503
during warm-up is correct behavior, not a bug. (Pitfall 2.)
- **"Tailwind classes used in `.templ` files don't appear in the compiled CSS"** —
Tailwind v4 only scans content paths declared via `@source` in
`tailwind.input.css`. Confirm the file contains `@source
"../templates/**/*.templ";` (and equivalent globs for `internal/web/**/*.go`).
Re-run `just styles-watch` so the watcher picks up the config change.
(Pitfall 3.)
If something else is wrong and you want a clean slate without dropping the
Postgres volume:
```
just clean # removes bin/, tmp/, static/htmx.min.js, static/tailwind.css, *_templ.go
just bootstrap # re-download tools and assets
just dev # back to a working state
```
Run `just db-down` first if you also want to drop the Postgres container.
## What Phase 1 ships (and doesn't)
**Ships:**
- Project scaffold (`go.mod`, justfile, `.air.toml`, `tailwind.input.css`,
`sqlc.yaml`, `compose.yaml`)
- Local Postgres via `compose.yaml` (`pg_isready` healthcheck)
- goose migration pipeline (`migrations/0001_init.sql` is a no-op bootstrap)
- chi router with `/`, `/healthz`, `/demo/time`, `/static/*`
- slog-based structured logging with RequestID middleware
- Graceful HTTP shutdown
- pgxpool wiring exercised by `/healthz`
- templ + HTMX demo (root page + `hx-get` round-trip to a templ fragment)
- Custom templ design-system package at `internal/web/ui/` (Button, Card, Badge)
- Live-reload dev loop (`just dev` + `just styles-watch`)
- `cmd/worker` skeleton (boot, log, idle, shutdown)
**Does not ship — deferred:**
- Authentication, sessions, users → Phase 2
- Tablos CRUD → Phase 3
- Tasks / kanban → Phase 4
- File uploads + R2/S3 → Phase 5
- Real worker jobs → Phase 6
- Production deploy, Dockerfile, `/readyz` → Phase 7
## Deploy
The production host is a Hetzner VM running plain Docker Compose (D-01, D-02). No
Kubernetes or managed orchestration is needed — `docker compose up -d` on the VM is
the entire deployment mechanism. Postgres runs inside the compose stack (D-03); there
is no external managed database.
### Prerequisites
Install on the production VM before first deploy:
- **Docker** ≥ 24 with the **Docker Compose** plugin (`docker compose` — not the
standalone `docker-compose` binary)
- **git** (optional — useful for pulling the repo directly onto the VM)
No other runtimes are needed. Go, Node, and all build tooling run in the Dockerfile's
multi-stage build and are not required on the VM.
### First-time setup
Run all commands on the VM via SSH unless noted otherwise.
1. **SSH to the VM.**
```
ssh user@<vm-ip>
```
2. **Copy the `backend/` directory to the VM** (or clone the repo).
```
# Option A — rsync from local machine:
rsync -av --exclude '.git' backend/ user@<vm-ip>:~/xtablo/
# Option B — clone the repo directly on the VM:
git clone <repo-url> ~/xtablo && cd ~/xtablo/backend
```
3. **Create `.env.prod`** by copying `.env.example` and filling in real values.
```
cp .env.example .env.prod
chmod 600 .env.prod # restrict read access — file contains secrets (T-07-10)
```
Mandatory variables to set in `.env.prod`:
| Variable | Value |
|---|---|
| `DATABASE_URL` | `postgres://xtablo:<POSTGRES_PASSWORD>@postgres:5432/xtablo?sslmode=disable` (internal compose network — hostname is `postgres`) |
| `POSTGRES_PASSWORD` | Strong random password (also used by the postgres service). Example: `openssl rand -hex 24` |
| `POSTGRES_USER` | `xtablo` (or your custom user; must match `DATABASE_URL`) |
| `POSTGRES_DB` | `xtablo` (or your custom db; must match `DATABASE_URL`) |
| `SESSION_SECRET` | 32 random bytes hex-encoded. Generate with: `openssl rand -hex 32` |
| `S3_ENDPOINT` | R2 endpoint URL: `https://<account-id>.r2.cloudflarestorage.com` |
| `S3_BUCKET` | R2 bucket name |
| `S3_ACCESS_KEY` | R2 API token key ID |
| `S3_SECRET_KEY` | R2 API token secret |
| `S3_USE_PATH_STYLE` | `false` for Cloudflare R2 (virtual-hosted-style URLs) |
| `S3_REGION` | `auto` or `us-east-1` (R2 accepts both) |
| `MAX_UPLOAD_SIZE_MB` | `25` (or your preferred limit) |
| `ENV` | `production` (activates JSON slog handler) |
| `PORT` | `8080` |
| `DOMAIN` | `app.yourdomain.com` (Caddy reads this for TLS) |
Do **not** include `TEST_DATABASE_URL` in `.env.prod` — it is a dev/test-only
variable and is not used by the runtime binaries.
4. **Build the Docker image** (from inside `backend/` — either locally or on the VM).
```
# From inside backend/
docker build -f Dockerfile -t ghcr.io/yourusername/xtablo:v0.1.0 .
```
If building locally, push to a registry and pull on the VM:
```
docker push ghcr.io/yourusername/xtablo:v0.1.0
# On the VM:
docker pull ghcr.io/yourusername/xtablo:v0.1.0
```
5. **Set image coordinates as environment variables** (used by `docker-compose.prod.yaml`).
```
export IMAGE=ghcr.io/yourusername/xtablo
export TAG=v0.1.0
```
6. **Start the stack.**
```
docker compose -f docker-compose.prod.yaml --env-file .env.prod up -d
```
The postgres service must pass its healthcheck before web and worker start.
Migrations run automatically at web startup via `goose.Up()` (D-10).
7. **Verify the deployment.**
```
curl https://app.yourdomain.com/healthz # → {"status":"ok"}
curl https://app.yourdomain.com/readyz # → {"status":"ok","db":"ok"}
```
If the domain is not yet configured, use the VM's public IP temporarily with
HTTP (Caddy will not yet have a certificate):
```
curl http://<vm-ip>:80/healthz
```
8. **Let's Encrypt staging (for initial TLS testing).**
To avoid hitting Let's Encrypt production rate limits (5 duplicate certificates
per week per domain) during initial setup, uncomment the staging global block in
`deploy/Caddyfile`:
```
{
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}
```
Restart Caddy after editing (`docker compose -f docker-compose.prod.yaml restart caddy`),
verify TLS works (browsers will show a staging cert warning — that is expected),
then remove the global block and clear the `caddy_data` volume to issue a real
production certificate.
### Deploying a new version
1. **Build and tag the new image** (same as first-time, with a new tag):
```
docker build -f Dockerfile -t ghcr.io/yourusername/xtablo:v0.2.0 .
docker push ghcr.io/yourusername/xtablo:v0.2.0 # if using a registry
```
2. **On the VM** — update `TAG` in `.env.prod`:
```
# Edit .env.prod:
TAG=v0.2.0
```
Or pass it inline without editing the file:
```
export TAG=v0.2.0
```
3. **Pull and recreate only the changed services:**
```
docker compose -f docker-compose.prod.yaml --env-file .env.prod up -d
```
Compose recreates only the web and worker containers (their image tag changed).
Postgres and Caddy are unaffected. Migrations run automatically at web startup
(D-10) — `goose.Up()` is idempotent and skips already-applied migrations.
## Rollback
Rollback means redeploying the previous image tag (D-11). No special tooling is
required — it is the same as deploying a new version, but with an older tag.
1. **On the VM** — set `TAG` to the previous tag in `.env.prod` (or inline):
```
export TAG=v0.1.0
```
2. **Redeploy:**
```
docker compose -f docker-compose.prod.yaml --env-file .env.prod up -d
```
Compose recreates web and worker with the old image. The rollback is complete.
### Schema rollback (break-glass)
`goose.Up()` is idempotent — rolling back to a previous binary does not automatically
run `goose down`. In most cases this is fine: the old binary ignores columns it does
not know about.
If a migration introduced a schema change that is **incompatible** with the old binary
(e.g. a NOT NULL column without a default that the old binary does not supply), run a
manual goose down as a break-glass step:
1. Connect to Postgres inside the container:
```
docker exec -it <postgres-container-name> psql -U xtablo -d xtablo
```
(Find the container name with `docker compose -f docker-compose.prod.yaml ps`.)
2. The production image is distroless — the `goose` CLI is not inside the runtime
container. Install the goose CLI separately on the VM or use the goose Docker
image against the internal network:
```
# Install goose CLI on the VM:
go install github.com/pressly/goose/v3/cmd/goose@latest
goose -dir ./migrations postgres "$DATABASE_URL" down
```
Or use an ephemeral container on the same compose network:
```
docker run --rm --network <compose-network> \
-e GOOSE_DRIVER=postgres \
-e GOOSE_DBSTRING="postgres://xtablo:<password>@postgres:5432/xtablo?sslmode=disable" \
-v $(pwd)/migrations:/migrations \
ghcr.io/kukymbr/goose-docker:latest \
goose -dir /migrations down
```
After reverting the migration, the old binary will start cleanly.
## Incident Runbook
### /readyz returns 503
`/readyz` pings Postgres. A 503 means the web container cannot reach the database.
1. Check container status:
```
docker compose -f docker-compose.prod.yaml ps
```
2. If `postgres` is down or unhealthy, restart it:
```
docker compose -f docker-compose.prod.yaml up -d postgres
```
Then restart web and worker (they will wait for postgres to be healthy):
```
docker compose -f docker-compose.prod.yaml up -d
```
3. Check web logs for the actual error:
```
docker compose -f docker-compose.prod.yaml logs web --tail=50
```
All application logs are JSON when `ENV=production` is set. Look for
`"level":"ERROR"` lines with a `"msg":"db ping failed"` or similar.
### Caddy TLS certificate errors
1. Check caddy logs:
```
docker compose -f docker-compose.prod.yaml logs caddy --tail=50
```
2. If you see "too many certificates already issued for" (Let's Encrypt rate limit,
RESEARCH Pitfall 4):
- Caddy hit the 5 duplicate certificates per week limit for the domain.
- Confirm the `caddy_data` named volume exists and is mounted — if the volume was
accidentally deleted, Caddy cannot reuse the cached certificate and must
re-issue on every restart, quickly exhausting the rate limit.
- Recovery options:
- Wait up to 1 week for the rate limit window to reset.
- Switch to the Let's Encrypt staging endpoint temporarily (see
"Let's Encrypt staging" in the First-time setup section above).
- Restore from a `caddy_data` volume backup if available.
3. If the `caddy_data` volume was lost:
```
# Verify the volume still exists:
docker volume ls | grep caddy_data
# If missing, the volume must be recreated (certificates will be re-issued):
docker compose -f docker-compose.prod.yaml up -d caddy
```
### Checking logs
Follow logs for any service:
```
docker compose -f docker-compose.prod.yaml logs web --tail=100 --follow
docker compose -f docker-compose.prod.yaml logs worker --tail=100 --follow
docker compose -f docker-compose.prod.yaml logs caddy --tail=100 --follow
docker compose -f docker-compose.prod.yaml logs postgres --tail=50
```
All application logs are JSON in production (`ENV=production` activates the slog
JSON handler). Pipe through `jq` for readable output:
```
docker compose -f docker-compose.prod.yaml logs web --follow --no-log-prefix | jq .
```
### Debugging the distroless container
The runtime image (`gcr.io/distroless/static-debian12:nonroot`) has **no shell**
(RESEARCH Pitfall 7). You cannot `docker exec -it <web-container> sh`.
To debug network or filesystem issues, attach an ephemeral busybox container to the
same network:
```
# Find the web container ID:
docker compose -f docker-compose.prod.yaml ps
# Attach busybox to the web container's network namespace:
docker run --rm -it --network container:<web-container-id> busybox sh
```
From the busybox shell you can run `wget`, `nc`, `ping`, etc. to diagnose
connectivity. To inspect the compose network directly (e.g. reach `postgres:5432`):
```
docker run --rm -it \
--network $(docker inspect <web-container-id> --format '{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}') \
busybox sh
```