xtablo-source/backend/README.md

578 lines
23 KiB
Markdown
Raw Permalink Normal View History

# Xtablo backend
Go + HTMX + Postgres. Phase 1: Walking Skeleton.
This README is the contract for FOUND-05: a developer with the prerequisites below
should be able to clone the repo, follow the Quickstart, and see the HTMX-driven
page within ~5 minutes.
## Prerequisites
Install these on your dev machine before starting:
- **Go** ≥ 1.22 (this project's `go.mod` declares 1.26)
- **just** — task runner (`brew install just` on macOS, `cargo install just`, or see
<https://github.com/casey/just>)
- **podman** with `podman compose` (preferred per D-11) **or** **docker** with
`docker compose`
- **curl**
- **git**
You do **not** need to install `goose`, `templ`, `sqlc`, `air`, the Tailwind CLI, or
`htmx.min.js``just bootstrap` installs the Go tools into `$GOBIN` and
bootstrap-downloads the Tailwind binary and HTMX script into local, gitignored
paths.
## Quickstart
Clone-to-running-page in ~5 minutes. Run from inside `backend/`.
```
cd backend
cp .env.example .env # adjust DATABASE_URL if Postgres is not on localhost:5432
just bootstrap # installs goose/templ/sqlc/air; bootstrap-downloads tailwindcss + htmx.min.js
just db-up # starts postgres via podman compose (see fallback below)
just migrate up # applies migrations from ./migrations
just dev # terminal 1: brings up db, runs generate, then air on :8080
# in a SECOND terminal:
just styles-watch # rebuilds static/tailwind.css on .templ / .go changes
# open http://localhost:8080
```
The page should render with a "Fetch server time" button. Clicking it swaps an
ISO-8601 timestamp into the page via HTMX. If the page shows "No time fetched
yet." and nothing happens on click, see Troubleshooting.
`bootstrap` is the slowest step (Go tool installs + two HTTP downloads). It only
needs to run once per clone.
## docker compose fallback
`compose.yaml` is portable across podman and docker — the service definition is
identical. If you don't have podman:
- Replace `podman compose` with `docker compose` mentally throughout this README.
- The `just db-up` / `just db-down` recipes call `podman compose` directly. Run
`docker compose up -d postgres` / `docker compose down` instead, and continue
with the rest of the Quickstart unchanged.
(Decision D-11.)
## Project layout
```
backend/
cmd/
web/main.go # HTTP server entry point
worker/main.go # background worker — river periodic jobs (Phase 6)
internal/
db/ # pgxpool wiring + sqlc-generated queries
web/ # chi router, handlers, middleware, design-system
ui/ # custom templ component library (Button, Card, Badge)
session/ # placeholder — Phase 2
tablos/ # placeholder — Phase 3
tasks/ # placeholder — Phase 4
files/ # placeholder — Phase 5
migrations/ # goose .sql migrations
templates/ # .templ files (layout, index, fragments)
static/
htmx.min.js # bootstrap-downloaded by `just bootstrap`; gitignored; no runtime CDN
tailwind.css # generated by the Tailwind standalone CLI
bin/ # gitignored — tailwindcss CLI binary, etc.
.air.toml # air live-reload config
.env.example # committed; copy to .env
compose.yaml # local Postgres
go.mod / go.sum
justfile # task runner recipes — the source of truth for commands
sqlc.yaml
tailwind.input.css
README.md
```
HTMX is served from `/static/htmx.min.js` at runtime — no CDN. The justfile's
bootstrap-time `unpkg.com` URL is the single authoritative version pin (D-10).
## Environment variables
`backend/.env` is gitignored; `backend/.env.example` is committed and lists the
keys consumed by `cmd/web` and `cmd/worker`. Local Just recipes load
`backend/.env` automatically, so `just dev` will pick up provider credentials
2026-05-15 19:41:22 +00:00
such as `GOOGLE_CLIENT_ID`.
| Variable | Description | Default |
| ------------------------ | ------------------------------------------------------------------------ | ---------------------------------------------------------------- |
| `DATABASE_URL` | Postgres DSN used by the web + worker binaries and by `just migrate` | `postgres://xtablo:xtablo@localhost:5432/xtablo?sslmode=disable` |
| `PORT` | HTTP port for `cmd/web` | `8080` |
| `ENV` | `development` enables slog's text handler; `production` switches to JSON | `development` |
| `GOOGLE_CLIENT_ID` | Google OAuth client ID | blank |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | blank |
| `GOOGLE_REDIRECT_URL` | Google callback URL, usually `/auth/google/callback` | `http://localhost:8080/auth/google/callback` |
2026-05-15 19:41:22 +00:00
Google config is optional in local development. When it is missing, the login
and signup pages keep the Google button visible but disabled with a
not-configured label. No real provider secrets should be committed to
`.env.example`. Apple sign-in is disabled in the current product surface.
## Common commands
Every command in this table is a recipe in `backend/justfile`.
| Recipe | What it does | When to use |
| ----------------------------------------------- | ---------------------------------------------------------------------------- | -------------------------------------------------------- |
| `just bootstrap` | Installs Go CLI tools (`goose`, `templ`, `sqlc`, `air`); bootstrap-downloads `bin/tailwindcss` and `static/htmx.min.js` | Once per clone; re-run after deleting `bin/` or `static/htmx.min.js` |
| `just db-up` | Starts the local Postgres container | Before `just migrate up` / `just dev` if not already running |
| `just db-down` | Stops the local Postgres container | When you're done for the day |
| `just migrate up` / `migrate down` / `migrate status` | Applies / reverts / inspects goose migrations against `DATABASE_URL` | After `just db-up`, or any time you change `migrations/` |
| `just generate` | One-shot: `templ generate`, `sqlc generate`, Tailwind compile to `static/tailwind.css` | After editing `.templ`, query SQL, or `tailwind.input.css` |
| `just styles-watch` | Tailwind standalone CLI in `--watch` mode | In a second terminal alongside `just dev` (D-14) |
| `just dev` | Loads `backend/.env`, brings up Postgres, runs `just generate`, then runs `air` for Go live-reload on `:8080` | Main dev loop, terminal 1 |
| `just test` | `templ generate` then `go test ./...` | Before committing |
| `just lint` | `go vet ./...` and `gofmt -l` check | Before committing |
| `just build` | Generates assets, then builds `bin/web` and `bin/worker` | Producing release binaries locally |
| `just clean` | Removes `bin/`, `tmp/`, `static/htmx.min.js`, `static/tailwind.css`, and `*_templ.go` files | Reset to a fresh-clone state without dropping the Postgres volume |
## Running the Worker
`cmd/worker` is the background job processor. It runs river periodic jobs against
the same Postgres as `cmd/web`. Start it with:
```
just worker
```
This requires `just db-up` (handled automatically as a dependency) and MinIO
running (used by the orphan-file cleanup job). If MinIO is not running, the worker
will exit on startup with "file store init failed".
### What to expect
- Structured logs appear immediately at startup.
- A `"worker ready"` log line appears within a few seconds after `rivermigrate`
and S3 init complete.
- A `"worker heartbeat"` log line appears almost immediately (the heartbeat job
is configured with `RunOnStart: true`, so it fires on the first scheduler tick
which happens within seconds of startup).
- Subsequent heartbeat logs appear every ~1 minute.
- The orphan-file cleanup job runs every hour (no `RunOnStart` — first run is
~1 hour after startup).
### Single-worker constraint
**Run only one worker process at a time (v1).** River uses advisory locks for
leader election and concurrent rivermigrate runs are unsafe. Do not run multiple
worker instances against the same database in this version.
### Graceful shutdown
Send SIGINT (Ctrl+C) and observe:
```
{"level":"INFO","msg":"shutting down"}
{"level":"INFO","msg":"shutdown complete"}
```
The worker calls `riverClient.StopAndCancel` with a 10-second timeout, which
cancels in-flight job contexts and waits for goroutines to exit before closing
the pool.
### Observing failed job retries
River logs each failure via the `SlogErrorHandler`. A failed job produces a log
line like:
```
{"level":"ERROR","msg":"job error","job_id":42,"job_kind":"heartbeat","attempt":1,"max_attempts":25,"err":"..."}
```
River retries up to 25 times with exponential backoff (`attempts^4` + jitter).
After 25 failed attempts the job is moved to the discarded state in `river_job`.
## Troubleshooting
The three issues most likely to trip you up on a fresh clone:
- **"Fresh clone fails to build with `undefined: templates.Index`"** — Templ
generates `*_templ.go` files from `.templ` sources, and those generated files
are not committed. Run `just generate` (or `just dev`, which calls it) before
invoking `go build` directly. (Pitfall 1.)
- **"First request to `/healthz` returns 503 right after `just db-up`"** — The
Postgres container needs ~510 seconds to become healthy after `podman compose
up -d` returns. Check `podman compose ps` (or `docker compose ps`) for the
`healthy` status, or just wait and retry. Subsequent calls succeed. The 503
during warm-up is correct behavior, not a bug. (Pitfall 2.)
- **"Tailwind classes used in `.templ` files don't appear in the compiled CSS"** —
Tailwind v4 only scans content paths declared via `@source` in
`tailwind.input.css`. Confirm the file contains `@source
"../templates/**/*.templ";` (and equivalent globs for `internal/web/**/*.go`).
Re-run `just styles-watch` so the watcher picks up the config change.
(Pitfall 3.)
If something else is wrong and you want a clean slate without dropping the
Postgres volume:
```
just clean # removes bin/, tmp/, static/htmx.min.js, static/tailwind.css, *_templ.go
just bootstrap # re-download tools and assets
just dev # back to a working state
```
Run `just db-down` first if you also want to drop the Postgres container.
## What Phase 1 ships (and doesn't)
**Ships:**
- Project scaffold (`go.mod`, justfile, `.air.toml`, `tailwind.input.css`,
`sqlc.yaml`, `compose.yaml`)
- Local Postgres via `compose.yaml` (`pg_isready` healthcheck)
- goose migration pipeline (`migrations/0001_init.sql` is a no-op bootstrap)
- chi router with `/`, `/healthz`, `/demo/time`, `/static/*`
- slog-based structured logging with RequestID middleware
- Graceful HTTP shutdown
- pgxpool wiring exercised by `/healthz`
- templ + HTMX demo (root page + `hx-get` round-trip to a templ fragment)
- Custom templ design-system package at `internal/web/ui/` (Button, Card, Badge)
- Live-reload dev loop (`just dev` + `just styles-watch`)
- `cmd/worker` skeleton (boot, log, idle, shutdown)
**Does not ship — deferred:**
- Authentication, sessions, users → Phase 2
- Tablos CRUD → Phase 3
- Tasks / kanban → Phase 4
- File uploads + R2/S3 → Phase 5
- Real worker jobs → Phase 6
- Production deploy, Dockerfile, `/readyz` → Phase 7
## Deploy
The production host is a Hetzner VM running plain Docker Compose (D-01, D-02). No
Kubernetes or managed orchestration is needed — `docker compose up -d` on the VM is
the entire deployment mechanism. Postgres runs inside the compose stack (D-03); there
is no external managed database.
### Prerequisites
Install on the production VM before first deploy:
- **Docker** ≥ 24 with the **Docker Compose** plugin (`docker compose` — not the
standalone `docker-compose` binary)
- **git** (optional — useful for pulling the repo directly onto the VM)
No other runtimes are needed. Go, Node, and all build tooling run in the Dockerfile's
multi-stage build and are not required on the VM.
### First-time setup
Run all commands on the VM via SSH unless noted otherwise.
1. **SSH to the VM.**
```
ssh user@<vm-ip>
```
2. **Copy the `backend/` directory to the VM** (or clone the repo).
```
# Option A — rsync from local machine:
rsync -av --exclude '.git' backend/ user@<vm-ip>:~/xtablo/
# Option B — clone the repo directly on the VM:
git clone <repo-url> ~/xtablo && cd ~/xtablo/backend
```
3. **Create `.env.prod`** by copying `.env.example` and filling in real values.
```
cp .env.example .env.prod
chmod 600 .env.prod # restrict read access — file contains secrets (T-07-10)
```
Mandatory variables to set in `.env.prod`:
| Variable | Value |
|---|---|
| `DATABASE_URL` | `postgres://xtablo:<POSTGRES_PASSWORD>@postgres:5432/xtablo?sslmode=disable` (internal compose network — hostname is `postgres`) |
| `POSTGRES_PASSWORD` | Strong random password (also used by the postgres service). Example: `openssl rand -hex 24` |
| `POSTGRES_USER` | `xtablo` (or your custom user; must match `DATABASE_URL`) |
| `POSTGRES_DB` | `xtablo` (or your custom db; must match `DATABASE_URL`) |
| `SESSION_SECRET` | 32 random bytes hex-encoded. Generate with: `openssl rand -hex 32` |
| `S3_ENDPOINT` | R2 endpoint URL: `https://<account-id>.r2.cloudflarestorage.com` |
| `S3_BUCKET` | R2 bucket name |
| `S3_ACCESS_KEY` | R2 API token key ID |
| `S3_SECRET_KEY` | R2 API token secret |
| `S3_USE_PATH_STYLE` | `false` for Cloudflare R2 (virtual-hosted-style URLs) |
| `S3_REGION` | `auto` or `us-east-1` (R2 accepts both) |
| `MAX_UPLOAD_SIZE_MB` | `25` (or your preferred limit) |
| `ENV` | `production` (activates JSON slog handler) |
| `PORT` | `8080` |
| `DOMAIN` | `app.yourdomain.com` (Caddy reads this for TLS) |
Do **not** include `TEST_DATABASE_URL` in `.env.prod` — it is a dev/test-only
variable and is not used by the runtime binaries.
4. **Build the Docker image** (from inside `backend/` — either locally or on the VM).
```
# From inside backend/
docker build -f Dockerfile -t ghcr.io/yourusername/xtablo:v0.1.0 .
```
If building locally, push to a registry and pull on the VM:
```
docker push ghcr.io/yourusername/xtablo:v0.1.0
# On the VM:
docker pull ghcr.io/yourusername/xtablo:v0.1.0
```
5. **Set image coordinates as environment variables** (used by `docker-compose.prod.yaml`).
```
export IMAGE=ghcr.io/yourusername/xtablo
export TAG=v0.1.0
```
6. **Start the stack.**
```
docker compose -f docker-compose.prod.yaml --env-file .env.prod up -d
```
The postgres service must pass its healthcheck before web and worker start.
Migrations run automatically at web startup via `goose.Up()` (D-10).
7. **Verify the deployment.**
```
curl https://app.yourdomain.com/healthz # → {"status":"ok"}
curl https://app.yourdomain.com/readyz # → {"status":"ok","db":"ok"}
```
If the domain is not yet configured, use the VM's public IP temporarily with
HTTP (Caddy will not yet have a certificate):
```
curl http://<vm-ip>:80/healthz
```
8. **Let's Encrypt staging (for initial TLS testing).**
To avoid hitting Let's Encrypt production rate limits (5 duplicate certificates
per week per domain) during initial setup, uncomment the staging global block in
`deploy/Caddyfile`:
```
{
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}
```
Restart Caddy after editing (`docker compose -f docker-compose.prod.yaml restart caddy`),
verify TLS works (browsers will show a staging cert warning — that is expected),
then remove the global block and clear the `caddy_data` volume to issue a real
production certificate.
### Deploying a new version
1. **Build and tag the new image** (same as first-time, with a new tag):
```
docker build -f Dockerfile -t ghcr.io/yourusername/xtablo:v0.2.0 .
docker push ghcr.io/yourusername/xtablo:v0.2.0 # if using a registry
```
2. **On the VM** — update `TAG` in `.env.prod`:
```
# Edit .env.prod:
TAG=v0.2.0
```
Or pass it inline without editing the file:
```
export TAG=v0.2.0
```
3. **Pull and recreate only the changed services:**
```
docker compose -f docker-compose.prod.yaml --env-file .env.prod up -d
```
Compose recreates only the web and worker containers (their image tag changed).
Postgres and Caddy are unaffected. Migrations run automatically at web startup
(D-10) — `goose.Up()` is idempotent and skips already-applied migrations.
## Rollback
Rollback means redeploying the previous image tag (D-11). No special tooling is
required — it is the same as deploying a new version, but with an older tag.
1. **On the VM** — set `TAG` to the previous tag in `.env.prod` (or inline):
```
export TAG=v0.1.0
```
2. **Redeploy:**
```
docker compose -f docker-compose.prod.yaml --env-file .env.prod up -d
```
Compose recreates web and worker with the old image. The rollback is complete.
### Schema rollback (break-glass)
`goose.Up()` is idempotent — rolling back to a previous binary does not automatically
run `goose down`. In most cases this is fine: the old binary ignores columns it does
not know about.
If a migration introduced a schema change that is **incompatible** with the old binary
(e.g. a NOT NULL column without a default that the old binary does not supply), run a
manual goose down as a break-glass step:
1. Connect to Postgres inside the container:
```
docker exec -it <postgres-container-name> psql -U xtablo -d xtablo
```
(Find the container name with `docker compose -f docker-compose.prod.yaml ps`.)
2. The production image is distroless — the `goose` CLI is not inside the runtime
container. Install the goose CLI separately on the VM or use the goose Docker
image against the internal network:
```
# Install goose CLI on the VM:
go install github.com/pressly/goose/v3/cmd/goose@latest
goose -dir ./migrations postgres "$DATABASE_URL" down
```
Or use an ephemeral container on the same compose network:
```
docker run --rm --network <compose-network> \
-e GOOSE_DRIVER=postgres \
-e GOOSE_DBSTRING="postgres://xtablo:<password>@postgres:5432/xtablo?sslmode=disable" \
-v $(pwd)/migrations:/migrations \
ghcr.io/kukymbr/goose-docker:latest \
goose -dir /migrations down
```
After reverting the migration, the old binary will start cleanly.
## Incident Runbook
### /readyz returns 503
`/readyz` pings Postgres. A 503 means the web container cannot reach the database.
1. Check container status:
```
docker compose -f docker-compose.prod.yaml ps
```
2. If `postgres` is down or unhealthy, restart it:
```
docker compose -f docker-compose.prod.yaml up -d postgres
```
Then restart web and worker (they will wait for postgres to be healthy):
```
docker compose -f docker-compose.prod.yaml up -d
```
3. Check web logs for the actual error:
```
docker compose -f docker-compose.prod.yaml logs web --tail=50
```
All application logs are JSON when `ENV=production` is set. Look for
`"level":"ERROR"` lines with a `"msg":"db ping failed"` or similar.
### Caddy TLS certificate errors
1. Check caddy logs:
```
docker compose -f docker-compose.prod.yaml logs caddy --tail=50
```
2. If you see "too many certificates already issued for" (Let's Encrypt rate limit,
RESEARCH Pitfall 4):
- Caddy hit the 5 duplicate certificates per week limit for the domain.
- Confirm the `caddy_data` named volume exists and is mounted — if the volume was
accidentally deleted, Caddy cannot reuse the cached certificate and must
re-issue on every restart, quickly exhausting the rate limit.
- Recovery options:
- Wait up to 1 week for the rate limit window to reset.
- Switch to the Let's Encrypt staging endpoint temporarily (see
"Let's Encrypt staging" in the First-time setup section above).
- Restore from a `caddy_data` volume backup if available.
3. If the `caddy_data` volume was lost:
```
# Verify the volume still exists:
docker volume ls | grep caddy_data
# If missing, the volume must be recreated (certificates will be re-issued):
docker compose -f docker-compose.prod.yaml up -d caddy
```
### Checking logs
Follow logs for any service:
```
docker compose -f docker-compose.prod.yaml logs web --tail=100 --follow
docker compose -f docker-compose.prod.yaml logs worker --tail=100 --follow
docker compose -f docker-compose.prod.yaml logs caddy --tail=100 --follow
docker compose -f docker-compose.prod.yaml logs postgres --tail=50
```
All application logs are JSON in production (`ENV=production` activates the slog
JSON handler). Pipe through `jq` for readable output:
```
docker compose -f docker-compose.prod.yaml logs web --follow --no-log-prefix | jq .
```
### Debugging the distroless container
The runtime image (`gcr.io/distroless/static-debian12:nonroot`) has **no shell**
(RESEARCH Pitfall 7). You cannot `docker exec -it <web-container> sh`.
To debug network or filesystem issues, attach an ephemeral busybox container to the
same network:
```
# Find the web container ID:
docker compose -f docker-compose.prod.yaml ps
# Attach busybox to the web container's network namespace:
docker run --rm -it --network container:<web-container-id> busybox sh
```
From the busybox shell you can run `wget`, `nc`, `ping`, etc. to diagnose
connectivity. To inspect the compose network directly (e.g. reach `postgres:5432`):
```
docker run --rm -it \
--network $(docker inspect <web-container-id> --format '{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}') \
busybox sh
```