From 9d1e24fc7e6e5e3a5e88d6a9d207275151234ceb Mon Sep 17 00:00:00 2001 From: Arthur Belleville Date: Fri, 15 May 2026 15:13:06 +0200 Subject: [PATCH] docs(06): capture phase context --- .../phases/06-background-worker/06-CONTEXT.md | 113 +++++++++++++++++ .../06-background-worker/06-DISCUSSION-LOG.md | 116 ++++++++++++++++++ 2 files changed, 229 insertions(+) create mode 100644 .planning/phases/06-background-worker/06-CONTEXT.md create mode 100644 .planning/phases/06-background-worker/06-DISCUSSION-LOG.md diff --git a/.planning/phases/06-background-worker/06-CONTEXT.md b/.planning/phases/06-background-worker/06-CONTEXT.md new file mode 100644 index 0000000..bce1a8a --- /dev/null +++ b/.planning/phases/06-background-worker/06-CONTEXT.md @@ -0,0 +1,113 @@ +# Phase 6: Background Worker - Context + +**Gathered:** 2026-05-15 +**Status:** Ready for planning + + +## Phase Boundary + +A `cmd/worker` binary runs alongside `cmd/web` against the same Postgres instance, processes periodic jobs via a river-backed queue, and proves end-to-end with two real jobs: a heartbeat and an orphan-file cleanup. The worker skeleton already exists (`backend/cmd/worker/main.go`) — this phase replaces its noop body with the real river runtime. + +Delivers WORK-01, WORK-02, WORK-03, WORK-04. **Not in scope:** web-side job enqueueing (event-triggered jobs from HTTP handlers), multi-worker leader election, per-job HTTP admin UI, CLI subcommands for job management, Redis or any non-Postgres queue backend. + + + + +## Implementation Decisions + +### Queue Library +- **D-01:** Use **river** (github.com/riverqueue/river) as the job queue. Postgres-native, adds one Go dependency, provides built-in retry/backoff and advisory locking. Matches the single-VPS / Postgres-only constraint. +- **D-02:** River's schema is managed via **`rivermigrate`** run programmatically at worker startup (before the client starts listening). No goose migration needed for river's internal tables — river owns its own migration path. +- **D-03:** Phase 6 uses **periodic jobs only** (river's `PeriodicJob`). Web-side enqueueing from HTTP handlers is deferred to a later phase when a real trigger exists (e.g. post-upload processing). + +### Proof-of-Life Jobs +- **D-04:** Two jobs ship in Phase 6: + 1. **Heartbeat** — logs a structured "worker heartbeat" line every **1 minute**. Proves the scheduler is running; observable in logs during development. + 2. **Orphan-file cleanup** — runs every **1 hour**. Finds `tablo_files` rows whose owning tablo no longer exists (hard-deleted) and deletes both the DB row and the corresponding S3 object. Uses the same DB pool and S3 client established in Phase 5. +- **D-05:** Both jobs are registered as `river.PeriodicJob` entries in the river client constructor at worker startup. + +### Failed Job Visibility +- **D-06:** Failed jobs are surfaced via **structured logs only**. River emits a log event on each failure and on discard (max retries exceeded). Log fields include: job ID, job type, error message, attempt count, next retry time. WORK-04's "visible via a simple CLI surface" is satisfied by log observability — no dedicated CLI command or admin route in Phase 6. + +### Job Scheduling Model +- **D-07:** A **single `river.Client`** is created in `cmd/worker/main.go`, all periodic jobs registered at startup, client started, binary blocks on SIGINT/SIGTERM. No external coordination layer. +- **D-08:** **Single-worker constraint** — only one worker instance should run in v1. README documents: do not run multiple worker processes until leader election is added. River's advisory locking exists but is not relied on in this phase. + +### Claude's Discretion +- Exact log fields emitted by the heartbeat job (beyond a basic "heartbeat" message — e.g. worker uptime, job count). +- Whether the orphan-file cleanup job logs a summary after each run (rows deleted, S3 objects deleted, errors encountered). +- River client configuration details: worker concurrency, max attempts before discard, queue name. +- Whether the orphan detection query uses a LEFT JOIN or a NOT IN / NOT EXISTS pattern — planner's call. + + + + +## Canonical References + +**Downstream agents MUST read these before planning or implementing.** + +### Requirements +- `.planning/REQUIREMENTS.md` §Worker (WORK-01..04) — The 4 worker requirements this phase delivers +- `.planning/PROJECT.md` — Core value statement and constraints (single binary + background worker, same Postgres, single VPS) +- `.planning/ROADMAP.md` §Phase 6 — Success criteria and user-in-loop decisions + +### Prior Phase Context (locked decisions that constrain this phase) +- `.planning/phases/05-files/05-CONTEXT.md` — D-01..D-06 (S3 client setup, MinIO in compose.yaml, file deletion pattern) — orphan-file cleanup job reuses the same S3 client and key format +- `.planning/phases/01-foundation/01-CONTEXT.md` — `cmd/web` and `cmd/worker` entrypoints, goose migration conventions, justfile targets + +### Codebase Entry Points +- `backend/cmd/worker/main.go` — Existing skeleton: pgxpool connect + slog + graceful shutdown. Phase 6 replaces the noop body with the river client. +- `backend/internal/db/` — Shared DB pool (`db.NewPool`), sqlc-generated types. Worker reuses these. +- `backend/internal/files/store.go` — S3 client and file operations. Orphan-cleanup job imports this package. +- `backend/internal/db/queries/files.sql` — sqlc queries for tablo_files. Orphan-cleanup query added here. +- `backend/compose.yaml` — MinIO already present from Phase 5. No new services needed. + + + + +## Existing Code Insights + +### Reusable Assets +- `db.NewPool(ctx, dsn)` — already called in the worker skeleton; river client wraps the same pool. +- `web.NewSlogHandler(env, os.Stdout)` — structured logging setup already in the worker; river's logger adapter should use the same slog default. +- `backend/internal/files/store.go` — S3 delete operation already implemented in Phase 5; orphan-cleanup job calls `store.DeleteFile(ctx, key)` directly. +- `signal.NotifyContext` pattern — already in `cmd/worker/main.go`; river client's `Stop()` hooks into the same context cancellation. + +### Established Patterns +- Handler/store separation: domain logic lives in `internal//store.go`, not in cmd packages. The orphan-cleanup job's DB query lives in `internal/db/queries/files.sql` (sqlc), not inline in cmd/worker. +- goose migrations numbered sequentially. Phase 5 adds `0005_files.sql`. If any app-level schema change is needed for worker (unlikely — river manages its own tables), it would be `0006_*.sql`. +- `just generate` runs sqlc after any `.sql` query change. +- `backend/.env.example` — new env vars (if any, e.g. worker-specific config) should be documented here. + +### Integration Points +- `backend/cmd/worker/main.go` — Replace noop body with: `rivermigrate.New(pool).Migrate(ctx)` → construct `river.Client` with periodic job registrations → `client.Start(ctx)` → `<-ctx.Done()` → `client.Stop(ctx)`. +- `backend/internal/db/queries/files.sql` — Add orphan detection query: find `tablo_files` rows where the owning `tablo_id` no longer exists in `tablos`. +- `backend/go.mod` — Add `github.com/riverqueue/river` and `github.com/riverqueue/river/riverdriver/riverpgxv5`. +- `backend/justfile` — Add `worker` target for local dev (`just worker` starts the worker binary). + + + + +## Specific Ideas + +- Heartbeat interval: **1 minute** in production and local dev (frequent enough to observe in logs quickly). +- Orphan-file cleanup interval: **1 hour** — orphans don't accumulate fast at v1 scale; safe to run hourly. +- S3 object key format for orphan cleanup: `files/{tablo_id}/{uuid}` (locked by Phase 5 D-04) — the cleanup job reconstructs the key from `tablo_files.s3_key` column directly, no reconstruction needed. +- The orphan-cleanup job should log a per-run summary: how many orphan rows found, how many S3 objects deleted, how many errors. Useful for verifying the job ran correctly. + + + + +## Deferred Ideas + +- **Web-side job enqueueing** (river client in `cmd/web`) — deferred to a later phase when a real event-triggered use case appears (e.g. post-upload thumbnail generation, email dispatch). +- **Multiple worker instances / leader election** — deferred; single-worker constraint documented for v1. +- **Job admin UI or CLI subcommand** (`backend list-failed-jobs`) — deferred; log observability satisfies v1. +- **Redis / asynq** — explicitly out of scope; Postgres-only stack. + + + +--- + +*Phase: 6-Background-Worker* +*Context gathered: 2026-05-15* diff --git a/.planning/phases/06-background-worker/06-DISCUSSION-LOG.md b/.planning/phases/06-background-worker/06-DISCUSSION-LOG.md new file mode 100644 index 0000000..c01db7c --- /dev/null +++ b/.planning/phases/06-background-worker/06-DISCUSSION-LOG.md @@ -0,0 +1,116 @@ +# Phase 6: Background Worker - Discussion Log + +> **Audit trail only.** Do not use as input to planning, research, or execution agents. +> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered. + +**Date:** 2026-05-15 +**Phase:** 6-background-worker +**Areas discussed:** Queue library, Proof-of-life job, Failed job surface, Job scheduling model + +--- + +## Queue Library + +| Option | Description | Selected | +|--------|-------------|----------| +| river (Postgres-native) | Postgres-native job queue, one Go dep, built-in retry/backoff | ✓ | +| Hand-rolled pg_notify | Zero new deps, full control, but owns retry/backoff yourself | | +| asynq (Redis) | Adds Redis to compose.yaml, richer UI but conflicts with Postgres-only thesis | | + +**User's choice:** river + +**Migration management:** + +| Option | Description | Selected | +|--------|-------------|----------| +| rivermigrate at startup (programmatic) | Run rivermigrate before client starts; zero manual SQL | ✓ | +| Embed into goose migrations | Copy river SQL into 0006_river.sql; manual sync on upgrades | | + +**User's choice:** rivermigrate at startup + +**Scheduling scope:** + +| Option | Description | Selected | +|--------|-------------|----------| +| Periodic only for Phase 6 | Prove wiring with scheduled jobs; web-side enqueueing deferred | ✓ | +| Both periodic + web-side enqueue | Wire river client into cmd/web for full flow | | + +**User's choice:** Periodic only + +**Notes:** User selected the recommended option for all three queue-library sub-decisions, confirming Postgres-only constraint alignment. + +--- + +## Proof-of-Life Job + +| Option | Description | Selected | +|--------|-------------|----------| +| Orphan-file cleanup | Finds and deletes tablo_files rows/S3 objects where tablo was deleted | | +| Heartbeat / noop | Logs on schedule; proves scheduler but no domain value | ✓ (initial) | +| Signed-URL prewarm | Pre-generates download URLs for recent files | | + +**Follow-up (heartbeat only vs. heartbeat + cleanup):** + +| Option | Description | Selected | +|--------|-------------|----------| +| Heartbeat only | Minimal Phase 6 | | +| Heartbeat + orphan-file cleanup | Both jobs; heartbeat proves scheduling, cleanup proves DB+S3 | ✓ | + +**Intervals:** + +| Option | Description | Selected | +|--------|-------------|----------| +| Cleanup: hourly; Heartbeat: every minute | Frequent heartbeat visibility, safe hourly cleanup | ✓ | +| Both every minute | More log noise, maximum dev observability | | + +**User's choice:** Heartbeat (1 min) + orphan-file cleanup (1 hr) + +**Notes:** User initially chose heartbeat-only for proof, then selected adding cleanup as a second job. Cleanup is the more meaningful domain job; heartbeat proves the periodic scheduler plumbing. + +--- + +## Failed Job Surface + +| Option | Description | Selected | +|--------|-------------|----------| +| CLI subcommand: backend list-failed-jobs | Queries river_jobs for failed rows; satisfies WORK-04 literally | | +| Structured logs only | Rich log fields on failure; redefines WORK-04 as log observability | ✓ | +| Admin HTTP route on worker | GET /admin/jobs/failed on separate port | | + +**User's choice:** Structured logs only — WORK-04's "CLI surface" interpreted as log observability. + +**Notes:** User rejected a follow-up push to add a CLI command alongside logs, then confirmed the logs-only decision on second presentation. Decision is deliberate. + +--- + +## Job Scheduling Model + +| Option | Description | Selected | +|--------|-------------|----------| +| Single river.Client with all periodic jobs at startup | One client, PeriodicJob registrations, block on shutdown | ✓ | +| Separate goroutines with time.Ticker | Hand-rolled scheduling outside river | | + +**Concurrency model:** + +| Option | Description | Selected | +|--------|-------------|----------| +| Document single-worker constraint | README notes: one worker only in v1 | ✓ | +| Rely on river's advisory locks | River prevents duplicate execution across instances | | + +**User's choice:** Single river.Client; document single-worker constraint in README. + +--- + +## Claude's Discretion + +- Exact log fields emitted by heartbeat job (uptime, job count, etc.) +- Whether orphan-cleanup logs a per-run summary +- River client configuration (concurrency, max attempts, queue name) +- Orphan detection query pattern (LEFT JOIN vs NOT EXISTS) + +## Deferred Ideas + +- Web-side job enqueueing (river client in cmd/web) — future phase when event-triggered use case appears +- Multiple worker instances / leader election — after v1 ships +- Job admin UI or CLI subcommand — logs satisfy v1 observability needs +- Redis / asynq — explicitly out of scope (Postgres-only constraint)