diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 6529eca..8d472d0 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -153,8 +153,13 @@ Plans: **Plans:** 3 plans Plans: +**Wave 1** - [ ] 06-01-PLAN.md — Wave 1: go get river + ListOrphanFiles sqlc query + internal/jobs/ package (HeartbeatWorker, OrphanCleanupWorker, SlogErrorHandler) + unit tests (WORK-01 partial, WORK-02, WORK-03, WORK-04) + +**Wave 2** *(blocked on Wave 1 completion)* - [ ] 06-02-PLAN.md — Wave 2: replace cmd/worker/main.go with full river wiring (rivermigrate + Client + periodic jobs + graceful shutdown) + just worker target + README section (WORK-01, WORK-02, WORK-03) + +**Wave 3** *(blocked on Wave 2 completion)* - [ ] 06-03-PLAN.md — Wave 3: Human-verify checkpoint — run just worker, observe heartbeat logs, verify graceful shutdown ### Phase 7: Deploy v1 diff --git a/.planning/STATE.md b/.planning/STATE.md index 3bd32d8..55c944d 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -3,13 +3,13 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: ready_to_plan -last_updated: "2026-05-15T13:13:11.604Z" +last_updated: "2026-05-15T14:30:16.643Z" progress: total_phases: 7 completed_phases: 5 - total_plans: 22 + total_plans: 25 completed_plans: 22 - percent: 100 + percent: 88 --- # STATE diff --git a/.planning/phases/06-background-worker/06-01-PLAN.md b/.planning/phases/06-background-worker/06-01-PLAN.md index 9e8e984..3e9b0be 100644 --- a/.planning/phases/06-background-worker/06-01-PLAN.md +++ b/.planning/phases/06-background-worker/06-01-PLAN.md @@ -24,11 +24,11 @@ requirements: must_haves: truths: - - "river and riverpgxv5 are importable from backend/internal/jobs/ without build errors" - - "HeartbeatWorker.Work() returns nil and logs 'worker heartbeat' with job_id and attempt fields" - - "OrphanCleanupWorker.Work() iterates orphan rows, deletes S3 object then DB row, logs per-run summary" - - "SlogErrorHandler implements river.ErrorHandler (both HandleError and HandlePanic compile)" - - "ListOrphanFiles sqlc query is generated and returns []ListOrphanFilesRow with {ID, TabloID, S3Key}" + - "D-01: river and riverpgxv5 are importable from backend/internal/jobs/ without build errors (go.mod contains github.com/riverqueue/river)" + - "D-04: HeartbeatWorker.Work() returns nil and logs 'worker heartbeat' with job_id and attempt fields" + - "D-04: OrphanCleanupWorker.Work() iterates orphan rows, deletes S3 object then DB row, logs per-run summary" + - "D-06: SlogErrorHandler implements river.ErrorHandler (both HandleError and HandlePanic compile)" + - "D-04: ListOrphanFiles sqlc query is generated and returns []ListOrphanFilesRow with {ID, TabloID, S3Key}" - "Unit tests for HeartbeatWorker, OrphanCleanupWorker, and SlogErrorHandler pass" artifacts: - path: "backend/go.mod" @@ -173,8 +173,8 @@ From backend/internal/db/sqlc/files.sql.go (generated pattern — ListOrphanFile - OrphanCleanupWorker struct embeds river.WorkerDefaults[OrphanCleanupArgs]; fields: pool *pgxpool.Pool, store files.FileStorer - NewOrphanCleanupWorker(pool *pgxpool.Pool, store files.FileStorer) *OrphanCleanupWorker constructor - Work(ctx, job): calls sqlc.New(w.pool).ListOrphanFiles(ctx); loops over results; deletes S3 object first (w.store.Delete(ctx, f.S3Key)), then calls sqlc.New(w.pool).DeleteTabloFile(ctx, sqlc.DeleteTabloFileParams{ID: f.ID, TabloID: f.TabloID}) for each; increments deleted/errCount counters; logs slog.Info("orphan cleanup complete", "orphans_found", len(orphans), "deleted", deleted, "errors", errCount); returns nil (does not fail the job for partial errors — logs them) - - TestOrphanCleanupWorker_NoOrphans: use mock store + real test DB (skip if no DB), confirm Work returns nil and logs "orphan cleanup complete" with orphans_found=0 - - TestOrphanCleanupWorker_DeletesOrphan: insert a tablo_files row with nonexistent tablo_id directly (bypassing FK), run Work, confirm the row is gone and s3 delete was called + - TestOrphanCleanupWorker_NoOrphans: pure unit test using a mock DB querier (implement a mockQuerier that satisfies the sqlc Querier interface, returning an empty []ListOrphanFilesRow) and a mock FileStorer; confirm Work returns nil + - TestOrphanCleanupWorker_DeletesOrphan: pure unit test using a mock DB querier that returns one fake ListOrphanFilesRow{ID: someUUID, TabloID: otherUUID, S3Key: "orphan-key"} and a mock FileStorer that records Delete calls; run Work and assert (a) mock FileStorer.Delete was called with "orphan-key", (b) mock querier.DeleteTabloFile was called with the matching ID and TabloID, (c) Work returns nil. Do NOT use a real test DB for this test — the FK constraint on tablo_files.tablo_id references tablos(id), making a real-DB orphan insert unreliable without complex setup. The mock approach gives deterministic coverage of the loop logic without DB dependency. SlogErrorHandler: - SlogErrorHandler struct with no fields @@ -191,7 +191,7 @@ From backend/internal/db/sqlc/files.sql.go (generated pattern — ListOrphanFile error_handler.go: Define SlogErrorHandler struct (no fields). Implement HandleError and HandlePanic per the behavior spec. Import "context", "github.com/riverqueue/river", "github.com/riverqueue/river/rivertype", "log/slog". - For the test files: heartbeat_test.go and error_handler_test.go use pure unit tests (no DB needed). orphan_cleanup_test.go: copy the setupTestDB helper from backend/internal/auth/testdb_test.go (it reads TEST_DATABASE_URL or DATABASE_URL, creates an isolated schema, runs goose migrations, returns pool + cleanup); use a fake in-memory FileStorer mock for the S3 side. Integration test for orphan cleanup inserts a tablo_files row with a uuid that has no matching tablos row (insert directly with no FK enforcement by first creating the file row while tablo exists, then hard-deleting the tablo row using `DELETE FROM tablos WHERE id = $1`; this triggers cascade but that's fine — alternatively, use a nonexistent UUID that was never inserted into tablos, which won't violate the FK constraint since you're inserting into tablo_files with a foreign key — instead, use testDB.Exec("INSERT INTO tablo_files (id, tablo_id, s3_key, filename, content_type, size_bytes) VALUES (gen_random_uuid(), gen_random_uuid(), 'orphan-key', 'orphan.txt', 'text/plain', 0)") to bypass the FK at test time; if the FK check prevents this, wrap in a deferred constraint or skip and only test the no-orphans path). + For the test files: heartbeat_test.go and error_handler_test.go use pure unit tests (no DB needed). orphan_cleanup_test.go uses pure unit tests with mock implementations — do NOT use a real test DB. Define a mockQuerier struct in the test file implementing the methods ListOrphanFiles(ctx) and DeleteTabloFile(ctx, params) used by OrphanCleanupWorker.Work; the mock records calls so tests can assert on them. Define a mockFileStorer struct implementing files.FileStorer with a Delete method that records the keys it was called with. TestOrphanCleanupWorker_NoOrphans: mockQuerier returns empty slice, assert Work returns nil. TestOrphanCleanupWorker_DeletesOrphan: mockQuerier returns one row, assert Delete was called with the correct S3 key and DeleteTabloFile was called with the correct params, assert Work returns nil. The verify command is: go test ./internal/jobs/... -run TestOrphanCleanupWorker -v -count=1. After writing all files: cd backend && go test ./internal/jobs/... @@ -199,7 +199,7 @@ From backend/internal/db/sqlc/files.sql.go (generated pattern — ListOrphanFile cd /Users/arthur.belleville/Documents/perso/projects/xtablo-source/backend && go build ./internal/jobs/... && go test ./internal/jobs/... -v -count=1 - go build ./internal/jobs/... exits 0. go test ./internal/jobs/... passes (or skips DB-dependent tests when no DB is available). All three files exist with correct package declarations and exported types. + go build ./internal/jobs/... exits 0. go test ./internal/jobs/... passes with no skips required (all tests are pure unit tests with mocks). All three files exist with correct package declarations and exported types. - `cd backend && go build ./internal/jobs/...` exits 0 @@ -210,7 +210,7 @@ From backend/internal/db/sqlc/files.sql.go (generated pattern — ListOrphanFile - `grep "SlogErrorHandler" backend/internal/jobs/error_handler.go` returns a line - `grep "HandleError" backend/internal/jobs/error_handler.go` returns a line - `grep "HandlePanic" backend/internal/jobs/error_handler.go` returns a line - - `cd backend && go test ./internal/jobs/... -v -count=1` exits 0 (DB tests may skip, but must not fail) + - `cd backend && go test ./internal/jobs/... -v -count=1` exits 0 (all tests pass — no DB required) - `cd backend && go build ./...` exits 0 (whole module compiles) @@ -238,7 +238,7 @@ From backend/internal/db/sqlc/files.sql.go (generated pattern — ListOrphanFile After Plan 01 is complete: - `cd backend && go build ./...` exits 0 -- `cd backend && go test ./internal/jobs/... -v` passes (skips if no DB, but does not error) +- `cd backend && go test ./internal/jobs/... -v` passes (all unit tests with mocks — no DB required) - `grep "riverqueue/river" backend/go.mod` shows at least 2 lines (river + riverpgxv5) - `grep "ListOrphanFiles" backend/internal/db/sqlc/files.sql.go` returns at least 1 line - `grep "HeartbeatWorker" backend/internal/jobs/heartbeat.go` returns 1 line @@ -249,7 +249,7 @@ After Plan 01 is complete: - river importable from internal/jobs/ without compile errors - HeartbeatWorker, OrphanCleanupWorker, SlogErrorHandler defined with correct signatures - ListOrphanFiles sqlc query generated with ListOrphanFilesRow type -- Unit tests for all three job types pass +- Unit tests for all three job types pass (pure mock-based, no DB dependency) - go build ./... exits 0 — whole module compiles clean diff --git a/.planning/phases/06-background-worker/06-02-PLAN.md b/.planning/phases/06-background-worker/06-02-PLAN.md index 4ec472a..4f52178 100644 --- a/.planning/phases/06-background-worker/06-02-PLAN.md +++ b/.planning/phases/06-background-worker/06-02-PLAN.md @@ -17,15 +17,15 @@ requirements: must_haves: truths: - - "cmd/worker/main.go runs rivermigrate at startup before constructing river.Client" - - "cmd/worker/main.go registers HeartbeatWorker and OrphanCleanupWorker via river.AddWorker" - - "cmd/worker/main.go passes slog.Default() as river.Config.Logger" - - "cmd/worker/main.go starts HeartbeatWorker as a PeriodicJob every 1 minute with RunOnStart: true" - - "cmd/worker/main.go starts OrphanCleanupWorker as a PeriodicJob every 1 hour with RunOnStart: false" - - "Graceful shutdown calls riverClient.StopAndCancel with a 10-second timeout context" + - "D-02: cmd/worker/main.go runs rivermigrate at startup before constructing river.Client" + - "D-05: cmd/worker/main.go registers HeartbeatWorker and OrphanCleanupWorker via river.AddWorker" + - "D-07: cmd/worker/main.go passes slog.Default() as river.Config.Logger (single river.Client)" + - "D-03: cmd/worker/main.go starts HeartbeatWorker as a PeriodicJob every 1 minute with RunOnStart: true (periodic jobs only)" + - "D-03: cmd/worker/main.go starts OrphanCleanupWorker as a PeriodicJob every 1 hour with RunOnStart: false" + - "D-07: Graceful shutdown calls riverClient.StopAndCancel with a 10-second timeout context" - "go build ./cmd/worker/... exits 0" - - "just worker target exists in justfile and passes MinIO dev defaults as env vars" - - "README.md documents how to run the worker locally" + - "D-08: just worker target exists in justfile; README.md documents single-worker constraint" + - "README.md documents how to run the worker locally alongside the web binary" artifacts: - path: "backend/cmd/worker/main.go" provides: "full river wiring — rivermigrate + river.Client + periodic jobs + graceful shutdown" @@ -214,7 +214,7 @@ CRITICAL ORDERING CONSTRAINT (RESEARCH Pitfall 2 + PATTERNS.md): - `grep "^worker:" backend/justfile` returns exactly 1 line - - `grep "S3_ENDPOINT" backend/justfile` returns at least 2 lines (one in dev equivalent path, one in worker target) + - `grep -A 20 "^worker:" backend/justfile | grep -c "S3_ENDPOINT"` returns 1 (S3_ENDPOINT confirmed inside the worker target block) - `grep "go run ./cmd/worker" backend/justfile` returns a line inside the worker target - `grep "just worker" backend/README.md` returns at least 1 line - `grep -i "single.*worker\|one worker\|multiple worker" backend/README.md` returns a line (D-08 documented) diff --git a/.planning/phases/06-background-worker/06-03-PLAN.md b/.planning/phases/06-background-worker/06-03-PLAN.md index 94c96d9..84f1a20 100644 --- a/.planning/phases/06-background-worker/06-03-PLAN.md +++ b/.planning/phases/06-background-worker/06-03-PLAN.md @@ -56,8 +56,10 @@ Output: Developer confirmation that WORK-01 through WORK-04 are satisfied in a l - `"worker ready"` — confirms binary started without errors - `"worker heartbeat"` — RunOnStart:true fires the heartbeat immediately on first tick; expect this within ~10 seconds - 5. Wait ~60 seconds for a second heartbeat log line — confirms the periodic scheduler is running. - Look for: `msg="worker heartbeat" job_id=... attempt=1` + 5. Wait until you have observed AT LEAST 2 separate `"worker heartbeat"` log lines separated by approximately 60 seconds. + The first fires on startup (RunOnStart); the second confirms the PeriodicJob scheduler is running correctly and is not just a one-shot. + Look for: `msg="worker heartbeat" job_id=... attempt=1` appearing twice with ~60 seconds between entries. + Do not proceed to the next step until both heartbeats have been observed. 6. Send SIGINT: press Ctrl+C in the worker terminal. 7. Confirm these log lines appear in order: @@ -73,7 +75,8 @@ Output: Developer confirmation that WORK-01 through WORK-04 are satisfied in a l 8. `cd backend && go test ./... && go build ./...` — must exit 0 - Type "approved" if all log lines appeared and shutdown was clean. + Do not approve until you have observed at least 2 "worker heartbeat" log lines separated by approximately 60 seconds — this confirms the PeriodicJob scheduler is running correctly, not just the RunOnStart fire. + Type "approved" once both heartbeats are visible and shutdown was clean. Type "issues: [describe]" if any step failed or was unclear. @@ -99,7 +102,7 @@ Manual verification only (this plan IS the verification step). Pass criteria: 1. `"worker ready"` appears in logs within 5 seconds of `just worker` 2. `"worker heartbeat"` appears within 10 seconds (RunOnStart fires first heartbeat immediately) -3. A second `"worker heartbeat"` appears ~60 seconds later (confirms scheduler is running) +3. A second `"worker heartbeat"` appears ~60 seconds later (confirms scheduler is running — REQUIRED before approval) 4. Ctrl+C produces `"shutting down"` then `"shutdown complete"` with no panic 5. `cd backend && go test ./... && go build ./...` exits 0 @@ -107,6 +110,7 @@ Manual verification only (this plan IS the verification step). Pass criteria: - Worker starts, connects to Postgres, applies river migrations, and logs "worker ready" - Heartbeat job fires within seconds of startup (RunOnStart: true) +- Second heartbeat fires ~60 seconds later confirming the scheduler loop is active - Graceful shutdown works: SIGINT → "shutting down" → "shutdown complete" - Full test suite green: go test ./... && go build ./... exits 0 - All WORK-01 through WORK-04 requirements satisfied as observable behaviors diff --git a/.planning/phases/06-background-worker/06-RESEARCH.md b/.planning/phases/06-background-worker/06-RESEARCH.md index 295ffe5..015edd0 100644 --- a/.planning/phases/06-background-worker/06-RESEARCH.md +++ b/.planning/phases/06-background-worker/06-RESEARCH.md @@ -632,17 +632,17 @@ worker-logs: --- -## Open Questions +## Open Questions (RESOLVED) 1. **S3 env var names for worker** - What we know: Phase 5 wired S3 env vars in `cmd/web`. Worker needs the same vars. - What's unclear: Whether the planner reuses the same env var names or introduces `WORKER_S3_*` variants. - - Recommendation: Reuse the same env var names (`S3_ENDPOINT`, `S3_BUCKET`, etc.) — both binaries share the same `.env.example` pattern. + - RESOLVED: Reuse the same env var names (`S3_ENDPOINT`, `S3_BUCKET`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, `S3_REGION`, `S3_USE_PATH_STYLE`) — both binaries share the same `.env.example` and compose.yaml MinIO defaults. 2. **DeleteTabloFile query takes (id, tablo_id) — orphan rows have no owning tablo** - What we know: `DeleteTabloFile` has signature `WHERE id = $1 AND tablo_id = $2`. - What's unclear: For a true orphan (tablo deleted bypassing cascade), the `tablo_id` column still holds the old UUID even though the tablo row is gone. `$2` matches the stored value, so the DELETE still works. - - Recommendation: No change needed — the query works correctly. Planner should note this in task comments. + - RESOLVED: No change needed — the query works correctly for orphan rows. The `tablo_id` FK column retains the old UUID value; passing it as `$2` to `DeleteTabloFile` succeeds. Both fields (`ID`, `TabloID`) are available from `ListOrphanFilesRow`. ---