SC-3 (failed-job CLI surface) and SC-4 (web binary enqueue) not yet implemented. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
14 KiB
| phase | verified | status | score | overrides_applied | gaps | human_verification | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 06-background-worker | 2026-05-15T17:00:00Z | gaps_found | 3/5 roadmap success criteria verified | 0 |
|
|
Phase 6: Background Worker Verification Report
Phase Goal: Implement a background worker binary using River for job queuing — heartbeat job, orphan file cleanup job, error handler, graceful shutdown, and developer-runnable via just worker.
Verified: 2026-05-15T17:00:00Z
Status: gaps_found
Re-verification: No — initial verification
Goal Achievement
Observable Truths (Roadmap Success Criteria)
| # | Truth | Status | Evidence |
|---|---|---|---|
| SC-1 | cmd/worker starts, connects to Postgres, registers handlers; structured logging and graceful shutdown work |
VERIFIED | cmd/worker/main.go implements full startup sequence: db.NewPool, rivermigrate, signal.NotifyContext, river.Client with slog.Default(), StopAndCancel(10s); go build ./... exits 0; live verification in Plan 03 checkpoint |
| SC-2 | At least one real job runs on a schedule and is observable in logs | VERIFIED | HeartbeatWorker (PeriodicInterval 1min, RunOnStart:true) and OrphanCleanupWorker (1hr) registered; heartbeat logs "worker heartbeat" with job_id and attempt; Plan 03 checkpoint confirms two heartbeat log lines ~60s apart in live environment |
| SC-3 | A failing job is retried with backoff and visible via a simple CLI surface (backend list-failed-jobs or admin route) |
FAILED | SlogErrorHandler logs errors correctly. No list-failed-jobs CLI command exists. No admin route exists. README documents the river_job table and log output as the observation mechanism, but the ROADMAP SC explicitly requires a CLI surface or admin route. |
| SC-4 | Web binary can enqueue a job; worker picks it up within a few seconds | FAILED | cmd/web has zero river dependency. grep "river" backend/cmd/web/main.go returns nothing. grep -r "river" backend/internal/web/ returns nothing. Web-to-worker job enqueue wiring is entirely absent. |
| SC-5 | README documents how to run the worker locally alongside the web binary | VERIFIED | README has "Running the Worker" section with just worker command, expected log sequence, single-worker constraint (D-08), and failed-job retry observation instructions |
Score: 3/5 roadmap success criteria verified
Required Artifacts
| Artifact | Expected | Status | Details |
|---|---|---|---|
backend/go.mod |
river v0.37.0 + riverpgxv5 | VERIFIED | Lines 42-46: github.com/riverqueue/river v0.37.0 // indirect, riverpgxv5 v0.37.0 // indirect present. Marked // indirect because no main package directly imports them yet — river deps show as indirect until go.sum is recomputed with direct imports from cmd/worker. Build succeeds regardless. |
backend/internal/jobs/heartbeat.go |
HeartbeatArgs + HeartbeatWorker | VERIFIED | HeartbeatArgs with Kind()="heartbeat"; HeartbeatWorker embeds river.WorkerDefaults[HeartbeatArgs]; Work() logs slog.Info("worker heartbeat", "job_id", "attempt") and returns nil |
backend/internal/jobs/orphan_cleanup.go |
OrphanCleanupArgs + OrphanCleanupWorker + constructor | VERIFIED | OrphanCleanupArgs Kind()="orphan_file_cleanup"; OrphanCleanupWorker with pool, store, fileQuerier; NewOrphanCleanupWorker wires sqlc.New(pool) as querier; Work() does S3-delete-before-DB-delete loop with error counting |
backend/internal/jobs/error_handler.go |
SlogErrorHandler implementing river.ErrorHandler | VERIFIED | SlogErrorHandler with HandleError (logs job_id, job_kind, attempt, max_attempts, err) and HandlePanic (logs job_id, job_kind, panic, trace); both return nil |
backend/internal/db/sqlc/files.sql.go |
ListOrphanFiles + ListOrphanFilesRow{ID, TabloID, S3Key} | VERIFIED | Generated: type ListOrphanFilesRow struct { ID uuid.UUID; TabloID uuid.UUID; S3Key string } and func (q *Queries) ListOrphanFiles(ctx context.Context) ([]ListOrphanFilesRow, error) at lines 132-140 |
backend/cmd/worker/main.go |
Full river wiring — rivermigrate + periodic jobs + graceful shutdown | VERIFIED | All required elements present: rivermigrate.New at startup, signal.NotifyContext AFTER startup I/O, river.AddWorker for both workers, river.NewClient with slog.Default() Logger, two PeriodicJobs (heartbeat RunOnStart:true, orphan cleanup no RunOnStart), StopAndCancel(10s), explicit pool.Close() |
backend/justfile |
worker: target with MinIO dev defaults | VERIFIED | worker: db-up target at line 116 with DATABASE_URL, S3_ENDPOINT, S3_BUCKET, S3_REGION, S3_ACCESS_KEY, S3_SECRET_KEY, S3_USE_PATH_STYLE env vars on separate continuation lines; runs go run ./cmd/worker |
backend/README.md |
Worker local dev instructions | VERIFIED | "Running the Worker" section with just worker, single-worker constraint, expected log sequence, failed job observation |
Key Link Verification
| From | To | Via | Status | Details |
|---|---|---|---|---|
cmd/worker/main.go |
internal/jobs/ |
jobs.HeartbeatWorker, jobs.NewOrphanCleanupWorker, jobs.SlogErrorHandler |
WIRED | All three imported and used at lines 93-94, 101 |
cmd/worker/main.go |
rivermigrate |
rivermigrate.New(riverpgxv5.New(pool), nil).Migrate(ctx, DirectionUp, nil) |
WIRED | Lines 56-68; result ranges over res.Versions with slog.Info |
cmd/worker/main.go |
river.Client |
river.NewClient(riverpgxv5.New(pool), &river.Config{...}) |
WIRED | Line 97-116; all config fields present |
orphan_cleanup.go |
backend/internal/db/sqlc/files.sql.go |
q.ListOrphanFiles(ctx) via fileQuerier interface |
WIRED | Interface defined at lines 22-25; production wiring via sqlc.New(pool) in constructor |
orphan_cleanup.go |
backend/internal/files/store.go |
w.store.Delete(ctx, f.S3Key) |
WIRED | Line 63; FileStorer.Delete called before DB delete |
Data-Flow Trace (Level 4)
| Artifact | Data Variable | Source | Produces Real Data | Status |
|---|---|---|---|---|
orphan_cleanup.go Work() |
orphans []ListOrphanFilesRow |
q.ListOrphanFiles(ctx) via sqlc query against tablo_files WHERE NOT EXISTS (SELECT 1 FROM tablos WHERE id = tf.tablo_id) |
Yes — live DB query with NOT EXISTS subquery | FLOWING |
heartbeat.go Work() |
job.ID, job.Attempt |
*river.Job[HeartbeatArgs] injected by river runtime |
Yes — populated by river from river_job table row |
FLOWING |
Behavioral Spot-Checks
| Behavior | Command | Result | Status |
|---|---|---|---|
| All jobs unit tests pass | cd backend && go test ./internal/jobs/... -v -count=1 |
7/7 PASS (TestSlogErrorHandler_HandleError, TestSlogErrorHandler_HandlePanic, TestHeartbeatWorker, TestHeartbeatArgs_Kind, TestOrphanCleanupWorker_NoOrphans, TestOrphanCleanupWorker_DeletesOrphan, TestOrphanCleanupArgs_Kind) | PASS |
| Full module builds | cd backend && go build ./... |
exit 0 | PASS |
| worker target exists in justfile | grep "^worker:" backend/justfile |
line 116: worker: db-up |
PASS |
| rivermigrate.New present | grep "rivermigrate.New" backend/cmd/worker/main.go |
line 56 | PASS |
| Web binary has no river wiring | grep -r "river" backend/cmd/web/ backend/internal/web/ |
0 results | FAIL (SC-4 gap confirmed) |
| list-failed-jobs CLI absent | grep -r "list.failed|list-failed" backend/ |
0 results | FAIL (SC-3 gap confirmed) |
Probe Execution
Step 7c: SKIPPED — no scripts/*/tests/probe-*.sh files exist for this phase. The phase declared Plan 03 as a human-verify checkpoint (autonomous: false).
Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|---|---|---|---|---|
| WORK-01 | 06-01, 06-02 | cmd/worker binary connects to Postgres and runs a job queue |
PARTIALLY SATISFIED | Binary connects to Postgres and runs river periodic jobs. However SC-4 (web binary enqueues a job) is part of the WORK-01 observable proof and is absent. |
| WORK-02 | 06-01, 06-02 | At least one real job runs end-to-end | SATISFIED | HeartbeatWorker and OrphanCleanupWorker both run on schedules; live verification confirmed in Plan 03 checkpoint |
| WORK-03 | 06-01, 06-02 | Worker has structured logging and graceful shutdown matching web binary | SATISFIED | slog.Default() used as Logger; StopAndCancel(10s) + explicit pool.Close(); signal.NotifyContext for SIGINT/SIGTERM |
| WORK-04 | 06-01 | Failed jobs retried with backoff and visible in a simple admin/CLI surface | PARTIALLY SATISFIED | River's built-in retry with exponential backoff (attempts^4) is active. SlogErrorHandler logs each failure. The river_job table persists discarded jobs. No CLI command or admin route provides the "visible in a simple admin/CLI surface" guarantee the requirement names. |
Note on traceability discrepancy: REQUIREMENTS.md line 123 still shows WORK-01..04 | Phase 6 | Pending despite all four requirements being marked [x] in the requirements list (lines 58-61). The traceability table was not updated after completion.
Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
backend/go.mod |
42-46 | river deps marked // indirect |
Info | Build succeeds; indirect marks will self-correct when go mod tidy is run after cmd/worker imports are confirmed. Not a blocker. |
No TBD, FIXME, XXX, placeholder, or empty-return stubs found in any phase-modified files.
Human Verification Required
1. Live Worker Run
Test: Run cd backend && just worker against local Postgres + MinIO
Expected: "worker ready" within 5s, "worker heartbeat" within 10s (RunOnStart:true), second "worker heartbeat" ~60s later, Ctrl+C produces "shutting down" then "shutdown complete" with clean exit
Why human: Scheduler live-run behavior, actual Postgres + MinIO connectivity, and signal handling require a running process
Gaps Summary
Two roadmap success criteria are not met:
SC-3 (WORK-04 admin/CLI surface): The ROADMAP explicitly names "a simple CLI surface (backend list-failed-jobs or admin route)". The implementation provides SlogErrorHandler that logs errors to structured slog output and river retries automatically, but there is no list-failed-jobs subcommand in the worker binary and no admin HTTP route. Visibility of failed jobs requires querying the river_job table directly or reading logs — not a CLI surface. This gap is well-understood: SlogErrorHandler provides observability, but the SC contract asks for an addressable surface (CLI command or HTTP route).
SC-4 (web binary enqueues a job): The ROADMAP requires "Web binary can enqueue a job; worker picks it up within a few seconds." The web binary (cmd/web) has zero river dependency. This is a material gap — it means the two-binary architecture's job-dispatch path is unproven. All phase work focused on the worker side; the web-side enqueue path was never wired.
These two gaps share a root cause: the plans focused on WORK-01 through WORK-03 (worker binary, periodic jobs, graceful shutdown) and treated WORK-04 as satisfied by logging alone, while SC-3 and SC-4 both require active wiring that extends beyond the worker binary itself.
Verified: 2026-05-15T17:00:00Z Verifier: Claude (gsd-verifier)