2026-05-23 15:16:44 +00:00
1 changed files with 256 additions and 0 deletions
--- a/.planning/phases/06-background-worker/06-REVIEW.md
+++ b/.planning/phases/06-background-worker/06-REVIEW.md
@ -0,0 +1,256 @@
+---
+phase: 06-background-worker
+reviewed: 2026-05-15T00:00:00Z
+depth: standard
+files_reviewed: 11
+files_reviewed_list:
+  - backend/README.md
+  - backend/cmd/worker/main.go
+  - backend/go.mod
+  - backend/internal/db/queries/files.sql
+  - backend/internal/jobs/error_handler.go
+  - backend/internal/jobs/error_handler_test.go
+  - backend/internal/jobs/heartbeat.go
+  - backend/internal/jobs/heartbeat_test.go
+  - backend/internal/jobs/orphan_cleanup.go
+  - backend/internal/jobs/orphan_cleanup_test.go
+  - backend/justfile
+findings:
+  critical: 1
+  warning: 5
+  info: 4
+  total: 10
+status: issues_found
+---
+
+# Phase 06: Code Review Report
+
+**Reviewed:** 2026-05-15T00:00:00Z
+**Depth:** standard
+**Files Reviewed:** 11
+**Status:** issues_found
+
+## Summary
+
+This phase implements a background worker binary (`cmd/worker`) that runs two River periodic jobs: a heartbeat (proof-of-life) and an orphan-file cleanup job (delete S3 objects + DB rows for files whose owning tablo no longer exists). The architecture is sound — signal-context ordering, graceful shutdown, and the DB→S3-first deletion order are all correct. However, there is one critical security defect (hardcoded cryptographic secret in version control), several warnings around missing input validation and silent failure modes, and a handful of code-quality issues including a misleading comment that contradicts the actual logic.
+
+---
+
+## Critical Issues
+
+### CR-01: Cryptographic session secret committed to version control
+
+**File:** `backend/justfile:113`
+**Issue:** A full 64-hex-character session secret (`191affeb1624de1f0e07bd5cfab14cd655510a24f7e673bd784ea56847890caf`) is hardcoded in the committed `justfile`. Anyone with read access to the repository can use this value to forge sessions. Even when labeled "development," committing a real-looking secret trains developers to treat the value as safe to reuse, and it will eventually be copy-pasted into staging or production `.env` files.
+
+**Fix:** Remove the literal secret from the justfile. Replace it with a placeholder that fails loudly if unset:
+
+```bash
+# In justfile dev recipe — read from .env so the value is never committed:
+dev: db-up
+    just generate
+    set -a && . .env && set +a && \
+    air -c .air.toml
+```
+
+And in `.env.example`:
+```
+# Generate with: openssl rand -hex 32
+SESSION_SECRET=REPLACE_ME
+```
+
+If a default is truly required for zero-config onboarding, generate it dynamically at bootstrap time and write it to `.env`:
+```bash
+# In bootstrap recipe, only if .env does not already contain SESSION_SECRET:
+grep -q SESSION_SECRET .env 2>/dev/null || \
+  printf '\nSESSION_SECRET=%s\n' "$(openssl rand -hex 32)" >> .env
+```
+
+---
+
+## Warnings
+
+### WR-01: No validation of required S3 environment variables before calling NewStore
+
+**File:** `backend/cmd/worker/main.go:71-85`
+**Issue:** `S3_ENDPOINT`, `S3_BUCKET`, `S3_ACCESS_KEY`, and `S3_SECRET_KEY` are read from the environment with no presence check. If any are unset or empty, `files.NewStore` is called with empty strings. The AWS SDK constructs a client with an empty endpoint and empty credentials without returning an error — the worker starts and logs "worker ready", but the orphan cleanup job then fails at runtime on its first execution (one hour later by default). The failure is invisible at startup.
+
+This is a worse failure mode than DATABASE_URL (which exits immediately on line 34). S3 config errors are deferred an hour and produce only a log line.
+
+**Fix:** Add explicit checks before calling `NewStore`:
+
+```go
+s3Endpoint := os.Getenv("S3_ENDPOINT")
+s3Bucket   := os.Getenv("S3_BUCKET")
+s3AccessKey := os.Getenv("S3_ACCESS_KEY")
+s3SecretKey := os.Getenv("S3_SECRET_KEY")
+
+for name, val := range map[string]string{
+    "S3_ENDPOINT":   s3Endpoint,
+    "S3_BUCKET":     s3Bucket,
+    "S3_ACCESS_KEY": s3AccessKey,
+    "S3_SECRET_KEY": s3SecretKey,
+} {
+    if val == "" {
+        slog.Error("required env var is unset", "var", name)
+        os.Exit(1)
+    }
+}
+```
+
+---
+
+### WR-02: Dead code branch masks unclear struct ownership contract
+
+**File:** `backend/internal/jobs/orphan_cleanup.go:34, 51-54`
+**Issue:** The struct field comment says `querier` is "nil in production; set by NewOrphanCleanupWorker for testability." This is factually wrong — `NewOrphanCleanupWorker` always sets `querier` (line 43). The nil-fallback branch in `Work` (lines 51-54) is therefore unreachable in all execution paths that go through the constructor, and the `pool` field on the struct is stored solely for this dead branch.
+
+The comment inversion creates two concrete risks:
+1. A future developer reading the comment believes `querier` is optional and may construct `OrphanCleanupWorker{}` directly with a nil querier, expecting the nil-guard to save them — but if `pool` is also zero, the fallback panics.
+2. The `pool` field is retained on the struct even though it is never needed if `querier` is always set at construction time.
+
+**Fix:** Remove the nil-guard branch and the `pool` field entirely, and correct the comment:
+
+```go
+// OrphanCleanupWorker deletes S3 objects and their DB rows for tablo_files
+// whose owning tablo no longer exists. Construct with NewOrphanCleanupWorker.
+type OrphanCleanupWorker struct {
+    river.WorkerDefaults[OrphanCleanupArgs]
+    store   files.FileStorer
+    querier fileQuerier
+}
+
+func NewOrphanCleanupWorker(pool *pgxpool.Pool, store files.FileStorer) *OrphanCleanupWorker {
+    return &OrphanCleanupWorker{
+        store:   store,
+        querier: sqlc.New(pool),
+    }
+}
+
+func (w *OrphanCleanupWorker) Work(ctx context.Context, job *river.Job[OrphanCleanupArgs]) error {
+    orphans, err := w.querier.ListOrphanFiles(ctx)
+    // ...
+}
+```
+
+Update `cmd/worker/main.go` accordingly (the pool argument to `NewOrphanCleanupWorker` can stay; it is simply not stored on the struct).
+
+---
+
+### WR-03: Orphan cleanup silently swallows all partial errors — no observable failure signal
+
+**File:** `backend/internal/jobs/orphan_cleanup.go:47-92`
+**Issue:** The `Work` method always returns `nil`, even when every file in the orphan set fails to delete (`errCount == len(orphans)`). The README acknowledges this ("always returns nil so river does not retry a partial run"), but the implication is that a complete and persistent failure (e.g., S3 bucket misconfigured, DB connection lost mid-job) produces only a log line and is then forgotten. The job will re-run hourly and keep emitting errors indefinitely with no escalation path.
+
+There is a meaningful distinction between "some files failed, partial success is acceptable" and "every file failed, something is systemically wrong." The latter should cause River to retry.
+
+**Fix:** Consider returning an error when `errCount > 0` and `deleted == 0` (total failure), while still returning `nil` for partial successes:
+
+```go
+slog.Info("orphan cleanup complete",
+    "orphans_found", len(orphans),
+    "deleted", deleted,
+    "errors", errCount,
+)
+// Surface a total failure so River retries with backoff.
+if errCount > 0 && deleted == 0 && len(orphans) > 0 {
+    return fmt.Errorf("orphan cleanup: all %d deletions failed", errCount)
+}
+return nil
+```
+
+---
+
+### WR-04: MinIO credentials hardcoded in justfile (committed to version control)
+
+**File:** `backend/justfile:122-123`
+**Issue:** `S3_ACCESS_KEY='minioadmin'` and `S3_SECRET_KEY='minioadmin'` are hardcoded in the committed justfile. While these are the MinIO default credentials, committing them normalizes the practice of putting secrets in source files. If the same MinIO instance is ever exposed (even inadvertently) or the credentials are reused elsewhere, this creates a real exposure vector.
+
+**Fix:** Source these from `.env` in the `worker` recipe, consistent with how other secrets should be handled:
+
+```just
+worker: db-up
+    set -a && . .env && set +a && \
+    go run ./cmd/worker
+```
+
+Document the required `.env` keys in `.env.example`:
+```
+S3_ENDPOINT=http://localhost:9000
+S3_BUCKET=xtablo
+S3_REGION=us-east-1
+S3_ACCESS_KEY=minioadmin
+S3_SECRET_KEY=minioadmin
+S3_USE_PATH_STYLE=true
+```
+
+---
+
+### WR-05: River and its driver packages declared as indirect dependencies
+
+**File:** `backend/go.mod:43-46`
+**Issue:** `github.com/riverqueue/river`, `github.com/riverqueue/river/riverdriver`, `github.com/riverqueue/river/riverdriver/riverpgxv5`, `github.com/riverqueue/river/rivershared`, and `github.com/riverqueue/river/rivertype` are all marked `// indirect`. These packages are used directly in `cmd/worker/main.go` and `internal/jobs/`. Indirect marks are normally set by `go mod tidy` when a package is pulled in transitively, not when imported directly. This suggests `go mod tidy` has not been run since these imports were added, which means the `go.sum` file may also be inconsistent.
+
+**Fix:** Run `go mod tidy` from `backend/`. The river packages will move from `indirect` to direct, and `go.sum` will be updated to match:
+
+```bash
+cd backend && go mod tidy
+```
+
+---
+
+## Info
+
+### IN-01: Test suite missing coverage for error paths in OrphanCleanupWorker
+
+**File:** `backend/internal/jobs/orphan_cleanup_test.go`
+**Issue:** Three error paths in `Work` have no test coverage: (a) `ListOrphanFiles` returns an error, (b) `store.Delete` fails for one file (errCount increments, DB delete is skipped for that file), and (c) `q.DeleteTabloFile` fails after S3 delete succeeds (errCount increments). The current tests only cover the happy path and the no-orphans case.
+
+**Fix:** Add table-driven subtests for each path, asserting: `Work` returns a non-nil error for `ListOrphanFiles` failures, `errCount` is reflected in log output (or inferred by checking which mock calls were made) for partial failures.
+
+---
+
+### IN-02: Misleading struct field comment is wrong
+
+**File:** `backend/internal/jobs/orphan_cleanup.go:34`
+**Issue:** The inline comment `// nil in production; set by NewOrphanCleanupWorker for testability` is the opposite of correct: `NewOrphanCleanupWorker` always sets the field (so it is never nil in production), and test code constructs the struct directly with a mock querier (so it is also never nil in tests). The comment should be removed or corrected as part of the WR-02 fix.
+
+**Fix:** Replace with: `// injected at construction time; tests supply a mock via struct literal.`
+
+---
+
+### IN-03: go.mod declares Go 1.26.1 but README states Go ≥ 1.22
+
+**File:** `backend/go.mod:3`, `backend/README.md:12`
+**Issue:** `go.mod` declares `go 1.26.1`, but the README prerequisite section says "Go ≥ 1.22". Go 1.26 is not yet a released version as of the knowledge cutoff. This discrepancy will confuse contributors: the toolchain constraint in `go.mod` takes precedence and will reject older Go versions with a toolchain mismatch error, making the README claim misleading.
+
+**Fix:** Align the README prerequisite with the actual `go.mod` toolchain directive:
+
+```markdown
+- **Go** — version matching the `go` directive in `backend/go.mod` (currently 1.26.1)
+```
+
+---
+
+### IN-04: No log output when rivermigrate has nothing to apply
+
+**File:** `backend/cmd/worker/main.go:66-68`
+**Issue:** The migration loop `for _, v := range res.Versions` only logs when versions were applied. On a normal restart (migrations already applied), this block produces no output, and there is no confirmation that the migration check ran at all. This makes it harder to distinguish "migrations ran and found nothing new" from "migration code was skipped."
+
+**Fix:** Add a log line for the zero-versions case:
+
+```go
+if len(res.Versions) == 0 {
+    slog.Info("river migrations: nothing to apply")
+} else {
+    for _, v := range res.Versions {
+        slog.Info("river migration applied", "version", v.Version)
+    }
+}
+```
+
+---
+
+_Reviewed: 2026-05-15T00:00:00Z_
+_Reviewer: Claude (gsd-code-reviewer)_
+_Depth: standard_