arthur/xtablo-source

Fork 0

Arthur Belleville 5e8dd4e4e4

docs(06): add code review report

2026-05-15 16:46:28 +02:00

12 KiB

Raw Blame History

phase

reviewed

depth

files_reviewed

files_reviewed_list

findings

status

06-background-worker

2026-05-15T00:00:00Z

standard

backend/README.md

backend/cmd/worker/main.go

backend/go.mod

backend/internal/db/queries/files.sql

backend/internal/jobs/error_handler.go

backend/internal/jobs/error_handler_test.go

backend/internal/jobs/heartbeat.go

backend/internal/jobs/heartbeat_test.go

backend/internal/jobs/orphan_cleanup.go

backend/internal/jobs/orphan_cleanup_test.go

backend/justfile

critical	warning	info	total
1	5	4	10

issues_found

Phase 06: Code Review Report

Reviewed: 2026-05-15T00:00:00Z Depth: standard Files Reviewed: 11 Status: issues_found

Summary

This phase implements a background worker binary (cmd/worker) that runs two River periodic jobs: a heartbeat (proof-of-life) and an orphan-file cleanup job (delete S3 objects + DB rows for files whose owning tablo no longer exists). The architecture is sound — signal-context ordering, graceful shutdown, and the DB→S3-first deletion order are all correct. However, there is one critical security defect (hardcoded cryptographic secret in version control), several warnings around missing input validation and silent failure modes, and a handful of code-quality issues including a misleading comment that contradicts the actual logic.

Critical Issues

CR-01: Cryptographic session secret committed to version control

File: backend/justfile:113 Issue: A full 64-hex-character session secret (191affeb1624de1f0e07bd5cfab14cd655510a24f7e673bd784ea56847890caf) is hardcoded in the committed justfile. Anyone with read access to the repository can use this value to forge sessions. Even when labeled "development," committing a real-looking secret trains developers to treat the value as safe to reuse, and it will eventually be copy-pasted into staging or production .env files.

Fix: Remove the literal secret from the justfile. Replace it with a placeholder that fails loudly if unset:

# In justfile dev recipe — read from .env so the value is never committed:
dev: db-up
    just generate
    set -a && . .env && set +a && \
    air -c .air.toml

And in .env.example:

# Generate with: openssl rand -hex 32
SESSION_SECRET=REPLACE_ME

If a default is truly required for zero-config onboarding, generate it dynamically at bootstrap time and write it to .env:

# In bootstrap recipe, only if .env does not already contain SESSION_SECRET:
grep -q SESSION_SECRET .env 2>/dev/null || \
  printf '\nSESSION_SECRET=%s\n' "$(openssl rand -hex 32)" >> .env

Warnings

WR-01: No validation of required S3 environment variables before calling NewStore

File: backend/cmd/worker/main.go:71-85 Issue: S3_ENDPOINT, S3_BUCKET, S3_ACCESS_KEY, and S3_SECRET_KEY are read from the environment with no presence check. If any are unset or empty, files.NewStore is called with empty strings. The AWS SDK constructs a client with an empty endpoint and empty credentials without returning an error — the worker starts and logs "worker ready", but the orphan cleanup job then fails at runtime on its first execution (one hour later by default). The failure is invisible at startup.

This is a worse failure mode than DATABASE_URL (which exits immediately on line 34). S3 config errors are deferred an hour and produce only a log line.

Fix: Add explicit checks before calling NewStore:

s3Endpoint := os.Getenv("S3_ENDPOINT")
s3Bucket   := os.Getenv("S3_BUCKET")
s3AccessKey := os.Getenv("S3_ACCESS_KEY")
s3SecretKey := os.Getenv("S3_SECRET_KEY")

for name, val := range map[string]string{
    "S3_ENDPOINT":   s3Endpoint,
    "S3_BUCKET":     s3Bucket,
    "S3_ACCESS_KEY": s3AccessKey,
    "S3_SECRET_KEY": s3SecretKey,
} {
    if val == "" {
        slog.Error("required env var is unset", "var", name)
        os.Exit(1)
    }
}

WR-02: Dead code branch masks unclear struct ownership contract

File: backend/internal/jobs/orphan_cleanup.go:34, 51-54 Issue: The struct field comment says querier is "nil in production; set by NewOrphanCleanupWorker for testability." This is factually wrong — NewOrphanCleanupWorker always sets querier (line 43). The nil-fallback branch in Work (lines 51-54) is therefore unreachable in all execution paths that go through the constructor, and the pool field on the struct is stored solely for this dead branch.

The comment inversion creates two concrete risks:

A future developer reading the comment believes querier is optional and may construct OrphanCleanupWorker{} directly with a nil querier, expecting the nil-guard to save them — but if pool is also zero, the fallback panics.
The pool field is retained on the struct even though it is never needed if querier is always set at construction time.

Fix: Remove the nil-guard branch and the pool field entirely, and correct the comment:

// OrphanCleanupWorker deletes S3 objects and their DB rows for tablo_files
// whose owning tablo no longer exists. Construct with NewOrphanCleanupWorker.
type OrphanCleanupWorker struct {
    river.WorkerDefaults[OrphanCleanupArgs]
    store   files.FileStorer
    querier fileQuerier
}

func NewOrphanCleanupWorker(pool *pgxpool.Pool, store files.FileStorer) *OrphanCleanupWorker {
    return &OrphanCleanupWorker{
        store:   store,
        querier: sqlc.New(pool),
    }
}

func (w *OrphanCleanupWorker) Work(ctx context.Context, job *river.Job[OrphanCleanupArgs]) error {
    orphans, err := w.querier.ListOrphanFiles(ctx)
    // ...
}

Update cmd/worker/main.go accordingly (the pool argument to NewOrphanCleanupWorker can stay; it is simply not stored on the struct).

WR-03: Orphan cleanup silently swallows all partial errors — no observable failure signal

File: backend/internal/jobs/orphan_cleanup.go:47-92 Issue: The Work method always returns nil, even when every file in the orphan set fails to delete (errCount == len(orphans)). The README acknowledges this ("always returns nil so river does not retry a partial run"), but the implication is that a complete and persistent failure (e.g., S3 bucket misconfigured, DB connection lost mid-job) produces only a log line and is then forgotten. The job will re-run hourly and keep emitting errors indefinitely with no escalation path.

There is a meaningful distinction between "some files failed, partial success is acceptable" and "every file failed, something is systemically wrong." The latter should cause River to retry.

Fix: Consider returning an error when errCount > 0 and deleted == 0 (total failure), while still returning nil for partial successes:

slog.Info("orphan cleanup complete",
    "orphans_found", len(orphans),
    "deleted", deleted,
    "errors", errCount,
)
// Surface a total failure so River retries with backoff.
if errCount > 0 && deleted == 0 && len(orphans) > 0 {
    return fmt.Errorf("orphan cleanup: all %d deletions failed", errCount)
}
return nil

WR-04: MinIO credentials hardcoded in justfile (committed to version control)

File: backend/justfile:122-123 Issue: S3_ACCESS_KEY='minioadmin' and S3_SECRET_KEY='minioadmin' are hardcoded in the committed justfile. While these are the MinIO default credentials, committing them normalizes the practice of putting secrets in source files. If the same MinIO instance is ever exposed (even inadvertently) or the credentials are reused elsewhere, this creates a real exposure vector.

Fix: Source these from .env in the worker recipe, consistent with how other secrets should be handled:

worker: db-up
    set -a && . .env && set +a && \
    go run ./cmd/worker

Document the required .env keys in .env.example:

S3_ENDPOINT=http://localhost:9000
S3_BUCKET=xtablo
S3_REGION=us-east-1
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_USE_PATH_STYLE=true

WR-05: River and its driver packages declared as indirect dependencies

File: backend/go.mod:43-46 Issue: github.com/riverqueue/river, github.com/riverqueue/river/riverdriver, github.com/riverqueue/river/riverdriver/riverpgxv5, github.com/riverqueue/river/rivershared, and github.com/riverqueue/river/rivertype are all marked // indirect. These packages are used directly in cmd/worker/main.go and internal/jobs/. Indirect marks are normally set by go mod tidy when a package is pulled in transitively, not when imported directly. This suggests go mod tidy has not been run since these imports were added, which means the go.sum file may also be inconsistent.

Fix: Run go mod tidy from backend/. The river packages will move from indirect to direct, and go.sum will be updated to match:

cd backend && go mod tidy

Info

IN-01: Test suite missing coverage for error paths in OrphanCleanupWorker

File: backend/internal/jobs/orphan_cleanup_test.go Issue: Three error paths in Work have no test coverage: (a) ListOrphanFiles returns an error, (b) store.Delete fails for one file (errCount increments, DB delete is skipped for that file), and (c) q.DeleteTabloFile fails after S3 delete succeeds (errCount increments). The current tests only cover the happy path and the no-orphans case.

Fix: Add table-driven subtests for each path, asserting: Work returns a non-nil error for ListOrphanFiles failures, errCount is reflected in log output (or inferred by checking which mock calls were made) for partial failures.

IN-02: Misleading struct field comment is wrong

File: backend/internal/jobs/orphan_cleanup.go:34 Issue: The inline comment // nil in production; set by NewOrphanCleanupWorker for testability is the opposite of correct: NewOrphanCleanupWorker always sets the field (so it is never nil in production), and test code constructs the struct directly with a mock querier (so it is also never nil in tests). The comment should be removed or corrected as part of the WR-02 fix.

Fix: Replace with: // injected at construction time; tests supply a mock via struct literal.

IN-03: go.mod declares Go 1.26.1 but README states Go ≥ 1.22

File: backend/go.mod:3, backend/README.md:12 Issue: go.mod declares go 1.26.1, but the README prerequisite section says "Go ≥ 1.22". Go 1.26 is not yet a released version as of the knowledge cutoff. This discrepancy will confuse contributors: the toolchain constraint in go.mod takes precedence and will reject older Go versions with a toolchain mismatch error, making the README claim misleading.

Fix: Align the README prerequisite with the actual go.mod toolchain directive:

- **Go** — version matching the `go` directive in `backend/go.mod` (currently 1.26.1)

IN-04: No log output when rivermigrate has nothing to apply

File: backend/cmd/worker/main.go:66-68 Issue: The migration loop for _, v := range res.Versions only logs when versions were applied. On a normal restart (migrations already applied), this block produces no output, and there is no confirmation that the migration check ran at all. This makes it harder to distinguish "migrations ran and found nothing new" from "migration code was skipped."

Fix: Add a log line for the zero-versions case:

if len(res.Versions) == 0 {
    slog.Info("river migrations: nothing to apply")
} else {
    for _, v := range res.Versions {
        slog.Info("river migration applied", "version", v.Version)
    }
}

Reviewed: 2026-05-15T00:00:00Z Reviewer: Claude (gsd-code-reviewer) Depth: standard

12 KiB Raw Blame History