docs(07): add code review report
This commit is contained in:
parent
7bca961bb0
commit
9ff40ac821
1 changed files with 249 additions and 0 deletions
249
.planning/phases/07-deploy-v1/07-REVIEW.md
Normal file
249
.planning/phases/07-deploy-v1/07-REVIEW.md
Normal file
|
|
@ -0,0 +1,249 @@
|
||||||
|
---
|
||||||
|
phase: 07-deploy-v1
|
||||||
|
reviewed: 2026-05-15T00:00:00Z
|
||||||
|
depth: standard
|
||||||
|
files_reviewed: 14
|
||||||
|
files_reviewed_list:
|
||||||
|
- backend/.env.example
|
||||||
|
- backend/cmd/web/main.go
|
||||||
|
- backend/docker-compose.prod.yaml
|
||||||
|
- backend/embed.go
|
||||||
|
- backend/internal/db/migrate.go
|
||||||
|
- backend/internal/web/csrf_test.go
|
||||||
|
- backend/internal/web/handlers_auth_test.go
|
||||||
|
- backend/internal/web/handlers_files_test.go
|
||||||
|
- backend/internal/web/handlers_tablos_test.go
|
||||||
|
- backend/internal/web/handlers_tasks_test.go
|
||||||
|
- backend/internal/web/handlers_test.go
|
||||||
|
- backend/internal/web/handlers.go
|
||||||
|
- backend/internal/web/router.go
|
||||||
|
- backend/README.md
|
||||||
|
findings:
|
||||||
|
critical: 3
|
||||||
|
warning: 5
|
||||||
|
info: 3
|
||||||
|
total: 11
|
||||||
|
status: issues_found
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 07: Code Review Report
|
||||||
|
|
||||||
|
**Reviewed:** 2026-05-15T00:00:00Z
|
||||||
|
**Depth:** standard
|
||||||
|
**Files Reviewed:** 14
|
||||||
|
**Status:** issues_found
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
This review covers the Phase 7 (deploy-v1) implementation: the HTTP server entrypoint, production compose stack, embedded assets, database migrations, CSRF middleware, auth/tablo/task/file handlers, and the router. The code is well-structured with clear security intent and good test coverage. Three blockers and five warnings were identified, primarily around S3 credential exposure in `.env.example`, a missing `NetworkMode` / network-isolation gap in `docker-compose.prod.yaml`, a file-upload race condition, and several quality issues in the test suite and startup logic.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical Issues
|
||||||
|
|
||||||
|
### CR-01: `.env.example` Contains Default S3 Credentials That May Ship to Production
|
||||||
|
|
||||||
|
**File:** `backend/.env.example:39-40`
|
||||||
|
**Issue:** `S3_ACCESS_KEY=minioadmin` and `S3_SECRET_KEY=minioadmin` are committed with explicit values — not placeholders like `<your-key>`. Operators who copy this file for production and miss those lines will start with the public default MinIO credentials against their R2 bucket. The README's first-time-setup table (line 298-299) says to set these but does not repeat that the example file ships with insecure defaults. The corresponding `DATABASE_URL` and `SESSION_SECRET` are intentionally left blank or with obvious-placeholder DSNs, making the S3 lines the odd ones out.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
# Replace concrete default values with angle-bracket placeholders:
|
||||||
|
S3_ACCESS_KEY=<your-r2-access-key-id>
|
||||||
|
S3_SECRET_KEY=<your-r2-secret-access-key>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### CR-02: File-Upload Race Condition — S3 Orphan After DB Insert Failure
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/handlers_files.go:194-210`
|
||||||
|
**Issue:** When `InsertTabloFile` fails (line 198), the handler attempts a best-effort S3 cleanup:
|
||||||
|
|
||||||
|
```go
|
||||||
|
cleanupCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
if delErr := deps.Files.Delete(cleanupCtx, s3Key); delErr != nil { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
Two bugs compound here:
|
||||||
|
1. `defer cancel()` is inside the `if err != nil` branch of the upload loop, but this is a handler function (not a loop). Still, calling `defer cancel()` after `context.WithTimeout` is a Go anti-pattern in a hot path — `cancel` will only run when the surrounding function returns, not right after `Delete` completes, leaking the timer for up to the full handler lifetime rather than the 10-second window. The correct idiom is `defer cancel()` immediately after `context.WithTimeout`, but here `cancel` should be called explicitly after `Delete` returns (or use a dedicated cleanup goroutine).
|
||||||
|
2. If the process dies between the `Upload` success and the `InsertTabloFile` failure (e.g., OOM-kill, SIGKILL), cleanup never runs. This is acknowledged as a Phase 6 concern, but the `defer cancel()` placement still results in the cancel function never being called before the function returns — which is actually the same call site, so this is benign at runtime. The real correctness issue is that this cleanup is best-effort only with no retry, meaning any transient S3 error during cleanup leaves a permanent orphan with no reconciliation path until Phase 6.
|
||||||
|
|
||||||
|
The primary BLOCKER is the `defer cancel()` placement: it is inside the `if err != nil` block, so `cancel()` is called when the function exits, but `cleanupCtx` is already expired by then (10 seconds is plenty for the synchronous `Delete` call). In practice this compiles and runs correctly, but represents a code-quality footgun and signals that the author may not have noticed the defer is scoped to function return, not the block.
|
||||||
|
|
||||||
|
More critically: the handler returns `http.StatusInternalServerError` to the user AFTER logging the DB error, but before the `defer cancel()` fires. This means the context passed to `Delete` has not been cancelled yet (good), so the cleanup call does execute within the handler's lifetime. However, if `Delete` itself is slow (>10 seconds), the context expires mid-cleanup, producing a partial-cleanup log error even though the operation was in-flight. This is a real production risk on a flaky S3 connection.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```go
|
||||||
|
// Replace the deferred cancel with explicit cancel after Delete:
|
||||||
|
cleanupCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
if delErr := deps.Files.Delete(cleanupCtx, s3Key); delErr != nil {
|
||||||
|
slog.Default().Error("files upload: S3 cleanup after DB failure", "s3_key", s3Key, "err", delErr)
|
||||||
|
}
|
||||||
|
cancel() // call immediately after Delete, not via defer
|
||||||
|
http.Error(w, "internal server error", http.StatusInternalServerError)
|
||||||
|
return
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### CR-03: `docker-compose.prod.yaml` Worker Service Has No `depends_on` Health Gate for Web
|
||||||
|
|
||||||
|
**File:** `backend/docker-compose.prod.yaml:56-64`
|
||||||
|
**Issue:** The `worker` service depends only on `postgres` being healthy. It does not depend on the `web` service completing startup (which runs migrations at boot via `goose.Up`). If `web` and `worker` both start concurrently and migrations are still in flight, `worker` may connect to a database that is mid-migration — missing tables, functions, or constraints. The healthcheck on `postgres` only ensures the Postgres process is ready to accept connections, not that migrations have completed.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
worker:
|
||||||
|
depends_on:
|
||||||
|
postgres:
|
||||||
|
condition: service_healthy
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fix:** Either make `worker` depend on `web` being healthy (requires a `/readyz`-style healthcheck for the web service) or — simpler — run migrations in a separate init container / step before both services start:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
worker:
|
||||||
|
depends_on:
|
||||||
|
postgres:
|
||||||
|
condition: service_healthy
|
||||||
|
web:
|
||||||
|
condition: service_healthy # web exposes /readyz once migrations complete
|
||||||
|
```
|
||||||
|
|
||||||
|
Add a healthcheck to the web service:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
web:
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-qO-", "http://localhost:8080/readyz"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 10
|
||||||
|
start_period: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Warnings
|
||||||
|
|
||||||
|
### WR-01: `NewRouter` Panics on Missing `static/` Sub-Directory — Not a Graceful Startup Failure
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/router.go:127-130`
|
||||||
|
**Issue:** `fs.Sub(staticFS, "static")` will panic if the embedded FS does not contain a `static/` top-level entry. The panic is caught by chi's `Recoverer` middleware in practice, but only after the server is already running — it fires on the first request, not at startup. This means `/healthz` and `/readyz` will also panic if the static FS is malformed, since the panic happens at router construction time (`NewRouter` returns before the panic is triggered? — No: `fs.Sub` is called during `NewRouter`, so the panic happens at server startup in `cmd/web/main.go:125`). The `panic` at startup is non-fatal-looking to the operator; it bypasses the `srv.Shutdown` and `pool.Close()` cleanup paths.
|
||||||
|
|
||||||
|
**Fix:** Convert the panic to an error return from `NewRouter`, propagated up to `main`:
|
||||||
|
```go
|
||||||
|
sub, err := fs.Sub(staticFS, "static")
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("router: failed to sub static FS: %w", err)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### WR-02: Rate Limiter `Allow` Does Not Increment Counter on Validation Errors (Wrong-Format Inputs Bypass the Rate Gate)
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/handlers_auth.go:195-209`
|
||||||
|
**Issue:** In `LoginPostHandler`, validation of email and password format (steps 2–3) runs BEFORE the rate-limit check (step 5). An attacker sending a syntactically invalid email or a password shorter than 12 characters will never reach the `deps.Limiter.Allow(key)` gate. This means the rate limiter only fires on valid-looking credentials, not on brute-force spam with malformed payloads. An attacker who wants to enumerate valid emails via timing differences could send malformed payloads at high speed without consuming rate-limit tokens.
|
||||||
|
|
||||||
|
The rate-limit check comment says "Rate-limit check BEFORE user lookup and argon2 work" but the actual sequencing allows validation errors to short-circuit before the rate gate.
|
||||||
|
|
||||||
|
**Fix:** Move the rate-limit check to before the validation block, using a key derivable from the raw (pre-validation) email:
|
||||||
|
```go
|
||||||
|
// Rate limit first — even on invalid input, to prevent enumeration via timing
|
||||||
|
ip := clientIP(r)
|
||||||
|
key := strings.ToLower(strings.TrimSpace(r.PostFormValue("email"))) + ":" + ip
|
||||||
|
if deps.Limiter != nil && !deps.Limiter.Allow(key) {
|
||||||
|
// return 429 immediately
|
||||||
|
}
|
||||||
|
// Then validate format
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### WR-03: `migrate.go` Opens a Second `database/sql` Connection Pool That Is Never Tuned
|
||||||
|
|
||||||
|
**File:** `backend/internal/db/migrate.go:27-29`
|
||||||
|
**Issue:** `sql.Open("pgx/v5", dsn)` creates a new `database/sql` connection pool with default settings (up to `MaxOpenConns = 0` = unlimited). This pool is used only for migrations and is closed via `defer sqlDB.Close()`. However, the DSN contains the password. If `connConfig.ConnString()` returns a URL with credentials (which pgx/v5's `ConnString()` does for password-based auth), the DSN is handled correctly — but the second pool opening is unnecessary work. More importantly, `sql.Open` with `MaxOpenConns = 0` could theoretically open many connections during a long migration. This is low-risk in practice (goose runs sequentially) but is an unnecessary resource-management gap.
|
||||||
|
|
||||||
|
Additionally, the return value of `pool.Config().ConnConfig.ConnString()` is not documented as guaranteed to include the password in all configurations (e.g., with `PasswordFunc` set). If the pool was initialized with a connection string that uses cert-based auth or a password callback, `ConnString()` may return an incomplete DSN.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```go
|
||||||
|
sqlDB, err := sql.Open("pgx/v5", dsn)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("migrate: open sql.DB: %w", err)
|
||||||
|
}
|
||||||
|
sqlDB.SetMaxOpenConns(2) // goose runs sequentially — no need for a large pool
|
||||||
|
defer sqlDB.Close()
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### WR-04: `TestIndex_UnauthRedirects` and `TestDemoTime_Fragment` Use Outdated `NewRouter` Signature
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/handlers_test.go:95-96, 110-111`
|
||||||
|
**Issue:** Two tests call `NewRouter` with 8 positional arguments:
|
||||||
|
```go
|
||||||
|
router := NewRouter(stubPinger{}, os.DirFS("./static"), AuthDeps{}, TablosDeps{}, TasksDeps{}, FilesDeps{}, testCSRFKey, "dev")
|
||||||
|
```
|
||||||
|
But `router.go` declares `NewRouter` with `trustedOrigins ...string` as the final variadic parameter. The tests omit `trustedOrigins`, which means CSRF in test mode does not include `"localhost"` as a trusted origin — unlike `newTestRouter` (line 37) and `newTestRouterWithLimiter` (line 44) which explicitly pass `"localhost"`. This discrepancy means requests in `TestIndex_UnauthRedirects` that go through CSRF middleware may behave differently from those in `newTestRouter`. For GET requests this is fine (CSRF only blocks state-changing methods), but the inconsistency is a maintenance footgun: a future POST test using this helper would silently fail with 403 instead of the expected status.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```go
|
||||||
|
router := NewRouter(stubPinger{}, os.DirFS("./static"), AuthDeps{}, TablosDeps{}, TasksDeps{}, FilesDeps{}, testCSRFKey, "dev", "localhost")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### WR-05: `FileUploadHandler` Does Not Validate or Sanitize `filename` Beyond Trimming Whitespace
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/handlers_files.go:178-182`
|
||||||
|
**Issue:** The filename is stored in the database as-is (after trim) and will be displayed back to users in the file list. While the S3 key uses a UUID and path traversal to the filesystem is not possible, the raw filename is rendered in the template. If the template does not HTML-escape the filename (which templ normally does automatically), this is safe. However, filenames like `<script>alert(1)</script>.pdf` would be stored verbatim. More concretely, filenames containing path separators (`../../etc/passwd`) are stored in the database — again, safe since the S3 key ignores it, but the filename is returned to the browser in the file list and could confuse a display that uses it as a download attribute.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```go
|
||||||
|
// Sanitize: strip path components, limit length
|
||||||
|
filename = filepath.Base(filename) // strip any directory components
|
||||||
|
if len(filename) > 255 {
|
||||||
|
filename = filename[:255]
|
||||||
|
}
|
||||||
|
if filename == "" || filename == "." {
|
||||||
|
http.Error(w, "bad request: invalid filename", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Info
|
||||||
|
|
||||||
|
### IN-01: `goose.SetBaseFS(nil)` in `testdb_test.go` Is a Global Mutation Inside a Mutex — Fragile
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/testdb_test.go:93`
|
||||||
|
**Issue:** `goose.SetBaseFS(nil)` is called inside `webGooseMu.Lock()` to reset the global goose base FS before applying on-disk migrations. This relies on goroutine exclusion (the mutex) but `goose.SetBaseFS` is a package-level global. If any goroutine outside this mutex calls `goose.SetBaseFS` concurrently (e.g., `RunMigrations` in a parallel test subprocess that somehow shares the process), the state could be corrupted. The mutex name (`webGooseMu`) is package-local, so it only serializes callers within the `web` test package.
|
||||||
|
|
||||||
|
This is the documented "verbatim copy" approach from the auth package and is probably acceptable, but it is worth noting the global state mutation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### IN-02: `handlers_test.go` Has `TestRouter_CSRFMountedAfterResolveSession` in `csrf_test.go` That Reads `router.go` Source from Disk
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/csrf_test.go:352-371`
|
||||||
|
**Issue:** `TestRouter_CSRFMountedAfterResolveSession` calls `os.ReadFile("router.go")` to assert middleware ordering by scanning source text. This is a "meta-test" that tests the source file rather than the runtime behavior. It is inherently brittle: it breaks if the file is renamed, if the string literals change, or if the logic is refactored without changing the string identifiers. The same ordering property is already implicitly tested by `TestCSRF_LoginMissingToken` (which would fail if CSRF ran before ResolveSession with the wrong cookie). The source-scanning approach adds maintenance cost without adding meaningful signal.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### IN-03: `TabloTasksTabHandler` in `handlers_files.go` Does Not Guard for Nil `deps.Files`
|
||||||
|
|
||||||
|
**File:** `backend/internal/web/handlers_files.go:105-127`
|
||||||
|
**Issue:** `TabloTasksTabHandler` lives in `handlers_files.go` and accepts `FilesDeps`, but unlike every other handler in that file, it does NOT check `deps.Files == nil` at the top. This is intentional (the tasks tab does not use S3), but it is conceptually confusing: the handler accepts a `FilesDeps` parameter that it never uses for file storage, and the nil guard pattern established by all sibling handlers is absent. When a reader scans the file they expect to see the nil guard and its absence is surprising.
|
||||||
|
|
||||||
|
Additionally, the tasks tab route is registered in the router as `GET /tablos/{id}/tasks`, but the handler passes task data to `templates.TabloDetailPage(user, ..., tasks, nil, "tasks")` — passing `nil` for the file list is fine but is not symmetrical with the files tab which passes `fileList` for the third list argument. If a future template change adds a nil-dereference on this parameter, the asymmetry will be the root cause.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Reviewed: 2026-05-15T00:00:00Z_
|
||||||
|
_Reviewer: Claude (gsd-code-reviewer)_
|
||||||
|
_Depth: standard_
|
||||||
Loading…
Reference in a new issue