docs: add self-hosted chat design spec (Stream Chat replacement)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
1268a268c1
commit
973d745753
1 changed files with 283 additions and 0 deletions
283
docs/superpowers/specs/2026-04-11-self-hosted-chat-design.md
Normal file
283
docs/superpowers/specs/2026-04-11-self-hosted-chat-design.md
Normal file
|
|
@ -0,0 +1,283 @@
|
|||
# Self-Hosted Chat: Stream Chat Replacement
|
||||
|
||||
**Date**: 2026-04-11
|
||||
**Status**: Approved
|
||||
|
||||
## Motivation
|
||||
|
||||
Replace Stream Chat (stream-chat + stream-chat-react) with a self-hosted solution to achieve:
|
||||
|
||||
- **Cost control** — eliminate Stream's per-MAU pricing
|
||||
- **Data ownership** — messages stored in our own Postgres
|
||||
- **Vendor independence** — remove dependency on Stream's backend
|
||||
|
||||
## Architecture
|
||||
|
||||
Three layers:
|
||||
|
||||
1. **Cloudflare Durable Objects** — one DO per tablo channel, managing WebSocket connections and real-time message broadcast. Uses the Hibernatable WebSocket API (idle rooms cost nothing).
|
||||
2. **Cloudflare Worker** — authenticates WebSocket upgrades (validates Supabase JWTs), routes connections to the correct DO, and serves REST endpoints for message history and channel metadata.
|
||||
3. **Supabase Postgres** — stores messages, read state. Channels and membership are implicit (tablos and tablo/org membership).
|
||||
|
||||
This is a separate Cloudflare Worker from the main app, with its own DO bindings and environment variables.
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Sending a message
|
||||
|
||||
```
|
||||
User sends WS message
|
||||
→ DO receives
|
||||
→ assigns server timestamp + message ID
|
||||
→ broadcasts to all connected WS clients in the room
|
||||
→ writes to Postgres (async, non-blocking via Supabase REST API)
|
||||
```
|
||||
|
||||
### Loading history (reconnect / initial load)
|
||||
|
||||
```
|
||||
Client connects
|
||||
→ Worker authenticates JWT
|
||||
→ Worker checks tablo membership
|
||||
→ routes to DO
|
||||
→ DO accepts WebSocket
|
||||
Client requests history
|
||||
→ Worker queries Postgres
|
||||
→ returns paginated messages
|
||||
```
|
||||
|
||||
## Data Model
|
||||
|
||||
### New tables
|
||||
|
||||
```sql
|
||||
-- Messages
|
||||
CREATE TABLE messages (
|
||||
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
channel_id uuid NOT NULL REFERENCES tablos(id) ON DELETE CASCADE,
|
||||
user_id uuid NOT NULL REFERENCES auth.users(id),
|
||||
text text NOT NULL,
|
||||
created_at timestamptz NOT NULL DEFAULT now(),
|
||||
updated_at timestamptz,
|
||||
deleted_at timestamptz
|
||||
);
|
||||
|
||||
CREATE INDEX idx_messages_channel_created ON messages(channel_id, created_at DESC);
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Read state (last read position per user per channel)
|
||||
CREATE TABLE channel_read_state (
|
||||
user_id uuid NOT NULL REFERENCES auth.users(id),
|
||||
channel_id uuid NOT NULL REFERENCES tablos(id) ON DELETE CASCADE,
|
||||
last_read_at timestamptz NOT NULL DEFAULT now(),
|
||||
PRIMARY KEY (user_id, channel_id)
|
||||
);
|
||||
```
|
||||
|
||||
### No new tables needed for
|
||||
|
||||
- **Channels** — tablos are the channels (tablo ID = channel ID)
|
||||
- **Membership** — existing tablo/org membership tables
|
||||
- **Users** — existing Supabase auth + user tables
|
||||
|
||||
### Unread count query
|
||||
|
||||
```sql
|
||||
SELECT COUNT(*) FROM messages
|
||||
WHERE channel_id = :channel_id
|
||||
AND created_at > (
|
||||
SELECT last_read_at FROM channel_read_state
|
||||
WHERE user_id = :user_id AND channel_id = :channel_id
|
||||
)
|
||||
AND deleted_at IS NULL;
|
||||
```
|
||||
|
||||
### Soft deletes
|
||||
|
||||
Messages use `deleted_at` (not hard deletion), consistent with the tablo soft-delete pattern.
|
||||
|
||||
## Durable Objects — Real-time Layer
|
||||
|
||||
One DO per tablo channel, identified by tablo ID.
|
||||
|
||||
### WebSocket lifecycle
|
||||
|
||||
- **Connect**: Worker passes authenticated user ID as a tag via `acceptWebSocket(ws, [userId])`. DO tracks connected users.
|
||||
- **Message**: DO parses incoming message, assigns server timestamp and ID, broadcasts to all other connected sockets via `getWebSockets()`.
|
||||
- **Close**: DO cleans up. If no connections remain, it hibernates (zero cost).
|
||||
|
||||
### WebSocket message types
|
||||
|
||||
**Client → Server:**
|
||||
|
||||
| Type | Payload | Persisted |
|
||||
|------|---------|-----------|
|
||||
| `message.send` | `{ text, clientId }` | Yes |
|
||||
| `typing.start` | `{}` | No |
|
||||
| `typing.stop` | `{}` | No |
|
||||
| `presence.ping` | `{}` | No |
|
||||
|
||||
**Server → Client:**
|
||||
|
||||
| Type | Payload |
|
||||
|------|---------|
|
||||
| `message.new` | `{ id, userId, text, createdAt, clientId }` |
|
||||
| `typing` | `{ userId, isTyping }` |
|
||||
| `presence.update` | `{ userId, status }` |
|
||||
| `error` | `{ code, message }` |
|
||||
|
||||
### What the DO does NOT do
|
||||
|
||||
- No message history queries (Worker + Postgres)
|
||||
- No channel membership management (existing API)
|
||||
- No authentication (Worker validates JWT before routing)
|
||||
|
||||
### Presence
|
||||
|
||||
The DO knows who's connected via `getWebSockets()`. On connect/disconnect, it broadcasts `presence.update` to the room.
|
||||
|
||||
### Typing indicators
|
||||
|
||||
Purely ephemeral. Client sends `typing.start`, DO broadcasts to others, never persisted.
|
||||
|
||||
## Cloudflare Worker — API & Routing
|
||||
|
||||
### Endpoints
|
||||
|
||||
```
|
||||
GET /chat/ws/:channelId # WebSocket upgrade → routed to DO
|
||||
GET /chat/channels/:channelId/messages?before=<ts>&limit=50 # Paginated history
|
||||
POST /chat/channels/:channelId/read # Mark channel as read
|
||||
GET /chat/unread # Unread counts for current user
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
Every request (including WebSocket upgrade) includes the Supabase JWT in the `Authorization` header. The Worker validates the JWT using Supabase's public JWT secret (standard JWT verification, no Supabase SDK needed).
|
||||
|
||||
### Membership check
|
||||
|
||||
Before routing a WebSocket connection to a DO, the Worker verifies the user is a member of the tablo by querying existing membership data via the Supabase REST API.
|
||||
|
||||
### Postgres access
|
||||
|
||||
Both the Worker (for history/unread queries) and the DO (for message persistence) use Supabase's PostgREST API with the service role key. The DO calls PostgREST directly — it does not route writes through the Worker. No direct Postgres connection needed at this scale.
|
||||
|
||||
## Frontend
|
||||
|
||||
### Chat client hook — `useChat`
|
||||
|
||||
```typescript
|
||||
const {
|
||||
messages, // Message[] — history + live messages merged
|
||||
sendMessage, // (text: string) => void
|
||||
isConnected, // boolean
|
||||
typingUsers, // string[] — user IDs currently typing
|
||||
onlineUsers, // string[] — user IDs currently connected
|
||||
loadMoreMessages, // () => void — pagination trigger
|
||||
hasMoreMessages, // boolean
|
||||
markAsRead, // () => void
|
||||
} = useChat(channelId)
|
||||
```
|
||||
|
||||
Internally:
|
||||
|
||||
1. Opens WebSocket to `/chat/ws/:channelId` with JWT
|
||||
2. Fetches initial message history via REST
|
||||
3. Merges incoming WebSocket messages with history (dedup via `clientId`)
|
||||
4. Sends typing events with debounce (start on keypress, stop after 2s idle)
|
||||
5. Calls `markAsRead` when channel is visible/focused
|
||||
|
||||
### Unread counts — `useChatUnread`
|
||||
|
||||
Polls `GET /chat/unread` on interval (every 30s) or on window focus. Replaces `useTabloDiscussionUnread`.
|
||||
|
||||
### UI components — chatscope
|
||||
|
||||
Using `@chatscope/chat-ui-kit-react`, styled to match existing Radix/Tailwind design system:
|
||||
|
||||
| chatscope component | Replaces |
|
||||
|---------------------|----------|
|
||||
| `ConversationList` + `Conversation` | Stream `ChannelList` + `ChannelPreview` |
|
||||
| `ChatContainer` + `MessageList` + `Message` | Stream `Channel` + `MessageList` |
|
||||
| `MessageInput` | Stream `MessageInput` |
|
||||
| `TypingIndicator` | New (wasn't explicitly used before) |
|
||||
| `Avatar` + `StatusIndicator` | `ChannelBadge` online dot |
|
||||
|
||||
### Styling
|
||||
|
||||
Override chatscope's default CSS theme to match existing Tailwind/Radix design system (colors, fonts, border radius, spacing).
|
||||
|
||||
## Migration — What Changes in Existing Code
|
||||
|
||||
### Removed from `apps/api`
|
||||
|
||||
- Stream server client initialization in `middleware.ts`
|
||||
- `stream-chat` dependency
|
||||
- Stream token generation in `GET /api/v1/users/me` (`user.ts:64`)
|
||||
- `STREAM_CHAT_API_KEY` and `STREAM_CHAT_API_SECRET` config variables
|
||||
- All `streamServerClient.channel()` calls in `tablo.ts`
|
||||
- All `streamServerClient.upsertUser()` calls in `user.ts`, `helpers.ts`, `tablo.ts`
|
||||
- All `channel.addMembers()` / `channel.removeMembers()` calls
|
||||
|
||||
### Replaced with Postgres operations
|
||||
|
||||
- **Tablo creation**: No extra chat setup needed. The tablo IS the channel.
|
||||
- **Add/remove member**: Already handled by tablo/org membership. No chat-specific call.
|
||||
- **Update channel name**: Not needed. Channel name = tablo name, read from `tablos` table.
|
||||
- **Delete channel**: `ON DELETE CASCADE` on `messages.channel_id` handles cleanup.
|
||||
- **Upsert Stream user**: Removed entirely. No user sync needed.
|
||||
|
||||
### Removed from `apps/main`
|
||||
|
||||
- `stream-chat` and `stream-chat-react` dependencies
|
||||
- `VITE_STREAM_CHAT_API_KEY` from all env files
|
||||
- `ChatProvider.tsx`
|
||||
- `streamToken` from user data flow
|
||||
- `ChannelPreview.tsx`, `CustomChannelHeader.tsx`, `ChannelBadge.tsx` (replaced by chatscope equivalents)
|
||||
- `useChannelFromUrl`, `useTabloDiscussionUnread` hooks (replaced by `useChat`, `useChatUnread`)
|
||||
|
||||
### Unchanged
|
||||
|
||||
- Tablo CRUD, membership management, auth — all stay in existing API
|
||||
- Chat page routes (`/chat`, `/chat/:channelId`) — same URLs, new implementation
|
||||
|
||||
## Error Handling & Edge Cases
|
||||
|
||||
### WebSocket disconnection / reconnect
|
||||
|
||||
- `useChat` implements automatic reconnection with exponential backoff
|
||||
- On reconnect, fetches messages with `?after=<lastMessageTimestamp>` to fill the gap
|
||||
- Messages sent while disconnected are queued locally and sent on reconnect (or shown as "failed to send" after timeout)
|
||||
|
||||
### Message ordering
|
||||
|
||||
- DO assigns server timestamps, which are authoritative
|
||||
- Optimistic messages use `clientId` for dedup — replaced by server echo on arrival
|
||||
- History from Postgres is ordered by `created_at DESC`
|
||||
|
||||
### Membership enforcement
|
||||
|
||||
- Worker checks tablo membership before allowing WebSocket connection
|
||||
- If a user is removed while connected, they are evicted on next reconnect (acceptable at small scale)
|
||||
|
||||
### Postgres write failures
|
||||
|
||||
- DO retries up to 3 times
|
||||
- Messages already delivered via WebSocket — user experience unaffected
|
||||
- Persistent failure is logged as an operational alert
|
||||
|
||||
### Deploys
|
||||
|
||||
- DO eviction on deploy closes all WebSockets
|
||||
- Clients reconnect and fetch the gap from Postgres via standard reconnect flow
|
||||
|
||||
## Scale Considerations
|
||||
|
||||
Designed for small scale (under 100 concurrent users, occasional messages). At this scale:
|
||||
|
||||
- A single DO per room handles all connections easily
|
||||
- Supabase PostgREST is sufficient (no connection pooling needed)
|
||||
- Polling for unread counts (every 30s) is fine
|
||||
- No need for message queues, caches, or read replicas
|
||||
Loading…
Reference in a new issue