9.8 KiB
Self-Hosted Chat: Stream Chat Replacement
Date: 2026-04-11 Status: Approved
Motivation
Replace Stream Chat (stream-chat + stream-chat-react) with a self-hosted solution to achieve:
- Cost control — eliminate Stream's per-MAU pricing
- Data ownership — messages stored in our own Postgres
- Vendor independence — remove dependency on Stream's backend
Architecture
Three layers:
- Cloudflare Durable Objects — one DO per tablo channel, managing WebSocket connections and real-time message broadcast. Uses the Hibernatable WebSocket API (idle rooms cost nothing).
- Cloudflare Worker — authenticates WebSocket upgrades (validates Supabase JWTs), routes connections to the correct DO, and serves REST endpoints for message history and channel metadata.
- Supabase Postgres — stores messages, read state. Channels and membership are implicit (tablos and tablo/org membership).
This is a separate Cloudflare Worker from the main app, with its own DO bindings and environment variables.
Data Flow
Sending a message
User sends WS message
→ DO receives
→ assigns server timestamp + message ID
→ broadcasts to all connected WS clients in the room
→ writes to Postgres (async, non-blocking via Supabase REST API)
Loading history (reconnect / initial load)
Client connects
→ Worker authenticates JWT
→ Worker checks tablo membership
→ routes to DO
→ DO accepts WebSocket
Client requests history
→ Worker queries Postgres
→ returns paginated messages
Data Model
New tables
-- Messages
CREATE TABLE messages (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
channel_id uuid NOT NULL REFERENCES tablos(id) ON DELETE CASCADE,
user_id uuid NOT NULL REFERENCES auth.users(id),
text text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz,
deleted_at timestamptz
);
CREATE INDEX idx_messages_channel_created ON messages(channel_id, created_at DESC);
-- Read state (last read position per user per channel)
CREATE TABLE channel_read_state (
user_id uuid NOT NULL REFERENCES auth.users(id),
channel_id uuid NOT NULL REFERENCES tablos(id) ON DELETE CASCADE,
last_read_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (user_id, channel_id)
);
No new tables needed for
- Channels — tablos are the channels (tablo ID = channel ID)
- Membership — existing tablo/org membership tables
- Users — existing Supabase auth + user tables
Unread count query
SELECT COUNT(*) FROM messages
WHERE channel_id = :channel_id
AND created_at > (
SELECT last_read_at FROM channel_read_state
WHERE user_id = :user_id AND channel_id = :channel_id
)
AND deleted_at IS NULL;
Soft deletes
Messages use deleted_at (not hard deletion), consistent with the tablo soft-delete pattern.
Durable Objects — Real-time Layer
One DO per tablo channel, identified by tablo ID.
WebSocket lifecycle
- Connect: Worker passes authenticated user ID as a tag via
acceptWebSocket(ws, [userId]). DO tracks connected users. - Message: DO parses incoming message, assigns server timestamp and ID, broadcasts to all other connected sockets via
getWebSockets(). - Close: DO cleans up. If no connections remain, it hibernates (zero cost).
WebSocket message types
Client → Server:
| Type | Payload | Persisted |
|---|---|---|
message.send |
{ text, clientId } |
Yes |
typing.start |
{} |
No |
typing.stop |
{} |
No |
presence.ping |
{} |
No |
Server → Client:
| Type | Payload |
|---|---|
message.new |
{ id, userId, text, createdAt, clientId } |
typing |
{ userId, isTyping } |
presence.update |
{ userId, status } |
error |
{ code, message } |
What the DO does NOT do
- No message history queries (Worker + Postgres)
- No channel membership management (existing API)
- No authentication (Worker validates JWT before routing)
Presence
The DO knows who's connected via getWebSockets(). On connect/disconnect, it broadcasts presence.update to the room.
Typing indicators
Purely ephemeral. Client sends typing.start, DO broadcasts to others, never persisted.
Cloudflare Worker — API & Routing
Endpoints
GET /chat/ws/:channelId # WebSocket upgrade → routed to DO
GET /chat/channels/:channelId/messages?before=<ts>&limit=50 # Paginated history
POST /chat/channels/:channelId/read # Mark channel as read
GET /chat/unread # Unread counts for current user
Authentication
Every request (including WebSocket upgrade) includes the Supabase JWT in the Authorization header. The Worker validates the JWT using Supabase's public JWT secret (standard JWT verification, no Supabase SDK needed).
Membership check
Before routing a WebSocket connection to a DO, the Worker verifies the user is a member of the tablo by querying existing membership data via the Supabase REST API.
Postgres access
Both the Worker (for history/unread queries) and the DO (for message persistence) use Supabase's PostgREST API with the service role key. The DO calls PostgREST directly — it does not route writes through the Worker. No direct Postgres connection needed at this scale.
Frontend
Chat client hook — useChat
const {
messages, // Message[] — history + live messages merged
sendMessage, // (text: string) => void
isConnected, // boolean
typingUsers, // string[] — user IDs currently typing
onlineUsers, // string[] — user IDs currently connected
loadMoreMessages, // () => void — pagination trigger
hasMoreMessages, // boolean
markAsRead, // () => void
} = useChat(channelId)
Internally:
- Opens WebSocket to
/chat/ws/:channelIdwith JWT - Fetches initial message history via REST
- Merges incoming WebSocket messages with history (dedup via
clientId) - Sends typing events with debounce (start on keypress, stop after 2s idle)
- Calls
markAsReadwhen channel is visible/focused
Unread counts — useChatUnread
Polls GET /chat/unread on interval (every 30s) or on window focus. Replaces useTabloDiscussionUnread.
UI components — chatscope
Using @chatscope/chat-ui-kit-react, styled to match existing Radix/Tailwind design system:
| chatscope component | Replaces |
|---|---|
ConversationList + Conversation |
Stream ChannelList + ChannelPreview |
ChatContainer + MessageList + Message |
Stream Channel + MessageList |
MessageInput |
Stream MessageInput |
TypingIndicator |
New (wasn't explicitly used before) |
Avatar + StatusIndicator |
ChannelBadge online dot |
Styling
Override chatscope's default CSS theme to match existing Tailwind/Radix design system (colors, fonts, border radius, spacing).
Migration — What Changes in Existing Code
Removed from apps/api
- Stream server client initialization in
middleware.ts stream-chatdependency- Stream token generation in
GET /api/v1/users/me(user.ts:64) STREAM_CHAT_API_KEYandSTREAM_CHAT_API_SECRETconfig variables- All
streamServerClient.channel()calls intablo.ts - All
streamServerClient.upsertUser()calls inuser.ts,helpers.ts,tablo.ts - All
channel.addMembers()/channel.removeMembers()calls
Replaced with Postgres operations
- Tablo creation: No extra chat setup needed. The tablo IS the channel.
- Add/remove member: Already handled by tablo/org membership. No chat-specific call.
- Update channel name: Not needed. Channel name = tablo name, read from
tablostable. - Delete channel:
ON DELETE CASCADEonmessages.channel_idhandles cleanup. - Upsert Stream user: Removed entirely. No user sync needed.
Removed from apps/main
stream-chatandstream-chat-reactdependenciesVITE_STREAM_CHAT_API_KEYfrom all env filesChatProvider.tsxstreamTokenfrom user data flowChannelPreview.tsx,CustomChannelHeader.tsx,ChannelBadge.tsx(replaced by chatscope equivalents)useChannelFromUrl,useTabloDiscussionUnreadhooks (replaced byuseChat,useChatUnread)
Unchanged
- Tablo CRUD, membership management, auth — all stay in existing API
- Chat page routes (
/chat,/chat/:channelId) — same URLs, new implementation
Error Handling & Edge Cases
WebSocket disconnection / reconnect
useChatimplements automatic reconnection with exponential backoff- On reconnect, fetches messages with
?after=<lastMessageTimestamp>to fill the gap - Messages sent while disconnected are queued locally and sent on reconnect (or shown as "failed to send" after timeout)
Message ordering
- DO assigns server timestamps, which are authoritative
- Optimistic messages use
clientIdfor dedup — replaced by server echo on arrival - History from Postgres is ordered by
created_at DESC
Membership enforcement
- Worker checks tablo membership before allowing WebSocket connection
- If a user is removed while connected, they are evicted on next reconnect (acceptable at small scale)
Postgres write failures
- DO retries up to 3 times
- Messages already delivered via WebSocket — user experience unaffected
- Persistent failure is logged as an operational alert
Deploys
- DO eviction on deploy closes all WebSockets
- Clients reconnect and fetch the gap from Postgres via standard reconnect flow
Scale Considerations
Designed for small scale (under 100 concurrent users, occasional messages). At this scale:
- A single DO per room handles all connections easily
- Supabase PostgREST is sufficient (no connection pooling needed)
- Polling for unread counts (every 30s) is fine
- No need for message queues, caches, or read replicas