Building Vaartalaap — real-time collab rooms with Yjs, WebRTC, and zero accounts
Vaartalaap (वार्तालाप — Hindi for conversation) is a side project I’ve been chipping away at: a single-link, no-account collaboration room that bundles a collaborative code editor, a whiteboard, rich-text notes, mesh video, and chat. Open it, share the URL, start working. Rooms expire after 30 days. That’s it.
Live: vaartalaapclient.vercel.app Repo: github.com/iyashwantsaini/Vaartalaap Docs: ARCHITECTURE.md · DEPLOYMENT.md
Why I built it
Most “interview portals” are heavy. They want signups, organizations, billing, calendar integrations, GDPR cookie banners. Sometimes you just need a room with a code editor and a video call for an hour. So I built one — small, disposable, dark-theme, zero friction.
Along the way it turned into a sandbox for things I wanted to learn properly: CRDTs, WebRTC mesh signalling, and a real-time canvas that behaves under flaky networks.

At a glance
| surface | tech | sync mechanism |
|---|---|---|
| code editor | CodeMirror 6 + Yjs CRDT | y-codemirror.next + awareness cursors |
| whiteboard | Canvas 2D | Socket.IO stroke broadcast |
| notes | Tiptap rich-text | Socket.IO debounced doc:change |
| chat | MUI list | Socket.IO append-only |
| video / audio | WebRTC mesh (≤6 peers) | Socket.IO signalling, Open Relay TURN |
| theme | MUI v6 | localStorage + OS pref, dark/light toggle |
Frontend is a React 18 + Vite 5 SPA. Backend is Express 4 + Socket.IO 4 with MongoDB for room metadata and CRDT snapshots. TypeScript end-to-end, ~93% of the codebase.
System topology
flowchart LR
subgraph Browsers["Clients (browsers)"]
A[Tab A<br/>React SPA]
B[Tab B<br/>React SPA]
C[Tab C<br/>React SPA]
end
subgraph Server["Node server<br/>(Express + Socket.IO)"]
REST[REST<br/>/api/rooms]
WS[Socket.IO<br/>rooms + relay]
YS[Yjs service<br/>in-mem Y.Doc cache]
end
subgraph Mongo["MongoDB Atlas"]
R[(rooms<br/>TTL 30d)]
Y[(yjsDocs<br/>state: Binary)]
end
TURN[(Open Relay<br/>STUN + TURN)]
A -- HTTPS --> REST
B -- HTTPS --> REST
A <-- WebSocket --> WS
B <-- WebSocket --> WS
C <-- WebSocket --> WS
REST --> R
WS --> R
WS --> YS
YS --> Y
A <-. WebRTC P2P (SRTP) .-> B
A <-. WebRTC P2P (SRTP) .-> C
B <-. WebRTC P2P (SRTP) .-> C
A -.- TURN
B -.- TURN
C -.- TURN
Two properties matter here:
- The server is a relay only — for signalling, presence, Yjs ops, and CRUD snapshots. Media never traverses it. That’s why the backend can run on Render’s free tier and not melt.
- Two MongoDB collections.
roomsholds REST snapshots + presence-friendly metadata.yjsDocsholds binary CRDT state per(roomId, docName), written byY.encodeStateAsUpdate. Both have TTL indexes for the 30-day expiry.
The interesting parts
1. Yjs for the editor (no central source of truth)
The first version used a “last-write-wins” approach over Socket.IO. It looked fine in demos and broke instantly with two people typing. I rewrote it on Yjs, a CRDT library that lets every peer hold its own copy of the document and merges concurrent edits deterministically.
The wiring:
const ydoc = new Y.Doc();
const ytext = ydoc.getText('code');
const view = new EditorView({
state: EditorState.create({
doc: ytext.toString(),
extensions: [
basicSetup,
yCollab(ytext, awareness),
],
}),
});
The server is stateless from a CRDT standpoint — it just relays binary updates and persists them. All conflict resolution lives inside Yjs itself. Awareness packets (cursor positions, selections, peer colors) are relayed but never persisted; they’re transient by design.
Pipeline per keystroke:
sequenceDiagram
autonumber
participant L as Local CodeMirror
participant YT as Y.Text (client)
participant CS as client/lib/yjs
participant WS as Socket.IO
participant SY as services/yjsService
participant SD as Server Y.Doc cache
participant DB as MongoDB yjsDocs
L->>YT: keystroke (CodeMirror binding)
YT->>CS: update (Uint8Array)
CS->>WS: yjs:update {roomId, docName, update}
WS->>SY: applyUpdate(roomId, docName, update)
SY->>SD: Y.applyUpdate(doc, update)
SY-->>DB: debounced 1.5s — encodeStateAsUpdate()
WS-->>WS: io.to(room).emit yjs:update (excl sender)
The doc cache is Map<roomId+docName, Y.Doc>, lazy-hydrated from Mongo on
first reference. There’s a hard cap of MAX_DOC_BYTES = 1 MB per doc — past
that, further updates are dropped, which is far above what any reasonable
interview session needs.
2. The Yjs seeding race I actually shipped a fix for
When a fresh room opens, the editor needs an initial template (e.g. a
#include <iostream> skeleton). The naive approach is “if the doc is empty,
insert the template locally.” But CRDTs are unforgiving: if two clients
open the same fresh room simultaneously, both insert the template into
their local Y.Doc, both broadcast updates, and the merge produces a
duplicated buffer (the template, twice).
The fix is server-authoritative seeding. The client sends:
socket.emit('yjs:seed-if-empty', { docName, textKey, text }, (ack) => { ... });
The server checks ytext.length === 0 atomically (Node is single-threaded —
this works without any explicit lock), inserts the template inside a
Y.transact iff empty, and broadcasts the resulting yjs:update to every
peer in the room. The first request wins; the rest get {seeded: false} and
pick up the canonical seed via the broadcast.
There’s a related Socket.IO subtlety I learned the hard way:
socket.join(roomId) must run synchronously before any await in the
room:join handler. Socket.IO does not pause event delivery while a handler
is suspended on await, so deferring the join until after a Mongo round-trip
causes the client’s immediately-following yjs:sync-request to be silently
dropped (because socket.rooms.has(roomId) === false at that instant).
3. WebRTC mesh — works for ≤6, breaks at 8
I went mesh (every peer connects to every other peer) instead of an SFU
because (a) I didn’t want to operate media servers, and (b) the target use
case is 1-on-1 interviews. Up to ~6 it’s smooth on a decent connection.
Beyond that, upstream bandwidth becomes the bottleneck — O(n²) peer
connections multiplied by your camera’s bitrate adds up fast.
Signalling is a thin Socket.IO relay:
sequenceDiagram
autonumber
participant A
participant S as Server (signaling)
participant B
Note over A,B: Joiner (A) vs existing peer (B). Server has no media path.
A->>S: rtc:join
S-->>B: peers:update (A appeared)
A->>S: rtc:signal {to:B, sdp:offer}
S-->>B: rtc:signal {from:A, sdp:offer}
B->>S: rtc:signal {to:A, sdp:answer}
S-->>A: rtc:signal {from:B, sdp:answer}
loop ICE candidates
A-->>S: rtc:signal candidate
S-->>B: rtc:signal candidate
B-->>S: rtc:signal candidate
S-->>A: rtc:signal candidate
end
Note over A,B: SRTP media flows direct (or via TURN if symmetric NAT)
Open Relay handles STUN + TURN for free. Without TURN, anyone behind a symmetric NAT (most corporate networks) just can’t connect. With it, those peers fall back to a relayed path with ~30–80 ms extra latency. STUN-only fallback kicks in when TURN is unreachable, which covers most home networks.
4. The “tab refresh shouldn’t kick me out” problem
If you refresh during a call your socket disconnects and the server normally
removes you from the participant list. Other peers tear down their
RTCPeerConnections. Two seconds later when you come back, everyone has to
renegotiate. Annoying — and on a flaky connection, it makes the room feel
broken.
The fix is a 3-second grace period with a per-socket / cross-socket two-map model on the server:
stateDiagram-v2 [*] --> Connected: socket connects Connected --> InRoom: room:join InRoom --> Pending: socket disconnects Pending --> InRoom: same pid re-acquires<br/>within 3s Pending --> Removed: 3s elapsed Removed --> [*]: broadcast peers:update
socketMemberships : Map<sid, Map<roomId, pid>>
participantRefs : Map<roomId+pid, count>
pendingRemovals : Map<roomId+pid, Timer>
acquireRef increments participantRefs and clears any pending removal
timer. releaseRef decrements; when the count hits zero, schedule a 3-second
timer. If a new socket re-acquires the same pid within that window, the
timer is cancelled and nobody downstream sees the disconnect at all.
socket.on('disconnect', () => {
for (const [roomId, pid] of memberships.get(socket.id) ?? []) {
releaseRef(roomId, pid);
}
});
function releaseRef(roomId, pid) {
const key = `${roomId}:${pid}`;
if (--refs[key] > 0) return;
pendingRemovals.set(key, setTimeout(() => {
db.rooms.updateOne({ id: roomId }, { $pull: { participants: { id: pid } } });
io.to(roomId).emit('peers:update', currentPeers(roomId));
}, 3000));
}
Same room, same camera tile, no re-handshake. Feels invisible — which is the goal.
5. Per-window identity via window.name
Two tabs in the same room used to collide because they shared localStorage
and would overwrite each other’s participant ID. The fix is one line:
window.name is per-tab, survives soft reloads, but doesn’t leak to new
windows.
const winId = window.name || (window.name = `vaa-${rand6()}`);
const storageKey = `vaartalaap:pid:${roomId}:${winId}`;
const pid = localStorage.getItem(storageKey)
?? (localStorage.setItem(storageKey, crypto.randomUUID()),
localStorage.getItem(storageKey));
So two tabs of the same room never collide on participant ID, and a refresh keeps the same ID — which is exactly what the 3-second grace timer needs to recognise the rejoin.
6. Whiteboard auto-contrast
If one peer is on dark mode and another flips to light, naïve white strokes become invisible on a white background. The renderer resolves stroke color against the current canvas background:
- Pure white on light bg → re-rendered as dark ink.
- Pure black on dark bg → re-rendered as light ink.
- Strokes within ±0.15 luminance of bg → treated as eraser (drawn as bg).
- Default new ink =
#2979ff(blue) — high contrast in both modes.
Not rocket science, but it’s the kind of thing you only notice when someone joins your call and immediately switches themes mid-session.

7. 3-tier code execution fallback
Running code from a browser is annoying because every free service rate-limits you eventually. Vaartalaap tries them in order:
- Wandbox — fastest, supports the most languages.
- CodeX — fallback when Wandbox returns
429or 5xx. - Local agent — last-resort sandbox with a small whitelist.
Each provider conforms to a CodeExecutor interface:
interface CodeExecutor {
supports(lang: string): boolean;
run(opts: { lang: string; src: string; stdin: string }): Promise<{ stdout: string; stderr: string }>;
}
Adding a new backend is ~30 lines. Run history is capped at 8 entries client-side so it doesn’t eat memory during a long session.
sequenceDiagram
autonumber
participant U as User
participant FE as CodeWorkbench
participant EX as lib/codeExecutor
participant W as Wandbox
participant CX as CodeX
participant AG as Agent fallback
U->>FE: Run (Ctrl+Enter)
FE->>EX: executeCode(lang, src, stdin)
EX->>W: POST /compile.json
alt Wandbox 200 OK
W-->>EX: {output}
else
EX->>CX: POST /exec
alt CodeX 200 OK
CX-->>EX: {output}
else
EX->>AG: POST /agent
AG-->>EX: {output}
end
end
EX-->>FE: combined stdout+stderr
8. Performance budget
Some hot-path numbers I tuned to:
| hot path | target | mechanism |
|---|---|---|
| code keystroke → peer keystroke | <150 ms | Yjs update batched on next tick, WS broadcast |
| whiteboard stroke ack | <50 ms local, <200 ms peer | optimistic local draw + WS broadcast |
| code → DB snapshot | 300 ms debounce | roomService.persistDocuments |
| Yjs → DB | 1.5 s debounce | yjsService per-doc timer |
| reconnect after refresh | 3 s grace before kick | server pendingRemovals |
Real-time event matrix (abridged)
flowchart LR
subgraph Client
direction TB
c1[doc:change]
c2[wb:stroke]
c4[chat:send]
c5[yjs:sync-request]
c6[yjs:update]
c7[yjs:awareness]
c8[rtc:signal]
c10[yjs:seed-if-empty]
end
subgraph Server
direction TB
s1[doc:state]
s2[wb:state]
s3[chat:new]
s4[yjs:sync-response]
s5[yjs:update]
s6[yjs:awareness]
s7[rtc:signal]
s8[peers:update]
end
c1 -- broadcast --> s1
c2 --> s2
c4 --> s3
c5 -- reply only --> s4
c6 -- relay + persist --> s5
c7 -- relay --> s6
c8 -- targeted --> s7
| direction | event | payload | persisted? |
|---|---|---|---|
| C → S | room:join | {roomId, name, participantId} | yes (rooms) |
| C → S | doc:change | {kind, value} (kind = code/notes/lang) | yes, debounced 300 ms |
| C → S | wb:stroke | Stroke | yes |
| C → S | chat:send | {text} | yes (capped 200 msgs) |
| C → S | yjs:update | Uint8Array | yes, debounced 1.5 s |
| C → S | yjs:awareness | Uint8Array | no (transient) |
| C → S | yjs:seed-if-empty | {docName, textKey, text} → ack {seeded} | yes (winner only) |
| C → S | rtc:signal | {to, sdp|candidate} | no |
| S → C | peers:update | Participant[] | — |
| S → C | room:state | RoomSnapshot | — |
Repo layout
Vaartalaap/
├── apps/
│ ├── client/ React 18 + Vite 5 SPA
│ │ └── src/
│ │ ├── components/ CodeWorkbench, CollabCodeEditor, Whiteboard,
│ │ │ Notepad, ChatPanel, CallPanel, ColorModeToggle…
│ │ ├── routes/ Landing, Room
│ │ ├── lib/ api, socket, yjs, codeExecutor, languages
│ │ └── styles/ muiTheme factory, ColorModeProvider
│ └── server/ Express 4 + Socket.IO 4
│ └── src/
│ ├── config/ env (Zod), db, socket
│ ├── routes/ roomRoutes
│ ├── services/ roomService, yjsService
│ └── lib/ logger
├── packages/shared/ Shared types (RoomSnapshot, Stroke, ChatMsg…)
└── docs/
├── ARCHITECTURE.md
└── DEPLOYMENT.md
Deployment — what actually works on $0
| layer | provider | plan | notes |
|---|---|---|---|
| static SPA | Vercel | Hobby | auto SSL + global CDN |
| API + WebSocket | Render | Free Web Service | cold-start ~30 s; upgrade to Starter avoids it |
| database | MongoDB Atlas | M0 free | 512 MB, shared CPU |
| TURN | openrelay.metered.ca | free | adds latency vs paid |
Both Vercel and Render listen to GitHub webhooks themselves — no GitHub
Actions tokens or secrets. CI runs tsc + vite build on every push as a
regression guard before the providers build.
Gotchas I actually hit
These cost me hours each, so they go in the post:
- Render strips devDeps because it sets
NODE_ENV=production. Build command must include--include=devortsccan’t find@types/express. - Vercel’s Vite preset hardcodes
outputDirectory=distand ignoresvercel.json. Set Framework Preset to Other instead, thenvercel.jsoncontrols everything. - CORS does exact-match.
CLIENT_ORIGINmust have no trailing slash, or every request gets rejected with a confusing CORS error. - Node ESM strict resolver requires
.jsextensions on every relative import in the compiled output.import './config/env'fails;import './config/env.js'works. - Render’s free tier has no static egress IP, so MongoDB Atlas Network
Access has to allow
0.0.0.0/0. Lock down with the connection string’s user/password instead. - Pin Node 20 on Render. The default Node 24 broke our TS deprecation
flags; setting
NODE_VERSION=20.18.0fixed it. VITE_*env vars are baked at build time. Changing them in the Vercel dashboard does nothing until you trigger a redeploy.
Failure modes & graceful degradation
| failure | mitigation |
|---|---|
| TURN unreachable | STUN-only — works on most home NATs |
| Mongo down | sockets keep broadcasting; persistence skipped, errors logged |
| Render cold start | first request waits ~30s, subsequent are warm |
| Wandbox / CodeX down | 3-tier fallback chain |
| WebSocket blocked by proxy | Socket.IO auto-falls-back to long-poll |
Security boundary (what’s in scope, what isn’t)
- In scope: Helmet headers, CORS allow-list,
express-rate-limit(200 req / 15 min), Zod validation on env + REST bodies + socket payloads, size caps (chat 200 msgs, yjsDoc 1 MB, strokes capped per room), 30-day TTL purge. - Out of scope: authn / authz (rooms are anyone-with-link, by design),
end-to-end encryption (WebRTC media is SRTP, but the server can read REST
- WS payloads), abuse prevention beyond rate limits.
If you put PII through this, that’s on you. It’s an interview / pairing tool, not a vault.
What I’d change next
- Persist Yjs snapshots more aggressively. Right now there’s a 1.5s debounce; under heavy bursts you can lose ~1s of edits if the server dies at the wrong moment.
- Switch video to an SFU (mediasoup or livekit) once I want classroom-sized rooms. Mesh is fine for the actual use case.
- Replay mode — Yjs already gives you the full edit history, so a scrubber over a code session is mostly free.
- Self-hosted coturn to drop the dependency on Open Relay (which rate-limits under load).
If you’ve read this far, just go open a room and try it. If it breaks — open an issue, I’d love the bug report.