Building Vaartalaap — real-time collab rooms with Yjs, WebRTC, and zero accounts

Vaartalaap (वार्तालाप — Hindi for conversation) is a side project I’ve been chipping away at: a single-link, no-account collaboration room that bundles a collaborative code editor, a whiteboard, rich-text notes, mesh video, and chat. Open it, share the URL, start working. Rooms expire after 30 days. That’s it.

Live: vaartalaapclient.vercel.app Repo: github.com/iyashwantsaini/Vaartalaap Docs: ARCHITECTURE.md · DEPLOYMENT.md

Why I built it

Most “interview portals” are heavy. They want signups, organizations, billing, calendar integrations, GDPR cookie banners. Sometimes you just need a room with a code editor and a video call for an hour. So I built one — small, disposable, dark-theme, zero friction.

Along the way it turned into a sandbox for things I wanted to learn properly: CRDTs, WebRTC mesh signalling, and a real-time canvas that behaves under flaky networks.

Vaartalaap landing page

At a glance

surface	tech	sync mechanism
code editor	CodeMirror 6 + Yjs CRDT	`y-codemirror.next` + awareness cursors
whiteboard	Canvas 2D	Socket.IO stroke broadcast
notes	Tiptap rich-text	Socket.IO debounced `doc:change`
chat	MUI list	Socket.IO append-only
video / audio	WebRTC mesh (≤6 peers)	Socket.IO signalling, Open Relay TURN
theme	MUI v6	localStorage + OS pref, dark/light toggle

Frontend is a React 18 + Vite 5 SPA. Backend is Express 4 + Socket.IO 4 with MongoDB for room metadata and CRDT snapshots. TypeScript end-to-end, ~93% of the codebase.

System topology

flowchart LR
  subgraph Browsers["Clients (browsers)"]
    A[Tab A<br/>React SPA]
    B[Tab B<br/>React SPA]
    C[Tab C<br/>React SPA]
  end
  subgraph Server["Node server<br/>(Express + Socket.IO)"]
    REST[REST<br/>/api/rooms]
    WS[Socket.IO<br/>rooms + relay]
    YS[Yjs service<br/>in-mem Y.Doc cache]
  end
  subgraph Mongo["MongoDB Atlas"]
    R[(rooms<br/>TTL 30d)]
    Y[(yjsDocs<br/>state: Binary)]
  end
  TURN[(Open Relay<br/>STUN + TURN)]
  A -- HTTPS --> REST
  B -- HTTPS --> REST
  A <-- WebSocket --> WS
  B <-- WebSocket --> WS
  C <-- WebSocket --> WS
  REST --> R
  WS --> R
  WS --> YS
  YS --> Y
  A <-. WebRTC P2P (SRTP) .-> B
  A <-. WebRTC P2P (SRTP) .-> C
  B <-. WebRTC P2P (SRTP) .-> C
  A -.- TURN
  B -.- TURN
  C -.- TURN

Two properties matter here:

The server is a relay only — for signalling, presence, Yjs ops, and CRUD snapshots. Media never traverses it. That’s why the backend can run on Render’s free tier and not melt.
Two MongoDB collections. rooms holds REST snapshots + presence-friendly metadata. yjsDocs holds binary CRDT state per (roomId, docName), written by Y.encodeStateAsUpdate. Both have TTL indexes for the 30-day expiry.

The interesting parts

1. Yjs for the editor (no central source of truth)

The first version used a “last-write-wins” approach over Socket.IO. It looked fine in demos and broke instantly with two people typing. I rewrote it on Yjs, a CRDT library that lets every peer hold its own copy of the document and merges concurrent edits deterministically.

The wiring:

const ydoc = new Y.Doc();
const ytext = ydoc.getText('code');

const view = new EditorView({
  state: EditorState.create({
    doc: ytext.toString(),
    extensions: [
      basicSetup,
      yCollab(ytext, awareness),
    ],
  }),
});

The server is stateless from a CRDT standpoint — it just relays binary updates and persists them. All conflict resolution lives inside Yjs itself. Awareness packets (cursor positions, selections, peer colors) are relayed but never persisted; they’re transient by design.

Pipeline per keystroke:

sequenceDiagram
  autonumber
  participant L as Local CodeMirror
  participant YT as Y.Text (client)
  participant CS as client/lib/yjs
  participant WS as Socket.IO
  participant SY as services/yjsService
  participant SD as Server Y.Doc cache
  participant DB as MongoDB yjsDocs
  L->>YT: keystroke (CodeMirror binding)
  YT->>CS: update (Uint8Array)
  CS->>WS: yjs:update {roomId, docName, update}
  WS->>SY: applyUpdate(roomId, docName, update)
  SY->>SD: Y.applyUpdate(doc, update)
  SY-->>DB: debounced 1.5s — encodeStateAsUpdate()
  WS-->>WS: io.to(room).emit yjs:update (excl sender)

The doc cache is Map<roomId+docName, Y.Doc>, lazy-hydrated from Mongo on first reference. There’s a hard cap of MAX_DOC_BYTES = 1 MB per doc — past that, further updates are dropped, which is far above what any reasonable interview session needs.

2. The Yjs seeding race I actually shipped a fix for

When a fresh room opens, the editor needs an initial template (e.g. a #include <iostream> skeleton). The naive approach is “if the doc is empty, insert the template locally.” But CRDTs are unforgiving: if two clients open the same fresh room simultaneously, both insert the template into their local Y.Doc, both broadcast updates, and the merge produces a duplicated buffer (the template, twice).

The fix is server-authoritative seeding. The client sends:

socket.emit('yjs:seed-if-empty', { docName, textKey, text }, (ack) => { ... });

The server checks ytext.length === 0 atomically (Node is single-threaded — this works without any explicit lock), inserts the template inside a Y.transact iff empty, and broadcasts the resulting yjs:update to every peer in the room. The first request wins; the rest get {seeded: false} and pick up the canonical seed via the broadcast.

There’s a related Socket.IO subtlety I learned the hard way: socket.join(roomId) must run synchronously before any await in the room:join handler. Socket.IO does not pause event delivery while a handler is suspended on await, so deferring the join until after a Mongo round-trip causes the client’s immediately-following yjs:sync-request to be silently dropped (because socket.rooms.has(roomId) === false at that instant).

3. WebRTC mesh — works for ≤6, breaks at 8

I went mesh (every peer connects to every other peer) instead of an SFU because (a) I didn’t want to operate media servers, and (b) the target use case is 1-on-1 interviews. Up to ~6 it’s smooth on a decent connection. Beyond that, upstream bandwidth becomes the bottleneck — O(n²) peer connections multiplied by your camera’s bitrate adds up fast.

Signalling is a thin Socket.IO relay:

sequenceDiagram
  autonumber
  participant A
  participant S as Server (signaling)
  participant B
  Note over A,B: Joiner (A) vs existing peer (B). Server has no media path.
  A->>S: rtc:join
  S-->>B: peers:update (A appeared)
  A->>S: rtc:signal {to:B, sdp:offer}
  S-->>B: rtc:signal {from:A, sdp:offer}
  B->>S: rtc:signal {to:A, sdp:answer}
  S-->>A: rtc:signal {from:B, sdp:answer}
  loop ICE candidates
    A-->>S: rtc:signal candidate
    S-->>B: rtc:signal candidate
    B-->>S: rtc:signal candidate
    S-->>A: rtc:signal candidate
  end
  Note over A,B: SRTP media flows direct (or via TURN if symmetric NAT)

Open Relay handles STUN + TURN for free. Without TURN, anyone behind a symmetric NAT (most corporate networks) just can’t connect. With it, those peers fall back to a relayed path with ~30–80 ms extra latency. STUN-only fallback kicks in when TURN is unreachable, which covers most home networks.

4. The “tab refresh shouldn’t kick me out” problem

If you refresh during a call your socket disconnects and the server normally removes you from the participant list. Other peers tear down their RTCPeerConnections. Two seconds later when you come back, everyone has to renegotiate. Annoying — and on a flaky connection, it makes the room feel broken.

The fix is a 3-second grace period with a per-socket / cross-socket two-map model on the server:

stateDiagram-v2
  [*] --> Connected: socket connects
  Connected --> InRoom: room:join
  InRoom --> Pending: socket disconnects
  Pending --> InRoom: same pid re-acquires<br/>within 3s
  Pending --> Removed: 3s elapsed
  Removed --> [*]: broadcast peers:update

socketMemberships  : Map<sid, Map<roomId, pid>>
participantRefs    : Map<roomId+pid, count>
pendingRemovals    : Map<roomId+pid, Timer>

acquireRef increments participantRefs and clears any pending removal timer. releaseRef decrements; when the count hits zero, schedule a 3-second timer. If a new socket re-acquires the same pid within that window, the timer is cancelled and nobody downstream sees the disconnect at all.

socket.on('disconnect', () => {
  for (const [roomId, pid] of memberships.get(socket.id) ?? []) {
    releaseRef(roomId, pid);
  }
});

function releaseRef(roomId, pid) {
  const key = `${roomId}:${pid}`;
  if (--refs[key] > 0) return;
  pendingRemovals.set(key, setTimeout(() => {
    db.rooms.updateOne({ id: roomId }, { $pull: { participants: { id: pid } } });
    io.to(roomId).emit('peers:update', currentPeers(roomId));
  }, 3000));
}

Same room, same camera tile, no re-handshake. Feels invisible — which is the goal.

5. Per-window identity via `window.name`

Two tabs in the same room used to collide because they shared localStorage and would overwrite each other’s participant ID. The fix is one line: window.name is per-tab, survives soft reloads, but doesn’t leak to new windows.

const winId = window.name || (window.name = `vaa-${rand6()}`);
const storageKey = `vaartalaap:pid:${roomId}:${winId}`;
const pid = localStorage.getItem(storageKey)
         ?? (localStorage.setItem(storageKey, crypto.randomUUID()),
             localStorage.getItem(storageKey));

So two tabs of the same room never collide on participant ID, and a refresh keeps the same ID — which is exactly what the 3-second grace timer needs to recognise the rejoin.

6. Whiteboard auto-contrast

If one peer is on dark mode and another flips to light, naïve white strokes become invisible on a white background. The renderer resolves stroke color against the current canvas background:

Pure white on light bg → re-rendered as dark ink.
Pure black on dark bg → re-rendered as light ink.
Strokes within ±0.15 luminance of bg → treated as eraser (drawn as bg).
Default new ink = #2979ff (blue) — high contrast in both modes.

Not rocket science, but it’s the kind of thing you only notice when someone joins your call and immediately switches themes mid-session.

Whiteboard with auto-contrast strokes

7. 3-tier code execution fallback

Running code from a browser is annoying because every free service rate-limits you eventually. Vaartalaap tries them in order:

Wandbox — fastest, supports the most languages.
CodeX — fallback when Wandbox returns 429 or 5xx.
Local agent — last-resort sandbox with a small whitelist.

Each provider conforms to a CodeExecutor interface:

interface CodeExecutor {
  supports(lang: string): boolean;
  run(opts: { lang: string; src: string; stdin: string }): Promise<{ stdout: string; stderr: string }>;
}

Adding a new backend is ~30 lines. Run history is capped at 8 entries client-side so it doesn’t eat memory during a long session.

sequenceDiagram
  autonumber
  participant U as User
  participant FE as CodeWorkbench
  participant EX as lib/codeExecutor
  participant W as Wandbox
  participant CX as CodeX
  participant AG as Agent fallback
  U->>FE: Run (Ctrl+Enter)
  FE->>EX: executeCode(lang, src, stdin)
  EX->>W: POST /compile.json
  alt Wandbox 200 OK
    W-->>EX: {output}
  else
    EX->>CX: POST /exec
    alt CodeX 200 OK
      CX-->>EX: {output}
    else
      EX->>AG: POST /agent
      AG-->>EX: {output}
    end
  end
  EX-->>FE: combined stdout+stderr

8. Performance budget

Some hot-path numbers I tuned to:

hot path	target	mechanism
code keystroke → peer keystroke	<150 ms	Yjs update batched on next tick, WS broadcast
whiteboard stroke ack	<50 ms local, <200 ms peer	optimistic local draw + WS broadcast
code → DB snapshot	300 ms debounce	`roomService.persistDocuments`
Yjs → DB	1.5 s debounce	`yjsService` per-doc timer
reconnect after refresh	3 s grace before kick	server `pendingRemovals`

Real-time event matrix (abridged)

flowchart LR
  subgraph Client
    direction TB
    c1[doc:change]
    c2[wb:stroke]
    c4[chat:send]
    c5[yjs:sync-request]
    c6[yjs:update]
    c7[yjs:awareness]
    c8[rtc:signal]
    c10[yjs:seed-if-empty]
  end
  subgraph Server
    direction TB
    s1[doc:state]
    s2[wb:state]
    s3[chat:new]
    s4[yjs:sync-response]
    s5[yjs:update]
    s6[yjs:awareness]
    s7[rtc:signal]
    s8[peers:update]
  end
  c1 -- broadcast --> s1
  c2 --> s2
  c4 --> s3
  c5 -- reply only --> s4
  c6 -- relay + persist --> s5
  c7 -- relay --> s6
  c8 -- targeted --> s7

direction	event	payload	persisted?
C → S	`room:join`	`{roomId, name, participantId}`	yes (rooms)
C → S	`doc:change`	`{kind, value}` (kind = code/notes/lang)	yes, debounced 300 ms
C → S	`wb:stroke`	`Stroke`	yes
C → S	`chat:send`	`{text}`	yes (capped 200 msgs)
C → S	`yjs:update`	`Uint8Array`	yes, debounced 1.5 s
C → S	`yjs:awareness`	`Uint8Array`	no (transient)
C → S	`yjs:seed-if-empty`	`{docName, textKey, text}` → ack `{seeded}`	yes (winner only)
C → S	`rtc:signal`	`{to, sdp\|candidate}`	no
S → C	`peers:update`	`Participant[]`	—
S → C	`room:state`	`RoomSnapshot`	—

Repo layout

Vaartalaap/
├── apps/
│   ├── client/                 React 18 + Vite 5 SPA
│   │   └── src/
│   │       ├── components/     CodeWorkbench, CollabCodeEditor, Whiteboard,
│   │       │                   Notepad, ChatPanel, CallPanel, ColorModeToggle…
│   │       ├── routes/         Landing, Room
│   │       ├── lib/            api, socket, yjs, codeExecutor, languages
│   │       └── styles/         muiTheme factory, ColorModeProvider
│   └── server/                 Express 4 + Socket.IO 4
│       └── src/
│           ├── config/         env (Zod), db, socket
│           ├── routes/         roomRoutes
│           ├── services/       roomService, yjsService
│           └── lib/            logger
├── packages/shared/            Shared types (RoomSnapshot, Stroke, ChatMsg…)
└── docs/
    ├── ARCHITECTURE.md
    └── DEPLOYMENT.md

Deployment — what actually works on $0

layer	provider	plan	notes
static SPA	Vercel	Hobby	auto SSL + global CDN
API + WebSocket	Render	Free Web Service	cold-start ~30 s; upgrade to Starter avoids it
database	MongoDB Atlas	M0 free	512 MB, shared CPU
TURN	openrelay.metered.ca	free	adds latency vs paid

Both Vercel and Render listen to GitHub webhooks themselves — no GitHub Actions tokens or secrets. CI runs tsc + vite build on every push as a regression guard before the providers build.

Gotchas I actually hit

These cost me hours each, so they go in the post:

Render strips devDeps because it sets NODE_ENV=production. Build command must include --include=dev or tsc can’t find @types/express.
Vercel’s Vite preset hardcodes outputDirectory=dist and ignores vercel.json. Set Framework Preset to Other instead, then vercel.json controls everything.
CORS does exact-match. CLIENT_ORIGIN must have no trailing slash, or every request gets rejected with a confusing CORS error.
Node ESM strict resolver requires .js extensions on every relative import in the compiled output. import './config/env' fails; import './config/env.js' works.
Render’s free tier has no static egress IP, so MongoDB Atlas Network Access has to allow 0.0.0.0/0. Lock down with the connection string’s user/password instead.
Pin Node 20 on Render. The default Node 24 broke our TS deprecation flags; setting NODE_VERSION=20.18.0 fixed it.
VITE_* env vars are baked at build time. Changing them in the Vercel dashboard does nothing until you trigger a redeploy.

Failure modes & graceful degradation

failure	mitigation
TURN unreachable	STUN-only — works on most home NATs
Mongo down	sockets keep broadcasting; persistence skipped, errors logged
Render cold start	first request waits ~30s, subsequent are warm
Wandbox / CodeX down	3-tier fallback chain
WebSocket blocked by proxy	Socket.IO auto-falls-back to long-poll

Security boundary (what’s in scope, what isn’t)

In scope: Helmet headers, CORS allow-list, express-rate-limit (200 req / 15 min), Zod validation on env + REST bodies + socket payloads, size caps (chat 200 msgs, yjsDoc 1 MB, strokes capped per room), 30-day TTL purge.
Out of scope: authn / authz (rooms are anyone-with-link, by design), end-to-end encryption (WebRTC media is SRTP, but the server can read REST
- WS payloads), abuse prevention beyond rate limits.

If you put PII through this, that’s on you. It’s an interview / pairing tool, not a vault.

What I’d change next

Persist Yjs snapshots more aggressively. Right now there’s a 1.5s debounce; under heavy bursts you can lose ~1s of edits if the server dies at the wrong moment.
Switch video to an SFU (mediasoup or livekit) once I want classroom-sized rooms. Mesh is fine for the actual use case.
Replay mode — Yjs already gives you the full edit history, so a scrubber over a code session is mostly free.
Self-hosted coturn to drop the dependency on Open Relay (which rate-limits under load).

If you’ve read this far, just go open a room and try it. If it breaks — open an issue, I’d love the bug report.