Index

Reverse Engineering FaceTime

FaceTime links looked simple enough that I initially underestimated them. A URL opens a web page, a guest types a name, the host taps approve, and media starts. If this were a normal browser-first calling product, I would expect a room id, a WebSocket, SDP, ICE, and maybe TURN. FaceTime did not fit that shape. The browser was not joining a room; it was asking a native Apple-controlled conversation to let it in.

The interesting part of this project was not that FaceTime eventually uses WebRTC. That is the least surprising part. The interesting part was the bridge around it: FaceTime link material, native host authority, IDS-flavoured web registration, WebCourier push delivery, LetMeIn admission, Quick Relay allocation, downlink stream selection, media-key material recovery, and SFrame encryption around encoded frames.

I am keeping real account material, handles, tokens, private keys, device identifiers, and live traffic out of the series. The snippets below are reconstructed/redacted shapes from my notes and local artifacts, not credentials and not a bypass guide. The point is to document the architecture I recovered and the reasoning path that got me there.

The map I ended up with

I found it useful to stop thinking in product screens and start thinking in protocol boundaries. Each boundary had a different authority, different failure space, and different evidence trail.

BoundaryWhat I was trying to proveArtifact that made it concrete
Link resolutionA FaceTime URL is not the call; it is an entry handle into a conversation/link object.ConversationLink and conversation fields in the web bundle.
AdmissionThe browser cannot promote itself from waiting room to participant.LetMeInState, LetMeInResult, host-visible pending/approved transitions.
Identity/routingThe browser still needs Apple-style signaling reachability.IDS web register/query endpoints and WebCourier connect material.
DeliveryPush-like messages reach the browser through a WebSocket fabric.wss://webcourier.../websocket/anon/<hex(protobuf)>.
RelayMedia path allocation is separate from being admitted.QuickRelayWebProtocolMessage and allocation/status enums.
DownlinkReceiving media is an explicit subscription problem.sessionInfoRequest carrying SubscribedStream entries.
Key stateA late joiner needs current media-key material, not only SDP.MKM/prekey timing constants and recovery flows.
Media cryptoFaceTime encrypts encoded frames outside normal RTP payload visibility.Dedicated SFrame worker and sender/receiver transformer names.

The rest of the series follows the order I actually worked through it. Part 0 is the first model and the first bad assumptions. Part 1 is the host/native side, where admission and conversation authority became obvious. Part 2 is the browser runtime, where the concrete protocol vocabulary leaked through minified JavaScript.

The thing that changed the project

The big shift was realizing that FaceTime Web is not a separate lightweight web product. It is a web-shaped participant inside a larger Apple calling stack. That explains why the runtime contains concepts that look too serious for a simple web invite: IDS web registration, push topics, WebCourier reconnect tokens, Quick Relay material stores, MKM rolling windows, AVC blob resend requests, downlink bandwidth allocation, and SFrame queue backpressure.

A simplified call path from my notes looked like this:

open link
  -> parse/resolve conversation link material
  -> create a web-reachable identity/routing surface
  -> connect WebCourier delivery
  -> submit LetMeIn request
  -> wait for host authority
  -> allocate Quick Relay participant resources
  -> exchange/query key material and media blobs
  -> subscribe to streams
  -> transform encoded media through SFrame

The arrows matter because every step can fail independently. A guest can have a valid link and still not be admitted. A guest can be admitted and still fail relay allocation. A guest can receive relay traffic and still be unable to decrypt because the current MKM is missing or stale. A guest can have key material and still not know which stream ids to request. That separation is what makes the reverse engineering interesting.

References