Reverse Engineering FaceTime (Part 1): The Mac Was The Oracle
Part 0 ended with a suspicion: the browser was not the source of truth. Part 1 is where that became the working model.
The Mac behaved like an oracle. The browser could knock, but the Mac decided whether the knock became a participant. That sounds obvious from the UI, but it matters technically. If the host is authoritative, then the reverse engineering target is not just the browser page. It is the conversation state transition that moves a guest from requested to admitted to active.
I used a small harness around FaceTime behavior while keeping the writeup clean of private account material. The interesting pieces were not credentials or local identifiers. The interesting pieces were the state edges: link creation, guest request, pending member, approval, active participant, media readiness, and teardown.
Link creation leaked the first role split
Creating a FaceTime link from the native side produced a shareable object, but it did not produce a call in the browser sense. The link existed before the guest existed. The guest request existed before the guest was active. The host approval existed before all media was fully ready.
The native-side timeline in my notes eventually looked like this:
host creates link
-> conversation/link object exists
-> guest opens link
-> guest sends LetMeIn-style request
-> host sees pending participant
-> host approves or rejects
-> conversation message marks approval
-> relay/key/media work catches up
-> participant becomes activeThat ordering is the most important takeaway from the host-side pass. It is very easy to compress this into “host accepts guest.” That phrase hides four different stages.
| Stage | What I considered proof |
|---|---|
| Link exists | Native FaceTime can share/open the URL before a browser participant is active. |
| Knock exists | Browser appears as a request/pending state, not as media. |
| Approval exists | Host action changes admission state, but media may still be preparing. |
| Active exists | Participant becomes part of the call topology and stream/key logic. |
| Teardown exists | Ending/leaving uses call lifecycle semantics separate from link validity. |
The admission boundary
The cleanest way to describe FaceTime admission is that the browser is a petitioner. It sends enough identity/name/link material to become visible, but it does not own the final transition. The host conversation does.
In reconstructed form, the admission object I cared about looked like this:
let_me_in_request {
link_context: redacted_conversation_link
display_name: redacted_guest_name
web_routing_material: redacted_webcourier_or_ids_projection
requested_capability: join_as_lightweight_guest
}
host_decision {
conversation_group: redacted_uuid
requester: redacted_guest_handle
result: ALLOWED | DENIED | CANCELLED
delegation: optional_redacted_delegation
}The exact live payloads are not published here, but this reconstructed shape captures the technical split I observed: request material is not participant authority. A request can be well-formed and still be denied. A request can be allowed and still need relay allocation, key material, and stream subscription before media works.
Mapping host behavior to recovered fields
The web schema later gave names to the native-side states I was watching. These were the fields that made the host behavior less mystical:
| Field / enum | Why it mattered on the native side |
|---|---|
ConversationMessage.isLetMeInApproved | Approval is explicit state, not merely an absence of denial. |
ConversationMessage.letMeInDelegationHandle | Host-side delegation/identity material is modeled, not improvised in UI. |
ConversationMessage.letMeInDelegationUUID | Approval can carry a specific delegation identifier. |
ConversationMessage.activeParticipants[] | Active media participants are represented separately from invited/requesting members. |
ConversationParticipant.avcData | A participant needs media negotiation data, not just admission. |
ConversationMessage.unicastConnectorBlob | There is connector material outside the visible UI state. |
LetMeInState.ALLOWED vs DONE | Host decision and completed join are different phases. |
This table changed how I debugged failures. If a guest was stuck after host approval, I stopped calling it an admission failure. It might be missing avcData, missing MKM, waiting on participant allocation, or failing downlink subscription. The host had said yes, but the rest of the stack still had work to do.
Conversation and call state were separate enough to matter
Another useful distinction was conversation state versus call/media state. The conversation can know about members, links, lightweight guests, invited handles, and approval. Media setup needs codecs, stream groups, SSRCs, key material, and relay allocation. Those are not the same object.
A simplified split from my notes:
conversation layer:
group uuid
link
members
lightweight members
active participants
let-me-in decision
invitation/delegation state
media layer:
avc blob
stream groups
codec payload types
relay participant allocation
subscribed stream ids
MKM/prekey material
SFrame sender/receiver contextsThat split explains why the UI can show a participant name before media is healthy. It also explains why late join is hard: the new participant has to be attached to both the conversation layer and the media/key layer.
The native side as a state oracle
When I say “the Mac was the oracle,” I do not mean the Mac did every job. I mean it was the easiest observable authority for host decisions. In practice, I treated the native side as the oracle for these questions:
| Question | Native side answered? | Browser/runtime still needed? |
|---|---|---|
| Does the link exist? | Yes | Link parsing/resolution. |
| Is a guest asking to join? | Yes | Web registration/delivery. |
| Was the guest approved? | Yes | LetMeIn state propagation. |
| Is the guest active in the call? | Partially | Relay/session info and media setup. |
| Can media decrypt? | No | MKM/prekey/SFrame state. |
The important lesson was that “approved by host” is a necessary condition, not a sufficient condition. A lot of protocol state becomes visible only after the host boundary has been crossed.
Reconstructing a useful trace
The trace I wish I had at the beginning looked like this. This is intentionally redacted and structural, because the values do not matter; the edges do.
T+0000 native: create link
conversation.link = <redacted>
guestModeEnabled = true
T+0040 web: open link
prepare browser identity/routing material
connect delivery surface
T+0060 web -> host: let_me_in_request
displayName = <redacted>
link = <redacted>
T+0065 native: pending participant visible
state ~= REQUESTING
T+0110 native -> web: decision
result = ALLOWED
isLetMeInApproved = true
T+0110+ web: participant/media phase
allocate relay
obtain/refresh AVC blob
exchange/recover MKM/prekey
subscribe streams
start SFrame transformsThis trace is why I do not like summaries that jump from “approved” to “joined.” The interesting bugs live in the plus sign after T+0110.
Failure taxonomy from the host-side pass
Once I separated those stages, failures became easier to classify.
| Symptom | Layer I would inspect first |
|---|---|
| Link opens but cannot knock | Link parsing, web registration, delivery setup. |
| Host never sees the guest | WebCourier/IDS delivery or malformed request material. |
| Host denies/cancels | Admission result, not media. |
| Host approves but browser stays stuck | Relay allocation, participant activation, key/material recovery. |
| Browser shows participant but no media | AVC blob, downlink subscription, MKM/SFrame decryption. |
| Media starts then stalls | SFrame worker queue, bandwidth/downlink tiering, key roll/recovery. |
That taxonomy shaped Part 2. The web runtime had to explain the right side of the table: delivery, relay, stream subscription, and media crypto. That is where the most technical artifacts were hiding.