Index

Reverse Engineering FaceTime (Part 1): The Mac Was The Oracle

Part 0 ended with a suspicion: the browser was not the source of truth. Part 1 is where that became the working model.

The Mac behaved like an oracle. The browser could knock, but the Mac decided whether the knock became a participant. That sounds obvious from the UI, but it matters technically. If the host is authoritative, then the reverse engineering target is not just the browser page. It is the conversation state transition that moves a guest from requested to admitted to active.

I used a small harness around FaceTime behavior while keeping the writeup clean of private account material. The interesting pieces were not credentials or local identifiers. The interesting pieces were the state edges: link creation, guest request, pending member, approval, active participant, media readiness, and teardown.

Creating a FaceTime link from the native side produced a shareable object, but it did not produce a call in the browser sense. The link existed before the guest existed. The guest request existed before the guest was active. The host approval existed before all media was fully ready.

The native-side timeline in my notes eventually looked like this:

host creates link
  -> conversation/link object exists
  -> guest opens link
  -> guest sends LetMeIn-style request
  -> host sees pending participant
  -> host approves or rejects
  -> conversation message marks approval
  -> relay/key/media work catches up
  -> participant becomes active

That ordering is the most important takeaway from the host-side pass. It is very easy to compress this into “host accepts guest.” That phrase hides four different stages.

StageWhat I considered proof
Link existsNative FaceTime can share/open the URL before a browser participant is active.
Knock existsBrowser appears as a request/pending state, not as media.
Approval existsHost action changes admission state, but media may still be preparing.
Active existsParticipant becomes part of the call topology and stream/key logic.
Teardown existsEnding/leaving uses call lifecycle semantics separate from link validity.

The admission boundary

The cleanest way to describe FaceTime admission is that the browser is a petitioner. It sends enough identity/name/link material to become visible, but it does not own the final transition. The host conversation does.

In reconstructed form, the admission object I cared about looked like this:

let_me_in_request {
  link_context:          redacted_conversation_link
  display_name:          redacted_guest_name
  web_routing_material:  redacted_webcourier_or_ids_projection
  requested_capability:  join_as_lightweight_guest
}
 
host_decision {
  conversation_group: redacted_uuid
  requester:          redacted_guest_handle
  result:             ALLOWED | DENIED | CANCELLED
  delegation:         optional_redacted_delegation
}

The exact live payloads are not published here, but this reconstructed shape captures the technical split I observed: request material is not participant authority. A request can be well-formed and still be denied. A request can be allowed and still need relay allocation, key material, and stream subscription before media works.

Mapping host behavior to recovered fields

The web schema later gave names to the native-side states I was watching. These were the fields that made the host behavior less mystical:

Field / enumWhy it mattered on the native side
ConversationMessage.isLetMeInApprovedApproval is explicit state, not merely an absence of denial.
ConversationMessage.letMeInDelegationHandleHost-side delegation/identity material is modeled, not improvised in UI.
ConversationMessage.letMeInDelegationUUIDApproval can carry a specific delegation identifier.
ConversationMessage.activeParticipants[]Active media participants are represented separately from invited/requesting members.
ConversationParticipant.avcDataA participant needs media negotiation data, not just admission.
ConversationMessage.unicastConnectorBlobThere is connector material outside the visible UI state.
LetMeInState.ALLOWED vs DONEHost decision and completed join are different phases.

This table changed how I debugged failures. If a guest was stuck after host approval, I stopped calling it an admission failure. It might be missing avcData, missing MKM, waiting on participant allocation, or failing downlink subscription. The host had said yes, but the rest of the stack still had work to do.

Conversation and call state were separate enough to matter

Another useful distinction was conversation state versus call/media state. The conversation can know about members, links, lightweight guests, invited handles, and approval. Media setup needs codecs, stream groups, SSRCs, key material, and relay allocation. Those are not the same object.

A simplified split from my notes:

conversation layer:
  group uuid
  link
  members
  lightweight members
  active participants
  let-me-in decision
  invitation/delegation state
 
media layer:
  avc blob
  stream groups
  codec payload types
  relay participant allocation
  subscribed stream ids
  MKM/prekey material
  SFrame sender/receiver contexts

That split explains why the UI can show a participant name before media is healthy. It also explains why late join is hard: the new participant has to be attached to both the conversation layer and the media/key layer.

The native side as a state oracle

When I say “the Mac was the oracle,” I do not mean the Mac did every job. I mean it was the easiest observable authority for host decisions. In practice, I treated the native side as the oracle for these questions:

QuestionNative side answered?Browser/runtime still needed?
Does the link exist?YesLink parsing/resolution.
Is a guest asking to join?YesWeb registration/delivery.
Was the guest approved?YesLetMeIn state propagation.
Is the guest active in the call?PartiallyRelay/session info and media setup.
Can media decrypt?NoMKM/prekey/SFrame state.

The important lesson was that “approved by host” is a necessary condition, not a sufficient condition. A lot of protocol state becomes visible only after the host boundary has been crossed.

Reconstructing a useful trace

The trace I wish I had at the beginning looked like this. This is intentionally redacted and structural, because the values do not matter; the edges do.

T+0000  native: create link
        conversation.link = <redacted>
        guestModeEnabled = true
 
T+0040  web: open link
        prepare browser identity/routing material
        connect delivery surface
 
T+0060  web -> host: let_me_in_request
        displayName = <redacted>
        link = <redacted>
 
T+0065  native: pending participant visible
        state ~= REQUESTING
 
T+0110  native -> web: decision
        result = ALLOWED
        isLetMeInApproved = true
 
T+0110+ web: participant/media phase
        allocate relay
        obtain/refresh AVC blob
        exchange/recover MKM/prekey
        subscribe streams
        start SFrame transforms

This trace is why I do not like summaries that jump from “approved” to “joined.” The interesting bugs live in the plus sign after T+0110.

Failure taxonomy from the host-side pass

Once I separated those stages, failures became easier to classify.

SymptomLayer I would inspect first
Link opens but cannot knockLink parsing, web registration, delivery setup.
Host never sees the guestWebCourier/IDS delivery or malformed request material.
Host denies/cancelsAdmission result, not media.
Host approves but browser stays stuckRelay allocation, participant activation, key/material recovery.
Browser shows participant but no mediaAVC blob, downlink subscription, MKM/SFrame decryption.
Media starts then stallsSFrame worker queue, bandwidth/downlink tiering, key roll/recovery.

That taxonomy shaped Part 2. The web runtime had to explain the right side of the table: delivery, relay, stream subscription, and media crypto. That is where the most technical artifacts were hiding.

References