Index

Reverse Engineering FaceTime (Part 2): WebCourier, IDS, Quick Relay, and SFrame

The browser bundle was where FaceTime stopped being a black box and started leaving labels everywhere.

Native reversing told me that the Mac owned admission. The web runtime told me what the browser had to speak after it knocked. This part is the real protocol notebook: CSP endpoint leaks, WebCourier connection shapes, IDS web register/query, recovered protobuf fields, LetMeIn enums, Quick Relay envelopes, downlink subscriptions, late-join repair, MKM timers, codec tiers, and the SFrame worker.

Same rule as before: no real tokens, no real account identifiers, no device handles, no private keys, and no live traffic dumps. The snippets are reconstructed/redacted shapes from artifacts I recovered. I am publishing the architecture, not operational secrets.

1. The CSP was an infrastructure leak

The /join shell was small enough to inspect before touching the main bundle. The useful thing was the Content-Security-Policy. A CSP is supposed to constrain the browser, but for a reverse engineer it also acts like a rough infrastructure index.

The interesting connect-src entries grouped into a few families:

connect-src blob: 'self' data: *.apple.com
  wss://webcourier.sandbox.push.apple.com
  wss://webcourier.push.apple.com
  wss://webcourier-carry.push.apple.com
  wss://webcourier-tc.push.apple.com
  wss://webcourier.qa2.push.apple.com
  *.icloud.com *.icloud.com.cn *.cdn-apple.com

That one block gave me the first useful split: FaceTime Web talks to Apple web services, WebCourier push hosts, iCloud/IDS-style surfaces, CDN-served runtime assets, and separate media/telemetry configuration. It was not one magical signaling socket.

CSP clueReverse-engineering interpretation
webcourier.*.push.apple.comBrowser delivery is push-like and environment-specific.
*.icloud.com / *.icloud.com.cnIdentity/query surfaces are not purely under facetime.apple.com.
CDN bundle pathThe protocol vocabulary can be recovered from static JavaScript assets.
Worker allowanceEncoded-frame processing can be split into dedicated worker code.
Store-bag style configRuntime behavior can be tuned without shipping a new app binary.

I started with the CSP because it is hard for minification to hide hostnames. Names can be mangled. Endpoints usually survive.

2. Hardcoded endpoints made the layers obvious

The bundle contained enough concrete URL material to separate identity, delivery, telemetry, and FaceTime-host environments.

IDS web register:
  https://gatewayws.icloud.com/identity/WebObjects/TDIdentityService.woa/wa/webregister
  https://identity{env}.ess.apple.com/WebObjects/TDIdentityService.woa/wa/webregister
 
IDS web query:
  https://gatewayws.icloud.com/query/WebObjects/QueryService.woa/wa/webquery
  https://query{env}.ess.apple.com/WebObjects/QueryService.woa/wa/webquery
 
RTC store bag:
  https://mediaservices.cdn-apple.com/store_bags/ft/v1/rtc_storebag.json
 
WebCourier:
  wss://<webcourier-env>/websocket
  wss://<webcourier-env>/websocket/anon/<hex(protobuf)>
  wss://<webcourier-env>/websocket?tok=<base64(token)>

The register/query split matters. Registration answers “how can this browser be reached and represented?” Query answers “what do I need to know about other handles/material?” WebCourier answers “how do async FaceTime events reach a browser tab?” None of these are SDP.

The environments also mattered:

EnvironmentWebCourier host
qa2webcourier.qa2.push.apple.com
prodwebcourier.push.apple.com
carrywebcourier-carry.push.apple.com
teamcarrywebcourier-tc.push.apple.com
sandboxwebcourier.sandbox.push.apple.com

The browser runtime could override environment selection with URL flags such as env and pushEnv. That was another useful sign that this was not a toy client. It had internal/testing surfaces baked into the same runtime path, gated by host and flags.

3. WebCourier looked like APNs semantics in a browser transport

A browser tab cannot be a normal APNs device client, but FaceTime still needs asynchronous delivery. Apple’s APNs is the native push infrastructure for delivering notifications to devices.2 FaceTime Web appeared to use WebCourier as a web-compatible projection of push delivery.

The anonymous connection shape was the first artifact that made this concrete:

wss://<host>/websocket/anon/<hex>
 
where:
  <hex> = hex( protobuf( AnonymousUserConnectMessage, payload ) )

The recovered bootstrap schema was small but meaningful:

AnonymousUserConnectMessage {
  4: connectWithIdsRegisterAndQuery  // message
  5: reconnectMessage                // message
}
 
ConnectedResponseMessage {
  1: pushToken                 // string
  2: timePresenceReceivedSec   // uint32
  3: reconnectToken            // string
}
 
AcknowledgementMessage {
  1: topic  // bytes
  2: id     // uint32
}

This gives the delivery loop a very Apple shape: connect anonymously with structured register/query material, receive a push token/reconnect token, subscribe/acknowledge by topic, and reconnect without rebuilding the entire session if possible.

The topics were also not generic:

SymbolTopic
FaceTimeMulticom.apple.private.alloy.facetime.multi
Willowcom.apple.private.alloy.willow
QuickRelaycom.apple.private.alloy.quickrelay
QuickRelaySignalingcom.apple.private.alloy.quickrelay.web.signaling

The topic names gave away the layer split. FaceTime conversation signaling and Quick Relay signaling ride through the same delivery world, but they are not the same protocol family.

4. The bundle gave names to the unknown bytes

I did not try to understand the whole minified bundle top-down. I searched for nouns that sounded too specific to be UI code:

WebCourier
IDSWebRegisterRequest
IDSWebQueryRequest
ConnectedResponseMessage
ConversationMessage
ConversationLink
LetMeInState
LetMeInResult
QuickRelayWebProtocolMessage
ParticipantAllocate
SubscribedStream
MKM
PreKey
SFrameVideoRTCRtpReceiverTransformer

That vocabulary became the index. Once I had names, I could attach behavior to protocol families instead of staring at byte arrays.

Recovered nameWhat it labeled
ConversationMessageHost/conversation state transitions and admission-carrying messages.
IDSWebRegisterRequestBrowser-side registration into Apple identity/routing surfaces.
WebClientPushMessageDelivery payload entering from WebCourier.
QuickRelayWebProtocolMessageRelay allocation, material exchange, session info, stats, probe, and data messages.
VCMediaNegotiationBlobMedia capabilities, streams, codecs, settings, and stream groups.
SKEBlob / DevicePreKey / MKMKey exchange and media-key material families.

This is why bundle reversing was more valuable than a blind packet capture. A capture gives timing and bytes. The bundle gives field names and enum names.

5. Conversation messages explained the waiting room

The most useful conversation object had fields that aligned almost perfectly with the host-side behavior from Part 1.

ConversationMessage {                         // selected fields
  1:  version                         uint32
  2:  type                            int32
  3:  shouldSuppressInCallUI          bool
  4:  activeParticipants[]            ConversationParticipant
  5:  conversationGroupUUIDString     string
  10: nickname                        string
  11: link                            ConversationLink
  13: isLetMeInApproved               bool
  14: encryptedMessage                EncryptedConversationMessage
  15: letMeInDelegationHandle         string
  16: letMeInDelegationUUID           string
  17: enclosedEncryptedType           int32
  19: invitationPreferences[]         ConversationInvitationPreference
  20: removedMembers[]                ConversationMember
  21: unicastConnectorBlob            bytes
  23: activeLightweightParticipants[] ConversationParticipant
  24: guestModeEnabled                bool
}

This is not a random collection of UI properties. It encodes the semantics I kept seeing externally: a link, guest mode, active participants, approval, delegation, encrypted enclosed messages, and connector material.

The LetMeIn enum made the state machine explicit:

LetMeInResult:
  0 = CANCELLED
  1 = ALLOWED
  2 = DENIED
 
LetMeInState:
  0 = PENDING
  1 = REQUESTING
  2 = FAILED
  3 = CANCELLED
  4 = ALLOWED
  5 = DONE
  6 = DENIED

The important detail is that ALLOWED is not DONE. Approval is an admission result; joining is completed only after more machinery runs.

6. FaceTime signaling still had old Apple command vocabulary

The command table recovered from the bundle placed FaceTime Web inside Apple’s broader messaging/calling protocol family. These were the FaceTime-multi IDS/Nice command values that mattered most:

CodeCommand
206NiceGroupSessionInternalMessage
207NiceGroupSessionJoin
208NiceGroupSessionLeave
209NiceGroupSessionUpdate
210NiceGroupSessionPrekey
211NiceGroupSessionMKM
227NiceMessage
232NiceSessionInvitation
233NiceSessionAccept
234NiceSessionDecline
237NiceSessionEnd
239NiceGroupSessionMessage
243NiceProtobuf

210 and 211 were especially important because they separated prekey distribution from MKM distribution. That distinction shows up again in late-join recovery. A participant can be known and even active while key material is still being repaired.

A simplified delivery path from my notes:

WebCourier message
  -> topic hash / topic id
  -> WebClientPushMessage
  -> Nice / Madrid / FaceTime multi command
  -> ConversationMessage or encrypted conversation payload
  -> state-machine update

This is exactly the kind of layered dispatch minified code tries to make unpleasant. The enum table makes it readable again.

7. Quick Relay was a separate protocol, not a side effect

Once the guest is allowed, media still needs a relay path. The runtime did not represent this as a tiny helper call. It had a large QuickRelayWebProtocolMessage envelope with many numbered branches.

QuickRelayWebProtocolMessage {                 // selected fields
  1:   uuid
  2:   error
  10:  participantAllocateRequest
  11:  participantAllocateResponse
  12:  unallocbindRequest
  13:  unallocbindResponse
  20:  infoRequest
  21:  infoResponse
  24:  putmaterialRequest
  25:  putmaterialResponse
  26:  putmaterialIndication
  27:  getmaterialRequest
  28:  getmaterialResponse
  40:  sessionInfoRequest
  41:  sessionInfoResponse
  42:  sessionInfoUpdate
  50:  dataMessage
  60:  streamCompoundRequest
  61:  streamCompoundResponse
  81:  participantUpdateRequest
  82:  participantUpdateResponse
  83:  participantUpdateUpdate
  100: participantStatsRequest
  101: participantStatsResponse
  120: participantProbeRequest
  121: participantProbeResponse
}

That schema is dense, but it reveals the design. Quick Relay is not only allocation. It carries material exchange, session info, stream subscription, participant updates, stats, probes, reporting, and data messages.

Allocation had its own status/error family:

CodeMeaning
0OK
1001InvalidField
5001InternalError
5004Busy
5008MissingKey
5010Block

The MissingKey result is the one that makes the relay/key boundary obvious. A relay failure can be caused by cryptographic material state, not only by networking.

8. Media negotiation was carried in an AVC blob

The media side had a central VCMediaNegotiationBlob. This was one of the artifacts that stopped me from treating media setup as “just SDP.” The blob had FaceTime-specific settings, stream groups, multiway streams, captions, moments, bandwidth, and timing fields.

VCMediaNegotiationBlob {
  1:  allowDynamicMaxBitrate                    bool
  2:  allowsContentsChangeWithAspectPreservation bool
  3:  audioSettings                             AudioSettings
  4:  videoSettings                             VideoSettings
  5:  screenSettings                            VideoSettings
  6:  userAgent                                 string
  7:  basebandCodec                             string
  8:  basebandCodecSampleRate                   uint32
  9:  bandwidthSettings[]                       BandwidthSettings
  10: captionsSettings                          CaptionsSettings
  11: multiwayAudioStreams[]                    MultiwayAudioStream
  12: momentsSettings                           MomentsSettings
  13: ntpTime                                   uint64
  14: blobVersion                               uint32
  15: multiwayVideoStream[]                     MultiwayVideoStream
  16: mediaControlInfoVersion                   uint32
  17: faceTimeSettings                          FaceTimeSettings
  18: multiwayScreenStream[]                    MultiwayVideoStream
  19: streamGroups[]                            StreamGroup
}

The late-join code later confirmed why this matters. If a participant appears and the client does not have that participant’s AVC blob, it may know the participant exists but still not know enough about stream groups/codecs to subscribe and decode correctly.

9. The codec tiers were exact, not vibes

The runtime carried concrete audio/video tier tables. That was useful because a lot of call quality debugging otherwise turns into folklore.

Audio tierStream indexQuality indexEncode kbpsNetwork kbpsptimeCodec
low02713.22660Opus16K, PT 96
high1162486940Opus24K, PT 97, in-band FEC
Video tierStream indexQualityResolution markerEncode kbpsNetwork kbpsFPS
low093192506215
mid140032020023030
high2125072080087230

The RTP/codec map also had concrete payload IDs and extensions:

ItemValue
H.264 payload type123
RTX payload type124
Opus16K payload type96
Opus24K payload type97
CVO extensionid 1, urn:3gpp:video-orientation
AudioLevel extensionid 2, urn:ietf:params:rtp-hdrext:ssrc-audio-level
TWCC extensionid 3, transport-wide congestion control
AbsoluteSendTime extensionid 6, abs-send-time

This is the level where the browser runtime became testable. I could force or reason about quality tiers instead of guessing what “high” meant.

Receiving media was not just “open peer connection and wait.” The web client had a downlink controller that decided which peer streams to request and at what tier.

The loop looked like this:

setRemoteParticipants()
participantRequestConfigurationChanged()
availableDownlinkBandwidthChanged()
SFrameQueueOccupancyChanged()
connectionChanged()
  -> updateStreamsDebounced()
  -> _updateStreams()
       infos = getStreamInfos()
       requestStreams(infos)
       rearmDownlinkBandwidthCheckTimer()

The per-peer desired tiers were selected from UI and policy:

getVideoTier(peer):
  if tile not visible: return null
  if video disabled or paused: return null
  tier = width <= 192 ? low : width <= 320 ? mid : high
  tier = capByUserAgent(tier)
  tier = capBySFrameQueueOccupancy(tier)
  return tier
 
getAudioTier(peer):
  if AutoQualityAudioStreams: return high
  if LowQualityAudioStreams:  return low
  return high

Then the bandwidth allocator flattened stream candidates and greedily fit them under the available estimate. The important property was audio-first behavior. Active talkers’ audio was prioritized, low-quality tiers were not starved, and video upgrades only happened when the delta could fit.

The actual subscription went through Quick Relay sessionInfoRequest field 40. The per-peer unit was SubscribedStream:

SubscribedStream {
  1: wildcardSubscription  bool
  2: peerParticipantId     bytes/string-ish id
  3: peerStreamIds[]       repeated stream id
  4: isSeamlessTransition  bool
}

That tiny structure explains a lot. The relay does not need to decrypt media. It only needs to know which encrypted streams to forward. SFrame keeps the media opaque while Quick Relay handles forwarding and subscription state.

11. Late join was three repair loops, not one join event

Late join is where FaceTime Web stopped looking like a thin client. When a participant becomes active after others are already in the call, the browser has to repair topology, media description, and key state.

The recovered flow looked like this:

setActiveParticipants(newIds)
  -> diff old active set vs new active set
  -> for newly active peer:
       sendMKMToParticipant(peer.pushToken)
  -> recoverPreKeys(localJustBecameActive)
       if active peer has no prekey:
         attemptMaterialRecovery(peer)
         sendPreKeyRecoveryRequest(peer)
  -> recoverAVCBlobs(activeIds)
       if active peer has no avcBlob:
         attemptMaterialRecovery(peer)
         sendAVCBlobRequest(peer)
  -> handleActiveParticipantsChange()
       recompute downlink subscriptions

I like this flow because it is brutally practical. A late participant needs three things:

Repair targetWhy it is necessary
MKM / prekey materialWithout current key material, encrypted frames are noise.
AVC blobWithout media negotiation data, stream groups/codecs/SSRC mapping are unknown.
Downlink subscriptionWithout SubscribedStream updates, the relay may not forward the right streams.

This is also why “approved but no media” is not one bug. It can be missing material, missing AVC data, stale subscription state, or SFrame recovery timing out.

12. MKM timers exposed the key lifecycle

Key material had a real lifecycle. The config object in the runtime gave exact timing constants for prekeys, MKM distribution, MKM rolling, recovery, and query behavior.

ConstantValueMeaning
PreKeyExpireDuration3,600,000 msPrekey validity, one hour.
PreviousPrekeyExpireDuration420,000 msGrace period for the previous prekey.
InitialMKMDistributeDuration600,000 msFirst MKM distribution window.
MKMExpireDuration1,800,000 msMKM validity.
MKMRollDuration1,200,000 msExpected media-key roll cadence.
MKMRollWaitDuration5,000 msSettle time around a roll.
MKMPreRecoveryInterval10,000 msRecovery probe interval before visible failure.
DecryptionTimeout30,000 msGive up decrypting under current assumptions.
DecryptionMKMRecoveryTimeout10,000 msMKM recovery deadline.
ParticipantAllocateResponseTimeout300,000 msRelay allocation wait.
ActiveParticipantsResponseTimeout10,000 msActive participant query wait.
LastParticipantRemainingTimeout15,000 msSolo-in-call grace window.
IDSQueryTimeout20,000 msPer-query timeout, with retries/backoff.

Numbers like these are useful because they make behavior falsifiable. If decryption has been broken for longer than 30 seconds, I should not be debugging it the same way as a 200 ms race. If relay allocation is still pending, that is a different timeout budget from IDS query.

13. SFrame lived in a dedicated worker

The media encryption boundary was clean. FaceTime Web used WebRTC encoded transforms, which let a web app transform encoded frames before sending or after receiving.3 SFrame is a frame-level encryption design for end-to-end encrypted media.4

The worker chunk exposed four transformer names:

SFrameVideoRTCRtpSenderTransformer
SFrameAudioRTCRtpSenderTransformer
SFrameVideoRTCRtpReceiverTransformer
SFrameAudioRTCRtpReceiverTransformer

The cipher split was also concrete:

Media kindCipher classMAC length
audioAES_CM_128_HMAC_SHA256_324 bytes
videoAES_CM_128_HMAC_SHA256_8010 bytes
dataAES_CM_128_HMAC_SHA256_8010 bytes

The derivation path used HKDF-SHA256 labels that were readable in the worker:

secret32:
  ikm  = secret[0..16]
  salt = secret[16..32]
 
HKDF labels:
  "SFrameSaltKey"
  "SFrameEncryptionKey"
  "SFrameAuthenticationKey"
 
per_ssrc_key:
  HKDF-SHA256(ikm = mkm, salt = mks, info = ssrc[4..]) -> 256 bits

The SFrame header parser exposed a compact frame-level format:

header byte:
  bit 7     signature flag       // rejected by this path
  bit 3     extended key id flag // required
  bits 0-2  counter length - 1
  bits 4-6  key id length - 1
 
serialized:
  header || keyId || counter || encrypted_payload || mac

That is not “WebRTC encryption” in the generic sense. That is a dedicated application-layer frame transform with key ids, counters, replay checks, and per-SSRC key derivation.

14. Replay protection and recovery were part of the media path

The receiver context kept more state than just “current key.” The worker tracked a max received counter, a 1024-entry replay window, key stats, and MKM recovery timers.

ReceiverContext {
  keyring
  maxReceivedCounter
  replayWindowBuffer[1024]
  keyStats
  mkmRecoveryTimers
}
 
checkReplayWindow(ctr):
  if ctr <= maxReceivedCounter and (maxReceivedCounter - ctr) >= 1024:
      reject
  if replayWindowBuffer[ctr & 1023] already seen:
      reject
  accept

When the worker could not decrypt, it did not just fail silently. It emitted recovery-oriented messages back to the main runtime:

inbound commands:
  sframeInitialize
  updateKeyRing
  resetReceiver
  setSenderKey
  decrypt
  encrypt
  heartbeatRequest
  senderTransformUpdate
  receiverTransformUpdate
 
outbound events:
  mkmNeedsRecovery
  maximumDecryptionTimeoutReached
  streamStartedDecrypting
  stallReport
  decryptSuccessStatusChanged
  heartbeatResponse
  logMessage

That worker protocol explains why MKM recovery shows up in the larger call flow. Decryption failure is not just a media error; it feeds back into key-material recovery.

15. SFrame queue pressure fed back into video quality

One of the coolest details was that the SFrame worker was not isolated from adaptation. The downlink controller could throttle video tier selection based on SFrame queue occupancy.

The worker had bounded queues:

Queue/config detailValue / behavior
Audio queue30 frames
Video receiver queue15 frames
Max time span500 ms
Scheduler cycle48 ms time-sliced pump
Backpressure signalAverage queue occupancy reported to downlink logic

That creates a feedback loop:

video frames arrive encrypted
  -> SFrame receiver queue backs up
  -> average occupancy crosses high watermark
  -> downlink caps max video tier mid/low
  -> subscription request changes
  -> less encrypted video load arrives
  -> queue drains
  -> cap relaxes

This is not just bandwidth adaptation. It is crypto/decode pipeline pressure feeding into stream selection.

16. URL flags were a built-in test surface

The runtime parsed a large flag schema from the URL/query/hash. A few flags were immediately useful for understanding what the client could vary:

FlagEffect
aqForce audio quality tier (a, h, l).
vqForce video quality tier (a, h, m, l).
multitierPublish multiple media tiers.
singleSsrcCollapse media to a single SSRC path.
twccToggle transport-wide congestion control extension.
nackToggle RTCP NACK behavior.
usePushForKeyExchangeUse IDS push path for key exchange.
useQRForKeyExchangeUse Quick Relay for key exchange.
tcpFallbackAllow relay TCP fallback.
noNativeRedirectAvoid bouncing to the native app.
forceWebQripForce web QR-initiated participant path.
allocateonForce relay allocation target.

I do not treat these as public product features. I treat them as reverse-engineering affordances. Flags reveal the variables Apple engineers expected to test: tiers, SSRC shape, congestion feedback, key-exchange path, relay fallback, and native redirect behavior.

17. Error spaces mapped the system better than prose

Enums are underrated. They are the places where engineers admit what can fail.

Error/result familyLayer
FTResponseCodeGeneral FaceTime response/error surface.
IDSSemanticErrorCodeIdentity/query/routing semantics.
WebCourierCloseCodeDelivery-channel closure.
QRSessionAllocationStatusCodeRelay allocation.
RelayFailedErrorCodeRelay/media path failure.
TUCallDisconnectedReasonCall lifecycle/disconnect semantics.
EndReasonBadLinkLink/capability invalidity.

This table became my debugging map. If a failure was in LetMeIn, I did not blame SFrame. If the relay returned MissingKey, I did not treat it like a bad URL. If the SFrame worker emitted maximumDecryptionTimeoutReached, I looked at MKM/prekey recovery rather than WebCourier connection state.

18. The final runtime model

By the end of the web pass, my model looked like this:

GET /join
  -> CSP reveals allowed infra
  -> load facetime2 bundle + worker chunks
  -> parse flags/env/store-bag config
  -> register/query IDS web surfaces
  -> connect WebCourier anon/reconnect/token path
  -> subscribe/ack push topics
  -> receive FaceTime multi / Quick Relay messages
  -> submit or track LetMeIn request
  -> host decision transitions LetMeIn state
  -> allocate Quick Relay participant
  -> exchange/query material with QR + IDS paths
  -> obtain/recover AVC blobs
  -> publish media tiers
  -> send sessionInfoRequest with SubscribedStream entries
  -> run encoded frames through SFrame worker
  -> recover MKM/prekeys when decryption fails or participants join late

That is the version I wish I had at the start. It keeps the boundaries clear. FaceTime Web is not only a WebRTC demo. It is a browser participant inside a native Apple calling system, using web-shaped versions of Apple identity, push delivery, relay signaling, media negotiation, and frame encryption.

What made the reversing satisfying was that the artifacts lined up across layers. The UI said “waiting.” The native side showed host authority. The schema said LetMeInState. The relay envelope said participantAllocateRequest and sessionInfoRequest. The media blob said stream groups. The worker said mkmNeedsRecovery.

At that point the system stopped looking magical. It looked like protocols.

References