Reverse Engineering FaceTime (Part 2): WebCourier, IDS, Quick Relay, and SFrame
The browser bundle was where FaceTime stopped being a black box and started leaving labels everywhere.
Native reversing told me that the Mac owned admission. The web runtime told me what the browser had to speak after it knocked. This part is the real protocol notebook: CSP endpoint leaks, WebCourier connection shapes, IDS web register/query, recovered protobuf fields, LetMeIn enums, Quick Relay envelopes, downlink subscriptions, late-join repair, MKM timers, codec tiers, and the SFrame worker.
Same rule as before: no real tokens, no real account identifiers, no device handles, no private keys, and no live traffic dumps. The snippets are reconstructed/redacted shapes from artifacts I recovered. I am publishing the architecture, not operational secrets.
1. The CSP was an infrastructure leak
The /join shell was small enough to inspect before touching the main bundle. The useful thing was the Content-Security-Policy. A CSP is supposed to constrain the browser, but for a reverse engineer it also acts like a rough infrastructure index.
The interesting connect-src entries grouped into a few families:
connect-src blob: 'self' data: *.apple.com
wss://webcourier.sandbox.push.apple.com
wss://webcourier.push.apple.com
wss://webcourier-carry.push.apple.com
wss://webcourier-tc.push.apple.com
wss://webcourier.qa2.push.apple.com
*.icloud.com *.icloud.com.cn *.cdn-apple.comThat one block gave me the first useful split: FaceTime Web talks to Apple web services, WebCourier push hosts, iCloud/IDS-style surfaces, CDN-served runtime assets, and separate media/telemetry configuration. It was not one magical signaling socket.
| CSP clue | Reverse-engineering interpretation |
|---|---|
webcourier.*.push.apple.com | Browser delivery is push-like and environment-specific. |
*.icloud.com / *.icloud.com.cn | Identity/query surfaces are not purely under facetime.apple.com. |
| CDN bundle path | The protocol vocabulary can be recovered from static JavaScript assets. |
| Worker allowance | Encoded-frame processing can be split into dedicated worker code. |
| Store-bag style config | Runtime behavior can be tuned without shipping a new app binary. |
I started with the CSP because it is hard for minification to hide hostnames. Names can be mangled. Endpoints usually survive.
2. Hardcoded endpoints made the layers obvious
The bundle contained enough concrete URL material to separate identity, delivery, telemetry, and FaceTime-host environments.
IDS web register:
https://gatewayws.icloud.com/identity/WebObjects/TDIdentityService.woa/wa/webregister
https://identity{env}.ess.apple.com/WebObjects/TDIdentityService.woa/wa/webregister
IDS web query:
https://gatewayws.icloud.com/query/WebObjects/QueryService.woa/wa/webquery
https://query{env}.ess.apple.com/WebObjects/QueryService.woa/wa/webquery
RTC store bag:
https://mediaservices.cdn-apple.com/store_bags/ft/v1/rtc_storebag.json
WebCourier:
wss://<webcourier-env>/websocket
wss://<webcourier-env>/websocket/anon/<hex(protobuf)>
wss://<webcourier-env>/websocket?tok=<base64(token)>The register/query split matters. Registration answers “how can this browser be reached and represented?” Query answers “what do I need to know about other handles/material?” WebCourier answers “how do async FaceTime events reach a browser tab?” None of these are SDP.
The environments also mattered:
| Environment | WebCourier host |
|---|---|
qa2 | webcourier.qa2.push.apple.com |
prod | webcourier.push.apple.com |
carry | webcourier-carry.push.apple.com |
teamcarry | webcourier-tc.push.apple.com |
sandbox | webcourier.sandbox.push.apple.com |
The browser runtime could override environment selection with URL flags such as env and pushEnv. That was another useful sign that this was not a toy client. It had internal/testing surfaces baked into the same runtime path, gated by host and flags.
3. WebCourier looked like APNs semantics in a browser transport
A browser tab cannot be a normal APNs device client, but FaceTime still needs asynchronous delivery. Apple’s APNs is the native push infrastructure for delivering notifications to devices.2 FaceTime Web appeared to use WebCourier as a web-compatible projection of push delivery.
The anonymous connection shape was the first artifact that made this concrete:
wss://<host>/websocket/anon/<hex>
where:
<hex> = hex( protobuf( AnonymousUserConnectMessage, payload ) )The recovered bootstrap schema was small but meaningful:
AnonymousUserConnectMessage {
4: connectWithIdsRegisterAndQuery // message
5: reconnectMessage // message
}
ConnectedResponseMessage {
1: pushToken // string
2: timePresenceReceivedSec // uint32
3: reconnectToken // string
}
AcknowledgementMessage {
1: topic // bytes
2: id // uint32
}This gives the delivery loop a very Apple shape: connect anonymously with structured register/query material, receive a push token/reconnect token, subscribe/acknowledge by topic, and reconnect without rebuilding the entire session if possible.
The topics were also not generic:
| Symbol | Topic |
|---|---|
FaceTimeMulti | com.apple.private.alloy.facetime.multi |
Willow | com.apple.private.alloy.willow |
QuickRelay | com.apple.private.alloy.quickrelay |
QuickRelaySignaling | com.apple.private.alloy.quickrelay.web.signaling |
The topic names gave away the layer split. FaceTime conversation signaling and Quick Relay signaling ride through the same delivery world, but they are not the same protocol family.
4. The bundle gave names to the unknown bytes
I did not try to understand the whole minified bundle top-down. I searched for nouns that sounded too specific to be UI code:
WebCourier
IDSWebRegisterRequest
IDSWebQueryRequest
ConnectedResponseMessage
ConversationMessage
ConversationLink
LetMeInState
LetMeInResult
QuickRelayWebProtocolMessage
ParticipantAllocate
SubscribedStream
MKM
PreKey
SFrameVideoRTCRtpReceiverTransformerThat vocabulary became the index. Once I had names, I could attach behavior to protocol families instead of staring at byte arrays.
| Recovered name | What it labeled |
|---|---|
ConversationMessage | Host/conversation state transitions and admission-carrying messages. |
IDSWebRegisterRequest | Browser-side registration into Apple identity/routing surfaces. |
WebClientPushMessage | Delivery payload entering from WebCourier. |
QuickRelayWebProtocolMessage | Relay allocation, material exchange, session info, stats, probe, and data messages. |
VCMediaNegotiationBlob | Media capabilities, streams, codecs, settings, and stream groups. |
SKEBlob / DevicePreKey / MKM | Key exchange and media-key material families. |
This is why bundle reversing was more valuable than a blind packet capture. A capture gives timing and bytes. The bundle gives field names and enum names.
5. Conversation messages explained the waiting room
The most useful conversation object had fields that aligned almost perfectly with the host-side behavior from Part 1.
ConversationMessage { // selected fields
1: version uint32
2: type int32
3: shouldSuppressInCallUI bool
4: activeParticipants[] ConversationParticipant
5: conversationGroupUUIDString string
10: nickname string
11: link ConversationLink
13: isLetMeInApproved bool
14: encryptedMessage EncryptedConversationMessage
15: letMeInDelegationHandle string
16: letMeInDelegationUUID string
17: enclosedEncryptedType int32
19: invitationPreferences[] ConversationInvitationPreference
20: removedMembers[] ConversationMember
21: unicastConnectorBlob bytes
23: activeLightweightParticipants[] ConversationParticipant
24: guestModeEnabled bool
}This is not a random collection of UI properties. It encodes the semantics I kept seeing externally: a link, guest mode, active participants, approval, delegation, encrypted enclosed messages, and connector material.
The LetMeIn enum made the state machine explicit:
LetMeInResult:
0 = CANCELLED
1 = ALLOWED
2 = DENIED
LetMeInState:
0 = PENDING
1 = REQUESTING
2 = FAILED
3 = CANCELLED
4 = ALLOWED
5 = DONE
6 = DENIEDThe important detail is that ALLOWED is not DONE. Approval is an admission result; joining is completed only after more machinery runs.
6. FaceTime signaling still had old Apple command vocabulary
The command table recovered from the bundle placed FaceTime Web inside Apple’s broader messaging/calling protocol family. These were the FaceTime-multi IDS/Nice command values that mattered most:
| Code | Command |
|---|---|
| 206 | NiceGroupSessionInternalMessage |
| 207 | NiceGroupSessionJoin |
| 208 | NiceGroupSessionLeave |
| 209 | NiceGroupSessionUpdate |
| 210 | NiceGroupSessionPrekey |
| 211 | NiceGroupSessionMKM |
| 227 | NiceMessage |
| 232 | NiceSessionInvitation |
| 233 | NiceSessionAccept |
| 234 | NiceSessionDecline |
| 237 | NiceSessionEnd |
| 239 | NiceGroupSessionMessage |
| 243 | NiceProtobuf |
210 and 211 were especially important because they separated prekey distribution from MKM distribution. That distinction shows up again in late-join recovery. A participant can be known and even active while key material is still being repaired.
A simplified delivery path from my notes:
WebCourier message
-> topic hash / topic id
-> WebClientPushMessage
-> Nice / Madrid / FaceTime multi command
-> ConversationMessage or encrypted conversation payload
-> state-machine updateThis is exactly the kind of layered dispatch minified code tries to make unpleasant. The enum table makes it readable again.
7. Quick Relay was a separate protocol, not a side effect
Once the guest is allowed, media still needs a relay path. The runtime did not represent this as a tiny helper call. It had a large QuickRelayWebProtocolMessage envelope with many numbered branches.
QuickRelayWebProtocolMessage { // selected fields
1: uuid
2: error
10: participantAllocateRequest
11: participantAllocateResponse
12: unallocbindRequest
13: unallocbindResponse
20: infoRequest
21: infoResponse
24: putmaterialRequest
25: putmaterialResponse
26: putmaterialIndication
27: getmaterialRequest
28: getmaterialResponse
40: sessionInfoRequest
41: sessionInfoResponse
42: sessionInfoUpdate
50: dataMessage
60: streamCompoundRequest
61: streamCompoundResponse
81: participantUpdateRequest
82: participantUpdateResponse
83: participantUpdateUpdate
100: participantStatsRequest
101: participantStatsResponse
120: participantProbeRequest
121: participantProbeResponse
}That schema is dense, but it reveals the design. Quick Relay is not only allocation. It carries material exchange, session info, stream subscription, participant updates, stats, probes, reporting, and data messages.
Allocation had its own status/error family:
| Code | Meaning |
|---|---|
| 0 | OK |
| 1001 | InvalidField |
| 5001 | InternalError |
| 5004 | Busy |
| 5008 | MissingKey |
| 5010 | Block |
The MissingKey result is the one that makes the relay/key boundary obvious. A relay failure can be caused by cryptographic material state, not only by networking.
8. Media negotiation was carried in an AVC blob
The media side had a central VCMediaNegotiationBlob. This was one of the artifacts that stopped me from treating media setup as “just SDP.” The blob had FaceTime-specific settings, stream groups, multiway streams, captions, moments, bandwidth, and timing fields.
VCMediaNegotiationBlob {
1: allowDynamicMaxBitrate bool
2: allowsContentsChangeWithAspectPreservation bool
3: audioSettings AudioSettings
4: videoSettings VideoSettings
5: screenSettings VideoSettings
6: userAgent string
7: basebandCodec string
8: basebandCodecSampleRate uint32
9: bandwidthSettings[] BandwidthSettings
10: captionsSettings CaptionsSettings
11: multiwayAudioStreams[] MultiwayAudioStream
12: momentsSettings MomentsSettings
13: ntpTime uint64
14: blobVersion uint32
15: multiwayVideoStream[] MultiwayVideoStream
16: mediaControlInfoVersion uint32
17: faceTimeSettings FaceTimeSettings
18: multiwayScreenStream[] MultiwayVideoStream
19: streamGroups[] StreamGroup
}The late-join code later confirmed why this matters. If a participant appears and the client does not have that participant’s AVC blob, it may know the participant exists but still not know enough about stream groups/codecs to subscribe and decode correctly.
9. The codec tiers were exact, not vibes
The runtime carried concrete audio/video tier tables. That was useful because a lot of call quality debugging otherwise turns into folklore.
| Audio tier | Stream index | Quality index | Encode kbps | Network kbps | ptime | Codec |
|---|---|---|---|---|---|---|
| low | 0 | 27 | 13.2 | 26 | 60 | Opus16K, PT 96 |
| high | 1 | 162 | 48 | 69 | 40 | Opus24K, PT 97, in-band FEC |
| Video tier | Stream index | Quality | Resolution marker | Encode kbps | Network kbps | FPS |
|---|---|---|---|---|---|---|
| low | 0 | 93 | 192 | 50 | 62 | 15 |
| mid | 1 | 400 | 320 | 200 | 230 | 30 |
| high | 2 | 1250 | 720 | 800 | 872 | 30 |
The RTP/codec map also had concrete payload IDs and extensions:
| Item | Value |
|---|---|
| H.264 payload type | 123 |
| RTX payload type | 124 |
| Opus16K payload type | 96 |
| Opus24K payload type | 97 |
| CVO extension | id 1, urn:3gpp:video-orientation |
| AudioLevel extension | id 2, urn:ietf:params:rtp-hdrext:ssrc-audio-level |
| TWCC extension | id 3, transport-wide congestion control |
| AbsoluteSendTime extension | id 6, abs-send-time |
This is the level where the browser runtime became testable. I could force or reason about quality tiers instead of guessing what “high” meant.
10. Downlink was a subscription algorithm
Receiving media was not just “open peer connection and wait.” The web client had a downlink controller that decided which peer streams to request and at what tier.
The loop looked like this:
setRemoteParticipants()
participantRequestConfigurationChanged()
availableDownlinkBandwidthChanged()
SFrameQueueOccupancyChanged()
connectionChanged()
-> updateStreamsDebounced()
-> _updateStreams()
infos = getStreamInfos()
requestStreams(infos)
rearmDownlinkBandwidthCheckTimer()The per-peer desired tiers were selected from UI and policy:
getVideoTier(peer):
if tile not visible: return null
if video disabled or paused: return null
tier = width <= 192 ? low : width <= 320 ? mid : high
tier = capByUserAgent(tier)
tier = capBySFrameQueueOccupancy(tier)
return tier
getAudioTier(peer):
if AutoQualityAudioStreams: return high
if LowQualityAudioStreams: return low
return highThen the bandwidth allocator flattened stream candidates and greedily fit them under the available estimate. The important property was audio-first behavior. Active talkers’ audio was prioritized, low-quality tiers were not starved, and video upgrades only happened when the delta could fit.
The actual subscription went through Quick Relay sessionInfoRequest field 40. The per-peer unit was SubscribedStream:
SubscribedStream {
1: wildcardSubscription bool
2: peerParticipantId bytes/string-ish id
3: peerStreamIds[] repeated stream id
4: isSeamlessTransition bool
}That tiny structure explains a lot. The relay does not need to decrypt media. It only needs to know which encrypted streams to forward. SFrame keeps the media opaque while Quick Relay handles forwarding and subscription state.
11. Late join was three repair loops, not one join event
Late join is where FaceTime Web stopped looking like a thin client. When a participant becomes active after others are already in the call, the browser has to repair topology, media description, and key state.
The recovered flow looked like this:
setActiveParticipants(newIds)
-> diff old active set vs new active set
-> for newly active peer:
sendMKMToParticipant(peer.pushToken)
-> recoverPreKeys(localJustBecameActive)
if active peer has no prekey:
attemptMaterialRecovery(peer)
sendPreKeyRecoveryRequest(peer)
-> recoverAVCBlobs(activeIds)
if active peer has no avcBlob:
attemptMaterialRecovery(peer)
sendAVCBlobRequest(peer)
-> handleActiveParticipantsChange()
recompute downlink subscriptionsI like this flow because it is brutally practical. A late participant needs three things:
| Repair target | Why it is necessary |
|---|---|
| MKM / prekey material | Without current key material, encrypted frames are noise. |
| AVC blob | Without media negotiation data, stream groups/codecs/SSRC mapping are unknown. |
| Downlink subscription | Without SubscribedStream updates, the relay may not forward the right streams. |
This is also why “approved but no media” is not one bug. It can be missing material, missing AVC data, stale subscription state, or SFrame recovery timing out.
12. MKM timers exposed the key lifecycle
Key material had a real lifecycle. The config object in the runtime gave exact timing constants for prekeys, MKM distribution, MKM rolling, recovery, and query behavior.
| Constant | Value | Meaning |
|---|---|---|
PreKeyExpireDuration | 3,600,000 ms | Prekey validity, one hour. |
PreviousPrekeyExpireDuration | 420,000 ms | Grace period for the previous prekey. |
InitialMKMDistributeDuration | 600,000 ms | First MKM distribution window. |
MKMExpireDuration | 1,800,000 ms | MKM validity. |
MKMRollDuration | 1,200,000 ms | Expected media-key roll cadence. |
MKMRollWaitDuration | 5,000 ms | Settle time around a roll. |
MKMPreRecoveryInterval | 10,000 ms | Recovery probe interval before visible failure. |
DecryptionTimeout | 30,000 ms | Give up decrypting under current assumptions. |
DecryptionMKMRecoveryTimeout | 10,000 ms | MKM recovery deadline. |
ParticipantAllocateResponseTimeout | 300,000 ms | Relay allocation wait. |
ActiveParticipantsResponseTimeout | 10,000 ms | Active participant query wait. |
LastParticipantRemainingTimeout | 15,000 ms | Solo-in-call grace window. |
IDSQueryTimeout | 20,000 ms | Per-query timeout, with retries/backoff. |
Numbers like these are useful because they make behavior falsifiable. If decryption has been broken for longer than 30 seconds, I should not be debugging it the same way as a 200 ms race. If relay allocation is still pending, that is a different timeout budget from IDS query.
13. SFrame lived in a dedicated worker
The media encryption boundary was clean. FaceTime Web used WebRTC encoded transforms, which let a web app transform encoded frames before sending or after receiving.3 SFrame is a frame-level encryption design for end-to-end encrypted media.4
The worker chunk exposed four transformer names:
SFrameVideoRTCRtpSenderTransformer
SFrameAudioRTCRtpSenderTransformer
SFrameVideoRTCRtpReceiverTransformer
SFrameAudioRTCRtpReceiverTransformerThe cipher split was also concrete:
| Media kind | Cipher class | MAC length |
|---|---|---|
| audio | AES_CM_128_HMAC_SHA256_32 | 4 bytes |
| video | AES_CM_128_HMAC_SHA256_80 | 10 bytes |
| data | AES_CM_128_HMAC_SHA256_80 | 10 bytes |
The derivation path used HKDF-SHA256 labels that were readable in the worker:
secret32:
ikm = secret[0..16]
salt = secret[16..32]
HKDF labels:
"SFrameSaltKey"
"SFrameEncryptionKey"
"SFrameAuthenticationKey"
per_ssrc_key:
HKDF-SHA256(ikm = mkm, salt = mks, info = ssrc[4..]) -> 256 bitsThe SFrame header parser exposed a compact frame-level format:
header byte:
bit 7 signature flag // rejected by this path
bit 3 extended key id flag // required
bits 0-2 counter length - 1
bits 4-6 key id length - 1
serialized:
header || keyId || counter || encrypted_payload || macThat is not “WebRTC encryption” in the generic sense. That is a dedicated application-layer frame transform with key ids, counters, replay checks, and per-SSRC key derivation.
14. Replay protection and recovery were part of the media path
The receiver context kept more state than just “current key.” The worker tracked a max received counter, a 1024-entry replay window, key stats, and MKM recovery timers.
ReceiverContext {
keyring
maxReceivedCounter
replayWindowBuffer[1024]
keyStats
mkmRecoveryTimers
}
checkReplayWindow(ctr):
if ctr <= maxReceivedCounter and (maxReceivedCounter - ctr) >= 1024:
reject
if replayWindowBuffer[ctr & 1023] already seen:
reject
acceptWhen the worker could not decrypt, it did not just fail silently. It emitted recovery-oriented messages back to the main runtime:
inbound commands:
sframeInitialize
updateKeyRing
resetReceiver
setSenderKey
decrypt
encrypt
heartbeatRequest
senderTransformUpdate
receiverTransformUpdate
outbound events:
mkmNeedsRecovery
maximumDecryptionTimeoutReached
streamStartedDecrypting
stallReport
decryptSuccessStatusChanged
heartbeatResponse
logMessageThat worker protocol explains why MKM recovery shows up in the larger call flow. Decryption failure is not just a media error; it feeds back into key-material recovery.
15. SFrame queue pressure fed back into video quality
One of the coolest details was that the SFrame worker was not isolated from adaptation. The downlink controller could throttle video tier selection based on SFrame queue occupancy.
The worker had bounded queues:
| Queue/config detail | Value / behavior |
|---|---|
| Audio queue | 30 frames |
| Video receiver queue | 15 frames |
| Max time span | 500 ms |
| Scheduler cycle | 48 ms time-sliced pump |
| Backpressure signal | Average queue occupancy reported to downlink logic |
That creates a feedback loop:
video frames arrive encrypted
-> SFrame receiver queue backs up
-> average occupancy crosses high watermark
-> downlink caps max video tier mid/low
-> subscription request changes
-> less encrypted video load arrives
-> queue drains
-> cap relaxesThis is not just bandwidth adaptation. It is crypto/decode pipeline pressure feeding into stream selection.
16. URL flags were a built-in test surface
The runtime parsed a large flag schema from the URL/query/hash. A few flags were immediately useful for understanding what the client could vary:
| Flag | Effect |
|---|---|
aq | Force audio quality tier (a, h, l). |
vq | Force video quality tier (a, h, m, l). |
multitier | Publish multiple media tiers. |
singleSsrc | Collapse media to a single SSRC path. |
twcc | Toggle transport-wide congestion control extension. |
nack | Toggle RTCP NACK behavior. |
usePushForKeyExchange | Use IDS push path for key exchange. |
useQRForKeyExchange | Use Quick Relay for key exchange. |
tcpFallback | Allow relay TCP fallback. |
noNativeRedirect | Avoid bouncing to the native app. |
forceWebQrip | Force web QR-initiated participant path. |
allocateon | Force relay allocation target. |
I do not treat these as public product features. I treat them as reverse-engineering affordances. Flags reveal the variables Apple engineers expected to test: tiers, SSRC shape, congestion feedback, key-exchange path, relay fallback, and native redirect behavior.
17. Error spaces mapped the system better than prose
Enums are underrated. They are the places where engineers admit what can fail.
| Error/result family | Layer |
|---|---|
FTResponseCode | General FaceTime response/error surface. |
IDSSemanticErrorCode | Identity/query/routing semantics. |
WebCourierCloseCode | Delivery-channel closure. |
QRSessionAllocationStatusCode | Relay allocation. |
RelayFailedErrorCode | Relay/media path failure. |
TUCallDisconnectedReason | Call lifecycle/disconnect semantics. |
EndReasonBadLink | Link/capability invalidity. |
This table became my debugging map. If a failure was in LetMeIn, I did not blame SFrame. If the relay returned MissingKey, I did not treat it like a bad URL. If the SFrame worker emitted maximumDecryptionTimeoutReached, I looked at MKM/prekey recovery rather than WebCourier connection state.
18. The final runtime model
By the end of the web pass, my model looked like this:
GET /join
-> CSP reveals allowed infra
-> load facetime2 bundle + worker chunks
-> parse flags/env/store-bag config
-> register/query IDS web surfaces
-> connect WebCourier anon/reconnect/token path
-> subscribe/ack push topics
-> receive FaceTime multi / Quick Relay messages
-> submit or track LetMeIn request
-> host decision transitions LetMeIn state
-> allocate Quick Relay participant
-> exchange/query material with QR + IDS paths
-> obtain/recover AVC blobs
-> publish media tiers
-> send sessionInfoRequest with SubscribedStream entries
-> run encoded frames through SFrame worker
-> recover MKM/prekeys when decryption fails or participants join lateThat is the version I wish I had at the start. It keeps the boundaries clear. FaceTime Web is not only a WebRTC demo. It is a browser participant inside a native Apple calling system, using web-shaped versions of Apple identity, push delivery, relay signaling, media negotiation, and frame encryption.
What made the reversing satisfying was that the artifacts lined up across layers. The UI said “waiting.” The native side showed host authority. The schema said LetMeInState. The relay envelope said participantAllocateRequest and sessionInfoRequest. The media blob said stream groups. The worker said mkmNeedsRecovery.
At that point the system stopped looking magical. It looked like protocols.