Phase D.6.0-B findings — "quiescent state" was observation-window bias; CDS disconnects ~30 s after handshake¶

Update 2026-04-19: hypothesis H1 CONFIRMED

D.6.1-B tested H1 (zero UUID as session gate) by swapping the 16-byte placeholder for a fresh RFC 4122 v4 UUID per session. Result: CDS held the connection ESTABLISHED for the full 130 s observation window, zero EOF, only TCP keepalives. The "CDS disconnects at 23-36 s" behaviour documented below was entirely caused by the zero UUID. H2/H3/H4 no longer need testing. See ../iter-07-uuid-patched/findings.md for the full breakthrough decode and D.6.2's next axis (what CDS expects from us now that handshake is accepted).

Status: verified — 2026-04-19

First D.6 research step. Deployed the D.6.0-A verbose-logging Go backend (commit 5d47ae5) on havoc, ran three interactive devicectl triggers meant to push CDS into a post-handshake RemoteXPC service exchange, captured the full session. The core finding is not the post-handshake request we expected to see — it's the discovery that CDS does not sit silently after our handshake. It closes the TCP connection ~30 s after our #8 big Handshake lands, then CDS retries with a fresh session (same bytes, same result). Every frame of every session decoded cleanly through internal/xpc/ + the D.6.0-A verbose logger.

TL;DR¶

The iter-4 / D.5 "quiescent state" finding needs to be reinterpreted: CDS's silence after our #8 big Handshake is not "happy and waiting" — it is "evaluating our Handshake, then deciding it's unusable, then disconnecting." In iter-1 through iter-5 we always pkill'd the listener within ~5-13 s of handshake completion, well before the natural CDS timeout at ~30 s. We never observed the timeout because we never waited for it.

D.6.0-B waited. Natural timeout observed three times in the same run:

session	handshake complete	CDS EOF	delta
1	02:13:51	02:14:27	36 s
2	02:14:27	02:14:59	32 s
3	02:14:59	02:15:22	23 s

The client-facing symptom in devicectl: com.apple.Mercury.error 1000 "The connection was interrupted".

None of our three triggers (device info, manage pair, or device info processes) succeeded in pushing CDS past the handshake, because each one starts by opening a fresh session that just runs through our byte-exact handshake again and then is torn down by CDS.

So D.6.0-B did not capture a "first post-handshake request" — there is no such request to capture until we fix whatever makes CDS reject the session. What it did capture is the failure mode that iter-1 through iter-5 could never surface under their short pkill windows.

What the sessions look like in verbose¶

Each of the three sessions emits the same 38-line verbose trace. Session 1 excerpt (identical to iter-5 smoke, PLUS the EOF line at the end that iter-5 never waited for):

[02:13:51] accepted: remote=[::1]:54675 session=1
[02:13:51] session=1 server handshake sent (SETTINGS + WINDOW_UPDATE)
[02:13:51] session=1 frame type=SETTINGS stream=0 len=12 flags=0x00
[02:13:51] session=1 sent SETTINGS-ACK
[02:13:51] session=1 frame type=WINDOW_UPDATE stream=0 len=4 flags=0x00
[02:13:51] session=1 frame type=HEADERS stream=1 len=0 flags=0x04
[02:13:51] session=1 dispatcher: emit #3 HEADERS(s1, len=0)
[02:13:51] session=1 frame type=SETTINGS stream=0 len=0 flags=0x01
[02:13:51] session=1 recv SETTINGS-ACK
[02:13:51] session=1 frame type=DATA stream=1 len=44 flags=0x00
[02:13:51] session=1 verbose recv   <empty dict>
[02:13:51] session=1 verbose send   <empty dict>
[02:13:51] session=1 dispatcher: emit #4 DATA(s1, 44 B empty-dict)
[02:13:51] session=1 frame type=HEADERS stream=3 len=0 flags=0x04
[02:13:51] session=1 dispatcher: emit #6 HEADERS(s3, len=0)
[02:13:51] session=1 frame type=DATA stream=1 len=24 flags=0x00
[02:13:51] session=1 verbose recv   [no payload]     # flags=0x00000201
[02:13:51] session=1 verbose send   [no payload]
[02:13:51] session=1 dispatcher: emit #5 DATA(s1, 24 B sync)
[02:13:51] session=1 frame type=DATA stream=3 len=24 flags=0x00
[02:13:51] session=1 verbose recv   [no payload]     # flags=0x00400001 INIT_HANDSHAKE
[02:13:51] session=1 verbose send   [no payload]
[02:13:51] session=1 dispatcher: emit #7 DATA(s3, 24 B INIT_HANDSHAKE mirror)
[02:13:51] session=1 verbose send DATA(s1, 14124 B): flags=0x00000101 msgid=2
[02:13:51] session=1 verbose send   MessageType => string "Handshake"
[02:13:51] session=1 verbose send   MessagingProtocolVersion => uint64 7
[02:13:51] session=1 verbose send   Services => <dict 62 entries>
[02:13:51] session=1 verbose send   Properties => <dict 46 entries>
[02:13:51] session=1 verbose send   UUID => <uuid 00000000000000000000000000000000>
[02:13:51] session=1 dispatcher: emit #8 DATA(s1, 14124 B big Handshake)
[02:14:27] session=1 read loop: peer closed (EOF)  ← 36 seconds of silence, then CDS gives up

Sessions 2 and 3 are byte-identical up to the dispatcher output (our Go backend is deterministic) and end with EOF after 32 s and 23 s respectively. The three retries are CDS's own client-side logic reconnecting after each Mercury error.

Why this reinterprets iter-4 and the D.5 smoke¶

iter-4's findings.md said:

CDS enters a quiescent state after receiving our 9-frame reply (#0–#8) and simply waits. [...] The TCP connection stays open indefinitely from CDS's side; iter-4 was ended by pkill.

And the D.5 smoke said:

Go framer did NOT return EOF naturally; exited via our pkill at t+13s (log line: read loop: local close). Same as iter-4 behavior (iter-4 listener was also killed manually).

Both are literally true (CDS did not close within the observation window) but interpretively incomplete (the observation window was too short to see CDS's decision). The "quiescent" language implied stable acceptance; the actual behaviour is delayed rejection.

This does not invalidate the iter-1 → iter-4 dispatch-table work. The HTTP/2 framing is correct, the XPC envelope is correct, and the frame sequence matches what real iPhone emits in the Q2 pcap. The problem is at the semantic content of #8 Handshake (or whatever CDS validates after #8). Our frame is structurally accepted by CDS's h2c + RemoteXPC parsers (no PROTOCOL_ERROR, no RST_STREAM, no GOAWAY); it is semantically rejected once CDS tries to use it for anything.

Candidate hypotheses for the rejection¶

These are ranked by likelihood, NOT verified. Each is one test away from confirm-or-refute.

H1: the placeholder UUID¶

The top-level UUID field in our #8 big Handshake is 16 zero bytes — redacted at source in iphone_replay_bytes.py (see the REDACTED_AT_SOURCE constant). Real iPhones emit a session-bound UUID there. CDS may use this field as a session identifier and refuse to progress when it is the zero UUID.

Cheap to test: synthesize a valid random UUID on each session, patch it into the Handshake bytes before emit, re-run.

H2: stale or mismatched Services dict¶

Services advertises 62 services with specific port numbers. Real tunneld binds those ports on the real iPhone; CDS later tries to connect to some of them. Our backend does not bind any of them. If CDS post-handshake-validates by trying to connect to com.apple.lockdown.remote.trusted on the advertised port and getting ECONNREFUSED, it would disconnect.

Test: grep the pcap (iosmux-d6.pcap) for any post-handshake TCP SYN to a port in the Services dict. If yes, hypothesis is likely. If no, it's ruled out.

H3: missing heartbeat / keepalive¶

pymobiledevice3's XPC flag enum has PING (0x00000002) and there's a HEARTBEAT_REQUEST / HEARTBEAT_RESPONSE pair (0x00010000 / 0x00020000) documented but not used in the iter-0 pcap. CDS may expect periodic server-initiated PING frames after handshake, and the ~30 s timeout is CDS's own keepalive window expiring.

Test: patch the dispatcher to emit a PING frame every 10 s on stream 0 after #8, see if that extends the window.

H4: the tunneld UDID mismatch¶

Tunneld advertises its UDID in the JSON response CDS reads before connecting. Our #8 Handshake has a zeroed UniqueDeviceID inside the dict. CDS may be cross-validating and disconnecting on mismatch.

Test: patch the Handshake bytes so UniqueDeviceID matches the tunneld-advertised UDID. Same shape of change as H1 (in-place byte patch on the embedded fixture).

Implications¶

Iter-1 through iter-4's Q3 "closed under replay" conclusion needs a footnote, not a retraction. The replay-layer investigation IS closed — we proved the HTTP/2 + XPC framing is correct. But "reached quiescent" should have said "reached a semantically-invalid handshake that CDS will discard after 30 s". We update iter-04 and iter-05-go-backend findings with that clarification.

Phase D.6.1 does NOT start with a handler, because there is no incoming request to handle. It starts with fixing the handshake so CDS does not disconnect. The four hypotheses above are the first-level experiment candidates.

Phase D's original theory — that our Go backend replies with real data sourced from a real iPhone-adjacent capture — remains correct. The test we ran today used the redacted iter-01 capture as the reply source. Redaction at source invalidates some of those fields (UUID zeroed, UDID zeroed, etc.). The backend needs a non-redacted source for session-bound fields. That source is either:

The tunneld JSON response (authoritative for UDID)
A fresh random value per session (legitimate for UUID, which is session-scoped by protocol definition)
The actual services that an iosmux-backend-owned tunneld replacement knows how to speak (future D.7 scope)

What's open after this¶

Confirm which of H1 / H2 / H3 / H4 is load-bearing (one dispatch-bisect iteration per hypothesis, same cadence as iter-1 through iter-4)
Once the 30 s timeout goes away, THEN the original D.6.0 goal becomes tractable — observe what CDS sends when it believes our handshake

Artifacts¶

Under iter-06-pair-trigger/:

verbose-session.log — 8,936 B, 116 lines total across 3 sessions. All handshake-only; no post-handshake frames. Pre-scan confirmed zero identifiers leaked; no redaction applied (file stored as produced).
iosmux-d6.pcap — 56,099 B loopback capture across the three sessions. Scanned for UDID/GUID/hostname patterns: 0 matches. Safe to commit as-is per iter-02/03/04 policy.

How to reproduce¶

On havoc (assumes iter-4-era SPIKE tunneld still running):

# Cross-compile + deploy (with verbose)
GOOS=darwin GOARCH=amd64 CGO_ENABLED=0 go build \
    -o /tmp/iosmux-backend-darwin ./cmd/iosmux-backend/
scp /tmp/iosmux-backend-darwin havoc:/tmp/iosmux-backend
ssh havoc 'chmod +x /tmp/iosmux-backend'

# Run verbose backend
ssh havoc 'nohup env IOSMUX_BACKEND_VERBOSE=1 \
    /tmp/iosmux-backend -listen [::1]:34719 \
    > /tmp/iosmux-backend.log 2>&1 &'

# Trigger CDS
ssh havoc-root 'killall CoreDeviceService'
ssh havoc 'devicectl device info details --device <UDID>'

# WAIT — this is the critical bit iter-1-5 missed.
# Do not pkill the backend for at least 60 seconds.
sleep 60

# Then collect
ssh havoc 'pkill -f iosmux-backend'
scp havoc:/tmp/iosmux-backend.log /local/path/

The session log should contain at least one peer closed (EOF) entry with a delta of 20-40 s from the corresponding accepted: line. That's the reproducible timeout signature.