Stage S2 — Honest pair flow¶
Status: reviewed — forward plan, active
Phases A, B, and C have landed. Phase C's self-research queue (Q1-Q5) still has open items. Phase D (Go-only Option δ backend) is specified but not yet implemented.
Date: session 10 (post S2.B landing) Supersedes: the original Option α/γ framing of this document, which Phase B empirically falsified. Philosophy: CDS must be told the truth about device state, and the code it runs to make things happen must either actually work, or be actively shimmed through a backend we know works. No bypass, no lies.
The Phase A realization (unchanged, landed)¶
Stage S1 finished with devicectl list devices returning the virtual
iPhone end-to-end. Opening Xcode's Devices window revealed that four
pieces of DeviceInfo had been force-written by the inject to make
CDS believe the device was further along its lifecycle than it was:
DeviceInfo.pairingState = .pairedDeviceInfo.preparednessState = .allDeviceInfo.areDeveloperDiskImageServicesAvailable = trueDeviceInfo.state = .connected
Phase A removed the first three outright. The fourth (state) was
temporarily removed and then restored when empirical test showed
Xcode filters devices whose DeviceInfo.state defaults to
.unavailable (tag 0) — the link physically exists because an
RSDDeviceWrapper with a live tunnel is in process memory, so
.connected is a truthful link-state value, not a lie about
pairing. See inject/iosmux_inject.m around the visibility_setter
/ state_setter block and the tombstone comment there.
Phase A landed in commit 18331ca as "Stage 2.A: stop lying about
device state".
The Phase B realization (new, course-correcting)¶
With the three lies removed, the Pair-button smoke test was run.
Findings are fully documented in docs/research/s2b-pair-attempt-log.md;
the short version is:
- Xcode correctly shows a Pair button,
state=connected (no DDI),pairingState=pairingInProgress— all the honest pre-pair values. - Clicking Pair causes CDS to invoke
PairActionImplementationthrough our S1.B passthrough trampoline. No crash, no SIGSEGV. - CDS's next step is to open an
nw_connectiononutun4directly to[fdc5:8480:2949::1]:57346(the RSD endpoint address it fetched fromtunneld's HTTP/API) and speak HTTP/2 on it. - The connection is immediately reset by the peer. Retries
identical.
com.apple.remotepairingsubsystem never emits a single event, because CDS never gets past the TCP-level failure. - The tunnel itself is not broken. A pymobiledevice3 Python client using the same tunneld instance and the same RSD endpoint successfully pulls the full handshake, queries DDI mount state, and lists the iPhone's root filesystem via the DVT developer service. User-confirmed: DDI mount, service calls, and filesystem access have all worked repeatedly via pymobiledevice3 in previous sessions.
The pre-Phase-B assumption (Option α: "let the natural CDS pair path
work through the existing tunnel") is now empirically false. CDS and
pymobiledevice3 are two independent client implementations of the
RSD-family transport, and CDS cannot speak to a channel produced by
pymobiledevice3 tunneld. Option γ ("audit our existing DYLD_INTERPOSE
so the pair service is forwarded correctly") is also falsified: the
pair path does not flow through remote_service_create_connected_socket
or any other MobileDevice/RSD C API at all — it goes straight through
Network.framework (nw_connection), bypassing everything we
currently interpose.
Option β (transplant Linux host's pair record) remains rejected. The problem is not pairing, the problem is that CDS does not speak the tunnel's transport; handing it a pair record would not give it a different transport layer.
Option δ — CDS → pymobiledevice3 backend shim¶
This is the new direction. The short statement is:
Stop trying to make CDS talk to the iPhone. Make CDS talk to pymobiledevice3, and let pymobiledevice3 talk to the iPhone.
The precedent already exists in the current inject:
iosmux_md_proxy.minterposesMDRemoteServiceSupport.retrievePropertiesForUUID:andretrieveNameForUUID:and answers them from an in-memory cache built from the pymobiledevice3 handshake. CDS never actually queries the device for properties — the inject serves them from Python-captured data. This works. No crash, no lie — just a translation layer.
Option δ generalizes that pattern to every CDS → iPhone call. When CDS tries to open a service socket, do a pair handshake, mount a DDI, install an app, or attach a debugger, the request is caught by our inject and routed to a local pymobiledevice3 session that does the real work on its existing working tunnel. The return value (socket fd, response bytes, Codable envelope, whatever) is handed back to CDS in the shape CDS expects.
This reframes Stage 2 from "observe the real pair flow" to "build the smallest shim that makes CDS's attempts succeed by delegating them". It is more work than α/γ would have been, but α/γ were physically impossible given the transport mismatch — this is the first plan that is actually compatible with the ground truth.
Phase B — done¶
Deliverable was docs/research/s2b-pair-attempt-log.md. Landed
alongside the earlier rewrite of this plan. Identified three
blockers — X (client-side protocol), Y (which PID runs the call),
Z (sandbox SCM_RIGHTS capability) — that Phase C was tasked with
resolving.
Phase C — self-experiment (done) + remaining research queue¶
What the self-experiment closed¶
Full write-up in
docs/research/s2c-self-experiment/FINDINGS.md.
Short version:
- X (client side) — CDS speaks plain prior-knowledge HTTP/2
cleartext (h2c) to the tunnel endpoint. No TLS, no ALPN. The
24-byte HTTP/2 connection preface arrives first, then a stock
SETTINGS frame whose values (
MAX_CONCURRENT_STREAMS=100,INITIAL_WINDOW_SIZE=1 MiB) match pymobiledevice3's own reverse observations byte for byte. Above HTTP/2 CDS uses RemoteXPC DATA frames on stream 1 (ROOT_CHANNEL) and stream 3 (REPLY_CHANNEL) with XpcWrapper / XpcPayload framing whose magic bytes (0x290bb092,0x42133742) match pymobiledevice3'sremote/xpc_message.pyexactly. The empty-HEADERS stream-open pattern CDS uses is unusual but compatible withgolang.org/x/net/http2at the framer level. - Y — every
nw_connection_*call observed in the experiment came from the same CoreDeviceService PID ourLC_LOAD_DYLIBinject loaded into. No helper process, no delegation toremotepairingd. The earlier "three CoreDeviceService PIDs" observation turned out to be concurrent launchd-managed instances of the same binary (user session / system session / log-stream-ephemeral), and the one handling each request is whichever the caller's XPC connection resolves to. Our current injection reaches it. - Z — CDS's sandbox permits, at runtime, each of the four
Shape B primitives we probed:
socket(AF_UNIX, SOCK_STREAM),socketpair(),bind()to/tmp/*.sock, andsendmsg()with anSCM_RIGHTSancillary block. All returnederrno=0. The devil's advocate attack ranked "sandbox denies SCM_RIGHTS" as a 55% chance of killing Shape B — it is now 0%, empirically verified.
Shape B is therefore architecturally viable. The hook site exists in the right process, the sandbox allows the required primitives, and the client-side transport is standard enough for a Go HTTP/2 server to speak.
Still open — research queue Q1-Q5¶
The self-experiment does NOT tell us what the server side of
the conversation should look like — our listener never replies to
CDS's first DATA frame, so the session stalls at that point. Five
research questions (Q1..Q5) are tracked in a dedicated standalone
file: plan-stage2-phase-c-queue.md.
All five are self-research tasks on havoc (no friend, no guessing).
Short summary of what each question covers:
- Q1 — decode the captured 44-byte XPC payload via
pymobiledevice3's own
XpcWrapper/XpcPayloadparser to classify the first RemoteXPC message CDS sends. - Q2 —
tcpdumponutun4during a workingpymobiledevice3 remote rsd-inforound-trip to see whether the tunnel interface carries plaintext HTTP/2 or TLS-encrypted bytes. - Q3 — incremental dialog bisection via the spike listener: reply to CDS's first DATA frame with a sourced response, observe the next request, repeat until the full pair handshake is walked or a named empirical gap is hit.
- Q4 — grep pymobiledevice3 for any server-side / fixture / mock code we can reuse instead of hand-crafting Q3 replies.
- Q5 — last-resort post-TLS byte capture of a live
pymobiledevice3 session if Q2 shows
utun4is encrypted (SSLKEYLOGFILE + Wireshark, or a recv() hook as a pymobiledevice3 patch under the existing overlay).
The queue file contains the full empirical methodology, done criteria, and decision tree for each question. Any answer that cannot be reached empirically is labelled UNKNOWN, not guessed. Only after all five questions are exhausted and still leave a gap does the deferred friend-capture fallback re-enter consideration.
Still open — carried over from earlier¶
C.7 is backlog-priority until we see Phase D behavior
We assume the DeviceIdentifier.uuid(UUID, String) second slot
does not affect pair-flow correctness, only identity-UI
presentation. This is unverified — the assumption would be
promoted to verified by either a disasm note in Phase D's
CoreDevice work or by an empirical test that toggles the string
and observes Xcode's reaction. Until then treat any Phase D bug
that names this slot as potentially related.
C.7 DeviceIdentifier.uuid(UUID, String) second slot — what
does CDS do with the String payload? This is a small identity-UI
question that only matters once the pair flow otherwise works. It
stays in the backlog and gets answered by the same disasm work
that will show up in Phase D if we discover a related behavior
while implementing the backend.
Closed by Phases B + C¶
"iOS 17+ pair service endpoint — find name and port"→ the endpoint is the unified RSD endpoint[tunnel-address]:tunnel-port, not a per-service port."does CDS reach the pair service via→ no, it goes throughremote_service_create_connected_socket"nw_connectiondirectly."what does default unpaired DeviceInfo look like on a real iPhone"→ answered empirically: removing our three lies produces Xcode's "unpaired, connected, no DDI, Pair button visible" UI, which is exactly a fresh iPhone's baseline."does the physical iPhone accept pair from a new host while already paired with Linux"→ moot; Option δ routes through the existing pair record thatpymobiledevice3 tunneldalready holds."CDS's own service-open call chain for PairAction"→ resolved empirically by the wire logger. The path isPairActionImplementation.invoke→ Swift async continuation → RemotePairing.framework → Network.frameworknw_connection_create(_with_connected_socket). All in-process in the same CDS instance."CDS's expected client-side protocol"→ plain prior-knowledge h2c + RemoteXPC DATA frames, perFINDINGS.md§X."Best interposition point"→ the spike override in pymobiledevice3 tunneld is already sufficient to redirect CDS without an inject hook onnw_connection_create_*at all, and the Go backend will own the redirected listener endpoint directly. See Phase D for the concrete shape."Backend architecture: embedded Python vs sibling daemon vs Go"→ Go, not Python, as a first-class principle. Rationale in Phase D."Surface area of pair flow in pymobiledevice3"→ known: the RemoteXPC message types we need to handle are the ones CDS actually sends on streams 1 and 3 of its h2c session. Q1-Q3 will enumerate them one by one."pymobiledevice3 pair record storage and survival"→ already answered in an earlier research pass: Ed25519 pair records live in~/.pymobiledevice3/remote_{udid}.plist, survive tunneld restarts, trigger the physical Trust prompt only on the very first pair. The Go backend does not touch pair records directly — the existing pymobiledevice3 tunneld subprocess keeps owning that state.
Phase D — implementation (shape locked by Phase C findings)¶
Phase D delivers the Option δ shim. Its shape is now fixed by what the self-experiment found.
Principle: new code is Go, Python stays where it already is¶
All new production code for Phase D is Go. The project has been Go-based from the start; introducing a second production language on a component we plan to delete in Phase E (the "remove Python" milestone) would create exactly the inertia we want to avoid. Python involvement is strictly limited to:
pymobiledevice3 remote tunneld— the existing subprocess we already run on havoc. It keeps owning TLS 1.2 PSK handshake to the real iPhone, pair identity management, and the utun tunnel lifecycle. Go talks to it over HTTP on loopback, exactly as it does today. We do not ship any NEW Python production code.- Spike/research tooling in
docs/research/s2c-self-experiment/(the Python HTTP/2 listener, one-shot capture scripts). These are research artifacts, never run in production, deleted with the rest of Stage S2's spike infrastructure when the Go backend ships.
If a future step genuinely needs server-side XPC logic that is
only implemented in pymobiledevice3 today, the contained fallback
is a narrow RPC subprocess (single pymobiledevice3-backed
Python process driven by Go over a Unix socket with a few RPC
methods) — but only as a contained compatibility shim with an
explicit Go porting plan per method. We do not write a full
Python backend and call it "temporary".
Components¶
The Go backend is a new command under
cmd/iosmux-backend/ in this repo. It is a single long-lived
process, launched from iosmux-restore.sh (or a LaunchAgent on
havoc once we productize), that:
- Listens on a loopback TCP endpoint — the one tunneld
advertises via its HTTP API response. During spike/research
this is driven by the
IOSMUX_SPIKEenv-var patch (docs/patches/pymobiledevice3/0001-iosmux-spike-tunneld-endpoint-override.patch); in production the same mechanism can be generalized to an always-onIOSMUX_BACKEND_ADDRoverride, or tunneld can be replaced altogether by a Go component that owns the HTTP API. Either way, CDS learns the shim's address through the sameGET /mechanism it already uses. - Speaks prior-knowledge h2c using
golang.org/x/net/http2. Notnet/http.Server(that wants a full request/response model) but the lower-levelFramerinterface, because RemoteXPC uses persistent streams and DATA frames rather than HTTP semantics. Example prior art: Caddy's h2c handler, gRPC-Go's h2c mode — both production. - Implements RemoteXPC on top of the raw frame layer:
- XpcWrapper / XpcPayload encode+decode (magic bytes, flags,
size, msg_id) — ported to Go from
pymobiledevice3/remote/xpc_message.pywith the decoded bytes inresearch/s2c-self-experiment/FINDINGS.mdas a ground-truth test vector. - bplist16 decoding for XPC object payloads.
howett.net/plist(already ingo.mod) handles bplist00 and may handle bplist16 too; if not, the missing bits are a small extension. - Stream routing — messages on stream 1 = ROOT_CHANNEL, stream 3 = REPLY_CHANNEL, one open per channel, handled as two persistent async loops. Matches the empirically-observed stream topology from the S2.C capture.
- Delegates the actual device work to its internal session, which is held open against tunneld's HTTP API for the lifetime of the backend. Every CDS request is translated into either:
- A native Go implementation of the equivalent RemoteXPC call (for calls we fully understand), or
- A
pymobiledevice3-backed fallback over a narrow subprocess RPC (for calls that are only implemented in pymobiledevice3 today, pending a Go port per call). - Materializes replies as XpcWrapper-framed DATA frames written back to CDS on the correct stream. Every reply must come from real data (Phase A's "never lie about state" rule stays binding here).
Concrete sub-tasks¶
D.0— Checkpoint current work (this commit).D.1— Finish the Phase C research queue (Q1-Q5 in the section above) until either the server-side protocol is fully understood or a concrete empirical gap is named.D.2— Bootstrapcmd/iosmux-backend/with a skeleton that binds a configurable address, logs every accepted connection, and shuts down cleanly on SIGTERM. No HTTP/2 yet. Verify it fits the existinggo build ./...flow.D.3— Wire the Go HTTP/2 framer to the listener. Replicate the empirically-observed server handshake (SETTINGS{MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=1 MiB}, WINDOW_UPDATE(983041), auto-SETTINGS-ACK). Reuse the existing spike listener's session logs as regression fixtures.D.4— Port XpcWrapper / XpcPayload / bplist16 decoders to Go. The hex blob inFINDINGS.md §Xis a unit test vector: the decoder either produces the same object graph pymobiledevice3's parser produces (Q1 decode) or the decoder is wrong.D.5— Implement the ROOT_CHANNEL state machine: accept the first XPC request, dispatch to a stubbed handler, emit a reply, watch CDS proceed to the next step. Dialog-bisect forward using the Q3 methodology.D.6— Per-service handlers (pair, DDI mount, app install, etc.) land one at a time, each gated on real empirical data from either Q2/Q5 captures or dialog bisection.D.7— Remove theIOSMUX_SPIKEtunneld patch once the Go backend owns the address CDS is told to connect to; replace with either a generalized tunneld override or a tunneld replacement owned by iosmux itself.
Non-goals (still binding):
- Faking any reply out of thin air. Every byte the backend sends on the wire must either come from a real data source or be a well-defined framing artifact (header magic, length field, etc.). If a request arrives for a shape we do not know, we halt and research it — we do not invent a reply.
- Setting any
DeviceInfofield to a value we have not observed from a real source. - Hooking any Swift action-dispatch path beyond the existing S1.B passthrough trampoline. The S2.C findings show the inject hook is unnecessary for redirecting the transport — the tunneld address override does that alone — so the inject stays minimal.
Phase E — post-pair (DDI, developer mode, install, debug)¶
Same shape as the old Phase E, except all of it flows through the same Option δ shim. DDI mount, developer-mode query, app install, debug attach — each is its own set of pymobiledevice3 calls exposed to the backend, each is its own pair of CDS-interposition points. We may find additional missing services to proxy as we walk the flow.
Risks and unknowns we are accepting¶
- Server-side protocol gap. The S2.C capture gives us every byte CDS sends but no byte CDS expects back. Until Q1-Q5 close this gap, we do not know if dialog bisection can walk the entire pair handshake without hitting a request for which neither we nor pymobiledevice3 have a reference. If it does hit one, the fallback is the deferred friend-capture track — but that is the only concession.
- Go
http2framer edge cases. CDS opens streams withHEADERS len=0, which is a minor RFC deviation.net/httpServer rejects this; the lower-levelFramerAPI accepts it. Phase D.3 must drive the framer directly, not go throughhttp2.Server{}.ServeConnwith a standardHandler. If some other frame-level quirk we have not yet seen exists, the spike listener's capture-and-halt methodology will surface it — no production code runs on unverified assumptions. - Backend lifecycle. The Go backend must survive CDS
restarts, respond quickly enough that CDS does not time out,
and not leak tunnel state across device reconnects. The
simplest answer: one long-lived backend process launched at
the same time as tunneld, per havoc session, managed by the
same
iosmux-restore.shflow that already owns tunneld's lifecycle. - bplist16 coverage in
howett.net/plist. If the existing Go plist library does not handle bplist16 (the XPC variant), D.4 gains a "port bplist16 from pymobiledevice3" sub-task. Bounded, ~200-400 LOC Go at most. - pymobiledevice3 version drift. We pin upstream via
docs/patches/pymobiledevice3/UPSTREAM_VERSIONand treat any upgrade as a deliberate research task. Patches underdocs/patches/pymobiledevice3/are all research-scoped; none of them define production behavior.
Open questions that do not need research but need decisions¶
- Do we keep
tunneldas an independent long-running Python subprocess and make the Go backend talk to it over its HTTP API, or do we eventually bring the tunneld state machine into Go alongside the backend so there is one less moving part? The Python version stays for Stage S2; Go replacement is a separate track. - Where does the Go backend's logging go, and who watches it? A
file under
/tmp/iosmux-backend.logwithlog/slog-structured records is probably enough for the spike runs; productization adds proper rotation later. - At what point do we freeze "Stage 2 is done"? Pair working is the minimum bar; DDI + developer mode + install + debug are incremental over the same backend and can slip into Stage 3 if the pair plumbing takes longer than expected.
What comes AFTER this roadmap¶
After Phase E works end-to-end:
- Promote from PoC to proper config (remove hardcoded UUIDs, service names, paths).
- Productize
iosmux-backendinto a daemon that ships with the project. - Write user-facing docs.
- Binary releases, supported OS matrix, upgrade safety across macOS / iOS versions.
None of that matters until the backend shim proves it can drive the pair flow honestly end to end.