Skip to content

Stage S2 — Honest pair flow

Status: reviewed — forward plan, active

Phases A, B, and C have landed. Phase C's self-research queue (Q1-Q5) still has open items. Phase D (Go-only Option δ backend) is specified but not yet implemented.

Date: session 10 (post S2.B landing) Supersedes: the original Option α/γ framing of this document, which Phase B empirically falsified. Philosophy: CDS must be told the truth about device state, and the code it runs to make things happen must either actually work, or be actively shimmed through a backend we know works. No bypass, no lies.

The Phase A realization (unchanged, landed)

Stage S1 finished with devicectl list devices returning the virtual iPhone end-to-end. Opening Xcode's Devices window revealed that four pieces of DeviceInfo had been force-written by the inject to make CDS believe the device was further along its lifecycle than it was:

  • DeviceInfo.pairingState = .paired
  • DeviceInfo.preparednessState = .all
  • DeviceInfo.areDeveloperDiskImageServicesAvailable = true
  • DeviceInfo.state = .connected

Phase A removed the first three outright. The fourth (state) was temporarily removed and then restored when empirical test showed Xcode filters devices whose DeviceInfo.state defaults to .unavailable (tag 0) — the link physically exists because an RSDDeviceWrapper with a live tunnel is in process memory, so .connected is a truthful link-state value, not a lie about pairing. See inject/iosmux_inject.m around the visibility_setter / state_setter block and the tombstone comment there.

Phase A landed in commit 18331ca as "Stage 2.A: stop lying about device state".

The Phase B realization (new, course-correcting)

With the three lies removed, the Pair-button smoke test was run. Findings are fully documented in docs/research/s2b-pair-attempt-log.md; the short version is:

  • Xcode correctly shows a Pair button, state=connected (no DDI), pairingState=pairingInProgress — all the honest pre-pair values.
  • Clicking Pair causes CDS to invoke PairActionImplementation through our S1.B passthrough trampoline. No crash, no SIGSEGV.
  • CDS's next step is to open an nw_connection on utun4 directly to [fdc5:8480:2949::1]:57346 (the RSD endpoint address it fetched from tunneld's HTTP / API) and speak HTTP/2 on it.
  • The connection is immediately reset by the peer. Retries identical. com.apple.remotepairing subsystem never emits a single event, because CDS never gets past the TCP-level failure.
  • The tunnel itself is not broken. A pymobiledevice3 Python client using the same tunneld instance and the same RSD endpoint successfully pulls the full handshake, queries DDI mount state, and lists the iPhone's root filesystem via the DVT developer service. User-confirmed: DDI mount, service calls, and filesystem access have all worked repeatedly via pymobiledevice3 in previous sessions.

The pre-Phase-B assumption (Option α: "let the natural CDS pair path work through the existing tunnel") is now empirically false. CDS and pymobiledevice3 are two independent client implementations of the RSD-family transport, and CDS cannot speak to a channel produced by pymobiledevice3 tunneld. Option γ ("audit our existing DYLD_INTERPOSE so the pair service is forwarded correctly") is also falsified: the pair path does not flow through remote_service_create_connected_socket or any other MobileDevice/RSD C API at all — it goes straight through Network.framework (nw_connection), bypassing everything we currently interpose.

Option β (transplant Linux host's pair record) remains rejected. The problem is not pairing, the problem is that CDS does not speak the tunnel's transport; handing it a pair record would not give it a different transport layer.

Option δ — CDS → pymobiledevice3 backend shim

This is the new direction. The short statement is:

Stop trying to make CDS talk to the iPhone. Make CDS talk to pymobiledevice3, and let pymobiledevice3 talk to the iPhone.

The precedent already exists in the current inject:

  • iosmux_md_proxy.m interposes MDRemoteServiceSupport.retrievePropertiesForUUID: and retrieveNameForUUID: and answers them from an in-memory cache built from the pymobiledevice3 handshake. CDS never actually queries the device for properties — the inject serves them from Python-captured data. This works. No crash, no lie — just a translation layer.

Option δ generalizes that pattern to every CDS → iPhone call. When CDS tries to open a service socket, do a pair handshake, mount a DDI, install an app, or attach a debugger, the request is caught by our inject and routed to a local pymobiledevice3 session that does the real work on its existing working tunnel. The return value (socket fd, response bytes, Codable envelope, whatever) is handed back to CDS in the shape CDS expects.

This reframes Stage 2 from "observe the real pair flow" to "build the smallest shim that makes CDS's attempts succeed by delegating them". It is more work than α/γ would have been, but α/γ were physically impossible given the transport mismatch — this is the first plan that is actually compatible with the ground truth.

Phase B — done

Deliverable was docs/research/s2b-pair-attempt-log.md. Landed alongside the earlier rewrite of this plan. Identified three blockers — X (client-side protocol), Y (which PID runs the call), Z (sandbox SCM_RIGHTS capability) — that Phase C was tasked with resolving.

Phase C — self-experiment (done) + remaining research queue

What the self-experiment closed

Full write-up in docs/research/s2c-self-experiment/FINDINGS.md. Short version:

  • X (client side) — CDS speaks plain prior-knowledge HTTP/2 cleartext (h2c) to the tunnel endpoint. No TLS, no ALPN. The 24-byte HTTP/2 connection preface arrives first, then a stock SETTINGS frame whose values (MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=1 MiB) match pymobiledevice3's own reverse observations byte for byte. Above HTTP/2 CDS uses RemoteXPC DATA frames on stream 1 (ROOT_CHANNEL) and stream 3 (REPLY_CHANNEL) with XpcWrapper / XpcPayload framing whose magic bytes (0x290bb092, 0x42133742) match pymobiledevice3's remote/xpc_message.py exactly. The empty-HEADERS stream-open pattern CDS uses is unusual but compatible with golang.org/x/net/http2 at the framer level.
  • Y — every nw_connection_* call observed in the experiment came from the same CoreDeviceService PID our LC_LOAD_DYLIB inject loaded into. No helper process, no delegation to remotepairingd. The earlier "three CoreDeviceService PIDs" observation turned out to be concurrent launchd-managed instances of the same binary (user session / system session / log-stream-ephemeral), and the one handling each request is whichever the caller's XPC connection resolves to. Our current injection reaches it.
  • Z — CDS's sandbox permits, at runtime, each of the four Shape B primitives we probed: socket(AF_UNIX, SOCK_STREAM), socketpair(), bind() to /tmp/*.sock, and sendmsg() with an SCM_RIGHTS ancillary block. All returned errno=0. The devil's advocate attack ranked "sandbox denies SCM_RIGHTS" as a 55% chance of killing Shape B — it is now 0%, empirically verified.

Shape B is therefore architecturally viable. The hook site exists in the right process, the sandbox allows the required primitives, and the client-side transport is standard enough for a Go HTTP/2 server to speak.

Still open — research queue Q1-Q5

The self-experiment does NOT tell us what the server side of the conversation should look like — our listener never replies to CDS's first DATA frame, so the session stalls at that point. Five research questions (Q1..Q5) are tracked in a dedicated standalone file: plan-stage2-phase-c-queue.md. All five are self-research tasks on havoc (no friend, no guessing). Short summary of what each question covers:

  • Q1 — decode the captured 44-byte XPC payload via pymobiledevice3's own XpcWrapper / XpcPayload parser to classify the first RemoteXPC message CDS sends.
  • Q2tcpdump on utun4 during a working pymobiledevice3 remote rsd-info round-trip to see whether the tunnel interface carries plaintext HTTP/2 or TLS-encrypted bytes.
  • Q3 — incremental dialog bisection via the spike listener: reply to CDS's first DATA frame with a sourced response, observe the next request, repeat until the full pair handshake is walked or a named empirical gap is hit.
  • Q4 — grep pymobiledevice3 for any server-side / fixture / mock code we can reuse instead of hand-crafting Q3 replies.
  • Q5 — last-resort post-TLS byte capture of a live pymobiledevice3 session if Q2 shows utun4 is encrypted (SSLKEYLOGFILE + Wireshark, or a recv() hook as a pymobiledevice3 patch under the existing overlay).

The queue file contains the full empirical methodology, done criteria, and decision tree for each question. Any answer that cannot be reached empirically is labelled UNKNOWN, not guessed. Only after all five questions are exhausted and still leave a gap does the deferred friend-capture fallback re-enter consideration.

Still open — carried over from earlier

C.7 is backlog-priority until we see Phase D behavior

We assume the DeviceIdentifier.uuid(UUID, String) second slot does not affect pair-flow correctness, only identity-UI presentation. This is unverified — the assumption would be promoted to verified by either a disasm note in Phase D's CoreDevice work or by an empirical test that toggles the string and observes Xcode's reaction. Until then treat any Phase D bug that names this slot as potentially related.

C.7 DeviceIdentifier.uuid(UUID, String) second slot — what does CDS do with the String payload? This is a small identity-UI question that only matters once the pair flow otherwise works. It stays in the backlog and gets answered by the same disasm work that will show up in Phase D if we discover a related behavior while implementing the backend.

Closed by Phases B + C

  • "iOS 17+ pair service endpoint — find name and port" → the endpoint is the unified RSD endpoint [tunnel-address]:tunnel-port, not a per-service port.
  • "does CDS reach the pair service via remote_service_create_connected_socket" → no, it goes through nw_connection directly.
  • "what does default unpaired DeviceInfo look like on a real iPhone" → answered empirically: removing our three lies produces Xcode's "unpaired, connected, no DDI, Pair button visible" UI, which is exactly a fresh iPhone's baseline.
  • "does the physical iPhone accept pair from a new host while already paired with Linux" → moot; Option δ routes through the existing pair record that pymobiledevice3 tunneld already holds.
  • "CDS's own service-open call chain for PairAction" → resolved empirically by the wire logger. The path is PairActionImplementation.invoke → Swift async continuation → RemotePairing.framework → Network.framework nw_connection_create(_with_connected_socket). All in-process in the same CDS instance.
  • "CDS's expected client-side protocol" → plain prior-knowledge h2c + RemoteXPC DATA frames, per FINDINGS.md §X.
  • "Best interposition point" → the spike override in pymobiledevice3 tunneld is already sufficient to redirect CDS without an inject hook on nw_connection_create_* at all, and the Go backend will own the redirected listener endpoint directly. See Phase D for the concrete shape.
  • "Backend architecture: embedded Python vs sibling daemon vs Go"Go, not Python, as a first-class principle. Rationale in Phase D.
  • "Surface area of pair flow in pymobiledevice3" → known: the RemoteXPC message types we need to handle are the ones CDS actually sends on streams 1 and 3 of its h2c session. Q1-Q3 will enumerate them one by one.
  • "pymobiledevice3 pair record storage and survival" → already answered in an earlier research pass: Ed25519 pair records live in ~/.pymobiledevice3/remote_{udid}.plist, survive tunneld restarts, trigger the physical Trust prompt only on the very first pair. The Go backend does not touch pair records directly — the existing pymobiledevice3 tunneld subprocess keeps owning that state.

Phase D — implementation (shape locked by Phase C findings)

Phase D delivers the Option δ shim. Its shape is now fixed by what the self-experiment found.

Principle: new code is Go, Python stays where it already is

All new production code for Phase D is Go. The project has been Go-based from the start; introducing a second production language on a component we plan to delete in Phase E (the "remove Python" milestone) would create exactly the inertia we want to avoid. Python involvement is strictly limited to:

  • pymobiledevice3 remote tunneld — the existing subprocess we already run on havoc. It keeps owning TLS 1.2 PSK handshake to the real iPhone, pair identity management, and the utun tunnel lifecycle. Go talks to it over HTTP on loopback, exactly as it does today. We do not ship any NEW Python production code.
  • Spike/research tooling in docs/research/s2c-self-experiment/ (the Python HTTP/2 listener, one-shot capture scripts). These are research artifacts, never run in production, deleted with the rest of Stage S2's spike infrastructure when the Go backend ships.

If a future step genuinely needs server-side XPC logic that is only implemented in pymobiledevice3 today, the contained fallback is a narrow RPC subprocess (single pymobiledevice3-backed Python process driven by Go over a Unix socket with a few RPC methods) — but only as a contained compatibility shim with an explicit Go porting plan per method. We do not write a full Python backend and call it "temporary".

Components

The Go backend is a new command under cmd/iosmux-backend/ in this repo. It is a single long-lived process, launched from iosmux-restore.sh (or a LaunchAgent on havoc once we productize), that:

  1. Listens on a loopback TCP endpoint — the one tunneld advertises via its HTTP API response. During spike/research this is driven by the IOSMUX_SPIKE env-var patch (docs/patches/pymobiledevice3/0001-iosmux-spike-tunneld-endpoint-override.patch); in production the same mechanism can be generalized to an always-on IOSMUX_BACKEND_ADDR override, or tunneld can be replaced altogether by a Go component that owns the HTTP API. Either way, CDS learns the shim's address through the same GET / mechanism it already uses.
  2. Speaks prior-knowledge h2c using golang.org/x/net/http2. Not net/http.Server (that wants a full request/response model) but the lower-level Framer interface, because RemoteXPC uses persistent streams and DATA frames rather than HTTP semantics. Example prior art: Caddy's h2c handler, gRPC-Go's h2c mode — both production.
  3. Implements RemoteXPC on top of the raw frame layer:
  4. XpcWrapper / XpcPayload encode+decode (magic bytes, flags, size, msg_id) — ported to Go from pymobiledevice3/remote/xpc_message.py with the decoded bytes in research/s2c-self-experiment/FINDINGS.md as a ground-truth test vector.
  5. bplist16 decoding for XPC object payloads. howett.net/plist (already in go.mod) handles bplist00 and may handle bplist16 too; if not, the missing bits are a small extension.
  6. Stream routing — messages on stream 1 = ROOT_CHANNEL, stream 3 = REPLY_CHANNEL, one open per channel, handled as two persistent async loops. Matches the empirically-observed stream topology from the S2.C capture.
  7. Delegates the actual device work to its internal session, which is held open against tunneld's HTTP API for the lifetime of the backend. Every CDS request is translated into either:
  8. A native Go implementation of the equivalent RemoteXPC call (for calls we fully understand), or
  9. A pymobiledevice3-backed fallback over a narrow subprocess RPC (for calls that are only implemented in pymobiledevice3 today, pending a Go port per call).
  10. Materializes replies as XpcWrapper-framed DATA frames written back to CDS on the correct stream. Every reply must come from real data (Phase A's "never lie about state" rule stays binding here).

Concrete sub-tasks

  • D.0 — Checkpoint current work (this commit).
  • D.1 — Finish the Phase C research queue (Q1-Q5 in the section above) until either the server-side protocol is fully understood or a concrete empirical gap is named.
  • D.2 — Bootstrap cmd/iosmux-backend/ with a skeleton that binds a configurable address, logs every accepted connection, and shuts down cleanly on SIGTERM. No HTTP/2 yet. Verify it fits the existing go build ./... flow.
  • D.3 — Wire the Go HTTP/2 framer to the listener. Replicate the empirically-observed server handshake (SETTINGS{MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=1 MiB}, WINDOW_UPDATE(983041), auto-SETTINGS-ACK). Reuse the existing spike listener's session logs as regression fixtures.
  • D.4 — Port XpcWrapper / XpcPayload / bplist16 decoders to Go. The hex blob in FINDINGS.md §X is a unit test vector: the decoder either produces the same object graph pymobiledevice3's parser produces (Q1 decode) or the decoder is wrong.
  • D.5 — Implement the ROOT_CHANNEL state machine: accept the first XPC request, dispatch to a stubbed handler, emit a reply, watch CDS proceed to the next step. Dialog-bisect forward using the Q3 methodology.
  • D.6 — Per-service handlers (pair, DDI mount, app install, etc.) land one at a time, each gated on real empirical data from either Q2/Q5 captures or dialog bisection.
  • D.7 — Remove the IOSMUX_SPIKE tunneld patch once the Go backend owns the address CDS is told to connect to; replace with either a generalized tunneld override or a tunneld replacement owned by iosmux itself.

Non-goals (still binding):

  • Faking any reply out of thin air. Every byte the backend sends on the wire must either come from a real data source or be a well-defined framing artifact (header magic, length field, etc.). If a request arrives for a shape we do not know, we halt and research it — we do not invent a reply.
  • Setting any DeviceInfo field to a value we have not observed from a real source.
  • Hooking any Swift action-dispatch path beyond the existing S1.B passthrough trampoline. The S2.C findings show the inject hook is unnecessary for redirecting the transport — the tunneld address override does that alone — so the inject stays minimal.

Phase E — post-pair (DDI, developer mode, install, debug)

Same shape as the old Phase E, except all of it flows through the same Option δ shim. DDI mount, developer-mode query, app install, debug attach — each is its own set of pymobiledevice3 calls exposed to the backend, each is its own pair of CDS-interposition points. We may find additional missing services to proxy as we walk the flow.

Risks and unknowns we are accepting

  • Server-side protocol gap. The S2.C capture gives us every byte CDS sends but no byte CDS expects back. Until Q1-Q5 close this gap, we do not know if dialog bisection can walk the entire pair handshake without hitting a request for which neither we nor pymobiledevice3 have a reference. If it does hit one, the fallback is the deferred friend-capture track — but that is the only concession.
  • Go http2 framer edge cases. CDS opens streams with HEADERS len=0, which is a minor RFC deviation. net/http Server rejects this; the lower-level Framer API accepts it. Phase D.3 must drive the framer directly, not go through http2.Server{}.ServeConn with a standard Handler. If some other frame-level quirk we have not yet seen exists, the spike listener's capture-and-halt methodology will surface it — no production code runs on unverified assumptions.
  • Backend lifecycle. The Go backend must survive CDS restarts, respond quickly enough that CDS does not time out, and not leak tunnel state across device reconnects. The simplest answer: one long-lived backend process launched at the same time as tunneld, per havoc session, managed by the same iosmux-restore.sh flow that already owns tunneld's lifecycle.
  • bplist16 coverage in howett.net/plist. If the existing Go plist library does not handle bplist16 (the XPC variant), D.4 gains a "port bplist16 from pymobiledevice3" sub-task. Bounded, ~200-400 LOC Go at most.
  • pymobiledevice3 version drift. We pin upstream via docs/patches/pymobiledevice3/UPSTREAM_VERSION and treat any upgrade as a deliberate research task. Patches under docs/patches/pymobiledevice3/ are all research-scoped; none of them define production behavior.

Open questions that do not need research but need decisions

  • Do we keep tunneld as an independent long-running Python subprocess and make the Go backend talk to it over its HTTP API, or do we eventually bring the tunneld state machine into Go alongside the backend so there is one less moving part? The Python version stays for Stage S2; Go replacement is a separate track.
  • Where does the Go backend's logging go, and who watches it? A file under /tmp/iosmux-backend.log with log/slog-structured records is probably enough for the spike runs; productization adds proper rotation later.
  • At what point do we freeze "Stage 2 is done"? Pair working is the minimum bar; DDI + developer mode + install + debug are incremental over the same backend and can slip into Stage 3 if the pair plumbing takes longer than expected.

What comes AFTER this roadmap

After Phase E works end-to-end:

  • Promote from PoC to proper config (remove hardcoded UUIDs, service names, paths).
  • Productize iosmux-backend into a daemon that ships with the project.
  • Write user-facing docs.
  • Binary releases, supported OS matrix, upgrade safety across macOS / iOS versions.

None of that matters until the backend shim proves it can drive the pair flow honestly end to end.