Skip to content

Forward Roadmap — Consolidated Plan

Date: 2026-04-13 (session 7) Supersedes: plan-next-steps.md (which is now a historical snapshot of workstream progress through session 6)

This document consolidates the findings from:

  • Phase 2 test results (wrapper fix deployed, check-in now hangs) — phase2-wrapper-fix-test-results.md
  • Behavior analysis (CDS+0xB896 confirmed as the hang cause) — phase2-behavior-analysis.md
  • Code audit (16 more bugs found of various severity) — code-audit-findings.md
  • Action interception strategy (Mercury Codable vs flat dict) — action-interception-full-picture.md

Current state of the repo

Confirmed working

  • Device registration pipeline: inject builds, dyld loads our dylib, all hooks install without errors
  • RSD connection to iPhone via pymobiledevice3 tunnel (62 services in Handshake on iOS 26.4.1)
  • OS_remote_device construction and attachment to SDR via handleDiscoveredSDR
  • RSDDeviceWrapper init completes and returns a real pointer
  • DYLD_INTERPOSE on 7 shared-cache functions (linked against RemoteServiceDiscovery.framework and RemoteXPC.framework)
  • CDS stays alive across repeated devicectl calls after the wrapper fix

Confirmed broken (verified)

  1. CDS+0xB896 hook (mov al, 1; ret) swallows every incoming XPC message via invoke(anyOf:) prefilter, including DeviceManagerCheckInRequest. devicectl list devices times out with "connection interrupted" after 15s.

  2. Wrapper read overflow at sdr+104 — FIXED in commit 59ca5cd. g_hook_wrapper was reading 16 bytes past the end of an 88-byte SDR object. Fix holds, g_hook_wrapper is now a valid pointer.

Suspected broken (code audit, not yet runtime-verified)

Three bugs of the exact same class as the wrapper read overflow (code audit C1, H1, H6) plus several likely Swift ABI mistakes (H3, H4). All currently "work" by accident. See code-audit-findings.md for full details.

The broader concern: if one latent memory bug hid for multiple sessions, the other listed bugs could bite us any time Apple ships an update or we reorder a hook.

Trust posture

We cannot trust the inject code at face value until we eliminate at least the CRITICAL audit findings. Every further test is noise until we do this — each new experiment could be blocked by a bug we already know exists but haven't fixed.

The approach going forward must be:

  1. Fix CRITICAL bugs first — they are cheap and their presence invalidates any test result
  2. Then make the minimal intended change (remove Step 19)
  3. Then test
  4. Then one step at a time

Stage 0 — Safety belt (fix CRITICAL audit findings first)

Effort: ~1 hour of mechanical fixes, no research needed.

# Fix File:Line
S0.1 C2: compute orig_target from rel32 of callq instead of hardcoded cds_base + 0xB0ECE iosmux_inject.m:1298
S0.2 C1: replace immediate bake of g_hook_wrapper with mov rax, [rip+global] pattern. Assert g_hook_wrapper != NULL before installing hook pages. iosmux_inject.m:1207–1213, 1339–1345
S0.3 C4: wrap iosmux_hook_with_wrapper and iosmux_with_wrapper_replacement in #if 0 with explanation comment iosmux_inject.m:498–524
S0.4 C5: add "r12","r13","rbx" to clobber list on the updateIdentifier asm block iosmux_inject.m:1014–1031
S0.5 C6: Block_copy before invoking the connected_callback block, Block_release after iosmux_inject.m:1065–1078

Gate: after Stage 0, verify the build still works and inject still completes init successfully. No behavior change expected — these are defensive fixes only.

Stage 1 — Unblock check-in (the intended next test)

Effort: 5 minutes to disable + test.

S1.1 Disable Step 19 (CDS+0xB896 hook) entirely

Simplest option: wrap the whole Step 19 block in if (0) with an explanatory comment pointing at phase2-behavior-analysis.md. Functionally equivalent to a passthrough trampoline but with zero extra machinery.

S1.2 Validate check-in

Tests in order:

  1. Build + deploy dylib
  2. killall CoreDeviceService to force relaunch on next client
  3. Open a CDS system log stream in one terminal: log stream --predicate 'process == "CoreDeviceService"' --style compact --info
  4. Run devicectl list devices in another terminal
  5. Verify within 2 seconds:
  6. devicectl returns successfully with device listed
  7. CDS log shows Client connected: [pid] (no name). Handling DeviceManagerCheckInRequest
  8. CDS log shows Published DeviceManagerCheckInCompleteEvent
  9. Run devicectl list devices several more times to confirm stability

S1.3 Stage 1 exit criteria

All three positive checks pass. No SIGSEGV. No timeout. If any fail, fall through to Stage 1 fallback.

S1.4 Stage 1 fallback

If check-in STILL fails after Stage 1:

  • Look at the CDS log to determine WHERE it's blocking:
  • Log shows Handling DeviceManagerCheckInRequest → blocker is downstream (typed handler, ServiceDeviceManager, publish path). Next suspect: handleDiscoveredSDR side effects on our injected SDR.
  • Log does NOT show Handling DeviceManagerCheckInRequest → blocker is still in the pre-dispatch path. Next suspects:
    1. DYLD_INTERPOSE on one of the 7 functions interfering with initialisation
    2. CDS+0xCCB0 or CDS+0x5E2D0 hook firing in unexpected context
    3. One of the Stage 0 fixes introduced a new bug

Stage 2 — Xcode visibility smoke test

Effort: 5 minutes to observe.

  1. With Stage 1 verified, open Xcode → Window → Devices and Simulators
  2. Observe whether the device appears in the sidebar
  3. Capture which fields Xcode shows (name, OS version, busy/paired state)
  4. Do NOT click Pair — that's a separate gated test
  5. Capture Xcode's XPC traffic to CDS via the upcoming Stage 3 logging interpose

Stage 2 exit criteria

Device row appears in Xcode. Its state may be "connecting" or similar — that's OK at this point. The goal is just to confirm Xcode sees it.

Stage 3 — Logging-only Mercury interpose

Effort: 2-4 hours including capture + analysis.

S3.1 Add a pure observational hook

Interpose xpc_connection_send_message and xpc_connection_send_message_with_reply (not set_event_handler — we want to capture both directions). Filter the incoming peer connection name to "com.apple.CoreDevice.CoreDeviceService".

For matching messages:

  1. xpc_copy_description(msg) → write to /tmp/iosmux_mercury.log
  2. Also attempt to introspect the mangledTypeName value and log it separately
  3. Pass through unchanged — absolutely no behavior change

S3.2 Capture the traffic we care about

With the interpose active:

  1. devicectl list devices — captures DeviceManagerCheckInRequest, ProvisioningProvidersListRequest, the CompleteEvent, etc.
  2. Open Xcode Devices window — captures whatever Xcode's device-manager does on initial connect
  3. (Without clicking Pair) look for any acquireusageassertion traffic — this might be triggered by just opening the window

S3.3 Build the envelope catalog

Deliverable: docs/research/mercury-envelope-catalog.md with actual xpc_copy_description output for each captured message. For each envelope:

  • Identify the top-level keys
  • Identify the mangledTypeName value
  • Extract the Codable payload shape (as much as we can see)

This replaces all our current guesses about the wire format.

Stage 3 exit criteria

At least the following envelopes captured in full:

  • DeviceManagerCheckInRequest + CompleteEvent
  • ProvisioningProvidersListRequest (the "Ignoring" message)
  • acquireusageassertion request + reply (if Xcode sends one)
  • Any Pair-flow messages Xcode sends before we click the button

Stage 4 — Targeted acquireusageassertion interceptor

Effort: 4-8 hours depending on envelope complexity.

Primary target: acquireusageassertion. If Xcode gets a successful reply, its client-side _shadowUseAssertion is set, hasConnection=true, and the Pair button disappears entirely. We then never need to handle PairAction — which is good because PairAction has a ChallengeAnswer sub-protocol we can't currently implement.

Approach selection (gated on Stage 3 output)

Decide based on envelope complexity:

Option A — XPC-level Mercury Codable faker

Interpose xpc_connection_set_event_handler (per research A+B), wrap the handler for com.apple.CoreDevice.CoreDeviceService connections. For messages with a mangledTypeName matching CoreDevice.AcquireBUsageAssertionActionDeclaration (or similar — TBD from Stage 3), forge a reply dict by cloning a known-good reply's byte layout and substituting UUIDs. Send via xpc_connection_send_message on the reply connection obtained from xpc_dictionary_get_remote_connection(reply).

Pros: doesn't need Swift async ABI work. Cons: requires mimicking exact Codable byte layout.

Option B — Swift-level ActionImplementation.invoke() hook

Hook the specific AcquireBUsageAssertionActionDeclaration.invoke() function (find offset in CoreDeviceUtilities, patch callsites in CDS binary). At this level, the action is already decoded into a typed Swift object and we interact with a strongly-typed continuation.

Pros: no Codable wire format work. Cons: Swift async ABI, need to synthesize and resume a continuation correctly from C code.

S4.1 Prototype + deploy

Implement chosen option, deploy, test that:

  1. CDS still stays alive
  2. devicectl list devices still works
  3. Xcode Devices window now shows device in a different (more "connected") state
  4. The Pair button disappears from Xcode Devices UI
  5. _shadowUseAssertion is set on the Xcode side (observable via DVTCoreDeviceCore logging or lldb attach to Xcode)

Stage 4 exit criteria

Pair button no longer shown. Xcode treats the device as usable. No crashes.

Stage 5 — DeviceInfo completeness and service wiring

Effort: multiple days.

Now that Xcode considers the device usable, fill in the holes:

S5.1 DeviceInfo fields

Set the 11 DeviceInfo fields listed in pair-button-and-cfnetwork.md (UNVERIFIED list — set one at a time and observe Xcode behavior):

  • transportType, platform, deviceType, reality, osVersion, osBuild, udid, authenticationType, developerModeStatus, bootState, isMobileDeviceOnly

Source values from the RSD Handshake Properties dict we already parse.

S5.2 Service routing verification

Ensure that when Xcode asks CDS to open a service socket (e.g. for com.apple.streaming_zip_conduit.shim.remote or com.apple.internal.dt.remote.debugproxy), our existing DYLD_INTERPOSE on remote_service_create_connected_socket actually gets invoked and returns a working TCP socket to the tunnel.

Test: try to install a simple app from Xcode. Even if it doesn't complete, we should see the service connection reach iPhone via the tunnel.

S5.3 Forward vs handle action routing

Come back to the list from Stage 4 research and actually implement forwarding for the 8 "forward" actions (createservicesocket, appinstall, transferfiles, etc.). Each needs its own Mercury reply construction if Option A was chosen in Stage 4, or its own invoke() hook if Option B.

Stage 6 — Integration verification

End-to-end:

  1. Install a test app from Xcode to the "remote" iPhone
  2. Attach debugger from Xcode
  3. Set a breakpoint, run the app, hit the breakpoint
  4. Stop cleanly

This is the ultimate goal. If we hit it, we have working Xcode integration.

Risk register

Risk Impact Mitigation
Stage 0 fix introduces a new bug Stage 1 test is invalid Test Stage 0 output by running inject once, checking log for successful registration. No behavior change expected.
Stage 1 removal of CDS+0xB896 hook re-crashes on Pair Pair path regresses to Session 5 state Stage 1 does NOT click Pair. That's for later. Current state is non-functional anyway.
Stage 3 interpose misses something Envelope catalog incomplete Can re-run with more capture. Observational only, no side effects.
Stage 4 Mercury Codable forging fails Stuck on Pair button Fall back to Option B (Swift invoke hook)
Swift async continuation ABI is too hard Option B fails Consider DVTCoreDeviceCore-side interpose on Xcode process (last resort)

Deferred items

Medium-severity audit findings (M1-M10) are deferred until we reach Stage 5. They're unlikely to bite in the current Stage 1-4 path but should be fixed before productisation.

Other deferred:

  • create_service_endpoint 128-listener array fix
  • Wire decoder input validation
  • Go relay blocking HTTP calls (Go relay is deprecated anyway)

Stopping rule

At any stage, if reality diverges from prediction:

  1. STOP. Do not layer more changes on unknown state.
  2. Run a minimal diagnostic (log stream, lldb attach, etc.)
  3. Document finding in a new research doc
  4. Update this roadmap if necessary
  5. Only then resume

The wrapper bug is a direct consequence of violating this rule previously.