Phase 2 Behavior Analysis — Hypothesis Verified¶

Date: 2026-04-13 (session 7) Status: Hypothesis confirmed via our own documented evidence. Ready to act.

Question¶

Why does devicectl list devices time out with "connection interrupted" after deploying the wrapper fix, when CDS is alive (PID 1301, exit code 0, no crash)? Hypothesis was that CDS+0xB896 hook (mov al, 1; ret) blocks check-in because invoke(anyOf:usingContentsOf:) is reached for every message, not only Pair.

Answer: hypothesis CONFIRMED¶

Evidence from docs/research/coredevice-xpc-protocol.md:

Evidence line 1: dispatcher uses `invoke` as first-gate action decoder¶

"CoreDeviceService uses Mercury.XPCMessageDispatcher<Mercury.SystemXPCPeerConnection> to dispatch incoming messages. Messages that don't match ActionDeclarations are logged as 'Ignoring XPC message since it was not an ActionDeclaration'."

So Mercury tries every incoming message through the invoke(anyOf:) decoder first. Only after that returns false does the dispatcher fall through to the typed Codable path (where DeviceManagerCheckInRequest and friends get handled).

Evidence line 2: live trace from 2026-04-06¶

T+0.004  CoreDeviceService: peer connected [pid:33895]
         "Ignoring XPC message since it was not an ActionDeclaration"
         (This is the ProvisioningProvidersListRequest)

T+0.026  devicectl: "DeviceManager sending check-in request: 487F370F-..."

T+0.027  CoreDeviceService: "ServiceDeviceManager - Client connected: [33895]
         (no name). Handling DeviceManagerCheckInRequest..."

The trace shows: 1. First message (ProvisioningProvidersListRequest) is rejected by invoke(anyOf:) → falls through → logged as "Ignoring" 2. Second message (CheckInRequest) also goes through invoke(anyOf:) → rejected → falls through → routed to ServiceDeviceManager

Both non-action messages traverse invoke(anyOf:) on entry. So do action messages (which would succeed the decoder). Our hook mov al, 1; ret makes EVERY message look like a successfully-handled action → dispatcher skips the typed-Codable fallback → check-in never reaches ServiceDeviceManager → no response is ever sent → devicectl times out.

Evidence line 3: "Client connected"/"Handling CheckInRequest" absent from new logs¶

In Phase 2 test, the system log shows:

CoreDeviceService[1301:7079] activating connection: ... peer[1342]
(15 seconds of silence)
CoreDeviceService[1301:7118] invalidated because the client process (pid 1342)
    either cancelled the connection or exited

No "Client connected", no "Handling DeviceManagerCheckInRequest", no "Published DeviceManagerCheckInCompleteEvent". The invoke(anyOf:) prefilter ate the message before it reached the typed-Codable dispatcher path.

Why did older sessions appear to work?¶

Not possibility 2 (different path), not possibility 3 (cached). It was possibility 1 with an order-of-operations twist:

Step 19 (CDS+0xB896 hook) was added in Session 5. Earlier sessions where "device visible in Xcode" was observed did NOT have this hook. Check-in worked normally through Mercury → ServiceDeviceManager, devicectl got real replies, Xcode saw the device.

Then sessions 5-6 added Step 19 on top of an existing latent bug (SDR+104 read overflow). The wrapper hooks crashed CDS (SIGSEGV) fast enough that the Step 19 silent bug was masked — nobody noticed check-in failing because the process was dead first.

Commit 59ca5cd fixed the loud crash. Now the quiet bug is exposed: dispatcher prefilter silently swallows everything.

So: there was no regression. Both bugs existed. We fixed the loud one and uncovered the quiet one.

Recommended minimal next change¶

Comment out Step 19 (CDS+0xB896 hook) entirely.

Rationale: - Functionally identical to passthrough trampoline (movabs + jmp to orig is just a no-op layer) - Simpler, eliminates hook-related failure modes (rel32 reach, page protection, etc.) - Reverts dispatcher to pre-Session-5 known-good behavior - Zero regression risk: check-in already failed with hook active, can't be worse without it - Pair click crash that Step 19 was meant to suppress has a different root cause that the wrapper fix may have already addressed

Expected outcome: - CDS stays alive (wrapper fix is independent) - devicectl list devices returns within ~2s - CDS log shows "Client connected" + "Handling DeviceManagerCheckInRequest" + "Published DeviceManagerCheckInCompleteEvent" - Xcode Devices window shows the device - DO NOT click Pair yet — that's the next gated test after validation

If the hypothesis is wrong and check-in still fails after removing Step 19: - The CDS log will show whether we reached "Handling DeviceManagerCheckInRequest" - If yes → blocker is somewhere after Mercury dispatch (typed handler) - If no → blocker is earlier (in DYLD_INTERPOSE functions or xpc_connection setup)

Either answer is a clear next diagnostic direction.

Forward roadmap outline¶

See plan-forward-roadmap.md for the consolidated plan. Key stages:

Remove Step 19, validate check-in
Xcode visibility check (observe, don't click Pair)
Logging-only Mercury interpose to capture real envelopes
Targeted acquireusageassertion interceptor (single most important action for hasConnection=true / removing Pair button)
DeviceInfo completeness + service wiring
End-to-end install/debug verification