Skip to content

AUA history — D24..D35a empirical localisation era (2026-04-30..2026-05-01)

Status: verified — historical archive

D24 (post-D23 dynamic re-probe — common funnel) + D30 (AFM peer keep-alive deploy — peer lifetime is not the gate) + D31 (lldb v2 — peer[1013] race, SUCCESS path observed) + D32 (lldb to live CDS — kernel-driven invalidation) + D33 (devicectl injection feasibility) + D34/D35a (Heisenbug discovery). Archived from aua-side-channel-mechanism.md on 2026-05-02 anchor refactor.

D24 update: common funnel finding (post-D23 deploy, 2026-04-30 evening)

D24 dynamic re-probe with apparatus already in post-D23 state (P-1a NOP-evict deployed). Goal: localise residual errorCode 3 that survives D23. Three pre-registered hypotheses (W1 secondary-validation, W2 separate-cache, W3 analytics-gate) — D24 falsified all three; named the actual mechanism W4.

W4 — assertionq dispatcher is the common funnel; side-channel peer is the second input

Empirical (notes/d24 §F.1 stack walk, §L unified-log correlation):

  1. D23 P-1a NOP-evict still effective — bp4 (the D23-NOP'd removeCachedXPCConnection symbol) NEVER fired during the D24 trigger run. Action-XPC connection eviction path is fully suppressed.
  2. XPCError 1001 IS BACK — but on peer[2019] of a SIDE-CHANNEL XPC connection (different connection from the action XPC). Mercury's XPCSideChannel-installed event handler delivers it (unified.log line 47, t=14:34:57.171). Verbatim: [com.apple.dt.coredevice:useassertion] Recieved error from side channel peer: XPCError(errorCode: 1001, "The connection was invalidated.")
  3. CD's [useassertion] subsystem subscribes to the Mercury event and posts a block to com.apple.dt.coredevice.remotedevice.default.assertionq.
  4. The assertionq dispatcher runs ___30eb0 AGAIN, but at a DIFFERENT internal call site this time:
    • D21 (cache-eviction input): ___30eb0+2034___324a0+98swift_continuation_throwingResume
    • D24 (side-channel input): ___30eb0+1910___324a0+56swift_continuation_throwingResumeWithError
    • Same wrapper functions, different internal RetPC offsets — proves the dispatcher chain is a SHARED FUNNEL with multiple input paths.
  5. errorCode 3 user-visible at unified.log line 50 (t=14:34:57.285): Failed to acquire usage assertion ... CoreDeviceError(errorCode: 3, "Failed to acquire assertion"). Literal string Failed to acquire assertion lives in CoreDevice __cstring at file offset 0x369860 (notes/d24 §J.1).

D24 fix vector ranking (notes/d24 §N)

Q-1a wins for the same reason D23 P-1a won: hook the FUNNEL, not the inputs. The dispatcher receives errors from at least two upstream paths (cache eviction + side-channel teardown); patching the funnel addresses both with one hook. Expected D25 implementation pattern: NOP-style or wrap-callback patch at CD+0x31621 (= ___30eb0+1905, the call instruction whose return PC is the captured frame-3 RetPC CD+0x31626).

If D25 Q-1a deploys cleanly and works, D23 P-1a NOP-evict on removeCachedXPCConnection becomes redundant (funnel hook already covers cache-eviction input). Decision to revert/retain D23 deferred to post-D25 acceptance test.

W1/W2/W3 falsifications (notes/d24 §M)

  • W1 (post-success validation on reply payload) — rejected. Action XPC reply decodes as success() with no error key (notes/d24 §I.2). Construction happens AFTER reply-decode path completes.
  • W2 (separate cache or registry) — rejected. Regex bps for UsageAssertionCache, checkAssertion, validateAssertion, registerAssertion matched ZERO symbols in CoreDevice (notes/d24 §C). No second cache.
  • W3 (analytics-side gate) — rejected. Analytics emit at unified.log lines 44-45 fires BETWEEN the side-channel error (.171) and the user-visible error (.285) — analytics is passive observer, not a gate.

Apparatus integrity (D24 also non-destructive)

iosmux_inject.dylib sha256 still 52df2cc643dbc6c24d3debd6eba8682ded9fd8cf5d1f9ee3c57962e0171b4803 (D23 v2). CoreDevice sha256 unchanged. iPhone (iosmux) connected (no DDI). Zero CDS/devicectl crashes during D24 probe. D23 hook still installed at runtime (verified by inject-log lines 562-564 in notes/d24 §A).

D30 update: AFM peer keep-alive — FAILED (2026-04-30 evening)

D30 was the first empirical test of "Why GAMBIT's AFM endpoint emit alone does not satisfy V2" §"AFM is emitted over an outbound connection iosmux creates with xpc_connection_create_from_endpoint. Apple expects a persistent inbound peer connection FROM CDS to live for the assertion's lifetime." Specifically D30 tested only the lifetime half of that statement: keep the peer alive past success() reply and observe whether errorCode 3 still fires.

Hypothesis

Explicit xpc_connection_cancel + xpc_release of the AFM peer connection (the send_barrier { cancel } ; release pattern introduced by commit 28d7d89) makes Apple's AUA wrapper at CD+0x31750 observe XPCError(connectionInvalid, 1001) on the same Mercury.XPCSideChannel-installed event handler that delivers AFM, and InProgressClientAssertion.state transitions to .invalidated(error) instead of .fulfilledcontinuation resumes with .failureCoreDeviceError(errorCode: 3) thrown. Note: D20 iter-3 (c49429b, reverted) had previously tested "no peer at all" (suppressed AFM emit entirely) → identical 9-13 ms latency to errorCode 3. D30 is the genuinely-untested case "open + send + keep alive forever".

Fix candidate (deployed)

Three edits to inject/iosmux_gambit.m:

  1. Globals at top of TU (after LOG macro): static NSMutableArray<id> *g_afm_retained_peers = nil; static NSLock *g_afm_retained_lock = nil;
  2. Replace gambit_emit_afm_via_endpoint cancel-after-send block: instead of xpc_connection_send_barrier(ep_conn, ^{ xpc_connection_cancel(ep_conn); }); xpc_release(ep_conn); — append (__bridge id)ep_conn into g_afm_retained_peers under g_afm_retained_lock. ARC keeps refcount > 0 for the lifetime of the CDS process. No xpc_release.
  3. Init lock + array at top of gambit_install_hook (the constructor entry called from iosmux_inject_init).

Full source diffs preserved in commit history of inject/iosmux_gambit.m (subsequently reverted at distillation time).

Apparatus state at deploy:

  • New build sha: 6c9b07bad7ac383c8a66d4660a1e04e41411608ff65286cce0eb808afde89f49 (D23 v2 was 52df2cc6...4803).
  • Size delta: 193680 − 193664 = +16 bytes. Tiny because NSLock and NSMutableArray are existing Foundation classes (no new ObjC class metadata emitted) — only handful of objc_msgSend call-sites + 2 zero-init static pointer slots in __DATA. Build sha differs from baseline → code change landed (not cached artefact).
  • CDS respawned via killall CoreDeviceService (PID 749 → 1940). Both hook install paths (AUA keepalive NOP at 0x10d817be0 + GAMBIT trampoline at 0x10c637f50) succeeded rc=0. ASLR slide differed from prior run (0x10d80a000 vs 0x111765000) — install code computed new slide correctly.
  • Backup of D23 v2 dylib at /home/op/backups/iosmux/d30-pre-deploy/iosmux_inject.dylib.d23v2 (sha verified 52df2cc6...4803).

Verdict: FAILURE

Same outcome class as D23 v2 baseline. Citations from the captured 60s unified-log window (timestamps verbatim from havoc):

Citation 1 — failure is the only AUA outcome present, no Successfully acquired line anywhere:

2026-04-30 21:29:52.008620+0700  localhost devicectl[2001]:
  (CoreDevice) [com.apple.dt.coredevice:useassertion]
  Failed to acquire usage assertion on device
  E8A190DD-64F5-44A4-8D57-28E99E316D60 due to error:
  CoreDeviceError(errorCode: 3, errorUserInfo:
    ["NSLocalizedDescription": "Failed to acquire assertion"])

errorCode 3, identical NSLocalizedDescription string, ~10 ms after success() reply at 21:29:51.998775. Latency identical to D23-v2 baseline.

Citation 2 — GAMBIT path executed correctly: AFM emitted on the same peer the brief identifies, the keep-alive code path DID run (no cancel barrier scheduled, peer kept in g_afm_retained_peers):

[inject] GAMBIT: synthesised reply for aid=com.apple.coredevice.action.acquireusageassertion label=acquireusageassertion
[inject] GAMBIT: emitted AFM endpoint=0x7fc754107cb0 did=E8A190DD-64F5-44A4-8D57-28E99E316D60

(/tmp/iosmux-d30-inject-tail.log lines 49-50, immediately prior to AUA failure at 21:29:52.008.)

The action reply was a success() per devicectl's own log:

2026-04-30 21:29:51.998775+0700  localhost devicectl[2001]:
  (CoreDeviceUtilities) [com.apple.dt.coredevice:action]
  Received reply from action
  (type=AcquireDeviceUsageAssertionActionDeclaration,
   invocation=33377F43-05D2-4117-9C02-BF4C1EE7848C):
  success()

So the synchronous action-RPC path is fine. AUA wrapper's later state-machine resolution to .fulfilled does NOT depend on the AFM peer connection's lifetime.

Critical absence — XPCError 1001 path eliminated

D24's W4 evidence chain showed peer[2019] of the side-channel surfaces XPCError(errorCode: 1001, "The connection was invalidated.") on Mercury's event handler, and CD's [useassertion] subsystem subscribes to that and posts to the assertionq dispatcher. In the D30 unified-log window: zero XPCError 1001 lines, zero peer[2019] references, zero Recieved error from side channel peer strings. The 1001 surfacing was caused by the cancel-after-send pattern; D30 eliminated the cancel; the 1001 surfacing stopped. But errorCode 3 still fires.

This proves the dispatcher chain has a third uncatalogued input source (input #3) that produces Failed to acquire assertion even when (a) action XPC reply is success(), (b) cache-eviction is NOP'd (D23 P-1a), and © AFM peer never invalidates (D30).

What D30 ruled out

  • Hypothesis: AFM peer cancel-after-send is what causes AUA wrapper to see connection invalidate and resolve .invalidated(error). RULED OUT. With cancel removed and peer held for entire CDS process lifetime, AUA outcome is byte-identical (same errorCode 3, same NSLocalizedDescription). Peer lifetime is not the gate.

What D30 leaves on the table (open hypothesis space for D31 research probe)

Listed exhaustively in the D30 REOPEN admonition near the top of this doc: AFM identifier value mismatch, Mercury XPCSideChannel framing layer, AFM timing race vs success-callback, separate notification channel (darwin notify / property update), DSS field validation (monotonicIdentifier etc).

Apparatus integrity (D30 also non-destructive)

  • iPhone (iosmux) still connected (no DDI) — no degradation.
  • CoreDevice sha256 unchanged (bea205e2c64622d144bcc7664ee104083d0e192aca206739cca345dc7c420495).
  • Zero new diagnostic reports in /Library/Logs/DiagnosticReports/ — final dump's 5 entries are byte-identical to baseline (apfsd, simdiskimaged, shutdown_stall noise types only). No CDS / devicectl / remoted crash.
  • Backup at /home/op/backups/iosmux/d30-pre-deploy/iosmux_inject.dylib.d23v2 intact (sha 52df2cc6...4803) — D23 v2 baseline restorable via single scp + killall CoreDeviceService.

Distillation status

D30 raw notes (notes/d30-afm-peer-keepalive.md, ~32 KB, gitignored) deleted at distillation time per feedback_notes_are_temporary_buffer. Empirical findings, FALSIFIED hypothesis, citations, and apparatus state preserved in this section + the REOPEN admonition near the top of this doc + the Q-D66-15 D30 Resolution log entry in docs/plans/d66-research-questions.md.

D31 update: input #3 IS peer[1013] race — same dispatcher as input #2; SUCCESS path observed (2026-05-01)

D31 lldb dynamic probe localised the third input source named in the D30 admonition. The empirical finding changes the architectural question from "what's the third gate" to "how to win a race that is deterministically winnable".

Construction site for input #3

Symbol Module File offset Role
CoreDeviceError.init(code:userInfo:) CoreDeviceUtilities +0xbabf0 Final ctor — receives code:Int32 = 3 in rdi
Swift.Error<...CoreDevice._Error>.init(Int32, String) CoreDeviceUtilities +0xc10c6 Inner-call call-site
Swift.Error<...CoreDevice._Error>.init(τ_0_0, Optional<String>) CoreDeviceUtilities +0xc2780 Outer-wrap adds userInfo["NSLocalizedDescription"] = "Failed to acquire assertion"
static Swift.Error<...CoreDevice._Error>.xpcError.getter CoreDeviceUtilities +0xc2978 Bridge from XPCError to CoreDeviceError

Caller chain (input #3 anchor inside CoreDevice)

Symbol Module File offset PC at hit Role
___lldb_unnamed_symbol_30230 CoreDevice +0x30230 +0x30368 (inner) / +0x303cd (outer wrap) Input #3 synthesis caller — calls xpcError.getter then wraps with NSLocalizedDescription
___lldb_unnamed_symbol_31700 CoreDevice +0x31700 +0x31734 Outer caller — invokes ___30230

Throw funnel — same dispatcher as D24 W4 input #2

D31 confirms input #3 traverses the same ___30eb0/___3bc80/___324a0 dispatcher as input #2; only the internal RetPC offset differs (build-dependent):

Symbol File offset entry PC at HIT (D31, current build) D24 measurement (input #2, prior build)
___324a0 +0x324a0 +0x324d8 (offset +0x38) +0x324f8 (offset +0x56)
___3bc80 +0x3bc80 +0x3bcbd (offset +0x3d) +0x3bcbd (offset +0x3d)
___30eb0 +0x30eb0 +0x31626 (offset +0x776) +0x317c0 (offset +0x910 — D24 reported "+1910" decimal = +0x776 actually; offsets reconcile, prior doc had ambiguous formatting)
___1e300 +0x1e300 +0x1e319 (offset +0x19) +0x1e319 (offset +0x19)

Inputs #2 and #3 share the same final dispatcher; they differ only in WHICH XPC peer's invalidation drives the 1001.

THE PEER

The XPC peer that drives the 1001 in input #3:

<SystemXPCPeerConnection 0x... { <connection: 0x... {
    name = com.apple.xpc.anonymous.0x...peer[1013].0x...,
    listener = false, pid = 1013, euid = 501, egid = 20,
    asid = 100024 } }>

pid = 1013 is the CoreDeviceService daemon. This is the primary CDS↔devicectl side-channel connection (distinct from the AFM peer[2019] that D30 retained).

D30's AFM-peer keep-alive addressed peer[2019]. peer[1013] was untouched. Its post-reply invalidation continued to fire and continued to be caught by the same [useassertion] observer that synthesises errorCode 3.

Race condition — success is deterministically winnable

D31 captured two outcomes under identical apparatus state:

  • Run #2 (success): [useassertion] Successfully acquired usage assertion E8A190DD-... followed by graceful invalidation. The 1001 STILL fires post-reply, but the success-resume thread won the race.
  • Run #3 (failure, iter 1 of 5-iteration retry): [useassertion] Failed to acquire usage assertion ... CoreDeviceError(errorCode: 3). The side-channel observer drain pump won the race.

Two competing threads in devicectl:

  1. Thread A — success-resume path: drives swift_continuation_throwingResume (no error variant). Triggered by GAMBIT-synthesised success(...) reply unwinding through CoreDevice's reply-handling code. Stack: ___324a0 → ___3bc80 → ___30eb0 → ___1e300 → _dispatch_call_block_and_release.

  2. Thread B — side-channel observer drain pump: catches XPCError 1001 from peer[1013] invalidation, synthesises CoreDeviceError(code:3, userInfo:[NSLocalizedDescription: "Failed to acquire assertion"]) via the construction chain above. Stack: CDU +0x55510 / +0x55540 / +0x52510 / +0x52630 / +0x54060 / +0x56350 / +0x54ab0 family on _dispatch_lane_serial_drain___31700+0x34___30230+0x368 (inner) / +0x3cd (outer) → ctor chain.

Whichever thread first calls into the AUA continuation wins. Run #2 proves Thread A CAN win deterministically if the post-reply teardown path is short-circuited or delayed long enough.

Image base addresses (current build, CDS PID 1013)

For lldb breakpoint resolution and offset arithmetic on the post-D30-revert apparatus:

Image __TEXT base sha256
CoreDeviceService.xpc 0x102b1d000 (Apple — unchanged)
CoreDevice.framework 0x104c7b000 bea205e2...0495
CoreDeviceUtilities.framework 0x103a99000 (Apple — unchanged)
CoreDeviceInternal.framework 0x1032ad000 (Apple — unchanged)
iosmux_inject.dylib 0x1032d4000 52df2cc6...4803

Process locality

Client-side (devicectl). Both BP2 hits and the throw at BP1 fire inside the devicectl process. The CDS daemon (PID 1013) reply is not the immediate trigger — the trigger is the post-reply peer invalidation propagated from CDS to devicectl via the side-channel.

This reconciles with D30 unified-log line 14 (devicectl[2001]: ... Failed to acquire usage assertion ...).

Strategy implications for D32

Three layered options ranked by stability:

  1. Suppress peer[1013] invalidation upstream of the 1001 emission — find what triggers peer[1013] invalidation CDS-side and block at AUA reply time. Most stable: closes the race at its cause instead of chasing downstream symptoms. Selected for D32 per dispatcher decision (rationale: prior D23/D30 NOP attempts pulled chained downstream gates one after another; cause-side blocking is the only architecturally clean exit).

  2. NOP the side-channel observer drain pump (CDU +0x55510 family) — risky. Same pump may handle legitimate errors elsewhere. Blast radius unknown.

  3. NOP CD+0x30230 synthesis — most localised but not AUA-specific. CD+0x303cd path also reached for non-AUA errors. Likely breaks other CoreDevice surfaces.

D32 (next): research probe to localise peer[1013] invalidation source CDS-side.

Apparatus integrity (D31 also non-destructive)

  • iosmux_inject.dylib sha unchanged (52df2cc6...4803).
  • CoreDevice sha unchanged (bea205e2...0495).
  • iPhone (iosmux) connected (no DDI).
  • CDS PID stable at 1013 across all 3 lldb attaches (lldb attached to devicectl, not CDS — no daemon respawn).
  • Zero new diagnostic reports.
  • Backup at /home/op/backups/iosmux/d30-pre-deploy/iosmux_inject.dylib.d23v2 intact.

Distillation status

D31 raw notes (notes/d31-aua-errorcode3-localisation.md, ~575 lines, gitignored) deleted at distillation time per feedback_notes_are_temporary_buffer. Empirical findings + race-condition evidence + construction site + caller chain + strategy disposition preserved in this section + the Q-D66-15 D31 Resolution log entry in docs/plans/d66-research-questions.md.

D32 update: peer[1013] invalidation is KERNEL-driven via MACH_NOTIFY_NO_SENDERS — no CDS-side hook point exists (2026-05-01)

D32 lldb dynamic probe (general-purpose research agent, attached to live CDS PID 1013, 12 BPs covering xpc_connection_cancel, _xpc_connection_cancel, xpc_remote_connection_cancel, _xpc_connection_dispose, Mercury XPCSideChannel.{deinit,__deallocating_deinit}, Mercury RemoteXPCConnection.{deinit,__deallocating_deinit}, RemoteXPCConnection.unsafePeer(from:forServiceNamed:), XPCSideChannel.send(message:), CoreDevice.XPCSideChannel.sendCancelledMessage(), and the CoreDeviceUtilities.invoke(anyOf:usingContentsOf:) action sentinel) falsified D31's "cause-side suppression CDS-side" strategy direction. The peer[1013] invalidation has NO CDS-application-level instruction site.

What D32 found

Two distinct XPC connection lifecycle traces during AUA window:

Connection A (iosmux's AFM peer, address 0x7f89c6e075e0): cancelled by iosmux itself via __gambit_emit_afm_via_endpoint_block_invoke_2 → xpc_connection_cancel → _xpc_connection_cancel. Confirms current D23 v2 baseline state (cancel-after-send pattern, post-D30-revert). Connection A lifetime is irrelevant to errorCode 3 (D30 already proved this).

Connection B (peer[1013], address 0x7f89c6a09040): NEVER cancelled by any CDS-side application code. Cancel chain has zero CD/CDU/Mercury frames. Triggered by kernel-delivered MACH_NOTIFY_NO_SENDERS mach notification.

The kernel-driven invalidation chain

Frame  Function                                  Module / offset
#00    _xpc_connection_cancel                    libxpc + 0x4d2f7
#01    do_mach_notify_no_senders +0x3c           libxpc + 0x407c0
#02    _Xmach_notify_no_senders +0x21            libxpc + 0x40761
#03    notify_server +0x4e                       libxpc + 0x3ff32
#04    _xpc_connection_pass2mig +0x8e            libxpc + 0x3fe77
#05    _xpc_connection_mach_event +0x4d5         libxpc + 0x39755
#06+   _dispatch_client_callout4                 libdispatch
       _dispatch_mach_msg_invoke +0x1b7
       _dispatch_lane_serial_drain

This chain is entirely inside libxpc.dylib (system library), responding to a mach kernel notification. There is no CoreDevice / CoreDeviceUtilities / Mercury / CoreDeviceService function in the invalidation chain. There is no iosmux_inject function in the chain either.

Secondary cleanup chain at t+126ms confirms peer identity via _xpc_connection_remove_peer_impl +0x3c → _xpc_connection_remove_peer — this is internal libxpc code that removes an anonymous accepted peer from a listener's peer table. The connection IS exactly the name = com.apple.xpc.anonymous.0x...peer[1013]... shape D31 documented.

What this means

MACH_NOTIFY_NO_SENDERS fires when the OTHER side (devicectl) drops its last mach send right on the connection's underlying mach port. The trigger is in devicectl's process, not in CDS. When devicectl finishes processing the AUA reply and releases its handle to peer[1013], the kernel delivers the notification to CDS, libxpc tears down its peer table entry, and devicectl-side immediately observes its own connection handle's invalidation on the side-channel event handler — XPCError(errorCode: 1001, "The connection was invalidated.").

peer[1013] cannot be kept alive CDS-side without devicectl also keeping its end alive. devicectl's release is governed by its own process-internal logic — no CDS-side hook can reach it.

What D31's preferred strategy actually requires

D31's "suppress peer[1013] invalidation upstream of 1001 emission" strategy is sound in principle but the upstream lives in devicectl, not in CDS. Cause-side suppression therefore requires a new injection surface — a separate inject dylib loaded into devicectl via DYLD_INSERT_LIBRARIES or equivalent. Pre-existing project plan documented in ADR-0009 §Decision Option C consequences and stage2.md Phase D.6.6 alternatives ("P-1b: devicectl-side inject"); previously rejected as "REJECTED — overkill, 12-20 hours, new injection mechanism". The empirical D32 finding reopens that rejection because the previously-preferred CDS-side path is now empirically falsified.

Three remaining strategy options for D33

  1. devicectl-side inject — new dylib, loaded via DYLD_INSERT_LIBRARIES (or LC_LOAD_DYLIB on a patched devicectl copy following the CDS pattern). Hook the function inside devicectl that releases peer[1013]'s handle; defer the release past AUA continuation resolution. Most stable architectural exit but expands inject footprint to a new process. devicectl may have hardened runtime / library validation that needs codesign accommodation.

  2. NOP side-channel observer drain pump in devicectl (CDU +0x55510 family). Hook the drain pump that catches XPCError 1001 and short-circuit when the error is from peer[1013] in AUA context. More targeted than blanket-NOPing the pump — surgical hook on xpcError.getter (CDU+0xc2978) entry filtered by error code + caller context. Blast radius bounded to AUA path.

  3. NOP CD+0x30230 synthesis in devicectl. Same critique as D31 — not AUA-specific, breaks other CoreDevice surfaces.

D32 deliverable per brief: source localised (in libxpc, not in CDS application code). Strategy decision for D33 deferred to dispatcher.

Apparatus integrity (D32 also non-destructive)

  • iosmux_inject.dylib sha unchanged (52df2cc6...4803).
  • CoreDevice sha unchanged (bea205e2...0495).
  • iPhone (iosmux) connected (no DDI).
  • CDS PID stable at 1013 across lldb attach/detach cycles (running-attach worked first try; no need for launch-under-lldb fallback).
  • Zero new diagnostic reports.

Distillation status

D32 raw notes (notes/d32-peer1013-invalidation-source.md, ~349 lines, gitignored) deleted at distillation time per feedback_notes_are_temporary_buffer. Empirical findings + kernel-driven invalidation chain + Connection A/B distinction + strategy disposition preserved in this section + the Q-D66-15 D32 Resolution log entry in docs/plans/d66-research-questions.md.

D33 lldb dynamic probe (general-purpose research agent, lldb attach to launching devicectl, three sub-probes covering codesign/entitlements survey, DYLD_INSERT_LIBRARIES empirical test, and BP-based Option ½ hook target localisation) closed three concrete questions before D34 implementation work begins.

Q1 — DYLD_INSERT_LIBRARIES injection: FEASIBLE

devicectl is at /Library/Developer/PrivateFrameworks/CoreDevice.framework/Versions/A/Resources/bin/devicectl, sha256 4fede2dd...bf6c, Mach-O universal [x86_64 + arm64e], signed by Apple com.apple.CoreDevice.devicectl, TeamIdentifier 59GAB85EFG.

Codesign properties relevant to injection:

  • Library validation flag (0x2000) PRESENT.
  • Hardened runtime flag NOT present (runtime flag absent).
  • Three Apple-private entitlements (com.apple.private.CoreDevice.takeDeviceSysdiagnose, com.apple.private.sysdiagnose, com.apple.videoconference.allow-conferencing) — none restrict dyld inserts.
  • NO com.apple.security.cs.allow-dyld-environment-variables (not needed because hardened runtime is off).
  • NO com.apple.security.cs.disable-library-validation (LV is on, no explicit disable).

Empirical test: ad-hoc-signed no-op probe dylib /tmp/iosmux-d33-probe.dylib loaded into devicectl via DYLD_INSERT_LIBRARIES=... env var. Probe constructor printed banner to stderr ([d33] DYLD_INSERT_LIBRARIES probe loaded pid=3533); devicectl exited 0; no DYLD_INSERT_LIBRARIES ignored / library validation / AMFI rejection. Library-validation flag does NOT block ad-hoc signed dylibs in this configuration.

Conclusion: D34 does NOT need to design an insert_dylib LC_LOAD_DYLIB patch path. Standard DYLD_INSERT_LIBRARIES injection works.

Q2 — Option 1 hook target REFRAMED — peer release is libxpc-internal, not app code

Original D32 framing assumed Option 1 = "hook devicectl's release of peer[] handle, defer past AUA continuation". D33 P3 lldb capture (filtered xpc_release BP for connections matching xpc_connection_get_pid == CDS-PID) found:

  • Two peer connections from CDS to devicectl per AUA invocation:
  • 0x7faad11056d0 — first peer, carries AUA reply (HITs 1-7 in P3)
  • 0x7faad11043c0 — second peer, side-channel "result peer" named peer[1013] (HIT 8, the cancel-event recipient)
  • Second peer's xpc_release fires from a libdispatch worker thread:
#0  xpc_release+0x0
#1  libxpc.dylib`_xpc_connection_mach_event+0x418
#2  libdispatch.dylib`_dispatch_client_callout4+0x7
#3  libdispatch.dylib`_dispatch_mach_cancel_invoke+0x40
#4  libdispatch.dylib`_dispatch_mach_invoke+0x399
#5  libdispatch.dylib`_dispatch_root_queue_drain_deferred_wlh+0x113
#6  libdispatch.dylib`_dispatch_workloop_worker_thread+0x367

ZERO frames in the release stack pass through CD/CDU/Mercury Swift app code. The release is a passive libxpc Mach-event reaction to CDS-side dropping the peer (the symmetric image of D32's CDS-side kernel-driven invalidation).

The actionable Option 1 hook target that the agent identified: Mercury.SystemXPCPeerConnection.__deallocating_deinit at Mercury+0x4ea10 (sister .deinit at +0x4e9d0). BUT this is a Swift type's deinit chain that fires AFTER libxpc has already released the connection — hooking it doesn't prevent the release, just modifies post-release Swift cleanup.

Crucial reframing: Option 1 cannot achieve "prevent peer release" as the user originally hoped. The release is libxpc-internal Mach handling. Both Option 1 and Option 2 are now symptom-suppression strategies (intercepting different layers of the post-release error propagation), not cause-prevention.

Q3 — Option 2 hook target viability: CONFIRMED

CDU __TEXT base in current devicectl run: 0x1020a4000 (slide differs per launch — must be resolved at runtime). All D31 §E offsets resolve to expected symbols:

File offset Resolved symbol Notes
CDU+0xbabf0 CoreDeviceUtilities.CoreDeviceError.init(code:userInfo:) NAMED Swift symbol — direct dlsym target
CDU+0xc2940 static Swift.Error<...>.xpcError.getter (entry) NAMED Swift symbol
CDU+0xc2978 static Swift.Error<...>.xpcError.getter +0x38 (same function, +0x38 PC)
CDU+0x55510 ___lldb_unnamed_symbol_55510 Unnamed Swift function (drain pump)
CDU+0x54ab0 ___lldb_unnamed_symbol_54ab0 Unnamed Swift function (drain pump)
Mercury+0x57b40 Mercury.XPCError.xpcError.getter NAMED — the wire-XPCError layer
Mercury+0x5cd90 Mercury generic xpcError getter variant NAMED

Named symbols are dlsym-resolvable directly. Unnamed pump symbols are reachable via slide arithmetic from a sibling named symbol — the same approach already proven for the CDS-side inject's CD+0xdbe0 NOP-evict.

Bonus finding: CoreDevice.ActionConnectionCache.removeCachedXPCConnection at CD+0xdbe0 is ALSO present in devicectl process — the SAME class type with the SAME method exists on both sides of the CDS↔devicectl boundary. iosmux's CDS-side inject NOPs this in CDS context; the devicectl-side instance currently runs unmodified. Whether devicectl-side removeCachedXPCConnection is part of the AUA failure path is a possible follow-up but out of scope for D33.

Strategy comparison for D34

Dimension Option 1 (Mercury deinit) Option 2 (CDU error path)
Symptom vs cause Symptom-side (Swift cleanup chain after libxpc release) Symptom-side (error construction)
Hook point named Yes — Mercury.SystemXPCPeerConnection.__deallocating_deinit Yes — CoreDeviceError.init, xpcError.getter
Discrimination needed Yes — must filter for CDS-PID-peer connections in hook Yes — must filter for errorCode == 1001 AND AUA invocation context
Side-effects Memory-leak risk (preventing deinit chain leaves connection-bookkeeping graph in CD/Mercury caches) Lower — error construction is stateless; suppression doesn't keep dead state alive
Failure mode Hook misses → connection still released (deinit was already too late anyway) Hook misses → error surfaces normally
Implementation complexity Higher — Swift deinit + refcount semantics on Mercury internal type Lower — Swift function prologue rewrite, well-understood pattern

Agent recommendation: Option 2. Rationale:

  1. Both options are symptom-suppression — Option 1's "cause-side" framing was empirically falsified (release is libxpc-internal, not app code; Mercury deinit fires AFTER libxpc release). Given symmetric symptom-side scope, simpler symbol set wins.

  2. Named Swift symbols dlsym-resolvableCoreDeviceError.init(code:userInfo:) and xpcError.getter are NOT stripped. dlsym → patch prologue → trampoline back. Same proven pattern as CDS-side gambit_install_hook from inject/iosmux_gambit.m.

  3. No refcount/deinit hazards — Option 1 prevents a Swift type's deinit chain; Mercury's internal connection-bookkeeping graph may rely on the deinit cascade, and preventing it could leak Mercury state across AUA invocations. Option 2 hooks stateless error-construction functions — no state-machine entanglement.

  4. Filter discrimination is bounded — Option 2 filter: code == 1001 AND caller-stack contains AUA path symbols. Option 1 filter: peer's xpc_connection_get_pid == CDS-PID AND deinit context is AUA-related. Both feasible but Option 2's discriminator is on values (errorCode integer) not on object pointers (xpc_pid).

Apparatus integrity (D33 also non-destructive)

  • iosmux_inject.dylib sha unchanged (52df2cc6...4803).
  • CoreDevice sha unchanged (bea205e2...0495).
  • devicectl sha unchanged (4fede2dd...bf6c) — Apple binary untouched throughout the probe.
  • iPhone (iosmux) connected (no DDI).
  • CDS PID stable at 1013 across all probe steps.
  • Zero new diagnostic reports.
  • Test probe dylib at /tmp/iosmux-d33-probe.dylib on havoc — will not survive reboot.

Distillation status

D33 raw notes (notes/d33-devicectl-feasibility.md, ~562 lines, gitignored) deleted at distillation time per feedback_notes_are_temporary_buffer. Empirical findings + injection feasibility verdict + Option 1 reframing + Option 2 verification + comparison table + recommendation preserved in this section + the Q-D66-15 D33 Resolution log entry in docs/plans/d66-research-questions.md.

D34 (lldb attach to launching devicectl, BPs on connection lifecycle) and D35a (follow-up lldb probe for cancel-initiation trigger) both ran on 2026-05-01. Together they REFINE the picture but leave the cause-side trigger partly UNRESOLVED.

D34 — peer[] full lifecycle captured

D34's run captured peer[<CDS-PID>] = 0x7fdbe6404340:

t event source
103.5 BIRTH _xpc_connection_init libxpc internal
107.36 xpc_retain ×2 inside _dispatch_mach_cancel_invoke handler
107.510 (HIT #18) xpc_connection_cancel libxpc — _xpc_connection_mach_event+0x310, ZERO Swift/app frames
107.556-108.500 release/dispose cascade libxpc cleanup
111.269 (HIT #23) Swift DeviceUsageAssertion.deinit chain runJobInEstablishedExecutorContext → devicectl+sym_100016970+0x395 → CoreDevice.DeviceUsageAssertion.deinit+0x227 → Swift._IndexBox.__deallocating_deinit → Mercury.SystemXPCPeerConnection.__deallocating_deinit+0x2d → [OS_xpc_connection _xref_dispose] → _xpc_connection_last_xref_cancel → xpc_connection_cancel

Critical finding: HIT #23 Swift chain fires 3.7 seconds AFTER HIT #18 libxpc cancel. The Swift DeviceUsageAssertion.deinit (with _IndexBox capture of Mercury.SystemXPCPeerConnection) is late lateral cleanup, NOT the trigger. By time Swift Task continuation runs and DeviceUsageAssertion goes out of scope, libxpc has already cancelled the connection.

H1 (Swift action wrapper holds peer) is structurally trueCoreDevice.DeviceUsageAssertion does capture peer[] via _IndexBox of Mercury.SystemXPCPeerConnection. But hooking its deinit doesn't prevent HIT #18's earlier libxpc cancel.

D35a — Heisenbug: lldb perturbation alters AUA path

D35a's brief was to find the trigger of HIT #18 by BP'ing on dispatch_source_cancel family + _xpc_connection_mach_event ENTRY (capturing dispatch_mach_reason_t in $rsi). Empirical result: path-A failure mode (D34 reference) did NOT reproduce under lldb instrumentation — the failure took a DIFFERENT path.

Path-A vs Path-B (NEW empirical finding):

Path-A (D34 reference) Path-B (D35a under lldb)
Main XPC reply success(...) success(...)
Successfully acquired log YES ([useassertion] Successfully acquired usage assertion E8A190DD-...) NO (skipped — analytics shows executionDuration=0)
Side-channel peer error log YES ([useassertion] Recieved error from side channel peer: XPCError 1001 ... peer[1013].0x...) NO — log line absent entirely
Final outcome Failed to acquire usage assertion ... CoreDeviceError errorCode 3 Failed to acquire usage assertion ... CoreDeviceError errorCode 3
Path success → side-channel cancel → 1001 → errorCode 3 success → errorCode 3 directly, no side-channel error

D35a observed dispatch_channel_cancel ONCE at t=3.958 from devicectl-internal Swift Concurrency runJobInEstablishedExecutorContext → devicectl+sym_100095660+0x75 → devicectl+sym_100092c60+0x548. This is app-level dispatch cancel initiation but it operates on a generic dispatch_channel_t, not specifically peer[].

D35a's _xpc_connection_mach_event BP fired 17 times, zero matched CDS-PID filter (xpc_connection_get_pid returned non-CDS PID for all hits). The side-channel peer connection either wasn't created at all in this run, or was created but the filter mechanism didn't recognize it.

Implication: HIT #18's libxpc cancel handler runs only when path-A is taken. Under lldb BP-based instrumentation timing perturbation, path-B fires instead and the side-channel peer is never even instantiated — explaining why $rsi could never be captured.

Verdict on H1/H2/H3/H4 from D35a: NONE EMPIRICALLY CONFIRMED

The original four hypotheses for HIT #18 trigger remain UNRESOLVED:

  • H1 (GAMBIT cancel-after-send) — neither confirmed nor falsified; couldn't observe under lldb.
  • H2 (Apple-CDS cleanup separate from D23 path) — same.
  • H3 (libxpc internal heuristic) — same.
  • H4 (devicectl Swift dispatch_source_cancel) — partially weak signal (dispatch_channel_cancel hit) but on wrong target.

Plus a NEW finding: errorCode 3 has at least two coalesce paths in production. GAMBIT-defeat may need to address both, not just the side-channel peer error one.

Recommendation for D36 — switch observation method

D35a recommendation: abandon lldb BP-based instrumentation for HIT #18 trigger. lldb's per-BP overhead (SIGSTOP/SIGCONT cycle, Python callback eval) perturbs the timing enough to suppress path-A reliably. Alternatives:

  1. dtrace pid provider on devicectldtrace -n 'pid$target:libxpc:_xpc_connection_mach_event:entry { @[arg0, arg1] = count(); }'. Lower per-probe overhead (no SIGSTOP), captures arg0=conn arg1=reason directly to ring buffer. Bypasses lldb's per-hit Python-callback latency.

  2. CDS-side inject capture — extend GAMBIT logging in iosmux_inject_dylib to log every mach msg emitted via xpc_connection_create_from_endpoint peer (destination port name, msgh_id, msgh_bits). Sidesteps devicectl-side observation entirely. Correlates with devicectl os_log stream timing for indirect proof of path-A trigger.

  3. os_log predicate stream with subsystem:com.apple.coredevice category:useassertion at --debug level — captures internal trace points that don't print at default level. May reveal whether side-channel peer was created in path-A vs not in path-B without lldb in the loop.

Strongest combination: (1) dtrace + (3) os_log, both running concurrently with a normal xcrun devicectl device info details invocation, no lldb. dtrace's lower overhead preserves path-A timing; os_log gives userspace-level observability without affecting trace.

Apparatus integrity (D34 + D35a both non-destructive)

  • iosmux_inject.dylib sha unchanged (52df2cc6...4803).
  • CoreDevice sha unchanged (bea205e2...0495).
  • devicectl sha unchanged (4fede2dd...bf6c).
  • iPhone (iosmux) connected (no DDI).
  • CDS PID stable at 1013 across both probes.
  • Zero new diagnostic reports.

Distillation status

D34 raw notes (~2.9 KB stub, §A only populated; rest analyzed inline in dispatcher chat) and D35a raw notes (notes/d35a-libxpc-cancel-trigger.md, ~17 KB, §A-§H populated, §I PENDING) deleted at distillation time per feedback_notes_are_temporary_buffer. Empirical findings + path-A vs path-B distinction + Heisenbug observation + D36 method recommendations preserved in this section + the Q-D66-15 D34/D35a Resolution log entry in docs/plans/d66-research-questions.md.

See also