Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Open-Frontier Register

All addresses, counts, and sidecar figures on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id md5 89edbbe81c5b328a958fe628a9f2207d, 781,691,048 bytes, ELF x86-64, not stripped). Other versions will differ.

Abstract

Every page in this book asserts things about an unstripped 745 MB binary, and a reconstruction is only auditable if it states its own edges as precisely as its interior. This appendix is that edge: the honest catalog of what is not yet fully resolved, kept as a live register rather than a disclaimer. It is the inverse of the rest of the wiki — instead of "here is the algorithm," each entry is "here is the question, here is exactly what evidence would close it, and here is the page that would own the answer."

The frontier sorts into five categories, and the distinction matters because they close by different means. Decompilation walls are functions Hex-Rays declined to lift — most are import stubs with no body to recover, a residual ~21 are genuine code that a manual disassembly pass would close. Hardware-dependent facts are values that static analysis structurally cannot confirm because they only exist on a powered TPU: runtime-populated framework vtable slots, live telemetry counters, flag defaults that resolve against device state. Per-gen data gaps are constants for older codenames that ship in one proto form but not another. Inferred-link items are edges the wiki traced by name-family agreement but did not byte-confirm at the leaf. And the named open questions were five specific tasks (#1092, #1096, #1171, and the P-3-478..482 SparseCore/DMA cluster); a recovery pass closed the #1092/#1096/#1171 trio directly from the decompile (two surfacing CORRECTIONs) and a later targeted-disassembly batch closed the P-3-478..482 cluster (three byte-traced, one a confirmed non-edge), so all five are now resolved.

A register is only credible if it is also a graveyard for closed items. Several once-open claims were not just left open — they were overturned by later analysis, and those CORRECTIONs are recorded here as CLOSED-by-correction to prove the register reflects current state, not the initial scratch hypotheses. The trailing-zstd blob, the "walrus" pass driver, and the naive demangle-rate count are the worked examples.

For an auditor, the contract of this page is:

  • The exact failure floors — the 516 decompilation refusals and the 7,915 analysis problems, broken to the structural cause, so a reader knows which are knowledge gaps and which are noise.
  • The closeability grade per item — a Confidence column that, uniquely on this page, grades how confidently the gap can be closed and by what evidence, not how confident the current claim is.
  • The owning page per item — every open question routes to the page that would absorb its resolution, so closing the frontier is a navigable task list, not a wish.
Decompilation failures516 (no cfunc) — owned by methodology-deep
— import/data stubs486 (0x228601080x228611xx thunk band; not a knowledge gap)
— hand-written assembly9 (BoringSSL bignum/MD5, dnnl JIT kernels; no C source exists)
— template/codegen giants~21 (the genuine residual wall)
Analysis problems7,915 (6 types; final 4188 dominant)
Named open tasksP-3-478..482 SC/DMA cluster — the #1092/#1096/#1171 trio was recovered in a later pass (now CLOSED)
CLOSED-by-correctiontrailing-zstd blob · "walrus" · demangle-rate · per-gen geometry source
Confidence column semanticscloseability of the gap, not certainty of a present claim

The Frontier Register

The table below is the master index. Each row is one open item; the Confidence column grades how confidently the gap can be closed given the evidence named in the blocking-evidence column — HIGH means a bounded manual pass over identified addresses closes it, LOW means closing it needs evidence the binary does not contain (a powered device, a newer build). Rows that read CLOSED are kept to show the register is live; their detail is in the §CLOSED-by-Correction section.

Open itemCategoryBlocking evidence to close itOwning page
~21 template/codegen functions with no cfuncDecompile wallManual disasm of named addresses; raise lift budgetmethodology-deep
dnnl JIT + BoringSSL asm stubs unrecoverable as CDecompile wallNone — assembly with no C source; read the disasmembedded-library-atlas
PJRT vtable slots populated by framework at CreateHW-dependentA live PJRT_Client_Create trace on a TPUclient-and-device
FLAGS_enable_runtime_uptime_telemetry live valuesHW-dependentOn-device runtime; telemetry is read, not storedstream-executor-pjrt-adapter
Flag defaults that resolve against device stateHW-dependentDevice-resident config; static default may be a sentinelflag-catalog-full
chip_config (driver-side) vs chip_parts (geometry) splitCLOSED (recovered)Recovered: kChipConfigAliases (0x2200b8b0) is a static 4-entry flat_map keyed by TpuVersion {2,3,4,5} (variant "default"); v4/v5 share one alias sub-map. Per-TpuVersion consumer split is fully static, not device-probedper-gen-comparison-matrix
issue_latency_cycle_count absent in every embedded blobPer-gen gapA build whose chip_parts populates field 4per-gen-comparison-matrix
834 stream-op per-leaf (pattern,verb,dtype,space) opcodeCLOSED (negative result)No per-leaf static table exists: the slot command is runtime-assembled in SparseCoreStreamEncoder::Encode (0x1eb9b4c0) from 4 orthogonal proto bitfields — form (bits 53–58: linear 0x3b/strided 0x3a/indirect 0x39), verb (dword[+0x18]>>9)&7, dtype (>>0xc)&1, memspace (qword[+0x10]>>0x2f)&7. Open INFERRED → proven composed-not-per-leafllvmtpu-intrinsic-table
890 default-builder ops' exact arity + result predicateRECOVERED (arity) / NEGATIVE (predicate)Arity byte-read from each op's mangled Op<…OneResult/ZeroResults…NOperands<Lj N>…> trait pack — (#res,#operands) shape for 1060 of 1356, full census tabulated. Result predicate: no per-op Vreg/Mask/Scalar/Ptr at the MLIR layer — all generic OneTypedResult<mlir::Type> + one shared isCompatibleOuterTypellvmtpu-intrinsic-table
Per-intrinsic LLVM IntrProperties bitsRECOVEREDDecoded: set = IntrinsicsToAttributesMap[ID−1] >> 9 (@0x416fb30); all 12 fn-attr sets byte-decoded, per-set census exact (sums to 1356), 16-leaf sample byte-verified; per-leaf for the 1340 others is a deterministic one-halfword lookupllvmtpu-intrinsic-table
#1092 structured-sparsity slot encodingCLOSED (recovered)Recovered: there is no dedicated sparsity bundle slot — sparsity rides in the packed MXU operand layout (SparsityConfig, 1:N restriction, SME outer-product gate)slot-sparsity-v5plus
#1096 per-gen NOP canonical templatesCLOSED (recovered)Recovered: two orthogonal no-ops — empty-slot predicate kNeverExecute=31 fill + opcode-space all-ones NOP (CORRECTION NOP-1: the default bundle halts)nop-canonical
#1171 TpuVersion-aware flag-prefix dispatchCLOSED (recovered)Recovered: codename-prefixed flags are registered unconditionally and applied gen-blind (CORRECTION DISPATCH-1); the active gen selects only data/codec, not flag gatingflag-prefix-dispatch
P-3-478 InitializeOnScs lookup-callback edgeCLOSED (recovered)Recovered: InitializeOnScs (0x1337aa60) folds the core index then call *0x98(%rbx) into ExplicitRingRecord (0x133a9a40) / ExplicitAllToAllRingRecord (0x133a94a0), writing (ordinal, next_chip, reorder) to strategy +0x58/+0x60/+0x78tensorcore-barrier
P-3-480 LatencyTable::Create(TpuVersion) factory tailCLOSED (recovered)Recovered: not a cmp-switch — a flat ordinal-indexed inlined_vector<AnyInvocable> registry (0x225799f8) populated by five static-init TU initializers; registry[version](entry) via call *0x18, per-gen invoker→ctor byte-tracedcycletable-family
P-3-481 SetLatchIndices per-gen overrun handshakeCLOSED (recovered)Recovered: gate is idx==0 && !GainLatchModeHasOverrunChecks(latch_mode) (vtable +0x358); only ViperfishTarget::HasMsrOverrunChecks (0x1d49aac0 = mov $1) returns TRUE, so only v5 indexes the first latchlatch-assignment-overrun
P-3-482 cmem-load / sparsity DMA edgeCLOSED (non-edge)Non-edge: EmitVectorCmemLoad (0x14120a40) calls only SlotMap binders — no DMA/sparsity callee. Sparsity is v5+ SparseCore (gxc::*); v4 pxc::isa has no sparsity codec; the two never co-occurslot-cmem-load-pf
trailing-zstd blob → per-codename constantsCLOSED— overturned: no blob existstrailing-zstd-blob
"walrus" pass-pipeline driverCLOSED— overturned: zero occurrences in binaryglossary
naive _Z-prefix demangle rate (98%)CLOSED— overturned: field-backed rate is 93.0%methodology-deep
7 illegal_addr analysis problemsCLOSED (triaged)— triaged: 6 are an intentional call *0x10 near-null trap in gRPC filter epilogues, 1 is a .rodata jump-table IDA misaligned into; no reloc, no out-of-segment data ref§The 7 illegal_addr Anomalies

NOTE — the Confidence column on this page is deliberately not the four-level behavioral scale that evidence-conventions defines for the rest of the book. Here it grades the tractability of closing the gap: HIGH/MEDIUM items are bounded static work this corpus already contains the inputs for; LOW items need a powered device or a different build; CERTAIN (won't improve) items are at their permanent floor.


Decompilation Walls

Purpose

The headline floor is 516 functions for which Hex-Rays returned no cfunc — 0.058% of the 884,832 recovered functions. Read naively, "516 functions did not decompile" sounds like a 516-function knowledge gap. It is not. The methodology-deep gap audit owns the full taxonomy; this section's job is to state which of the 516 are frontier (worth re-attacking) and which are permanently at the floor.

The Three Structural Causes

A breakdown of the 516 by address band and demangled name puts every refusal into one of three buckets:

516 decompilation refusals (no cfunc)
├─ 486  import / data stubs        0x22860108–0x228611xx (one shard)
│        strlen · free · getenv · abort · __cxa_finalize · __tls_get_addr
│        MallocExtension_Internal_* · sched_getcpu · eventfd
│        → PLT/GOT-style thunks; no local body exists to lift. NOT a gap.
├─   9  hand-written assembly      0x206ee040–0x2071e720 · 0x1b012c00 …
│        bn_sqr8x_mont · bn_power5_nohw · bn_mulx4x_mont_gather5 · md5_sha1_final
│        dnnl jit_avx512/jit_uni convolution kernels
│        → assembly with no C source; read the disasm, never the cfunc. Floor.
└─ ~21  template / codegen giants  the genuine residual wall
         mlir::Dialect::addOperations<…1000+ TF ops…>  @ 0xfedc180
         xla::jellyfish::ReduceEmitter::EmitReduction  @ 0x13e16240
         llvm::LiveIntervals::computeVirtRegInterval   @ 0x18e601e0
         llvm::X86II::getMemoryOperandNo · AtomicExpandImpl::run · RegAllocEvictModel
         xla::{viperfish,ghostlite}::*Performance ctors · DummyAlias*Printer
         → real code Hex-Rays declined on lift budget; a manual pass closes these.

GOTCHA — the dominant 486 are import thunks in a single contiguous shard at 0x2286xxxx. A reader who treats "516 failures" as 516 missing functions over-states the gap by ~24×. The actual frontier is the ~21 template/codegen functions; the 486 thunks resolve trivially by their symbol name and the 9 asm stubs are at their permanent floor. Cite the ~21, not the 516, as the recoverable wall.

What Closes Each Bucket

BucketCountCloseable by
Import / data stubs486Already closed by symbol name; no decompilation owed
Hand-written assembly9Reading the disassembly; the cfunc will never exist
Template / codegen giants~21Manual disasm pass over the named addresses; some lift with a raised budget

QUIRK — the largest single refusal, mlir::Dialect::addOperations<…> at 0xfedc180, is one C++ call that registers the entire TensorFlow MLIR op set — over a thousand op classes as template arguments in a single statement. It is not algorithmically interesting; recovering it yields a flat registration list, not logic. It sits on the frontier only because the decompiler refused it, not because a reimplementer needs its body. The jellyfish::ReduceEmitter::EmitReduction refusal (0x13e16240) is the inverse — genuinely interesting reduction-emission logic behind a deeply nested btree_map signature — and was the one template-explosion refusal worth a targeted manual pass. That pass is done: a windowed disassembly of 0x13e162400x13e16720 recovered it as a thin axis dispatcher (validate → two axis bits → tail-call one of five specialized emitters), written up in fusion-patterns § ReduceEmitter::EmitReduction. The TPU-IP half of the residual wall is therefore closed — what remains is registration boilerplate (addOperations) and upstream-LLVM/XLA giants whose logic is recoverable but not TPU-specific.

The Zero-Output Floor — 5 Named Addresses (CLOSED)

The 516 "no cfunc" count is the set where Hex-Rays produced some output but no clean C tree. A strictly harder subset is the artifact-level floor: addresses for which IDA wrote no decompiled file at all. For libtpu.so that set is exactly five named addresses, and naming them resolves the question of whether the floor hides any TPU IP. It does not — every one is third-party:

Addressnm symbolKindWhat it is
0x1ffb8020riegeli::TransposeDecoder::Parsecode, 7,424 Briegeli columnar-transpose record decoder (the trace-container codec, riegeli-trace-container); too large for the lift budget, vendored upstream
0x21055260proto2::internal::UntypedMapBase::SpaceUsedExcludingSelfLongcode, 1,984 Bprotobuf map-field memory accounting; vendored upstream
0x206a3600mlkem::(anon)::fips::ensure_decap_self_test()::$_0::__invokecode, trivialML-KEM FIPS decap self-test lambda — call decap_self_test; test eax; ret; BoringSSL post-quantum crypto
0x206a8000(none — 0x1000 into ecp_nistz256_precomputed, 0x206a70000x206cc000, 148 KB)dataNIST P-256 precomputed EC-point table; BoringSSL crypto constant, not code
0x206d2690boringssl_self_test_fast()::kTLS10SecretdataTLS-1.0 KDF known-answer-test secret (the bytes contain the literal TLS10-KDF KAT); BoringSSL self-test constant, not code

GOTCHA — two of the five (0x206a8000, 0x206d2690) are data IDA tried to lift as functions — a NIST P-256 point table and a TLS KDF test vector — so "decompilation failed" is the correct outcome, not a gap. The other three are vendored library code (riegeli decoder, protobuf accountant, a one-line ML-KEM self-test). None is TPU IP. The artifact-level decompile floor of the entire 745 MB binary is two crypto blobs, two upstream library functions, and a crypto self-test stub — confirming the recoverable TPU surface was fully lifted.

The 7,915 Analysis Problems

The second published floor is the problems sidecar: 7,915 IDA analysis problems, distinct from decompilation refusals and clustering by type, not by subsystem.

Problem typeCountWhat it isFrontier?
final4,188Address finalized without full flow resolutionNo — analysis bookkeeping
rolled1,659Instruction rolled into a prior analysis unitNo — bookkeeping
disasm_problem942A byte span IDA could not cleanly disassembleMarginal — data-in-code edges
bad_stack574Stack-pointer delta unresolved at a pointMarginal — affects frame recovery
head_problem545Instruction-head boundary ambiguityNo — bookkeeping
illegal_addr7Reference to an address outside any segmentYes — 7 anomalies worth a look

NOTE — the 7,915 problems are overwhelmingly analysis bookkeeping (final + rolled + head_problem = 6,392, 81%), not knowledge gaps. The only rows a frontier auditor should chase are the 7 illegal_addr anomalies — references that point outside every defined segment, which usually mean either a relocation the loader resolves at runtime or a genuine analysis miss. They are the single most tractable problems-floor item, and they are triaged in full below.

The 7 illegal_addr Anomalies — Triage (CLOSED)

This meta page owns the triage, since the seven sites are a problems-floor item with no subsystem deep page of their own. Each of the seven addresses lies inside the executable segment (none is out-of-bounds as a code address); what IDA flags is that the operand decoded at that site references an address outside every PT_LOAD. The two structural causes — a near-null absolute indirect call, and a jump-table-base load IDA misaligned into — are confirmed below, with no remaining knowledge gap.

The segment frame they are measured against (readelf -lW):

LOAD  VirtAddr 0x00000000  FileSiz 0x213f25d0  Flg R E   ← .text lives here (sec [21] 0x0e63c000..0x21217484)
LOAD  VirtAddr 0x215f25e0  FileSiz 0x00a62bc0  Flg RW
LOAD  VirtAddr 0x222551c0  FileSiz 0x0026e6a0  Flg RW
LOAD  VirtAddr 0x22798c30  FileSiz 0x00021c00  Flg RW
Site (instr addr)Decoded operandFalls inRelocation?Classification
0x20062a67call *0x10 (ff 14 25 10 00 00 00)abs 0x10 → in no PT_LOAD (ELF-header gap, below LOAD1 VMA 0)none (readelf -rW empty at site; 0x10 unpatched)Near-null absolute indirect call; intentional unreachable/abort tail of a fused gRPC filter epilogue
0x2006a8a4call *0x10 (identical 7 bytes)abs 0x10 → no PT_LOADnonesame idiom (different filter fusion)
0x20070a95call *0x10 (identical)abs 0x10 → no PT_LOADnonesame idiom
0x20074db2call *0x10 (identical)abs 0x10 → no PT_LOADnonesame idiom
0x20079a52call *0x10 (identical)abs 0x10 → no PT_LOADnonesame idiom
0x2007e7b5call *0x10 (identical)abs 0x10 → no PT_LOADnonesame idiom
0xee6c96bmid-lea of lea -0x4af7817(%rip),%rsi # 0xa375158table base 0xa375158.rodata (sec [11], +0x1ed5158)none needed (RIP-relative, no dynamic reloc)Jump-table dispatch; flagged byte is inside the lea of a movslq (%rsi,%rdx,4); add %rsi,%rdx; jmp *%rdx switch — IDA failed to follow the computed table, base is valid

Evidence quoted:

  • The six gRPC sites are byte-identical and unique. A whole-.text disassembly scan finds the ff 14 25 10 00 00 00 / call *0x10 encoding exactly six times — these six sites and nowhere else. Raw bytes at each (file offset = VMA for .text): … 77 | ff 14 25 10 00 00 00 | 48 89 …. The SIB byte 25 with no base register encodes a disp32-absolute memory operand, so the call dereferences fixed linear address 0x10. That address sits below LOAD1's VirtAddr 0x0 mapping region's first code (in the header gap) and is in no PT_LOAD — hence illegal_addr. readelf -rW lists no relocation at any of the six instruction offsets, so the loader never rewrites the 0x10; it is a compile-time-fixed near-null pointer, the unreachable/crash tail of the heavily-templated grpc_core::promise_filter_detail::MapResult<…> filter-fusion combinator. (The R_X86_64_DTPMOD64 rows that surface when grepping 0000000000000010 are TLS-module relocs whose addend column is 0x10; their target VMAs are 0x22048d50… in .got, unrelated to any code reference to absolute 0x10.)
  • The absl site is a jump table, not a bad reference. At 0xee6c968 the instruction is lea -0x4af7817(%rip),%rsi # 0xa375158, immediately followed by movslq (%rsi,%rdx,4),%rdx ; add %rsi,%rdx ; jmp *%rdx — the canonical relative-offset jump-table dispatch in absl::container_internal::raw_hash_set<…tsl::tstring…>::find_large. The flagged address 0xee6c96b is three bytes into that lea (the RIP displacement bytes e9 87 50 fb), so IDA's linear sweep misaligned and then could not resolve the computed jmp *%rdx. The table base 0xa375158 resolves cleanly into .rodata (section [11], 0x84a0000..0xbe8af28, offset +0x1ed5158); nothing points out of segment. This is a recoverable analysis miss, not a knowledge gap.

GOTCHA — none of the seven is a relocation the loader resolves at runtime (the third hypothesis the NOTE above names): readelf -rW is empty at every site, and the only candidate target needing a fixup — absolute 0x10 — is deliberately left unpatched. Six are an intentional near-null indirect-call trap inside a gRPC template epilogue; one is IDA misreading a .rodata jump table. All seven close with zero residual gap — the illegal_addr floor is fully accounted for and needs no on-device evidence.


Hardware-Dependent Facts

Purpose

A second class of frontier item cannot be closed by any amount of static work, because the fact does not exist in the binary — it only exists on a powered TPU. These items are flagged here precisely so no future page treats a static placeholder as the real value. The closeability Confidence for every item in this category is LOW: closing them needs a live device, which is outside the static-RE method this book is built on.

Runtime-Populated Framework Vtable Slots

libtpu.so is a PJRT plugin exporting one C symbol, GetPjrtApi, returning a 140-slot PJRT_Api vtable (API & vtable Reconstruction). Of those slots, five are injected by CreatePjrtApi's caller and four of those carry genuine TPU-specific code (slots 8, 15, 87, 103, all tpu_plugin::-namespaced); all are statically resolvable — slot 15 points at tpu_plugin::PJRT_Client_Create (0xe6a8840), confirmed in the binary. But the contents of the PJRT_Client object that Create builds — the per-device handles, the tpu::System* shared pointer, the wired device/memory/topology slots — are populated at runtime by probing hardware. The static binary shows the construction code (client-and-device); it cannot show the resulting object graph, because that depends on how many cores the device enumerates and what tpu::System::Initialize (0x1d0ae420) discovers.

NOTE — the wiki traces PJRT_Client_CreateGetTpuPjRtClient (0xf8008c0) → xla::TpuClient construction fully as code. What it cannot trace is the post-construction state: how many xla::TpuDevice objects exist, what each TpuCoreLocation resolves to, what the throttle Semaphore permits. Those are runtime facts. A reimplementer rebuilding the client from the code path will get the shape right; the population is a live-trace question.

Live Telemetry Values

The construction path reads FLAGS_enable_runtime_uptime_telemetry and, when set, merges uptime telemetry into the client config. The flag's existence and its gate are static; the telemetry values it streams are runtime counters with no static representation. Any page describing telemetry content is describing a wire schema, not observed values.

Flag Defaults That Resolve On-Device

The flag catalog recovers each flag's static default from the binary. A subset of flags carry a static sentinel default that the runtime overrides against device state at boot (e.g. a -1/0 placeholder that means "ask the hardware"). For those, the static default is not the effective default. The frontier item is: which flags carry sentinels, and what device-state rule resolves each. This is closeable only with a device, hence LOW.

GOTCHA — a reimplementer who reads a flag's static default as its effective default will mis-configure any flag whose real value is device-derived. The static catalog is correct about what the binary stores; it is silent about what the runtime substitutes. Treat any default that looks like a sentinel as unresolved until confirmed on hardware.


Per-Gen Data Gaps

Purpose

Older-codename constants are a frequently-assumed gap that turns out to be mostly closed — and the register's job is to say so precisely rather than leave a vague "older gens may be incomplete." The genuine residual is narrow.

What Is Already Closed

The hinted gap — "older codenames ship only as chip_configs, not chip_parts" — does not hold for this build. The per-gen comparison matrix confirms that all nine <codename>_chip_parts.binarypb blobs (v2 through v7) are embedded contiguously in .rodata at 0xbdf29a0.., parsed by TpuChipParts::DefaultsForVersion (0x20b1b040). The older-gen lane/sublane/MXU/memory constants are therefore proto-sourced and materializable, not inferred. The HBM/VMEM/SMEM/SFLAG bytes, the MXU VectorIsa, and per-gen frequencies all decode straight from the wire bytes.

CORRECTION (PGM-1, summarized) — the per-gen geometry is not written by tpu::TpuChipConfig::Create (0x20ae98e0), as an early framing assumed. TpuChipConfig::Create builds the driver-side memory/queue layout via the kChipConfigAliases flat-map (0x2200b8b0); the lane/sublane geometry is a TpuChipParts/TpuTopology property. This correction is owned in full by per-gen-comparison-matrix; it is logged here as a CLOSED per-gen item so this register reflects the resolved state.

The Genuine Residual

Residual gapWhy it is openCloseable by
chip_config (driver) vs chip_parts (geometry) consumer splitRECOVERED — kChipConfigAliases is a static 4-entry flat_map<TpuVersionAndVariant, MapView> keyed by TpuVersion {2,3,4,5} (all variant "default"); v4/v5 share one sub-map; alias vocab default/legacy/megacore/megachip/megachip_tccontrolper-TpuVersion split is static, not device-probed; only the type-erased MapView key→value direction is residual
issue_latency_cycle_count (VectorIsa field 4)Absent (proto default 0) in every embedded blob — real per-gen issue latency lives in the cost-model Performance grids, not chip_partsA build whose chip_parts populates field 4 — may never exist

GOTCHA — issue_latency_cycle_count is the trap. The field exists in the chip_parts schema but is never populated in this build (it reads as proto default 0 for all gens). A reimplementer must not read MXU/VPU issue latency from chip_parts; the live value is queried through CycleTable::GetCyclesForThroughput (per-gen vtable slot +0x10). The chip_parts zero is not the answer — it is the absence of an answer.


Purpose

The fourth category is edges the wiki traced by name-family agreement — confirming the class→engine mapping and locating the lowering pass — but stopping short of byte-dumping the leaf encoding. These are graded INFERRED or I on their owning pages; they are genuine frontier because the conclusion is trustworthy while the exact leaf is not yet re-verified.

The Intrinsic→ISel Leaf Gaps

The LLVMTPU intrinsic table recovers all 1356 distinct llvm.tpu.* intrinsics two independent ways. Three leaf-level facts remain inferred:

Inferred linkWhat is knownWhat is not yet byte-confirmed
834 stream opsRESOLVED (negative result) — class→engine + lowering located; encoder Encode (0x1eb9b4c0) 11-arm proto-oneof + 4 byte-verified field accessorsthe numeric command is runtime-composed from (form,verb,dtype,memspace) bitfields — there is no static per-leaf command table to dump
890 default-builder opsRECOVERED(#res,#operands) byte-read from the Op<…> trait pack for 1060/1356 (full census tabulated); stream split 6/8/9, DMA 11/12-iova corrections surfacedresult TypeConstraint is a negative result: no per-op register-class predicate at the MLIR layer (all OneTypedResult<mlir::Type> + isCompatibleOuterType) — refinement lives only in the downstream LLVM intrinsic signature
Per-intrinsic IntrPropertiesRECOVERED — the IntrNoMem/IntrArgMemOnly/IntrWillReturn/… bits via set = IntrinsicsToAttributesMap[ID−1] >> 9 (@0x416fb30); all 12 fn-attr sets byte-decoded, census exact (1356), 16-leaf sample byte-verifiedper-leaf set for the 1340 non-sampled IDs not individually transcribed (deterministic one-halfword lookup)

QUIRK — the 834-way stream-op explosion is the encoding — there is no single parameterized llvm.tpu.stream op. The frontier here is not "find the parameterization" (there is none); it is "byte-dump the matcher arm for each of the 834 leaves." That is bounded work, but 834 leaves of it, which is why the closeability is MEDIUM rather than HIGH despite the path being known.

A Byte-Confirmed Counter-Example

Not every intrinsic→ISel link stayed inferred. The SparseCore addrspacecast family (16 ops) was traced end-to-end and corrected a common assumption:

CORRECTION (INTR-2, summarized) — the SparseCore addrspacecast intrinsics do not lower to ISD::ADDRSPACECAST (0xf4) nodes; they survive as LLVM-IR intrinsic calls absorbed by the consuming SC load-store ISel. A whole-.text xref placed every 0xf4 constructor caller in generic LLVM, none in the TPU/SC bands. This is owned by addrspacecast-isel; it is cited here as the worked example of an inferred link that closure upgraded into a correction — the frontier is not only "fill gaps," it is "re-verify and overturn."


Named Open Questions

Purpose

This band was originally five O-graded tasks whose deep pages existed as stubs with no completed raw-findings file. A later recovery pass closed the #1092/#1096/#1171 trio directly from the decompile — their deep pages are now full, and two of them surfaced CORRECTIONs in the process. A subsequent targeted-disassembly batch then closed the P-3-478..482 SparseCore/DMA cluster as well (three recovered byte-exact, one a confirmed non-edge), so all five named questions are now resolved. The table below records all five for the audit trail.

The Five Tasks (all now resolved)

TaskTopicOwning page (current state)What closed / closes it
#1092Structured-sparsity slot encoding (v5+)slot-sparsity-v5plus — RECOVEREDClosed: no dedicated slot; sparsity lives in the packed MXU operand layout
#1096Per-gen NOP canonical templatesnop-canonical — RECOVEREDClosed: predicate kNeverExecute=31 fill + all-ones opcode NOP (CORRECTION NOP-1)
#1171TpuVersion-aware flag-prefix dispatchflag-prefix-dispatch — RECOVEREDClosed: flags registered gen-blind (CORRECTION DISPATCH-1)
P-3-478..482SparseCore / DMA edges (cluster)barrier / cost / sched / isa pages — RECOVEREDClosed: callbacks/factory/overrun byte-traced; the cmem-load/sparsity "edge" is a confirmed non-edge

The P-3-478..482 cluster was a band of related SparseCore and DMA edges feeding completed pages; it is now fully resolved. P-3-478 — the InitializeOnScs lookup-callback target — is byte-traced to call *0x98ExplicitRingRecord/ExplicitAllToAllRingRecord writing (ordinal, next_chip, reorder) (tensorcore-barrier). P-3-480 — the tail of LatencyTable::Create(TpuVersion) — is recovered as a flat ordinal-indexed AnyInvocable registry populated by five static-init TU initializers (cycletable-family). P-3-481 — the SetLatchIndices per-gen overrun handshake — gates on ViperfishTarget::HasMsrOverrunChecks (the only gen returning TRUE), so only v5 indexes the first latch (latch-assignment-overrun). P-3-482 — the supposed cmem-load → sparsity DMA edge — is a confirmed non-edge: the v4 cmem-load emit chain touches no DMA or sparsity path, and sparsity is a disjoint v5+ SparseCore ISA (slot-cmem-load-pf).

NOTE — #1171 is now CLOSED — flag-prefix-dispatch was recovered and its OPEN banner replaced with a full decode (CORRECTION DISPATCH-1: codename-prefixed flags are registered gen-blind; the active gen drives only data/codec selection). The row is retained here as a closed-by-recovery audit entry, the same way the trailing-zstd and walrus items are kept in the graveyard below.


CLOSED-by-Correction (the Graveyard)

A register that only ever grows is a wish-list, not an audit. These items were open in early analysis and are now closed by correction — later evidence overturned the original claim. They are retained, not deleted, so a reader can see the register reflects current truth rather than initial guesses.

Closed itemOriginal claimWhat overturned itOwning correction
Trailing zstd blob~4.1 MB zstd-dictionary blob appended past EOF at 0x20f99bef, decoding to per-codename HW constantsThe offset is ~2.6 MB inside .text; the bytes are an x86-64 mov immediate; no dictionary, no frame, no payloadCORRECTION (ZSTD-01)trailing-zstd-blob
"walrus" pass driverAn IR/compiler pass-pipeline driver named "walrus"Case-insensitive search of name and string tables returns zero occurrences; the real driver is the HLO pass registry, ungated by any "walrus" symbolCORRECTION (GLOSS-1)glossary
Demangle rate (98%)98% of functions carry a demangled name (from the _Z-prefix count)_Z-prefix overshoots: ~48,500 of the 871,370 _Z names fail to demangle or demangle to themselves; the field-backed rate is 822,847 / 884,832 = 93.0%CORRECTION (METH-D1)methodology-deep
Per-gen geometry sourceLane/sublane geometry written by tpu::TpuChipConfig::CreateTpuChipConfig::Create builds only the driver-side layout; geometry is a TpuTopology property from the MXU VectorIsaCORRECTION (PGM-1)per-gen-comparison-matrix

QUIRK — the trailing-zstd closure is the strongest argument for keeping a register at all. The original claim had a file offset, a size, a compression format, and a decode target — it was specific enough to sound recovered. It was wrong on every material fact. The defense was a single cross-check: resolve the anchor offset against the symbol map before carving. Every CLOSED row here is a reminder that a precise-sounding claim with one missing cross-check is exactly where a reconstruction goes wrong.


Cross-References