Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Methodology

Every fact in this book is recovered by static reverse engineering of libtpu.so from the freely-redistributed libtpu-0.0.40-cp314 PyPI wheel (manylinux_2_31_x86_64): a 781,691,048-byte ELF64 shared object, build-id 89edbbe81c5b328a958fe628a9f2207d. The build-id is the unambiguous pin; the wheel's package metadata reports version 0.0.40. Another wheel will differ in every address.

Abstract

This page describes how the entire reconstruction was performed: where the binary came from, what tool read it, what the analysis emitted, how each claim in the rest of the book was cross-checked before it was written down, what could not be recovered and why, how a reader could repeat the work, and the legal basis for doing it. It is the process counterpart to Evidence & Citation Conventions, which defines the callout vocabulary and citation grammar every other page applies.

The whole book derives from one act repeated across nearly a million functions: load a single shared object into a disassembler, let it recover code and data, decompile each function to C, and serialize everything — disassembly, decompiled bodies, control-flow graphs, type information, cross-references — into machine-readable sidecars. No source tree, no debugger, no running TPU, and no internal artifact entered the process. The libtpu.so shipped in this wheel is not stripped: its .symtab survives, so the C++-looking identifiers throughout the book are demangled symbols read out of the binary's own symbol table, not reconstructions. That single fact is what makes a 745 MB object tractable — every function arrives pre-named, and the analyst's job is to recover behavior, not names.

The method is adversarial by design. A symbol name is a hypothesis, not a fact; a function called ValidateLength is treated as un-validated until its decompiled body shows the length compare. Every headline claim is re-checked against the decompiled C or the raw bytes, and a conclusion is trusted only when several independent indicators — body, callers, referenced strings, dispatch-table position — agree. The pipeline below is the machinery; the cross-validation discipline is what turns its output into a reference rather than a transcript.

For reproducing the methodology, the contract is:

  • The acquisition path — the exact wheel, what it contains, and why redistribution makes it lawful to analyze.
  • The tool and its passes — IDA Pro 9.x auto-analysis, the Hex-Rays decompiler, and FLIRT library-signature matching, and which of these actually fired on this binary.
  • The sidecar family — the JSON artifacts that hold the recovered facts, with verified counts, so a reimplementer knows what evidence backs the book.
  • The cross-validation rule — multiple independent indicators before a claim is trusted.
  • The published floor — the decompilation failures, analysis problems, and firmware walls that bound what any page can assert.
Source artifactlibtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64.whl (PyPI, freely redistributable)
Analyzed binarylibtpu/libtpu.so — 781,691,048 bytes, ELF64 DYN, x86-64
Build-id89edbbe81c5b328a958fe628a9f2207d (NT_GNU_BUILD_ID, md5/uuid form)
Stripped?No.symtab present; most functions carry a demangled C++ name
ToolIDA Pro 9.x — auto-analysis + Hex-Rays decompiler + FLIRT
Recovered functions884,832 (per binary metadata)
Recovered strings1,249,324
Switch tables33,016
Per-function artifact files884,843 each in context/, decompiled/, disasm/, graphs/
Published limits516 decompilation failures · 7,915 analysis problems

The Pipeline at a Glance

Five stages take the wheel to a wiki page. Each stage's output is the next stage's input; the discipline lives in the fourth.

StageInputActionOutput
AcquirePyPI wheel nameDownload, unzip, locate libtpu.so, record build-id + hashesA pinned 781,691,048-byte ELF
AnalyzeThe ELFIDA 9.x auto-analysis: code/data recovery, function boundaries, xrefs, type propagationAn IDB with 884,832 functions
ExtractThe IDBHex-Rays decompile + serialize every function and table to JSON sidecars and per-function filesThe sidecar family + 884,843×4 per-function artifacts
Cross-validateSidecars + bodiesFor each claim, require the decompiled body or raw bytes to support it; require multiple agreeing indicatorsA claim with an address anchor
WriteValidated claimsSynthesize into a reimplementation-grade page; mark gapsA wiki page

NOTE — the first three stages are mechanical and reproduce byte-for-byte from the same wheel and the same IDA version. The last two are analytical judgment.


Acquisition

The analyzed object is not a leaked or extracted internal build. It is the libtpu.so inside the published Python wheel libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64.whl, the same artifact pip install libtpu retrieves to provide the TPU PJRT plugin. A wheel is an ordinary ZIP archive; unzipping it yields a libtpu/ package directory whose payload is the shared object this book describes.

The wheel directory contains exactly six payload files, all of which are part of the analysis surface:

FileSizeRole
libtpu.so781,691,048 BThe PJRT plugin — the primary analysis target, the subject of this book
sdk.so22,541,240 BA companion shared object, analyzed in parallel (sibling extraction)
__init__.py1,131 BPython entry shim that locates and loads the plugin
LICENSE229 BThe wheel's license file
THIRD_PARTY_NOTICES.txt731,537 BBundled third-party attributions — confirms the open-source components statically linked in
SDK_THIRD_PARTY_NOTICES.txt103,306 BThe same for sdk.so

The identity of the analyzed file is pinned three ways so any reader can confirm they hold the same bytes: the size (781,691,048), the GNU build-id 89edbbe81c5b328a958fe628a9f2207d read from the NT_GNU_BUILD_ID note, and the file's SHA-256. readelf -h reports an ELF64 DYN (shared object) for Advanced Micro Devices X86-64, and readelf -S shows a live .symtab — the basis for the un-stripped claim.

GOTCHA — the build-id is in the kernel's md5/uuid form, not the more common SHA-1 form; it is a 16-byte digest, and file reports it as BuildID[md5/uuid]. Do not expect a 40-hex-character SHA-1 build-id here. The 32-hex-character value above is correct.

NOTE — libtpu.so retains its full .symtab and is reported by file as not stripped. Every named function in the book is therefore a demangled symbol read directly from the binary, not a reconstruction — which is why recovering names was never the bottleneck; recovering behavior was.


Tooling & Extraction

The analysis tool

A single tool produced every primitive fact: IDA Pro 9.x, run in three capacities.

  • Auto-analysis walks the ELF, recovers the section/segment layout, identifies function boundaries, follows the control flow, builds the global cross-reference database, and propagates types through the recovered code. On this binary it resolved 884,832 functions and 33,016 switch/jump tables.
  • The Hex-Rays decompiler lifts each function from x86-64 to a C-like pseudocode body. These bodies — not the raw disassembly — are the primary evidence for behavioral claims, because a switch or a guard compare is legible in C in a way it is not in a screen of mov/cmp/jne.
  • FLIRT (Fast Library Identification and Recognition Technology) matches byte-pattern signatures of known library routines, so a statically-linked memcpy or libstdc++ helper is labeled as such instead of re-analyzed from scratch.

NOTE — FLIRT contributed no recognized matches on this binary's primary extraction pass (the metadata records flirt_matches: 0). That is expected, not a defect: the binary is already richly symbolized by its surviving .symtab, so library routines arrive pre-named and FLIRT has nothing left to add.

The sidecar family

Auto-analysis and decompilation are not the deliverable — their serialized output is. Every function and structured table is exported to a family of JSON sidecars, plus four per-function artifact files (decompiled C, raw disassembly, a context bundle, and a control-flow graph). The book is written against these sidecars, never against a live IDA session, so that every page is reproducible from files on disk.

The sidecar family, with verified presence and the count or size each carries:

SidecarHoldsVerified scope
functionsOne record per recovered function (address, size, name)884,832 records
namesThe name/symbol table surface~847 MB
stringsEvery recovered string literal1,249,324 strings
segmentsELF segment / section layout55 segments
data_tablesRecovered static data tables~114 MB
switchesJump/switch dispatch tables33,016 tables
rttiC++ RTTI: type-info, vtables, class hierarchy~65 MB
fixupsRelocations / address fixups~120 MB
xrefsThe global cross-reference graph (code + data edges)~41 GB
enumsRecovered enumeration typespresent (small)
structuresRecovered struct/class layoutspresent (~290 KB)
framesPer-function stack-frame layouts~745 MB
entriesExported / entry-point symbolspresent
importsImported / external symbolspresent
prototypesExternally-supplied prototypes (empty here)empty []
metadataThe extraction manifest (counts, hashes, mode)present
problemsIDA-flagged analysis problems7,915 records
per-functiondecompiled/*.c, disasm/*, context/*, graphs/*884,843 files each

NOTE — the per-function artifact count (884,843) is slightly higher than the function-record count in the metadata (884,832). The directory count includes a small number of thunk/alias/data-stub entries that receive an artifact file without being counted as a full function record. When a page cites a function count, it cites 884,832; when it cites artifact coverage, the per-function directories hold one file per analyzed entry.

NOTE — the extraction manifest for the primary pass records decompiled: 0 and ctree: 0. This is an artifact of a two-phase extraction, not a claim that nothing was decompiled. The first pass ran in a fast mode (boundaries, names, tables, xrefs) and deliberately deferred Hex-Rays; the decompiled bodies and control-flow trees were produced by subsequent split passes, yielding the 884,843 per-function decompiled/*.c files and control-flow-tree coverage over 884,332 of them (a 511-function gap). A reader who only inspects the manifest's decompiled counter will draw the wrong conclusion — the bodies exist on disk.


Cross-Validation Discipline

A symbolized binary is seductive: a name like xla::tpu::sparse_core::lowering_util::GetPadValue reads like documentation. It is not. The name records what the original author called the function, which is a strong lead but not a verified behavior, and demangled C++ names routinely outlive the code that justified them. Every claim in the book is therefore re-checked against evidence one level more direct than the name.

The strongest evidence is the decompiled body (or a byte-exact table in the binary) literally containing the claimed construct — the switch, the guard compare, the constant, the vtable slot. Where no single line states a role, the conclusion rests on several independent indicators that agree: the function's callers, the strings it references, its position in a dispatch table, its frame shape. A single uncorroborated indicator — one suggestive string, one xref, a name implying unseen behavior — is recorded as a lead, not a foundation.

The cross-checks draw on different sidecars on purpose, so that the indicators are genuinely independent:

  • Body vs. name — the decompiled/*.c file is read; the name is confirmed or set aside.
  • Callers vs. role — the xrefs graph shows who calls the function and with what; a "validator" with no length-shaped caller is suspect.
  • Strings vs. behavior — a function that references "request too large" corroborates a rejection path; the strings and data_tables sidecars supply these.
  • Dispatch position vs. purpose — the switches and rtti sidecars place a function in a table or a vtable, which constrains what it can be.
  • Raw bytes vs. decompiler — where Hex-Rays is uncertain (it flags this with its own warnings), the disasm/* file and the raw bytes are the tiebreaker.

Where a value is inferred from structure rather than read directly off the binary, a > **NOTE —** or > **GOTCHA —** callout flags exactly which step is inference. This is the book's core honesty mechanism: a reader can always tell how directly a stated value is backed.

QUIRK — the heaviest cross-validation burden falls on the most symbolized functions, not the least. A 600-character demangled C++ template name (the pxc::mnemonics::ProtoToEnvMiscGenerated... family, for instance) is so specific that it feels authoritative, yet these template-heavy functions are precisely the ones where IDA's analysis stumbles (see below). The richest name and the weakest analysis often coincide; the discipline exists to catch exactly that trap.


Limits — What Could Not Be Recovered

The credibility of every page rests on these limits being published, not hidden. Static analysis of one binary has a hard floor, and the floor has three distinct causes.

Decompilation failures

Hex-Rays returned no cfunc for 516 functions: the decompiler ran, refused, and the function exists only as disassembly. These are not random. They cluster in:

  • Template-explosion code — deeply nested C++ templates (variant-of-strong-int dispatch in the mnemonics layer, addOperations<...> registering hundreds of MLIR ops in one call) whose recovered control flow exceeds what the decompiler will lift.
  • Imported PLT/GOT stubs — entries like strlen, getenv, __tls_get_addr, MallocExtension_Internal_* that are import thunks, not local code, and have no body to decompile.
  • Hand-written assembly — cryptographic and math routines from statically-linked libraries (bn_sqr8x_mont, bn_power5_nohw, md5_sha1_final) that are assembly with no C to recover.

For these 516, the book relies on the disassembly, the surrounding xrefs, and the name, and a page touching one says so.

Analysis problems

IDA's auto-analysis flagged 7,915 problems during the disassembly pass, recorded in the problems sidecar. They fall into a few types: disasm_problem (a byte sequence the disassembler could not confidently decode), bad_stack (an unbalanced or unrecoverable stack frame), head_problem (an instruction-boundary ambiguity), and final (a problem persisting to the end of analysis). Like the decompilation failures, they concentrate in template-heavy compiler code (primitive_util::*TypeSwitch, AlgebraicSimplifierVisitor) and in statically-linked third-party assemblers (X86AsmParser, AArch64AsmParser, PPCAsmParser::matchAndEmitInstruction). A function carrying a bad_stack flag has un-trustworthy local-variable recovery, and any page touching it says so.

What static analysis structurally cannot see

Beyond the per-function failures, three categories are invisible in principle to static analysis of this one file:

  • Firmware blobs — payloads destined for the TPU hardware are embedded data, not x86 code. The disassembler sees bytes, not instructions; these are walls, documented as opaque regions rather than guessed at.
  • Runtime-only values — anything computed at load or run time (resolved relocations after dlopen, environment-driven configuration, values arriving over ICI/network) has no static representation. The book documents the code path that consumes such a value, never the value.
  • The companion sdk.so — analyzed in a sibling pass (94,732 functions, of which 94,292 decompiled), it is a smaller and largely independent object; this book's primary subject is libtpu.so, and sdk.so evidence is cited only where the two demonstrably interact.

NOTE — the limits are bounded and small relative to the whole: 516 decompilation failures out of 884,832 functions is well under 0.1%, and 7,915 analysis problems span a binary with millions of instructions. The floor is published not because it is large but because a reverse-engineering reference that hid it would be untrustworthy on every page above it.


Reproduction

The mechanical stages reproduce exactly; the analytical stages reproduce to the same evidence, with judgment driving the conclusions. A reader who wants to re-derive any claim follows this path:

# 1. Acquire the identical artifact from PyPI.
pip download libtpu==0.0.40 --no-deps --python-version 3.14 \
    --only-binary=:all: --platform manylinux_2_31_x86_64

# 2. A wheel is a ZIP. Unpack and locate the plugin.
unzip libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64.whl -d libtpu_wheel
ls libtpu_wheel/libtpu/libtpu.so

# 3. Confirm you hold the same bytes (size + build-id must match the pin).
stat -c%s libtpu_wheel/libtpu/libtpu.so      # -> 781691048
readelf -n libtpu_wheel/libtpu/libtpu.so | grep -i 'build id'
                                             # -> 89edbbe81c5b328a958fe628a9f2207d

From there, the analysis is: load libtpu.so into IDA Pro 9.x, run full auto-analysis, and decompile. The per-function evidence the book cites is the decompiled C body at a given virtual address — open the address in Hex-Rays (or read the corresponding decompiled/<addr>.c artifact) and the claim is verifiable line-for-line. The address anchor in every citation is the entry point for that reconstruction.

GOTCHA — reproduction fidelity depends on the IDA version. Auto-analysis heuristics, function-boundary recovery, and Hex-Rays output all shift between major IDA releases, so a different version may recover a slightly different function count or decompile a body this book lists as a failure (or vice-versa). Pin IDA 9.x to match the function and switch-table counts cited here. The binary's bytes are invariant; the analysis of them is not.


This work is reverse engineering of lawfully-obtained, publicly-distributed software for the purposes of interoperability and research. The binary was downloaded from PyPI under the terms that make it freely redistributable; no access control was circumvented, and no copyrighted source code, internal documentation, or other restricted material was used or consulted at any point. Every finding derives solely from analysis of the compiled binary that anyone may download.

The activity rests on well-established law on both sides of the Atlantic:

AuthorityJurisdictionWhat it establishes
DMCA §1201(f)USReverse engineering for interoperability is a statutory exemption to the anti-circumvention rules.
Sega v. Accolade (9th Cir. 1992)USDisassembly to access unprotectable functional elements is fair use when it is the only means to that end.
Sony v. Connectix (9th Cir. 2000)USIntermediate copying during reverse engineering to build an interoperable product is fair use.
EU Directive 2009/24/EC, Art. 6EUDecompilation is permitted to achieve interoperability, without rightsholder authorization, within stated bounds.

The book documents what the binary does — its functional behavior, data layouts, and decision logic — which are the unprotectable ideas and methods of operation, not the protected expression of any source text. The provenance discipline stated throughout (static analysis of the binary only) is precisely what keeps the work inside these boundaries: no protected source was copied, because none was ever in hand.


Cross-References

  • Evidence & Citation Conventions — the callout vocabulary and citation grammar this page's process produces; read it before any other page.
  • Forensics Overview — the structural starting point for the binary itself: sections, sizes, and headline counts confirmed directly against the bytes.
  • ELF Anatomy — the segment/section layout the segments sidecar records, read at the readelf level.
  • Dispatch-Table Taxonomy — how the 33,016 switch tables and the RTTI/vtable graph are read, the structured-evidence backbone of the behavioral pages.