Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Binary Layout Reference

All addresses on this page apply to libtpu.so build-id 89edbbe81c5b328a958fe628a9f2207d (build tag libtpu_lts_20260413_b_RC00), from the libtpu-0.0.40-cp314 wheel (781,691,048 bytes, 884,832 functions). Other builds will shift every boundary.

Abstract

libtpu.so is a 745 MiB statically-linked monolith: one shared object that absorbs the PJRT plugin entry layer, the full XLA compiler, the MLIR framework with three dozen dialects, the TPU instruction-set codecs, the oneDNN CPU fallback kernels, the profiling and collective subsystems, and the abseil/protobuf/LLVM/gRPC runtime — all link-time-merged into a single .text. A reverse-engineer who single-steps into this binary lands somewhere inside a flat 299.9 MiB instruction region with no module boundaries and no per-library segments. (This build does retain its full .symtab, so a symbol name is usually available — but a name alone does not tell you which of the link-merged subsystems you are in.) This page is the map that answers the only question that matters at that moment: given an address, which subsystem am I in?

The organizing observation is that the link order is not random. Functions from the same translation-unit cluster, and translation units from the same subsystem cluster into broad, contiguous address bands inside .text. The PJRT/runtime entry code sits at the very bottom of .text (the linker placed the exported-symbol translation units first); the MLIR dialect machinery fills the broad middle; the oneDNN CPU kernels form a tight island; and the TPU ISA codecs (asic_sw::deepsea::{gxc,vxc,pxc}) pile up densest in the top quarter. The bands overlap at their edges — a single namespace can have stragglers 200 MiB away from its centroid — so this map describes density centroids and dominant occupants, not hard partitions. Treat a band assignment as "most code here is X," verified by sampling, not as a guarantee that every byte is X.

This is the address-band navigation appendix. It complements but does not duplicate the ELF section table: forensics/elf-anatomy.md owns the authoritative section header dump, and subsystem-map.md owns the subsystem-to-entry-point catalog. This page is the bridge between them — the VA-to-subsystem lookup that turns a raw address into a starting hypothesis.

For navigation, the contract is:

  • The section skeleton: which sections are allocatable code, which are large-model read-only data, and where each begins. The address you hold is meaningless until you know which section it falls in.
  • The .text band partition: the dominant namespace per ~15–30 MiB band, anchored by named exports and by the per-band namespace census.
  • The boundary anchors: named symbols (GetPjrtApi, GetLibtpuSdkApi, TpuExecutor_Init, RTTI typeinfo strings) whose addresses pin band edges so the map can be re-verified against any future build.
Binarylibtpu.so, build-id 89edbbe81c5b328a958fe628a9f2207d, build tag libtpu_lts_20260413_b_RC00
File size781,691,048 B (745.5 MiB), 51 named sections (48 allocatable)
Function count884,832 total · 877,976 inside .text
.text range0x0e63c000 .. 0x21217484 (299.9 MiB)
.lrodata range0x01884a00 .. 0x084931d0 (108.1 MiB, large read-only data)
.rodata range0x084a0000 .. 0x0be8af28 (57.9 MiB)
Lowest code addr0x0e635524 (.init)
Highest code addr0x21217480 (end of .text)
PJRT entry anchorGetPjrtApi @ 0x0e6a83a0 (low .text)
SDK entry anchorGetLibtpuSdkApi @ 0x109028c0 (mid .text)

Section Skeleton

Before any band lookup, place the address in its section. The linker laid the file out in a strict ascending order — read-only data first, then a small ancillary-code cluster, then the giant .text, then the writable trailer. The full 51-row header dump lives in forensics/elf-anatomy.md; the rows below are the ones that bound the address bands.

SectionVA StartVA EndSizeTypeHolds
.rela.dyn0x000091700x01881da024.5 MiBRELADynamic relocations (RELRO)
.lrodata0x01884a000x084931d0108.1 MiBPROGBITS (l)Large read-only data: vtables, typeinfo, jump tables, constant pools
.rodata0x084a00000x0be8af2857.9 MiBPROGBITSStrings, small constants, format tables
protodesc_cold0x0be8af300x0c1bf0b03.2 MiBPROGBITSCold protobuf descriptor tables
.gcc_except_table0x0c1bf0b00x0c2cc6341.0 MiBPROGBITSC++ EH landing-pad tables
.eh_frame_hdr / .eh_frame0x0c2cc6340x0e635524~35 MiBPROGBITSUnwind tables (C++ EH; .eh_frame alone is ~28.7 MiB)
.init / .text.hot0x0e6355240x0e63738e~8 KiBPROGBITS (X)Init stub + 6 hot-promoted functions
google_malloc0x0e6373c00x0e63bab218.2 KiBPROGBITS (X)TCMalloc fast-path (72 functions)
.text0x0e63c0000x21217484299.9 MiBPROGBITS (X)All primary code — the band map below
.text.startup0x212174900x213818e41.4 MiBPROGBITS (X)Static-initializer constructors (2,886 funcs)
.text.unlikely0x213819000x213e9d690.4 MiBPROGBITS (X)Cold/error-path code (2,798 funcs)
google_init_cold0x213e9d800x213efe7124.8 KiBPROGBITS (X)Cold init (125 funcs)
.plt0x213f08300x213f25d07.5 KiBPROGBITS (X)PLT stubs for imported libc/libm/libdl symbols
.data.rel.ro0x215f81a00x22048b3010.3 MiBPROGBITS (W)Relocated read-only: vtable pointers, RTTI graph
.data0x222551c00x224bf7982.4 MiBPROGBITS (W)Initialized globals
.bss0x224c38800x22598c300.8 MiBNOBITSZero-init globals
.ldata / .lbss0x22798c300x2285a180~0.7 MiBPROGBITS / NOBITS (l)Large writable / large zero-init data

NOTE — the l (large) section flag on .lrodata, .ldata, and .lbss is the x86-64 large-code-model marker. These sections are placed outside the ±2 GiB signed-displacement window of .text, so the compiler addresses them with full 64-bit movabs rather than RIP-relative lea. When you see a movabs loading a constant from 0x018xxxxx0x084xxxxx, it is reaching into .lrodata — almost always a vtable, a typeinfo record, or a large dispatch/jump table.

GOTCHA — .text.startup (0x2121...), .text.unlikely, and google_init_cold all sit above the main .text end (0x21217484), not interleaved with it. A static constructor that builds an MLIR dialect registry lives at 0x2122xxxx, ~290 MiB away from the dialect's runtime methods in the 0x010x0x012x band. Do not assume an address near a function is in the same translation-unit cluster if it crosses into the startup/cold sections.


The .text Address Bands

.text is partitioned below into ten bands (B0 a ~15 MiB entry band, B1–B9 ~30 MiB each). Each row gives the band's VA range, the section (always .text here), the dominant namespace/subsystem by function count, the approximate live-function count in the band, and a confidence. The dominant-occupant column is the per-band namespace census: the leading C++ namespace among the functions whose address falls in that band, sampled from the symbol table. "~funcs" is the count of recovered functions whose entry address lands in the band.

BandVA RangeSectionDominant Subsystem / Namespace~Funcs
B0 Runtime / PJRT entry0x0e63c000 .. 0x0f53a2a0.textTPU runtime driver (asic_sw::driver::deepsea) + PJRT/Tpu* exports + MLIR op registration~40,000
B1 MLIR core + SPIR-V0x0f53a2a0 .. 0x11336000.textMLIR framework (mlir::RegisteredOperationName, mlir::spirv), LLVM support~115,000
B2 XLA / SparseCore + MLIR0x11336000 .. 0x14030fc0.textMLIR op machinery + xla::tpu::sparse_core, mlir::linalg~120,000
B3 MLIR dialects (dense)0x14030fc0 .. 0x15e2d500.textmlir::RegisteredOperationName peak, mlir::stablehlo, Eigen tensor evaluators~92,000
B4 MLIR linalg/bufferization0x15e2d500 .. 0x17c29a40.textmlir::linalg, mlir::bufferization, mlir::stablehlo, Eigen~99,000
B5 LLVM backend + MLIR0x17c29a40 .. 0x19a25f80.textLLVM (llvm, llvm::cl, SelectionDAG), residual MLIR interfaces~65,000
B6 oneDNN CPU kernels0x19a25f80 .. 0x1b8224c0.textdnnl::impl::cpu (CPU fallback JIT/reference kernels)~39,000
B7 STL/variant + driver0x1b8224c0 .. 0x1d61ea00.textstd::__u instantiations, asic_sw::driver::deepsea, residual oneDNN/MLIR~58,000
B8 TPU codecs (vxc/gxc)0x1d61ea00 .. 0x1f41af40.textTPU ISA codecs asic_sw::deepsea::{vxc,gxc,pxc}, xla literals~128,000
B9 TPU codecs (gxc peak)0x1f41af40 .. 0x21217484.textasic_sw::deepsea::gxc density peak, pxc/vxc, protobuf arena~146,000

QUIRK — the namespace asic_sw::deepsea::gxc (the TPU "core-X" ISA codec family, ~60,700 functions in .text) appears across a ~197 MiB code span (0x1391cd40 .. 0x1fe6f7a0), but its density climbs toward the top of .text: ~13.7k functions land in B8 and ~46k in B9. A reimplementer who keys "codec land starts at X" off the first gxc function will be ~150 MiB too low. Use the density centroid (high .text, B8–B9), not the first occurrence.

NOTE — the codec namespaces (gxc/vxc/pxc) cluster ascending with a density peak in the top quarter of .text (~0x1d6xxxxx0x21217484); B8–B9 are the codec bands. The lower codec stragglers begin around 0x14xxxxxx, far below where the cluster actually sits — key off the per-band census, not the first occurrence.


Band Detail

Each band below names the anchor symbols that pin its edges and the most reliable landmarks for orientation. Anchors are real exported or RTTI symbols whose addresses survive in the .symtab; they are the re-verification points for checking this map against a future build.

B0 — Runtime / PJRT Entry (0x0e63c0000x0f53a2a0)

The bottom of .text. The linker front-loads the translation units that carry the library's exported Tpu* C API and the PJRT plugin surface, so this is where every externally-callable entry point lives. It is also where the asic_sw::driver::deepsea runtime/driver namespace is densest (~18,800 functions in the band) and where MLIR operation-registration constructors begin.

Anchor SymbolAddressRole
GetPjrtApi0x0e6a83a0PJRT plugin vtable accessor — the canonical entry point
ConfigureDistributedTpuOp_DoWork0x0e8cd400Distributed-TPU configuration op
TpuExecutor_Init0x0eab90c0TPU executor lifecycle
TpuCompiler_New0x0eabc4a0TPU compiler factory

If you land between 0x0e6a0000 and 0x0eb00000, you are almost certainly in the PJRT/executor lifecycle layer documented in subsystem-map.md. The Tpu* exports are the only undecorated (non-mangled) names in this region, which makes them the fastest visual anchors in a disassembly listing.

B1–B5 — The MLIR / XLA / LLVM Middle (0x0f53a2a00x19a25f80)

The broad central mass of .text, ~60% of all code. It is dominated by the MLIR framework: mlir::RegisteredOperationName alone spans 0x0ea2c1200x1d8c90c0 and accounts for ~130,000 functions — the single largest namespace in the binary. These are the per-operation registration, verification, folding, and interface-dispatch methods generated by MLIR's ODS (Operation Definition Specification) for each of the ~36 dialects (spirv, linalg, stablehlo, tosa, vector, bufferization, affine, and the TPU-specific tpu, llo, sparse_core dialects — see the namespace census in forensics/llvm-mlir-manifest.md).

Interleaved with MLIR are the XLA compiler proper (xla::jellyfish, xla::tpu::sparse_core, xla::HloEvaluator), the LLVM backend used for host/CPU codegen (llvm, llvm::SelectionDAG, llvm::cl command-line registration), and Eigen tensor-evaluator template instantiations. The sub-band tilt:

Sub-bandLeanAnchor namespace span
B1 0x0f53...0x1133...MLIR core + SPIR-V dialectmlir::spirv 0x1126faa0+
B2 0x1133...0x1403...XLA SparseCore + MLIRxla::tpu::sparse_core 0x0f858f20+
B3 0x1403...0x15e2...MLIR op-name peak + stableHLOmlir::stablehlo 0x0eba7a60+
B4 0x15e2...0x17c2...MLIR linalg + bufferizationmlir::linalg 0x10a7f2e0+
B5 0x17c2...0x19a2...LLVM backend + residual MLIRllvm cluster

NOTE — the xla::megascale (collectives), xla::HloEvaluator, and tensorflow namespaces are scattered across the entire middle rather than banded — their function spans run the full width of .text (e.g. tensorflow spans 0x0e63d5c00x20cccd80). Do not expect a "collectives band"; collective and profiling code is sprinkled by translation-unit adjacency, not gathered. Use the subsystem-map.md entry-point table for these, not an address range.

B6 — oneDNN CPU Kernels (0x19a25f800x1b8224c0)

A genuinely tight island. The dnnl::impl::cpu namespace clusters at 0x1a3cb2a00x1bf83d40 and dominates this band (~16,000 functions in B6, ~28,000 across its span). This is the Intel oneDNN (DNNL) CPU backend — the reference and JIT-generated convolution/matmul/reorder kernels XLA falls back to for host-side execution. Because oneDNN is a self-contained third-party library with little cross-call into the rest of libtpu, its translation units linked contiguously, producing the cleanest band boundary in the binary.

QUIRK — the dense dnnl::impl::cpu cluster makes B6 the easiest band to recognize by content alone: heavy use of AVX-512 register-blocked inner loops, std::__u::__function trampolines (~6,200 in-band, the oneDNN kernel-dispatch closures), and almost no mlir/xla symbols. If a disassembly window is wall-to-wall vectorized GEMM micro-kernels, you are in B6.

B7 — STL / Variant Glue + Driver (0x1b8224c00x1d61ea00)

A transitional band with no single dominant subsystem. It is led by std::__u STL template instantiations (std::variant visitors, std::function targets) and a second concentration of asic_sw::driver::deepsea driver code, with residual oneDNN and MLIR spillover. This band is where the binary transitions from "framework/library" code to "TPU-specific codec" code, and its mixed census reflects that. Confidence is MEDIUM: the leading occupant (std::__u, ~5,600) is glue, not a subsystem, so address-to-subsystem inference here is weaker than in the clean bands.

B8–B9 — TPU ISA Codecs (0x1d61ea000x21217484)

The top quarter of .text and the heart of the TPU-specific machinery: the instruction-set codecs that encode/decode TPU bundles for the three core families — vxc (vector core), gxc (general/grid core), and pxc (processing core). These are the asic_sw::deepsea::{vxc,gxc,pxc} namespaces, and they are densest here: B8 holds ~20k vxc + ~14k gxc + ~1.5k pxc; B9 holds ~46k gxc (its peak) + ~4.9k pxc + ~4.4k vxc. The codec class hierarchy is template-heavy — SparseCoreTecCodecBase, TensorCoreCodecBase, and the platforms_deepsea::jellyfish::isa::{Encoder,Decoder}Base templates — which is why function counts explode here: every codec instantiation generates a full set of encode/decode/validate methods.

Anchor (RTTI typeinfo string, .rodata)AddressPins
asic_sw::deepsea::vxc::isa::TensorCoreCodecBase<…>0x0406f1d8vxc TensorCore codec family
asic_sw::deepsea::gxc::gfc::isa::TensorCoreCodecBase<…>0x0406fd20gxc TensorCore codec family
platforms_deepsea::jellyfish::isa::EncoderBase<…>0x044fdc6eISA encoder base template
platforms_deepsea::jellyfish::isa::DecoderBase<…>0x044fe6b9ISA decoder base template
tpu::TpuCompactionIsaEmitterCodegen0x0abe3e38ISA-emitter codegen RTTI

NOTE — the RTTI typeinfo strings above live in .rodata/.lrodata (low addresses, 0x04xxxxxx0x0axxxxxx), not in .text. They are the names of classes whose methods execute in B8–B9. The .data.rel.ro vtable pointer arrays (0x215f81a0+) wire the two together; see forensics/rtti-vtable-census.md for the full RTTI graph. Use the typeinfo strings to confirm a band's identity, but expect the executing code 200+ MiB higher.


Cross-Section Bands

A few subsystems span sections rather than living inside .text. They are listed here so an address outside .text still resolves.

RegionVA RangeSection(s)Subsystem
Vtable / typeinfo pool0x01884a000x084931d0.lrodataRTTI typeinfo strings, vtable bodies, jump/dispatch tables (large code model)
String / constant pool0x084a00000x0be8af28.rodataLog messages, op-name strings, flag tables, format strings
Protobuf descriptors0x0be8af300x0c1bf0b0protodesc_coldCold protobuf reflection metadata
Unwind tables0x0c2cc6340x0e635524.eh_frame*C++ exception unwind (~35 MiB)
Static constructors0x212174900x213818e4.text.startup2,886 init functions (dialect/flag/proto registration)
Cold / error paths0x213819000x213efe71.text.unlikely, google_init_cold2,923 cold-promoted functions
Relocated RO data0x215f81a00x22048b30.data.rel.roVtable pointer arrays, RTTI base-class lists, the RTTI graph backbone

GOTCHA — the second ELF in the wheel, sdk.so (21.5 MiB, the Python sdk module), is a separate shared object with its own independent VA space. Do not confuse sdk.so's addresses with libtpu.so's — they overlap numerically but mean nothing across the boundary. Note the naming trap: the GetLibtpuSdkApi C-ABI export lives inside libtpu.so (at 0x109028c0), not in sdk.sosdk.so is a Python module that neither exports nor imports it. The two-binary split is documented in forensics/two-binary-split.md. All addresses on this page are libtpu.so only.


Re-Verification Recipe

To re-derive this map against a different libtpu.so build, the procedure that produced it:

1. readelf -SW libtpu.so
     -> .text VA start/end, .lrodata/.rodata bounds, large-section flags
2. Histogram recovered function entry addresses into N equal .text sub-bands
     -> raw function-density profile (the ~Funcs column)
3. For each sub-band, census the leading demangled C++ namespace token
     -> dominant-subsystem column
4. Pin band edges with named anchors:
     readelf -sW libtpu.so | grep -E 'GetPjrtApi|GetLibtpuSdkApi|TpuExecutor_Init'
     readelf -sW libtpu.so | grep -iE 'CodecBase|EncoderBase|DecoderBase|IsaEmitter'
5. Cross-check density centroids vs first/last occurrence of each namespace
     -> distinguishes the cluster (centroid) from stragglers (span edges)

The invariants that should survive across builds (even as absolute addresses shift): PJRT/runtime is lowest, MLIR fills the middle, oneDNN is a tight island near the lower-third, and the TPU codecs cluster highest. The absolute boundaries on this page are build-specific; the ordering is structural and follows the link order of the subsystems.


Cross-References

  • ELF Anatomy — owns the authoritative 51-row section header table; this page's section skeleton is the navigation subset
  • Forensics Overview — the binary-shape orientation: size, symbol counts, stripped state
  • Custom Sectionsprotodesc_cold, google_malloc, google_init_cold, linkarr_upb_AllExts and the other non-standard sections
  • Subsystem Map — owns the subsystem-to-entry-point catalog; use it for collectives/profiling code that is scattered rather than banded
  • LLVM/MLIR Manifest — the dialect census that fills bands B1–B5
  • Embedded Library Atlas — the third-party libraries (oneDNN, abseil, protobuf, gRPC) statically linked into .text
  • RTTI / Vtable Census — the .data.rel.ro vtable graph that wires .lrodata typeinfo to B8–B9 codec code
  • Two-Binary Split — why sdk.so is a separate VA space from libtpu.so
  • Symbol Namespace Index — the full namespace-to-count table underlying the per-band census