Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ELF Anatomy

All offsets, virtual addresses, and counts on this page apply to libtpu.so from the libtpu-0.0.40 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d, build tag libtpu_lts_20260413_b_RC00). The wheel, its dist-info/METADATA, and libtpu/__init__.py all report wheel version 0.0.40; the build-id is the only unambiguous anchor and is the field every page pins to. Other builds will differ in every address.

Abstract

libtpu.so is a 745 MiB (781,691,048-byte) position-independent ELF64 shared object — ET_DYN, EM_X86_64, little-endian, System V ABI. It is the entire Google TPU PJRT plugin statically linked into one file: the XLA compiler, the MLIR/LLVM backend, the TPU runtime, gRPC, protobuf, Abseil, Eigen, and the device driver are all here. It carries no DT_SONAME, no RPATH/RUNPATH, no separate debug objects, and depends on only six bare system libraries. Despite its size it is a thoroughly ordinary clang/LLVM-built .so, just scaled up by three orders of magnitude.

The defining structural fact is that this object was built with the x86-64 large code model. That model splits read-only and BSS data into two tiers: the ordinary small-model .rodata/.bss reachable with 32-bit %rip-relative displacements, and the large-model .lrodata/.lbss/.ldata (the l flag, SHF_X86_64_LARGE) reached through 64-bit absolute addressing because their distance from .text exceeds ±2 GiB. .lrodata alone is 108 MiB. This is why the singleton tables that the runtime keys off — the PJRT API vtable among them — live in .lbss near vaddr 0x227ba840 rather than in .bss. A reimplementer who assumes a single .rodata/.bss pair will mis-map a third of the constant data.

This page is the authoritative section/segment/dynamic map of the binary. Every address cited anywhere else in this wiki sits inside one of the regions tabulated below; this page establishes which section owns which vaddr range, which PT_LOAD carries which permissions, and what the dynamic linker is told at load time. The deep dives — who the .init_array constructors are, and what the unusual google_*/__rseq_cs/linkarr_* sections hold — belong to the sibling pages noted at each table.

For reimplementation, the contract is:

  • The four-segment load image — one R E text/rodata segment followed by three RW data segments, and the page-shift rule that breaks file-offset ≡ vaddr after the first segment.
  • The large-code-model section split.rodata/.bss/.data (small) versus .lrodata/.lbss/.ldata (large, l-flagged), and why the runtime's singletons land in the large tier.
  • The dynamic contract — six DT_NEEDED, GNU-hash-only lookup, .rela.dyn/.rela.plt split with a million-plus R_X86_64_RELATIVE relocations, and the init/fini/preinit array anchors.
File size781,691,048 bytes (~745 MiB)
Class / data / typeELF64 · LSB (2's complement, little-endian) · ET_DYN
Machine / OS-ABIEM_X86_64 · UNIX System V (ABI v0)
Entry point0x0 (none — pure shared library)
Program headers11 entries, 56 bytes each, at file offset 64
Section headers52 entries, 64 bytes each, at file offset 0x2e979ba8
.shstrtab indexsection [50]
Build-id (NT_GNU_BUILD_ID)89edbbe81c5b328a958fe628a9f2207d (16 bytes)
DT_SONAMEnone
Code modelx86-64 large (.lrodata/.lbss/.ldata present)
.text extent0xe63c0000x21217484, 0x12bdb484 bytes (~299 MiB)

ELF Header

The header is readelf -h verbatim. Every downstream address on this page and elsewhere in the wiki is a virtual address in the address space this header describes.

Class:                             ELF64
Data:                              2's complement, little endian
Type:                              DYN (Shared object file)
Machine:                           Advanced Micro Devices X86-64
Entry point address:               0x0
Start of program headers:          64 (bytes into file)
Start of section headers:          781687720 (bytes into file)   # 0x2e979ba8
Size of program headers:           56  ·  Number of program headers:   11
Size of section headers:           64  ·  Number of section headers:   52
Section header string table index: 50

Three header facts shape everything below.

  • Entry point = 0x0. This is a library, not an executable; there is no _start. Execution enters through the DT_INIT/DT_INIT_ARRAY machinery (the loader-driven constructor chain) and through exported entry symbols like GetPjrtApi. The entry-into-init mechanics are owned by lifecycle/elf-entry-and-init-proc.md.
  • The section header table sits at the very end of the file (offset 0x2e979ba8, ~745 MiB in), after the colossal .symtab/.strtab. The section headers, the symbol table, and the string tables are not part of any PT_LOAD; they exist on disk but are never mapped at runtime.
  • 52 sections including the NULL section [0]. Tools that enumerate "real" sections report 51; both counts describe the same table.

NOTE — the binary is not_stripped in the .symtab sense: a full .symtab of 1,233,710 entries (size 0x1c3cc50 ÷ 24) plus an 0xab824de-byte (~172 MiB) .strtab is present on disk. These give every internal function a name for static analysis, but they are non-SHF_ALLOC and contribute nothing to the runtime image. The runtime sees only the 741-entry .dynsym.

ELF class / endianness / typeELF64 · LSB · ET_DYN
MachineEM_X86_64 (Advanced Micro Devices X86-64)
Entry point0x0 (no _start)
PHDR count / size11 × 56 bytes at offset 0x40
SHDR count / size52 × 64 bytes at offset 0x2e979ba8
.shstrtab index[50]

Segment Table (Program Headers)

The runtime image is four PT_LOAD segments: one read-execute, three read-write. The remaining seven program headers are metadata views into the same bytes (PHDR, TLS, DYNAMIC, GNU_RELRO, GNU_EH_FRAME, GNU_STACK, NOTE). All alignments are 2 MiB (0x200000) — the build targets huge pages.

Type           Offset     VirtAddr     FileSiz    MemSiz     Flg  Align
PHDR           0x000040   0x00000040   0x000268   0x000268   R    0x8
LOAD  [1]      0x000000   0x00000000   0x213f25d0 0x213f25d0 R E  0x200000
LOAD  [2]      0x213f25e0 0x215f25e0   0xa62bc0   0xa63a20   RW   0x200000
LOAD  [3]      0x21e551c0 0x222551c0   0x26e6a0   0x343a70   RW   0x200000
LOAD  [4]      0x22198c30 0x22798c30   0x021c00   0x0c6650   RW   0x200000
TLS            0x213f25e0 0x215f25e0   0x000110   0x000e78   R    0x20
DYNAMIC        0x21e48b40 0x22048b40   0x000210   0x000210   RW   0x8
GNU_RELRO      0x213f25e0 0x215f25e0   0xa62bc0   0xa63a20   R    0x1
GNU_EH_FRAME   0x0c2cc634 0x0c2cc634   0x6bd684   0x6bd684   R    0x4
GNU_STACK      0x000000   0x00000000   0x000000   0x000000   RW   0x0
NOTE           0x0002a8   0x000002a8   0x000020   0x000020   R    0x4

The four load segments

SegPermsFile offsetVaddrFileSizMemSizHolds
LOAD 1R E0x00x00x213f25d0 (~534 MiB)samemetadata, .lrodata, .rodata, EH tables, all .text*, .plt
LOAD 2RW0x213f25e00x215f25e00xa62bc0 (~10.4 MiB)0xa63a20TLS template, init/fini/preinit arrays, .data.rel.ro, .dynamic, .got
LOAD 3RW0x21e551c00x222551c00x26e6a0 (~2.5 MiB)0x343a70.data, the google_*/__rseq_cs/linkarr_* sections, .got.plt, .bss
LOAD 4RW0x22198c300x22798c300x021c000x0c6650 (~806 KiB).ldata, .lbss, google_malloc_bss (large-model writable)

GOTCHA — the file-offset ≡ vaddr identity holds only for LOAD 1, where both start at 0. From LOAD 2 onward the linker page-shifts the vaddr ahead of the file offset by a growing 2 MiB step: vaddr − offset is 0x200000 for LOAD 2, 0x400000 for LOAD 3, 0x600000 for LOAD 4. A tool that computes file_offset = vaddr (correct inside .text) will read the wrong bytes for anything in the writable segments — including .dynamic, .got, and the .lbss singletons. Always translate through the owning PT_LOAD, never with a single global delta.

The three writable segments are split rather than merged because the large-code-model writable data (.ldata/.lbss) must stay in its own region: LOAD 4 carries exactly the l-flagged writable sections, kept apart from the small-model .data/.bss of LOAD 3.

The non-LOAD program headers

  • GNU_RELRO covers the entire LOAD-2 byte range (0x215f25e0, 0xa62bc0 bytes). After the dynamic linker applies relocations, this whole 10.4 MiB span — init arrays, .data.rel.ro, .dynamic, and .got — is re-mapped read-only. The .got.plt deliberately sits in LOAD 3, outside RELRO, so lazy PLT binding can still write resolved addresses.
  • TLS points at the LOAD-2 head: filesz 0x110 is the .tdata initialization image, memsz 0xe78 adds .tbss. The thread-local block is tiny (3704 bytes) for a binary this size.
  • DYNAMIC is the 528-byte (0x210) .dynamic array, also addressable as section [34].
  • GNU_EH_FRAME is the .eh_frame_hdr binary-search index at vaddr 0xc2cc634, fronting the 30 MiB .eh_frame.
  • GNU_STACK has zero size and RW (no X): non-executable stack, the modern default.
  • NOTE is the 32-byte build-id note at 0x2a8.

QUIRK — the read-execute LOAD 1 spans 0x00x213f25d0 (~534 MiB) and physically contains far more constant data than code: .lrodata (108 MiB) and .rodata (58 MiB) and the EH tables (~38 MiB unwind) all live inside the executable segment because they are read-only, not because they are executable. .text is the largest tenant (~299 MiB) but the segment is not "the code segment" in the small-binary sense — it is "everything immutable".


Section Table

52 section headers. The table below groups them by role and gives the address, file offset, size, and flags readelf -S reports. Flag key: A alloc, X execute, W write, M mergeable, S strings, T TLS, I info-link, o OS-processing, l SHF_X86_64_LARGE.

Dynamic-linking metadata (LOAD 1 head)

[Nr] NameTypeAddressOffsetSizeFlg
[1] .note.gnu.build-idNOTE0x2a80x2a80x20A
[2] .dynsymDYNSYM0x2c80x2c80x4578A
[3] .gnu.versionVERSYM0x48400x48400x5caA
[4] .gnu.version_dVERDEF0x4e0c0x4e0c0x38A
[5] .gnu.version_rVERNEED0x4e440x4e440x270A
[6] .gnu.hashGNU_HASH0x50b80x50b80x678A
[7] .dynstrSTRTAB0x57300x57300x3a3cA
[8] .rela.dynRELA0x91700x91700x1878c30A
[9] .rela.pltRELA0x1881da00x1881da00x2c58AI

Read-only data (LOAD 1 body)

[Nr] NameTypeAddressSizeFlgHolds
[10] .lrodataPROGBITS0x1884a000x6c0e7d0 (~108 MiB)AMSllarge-model RO constants & merged strings
[11] .rodataPROGBITS0x84a00000x39eaf28 (~58 MiB)AMSosmall-model RO constants, 64 KiB-aligned
[12] protodesc_coldPROGBITS0xbe8af300x334180 (~3.2 MiB)Acold protobuf descriptor tables
[13] .gcc_except_tablePROGBITS0xc1bf0b00x10d584 (~1.0 MiB)ALSDA exception tables
[14] .eh_frame_hdrPROGBITS0xc2cc6340x6bd684 (~6.7 MiB)Aunwind search index
[15] .eh_framePROGBITS0xc989cb80x1cab86c (~28.7 MiB)ADWARF CFI unwind records

Executable code (LOAD 1 tail)

[Nr] NameTypeAddressSizeFlgRole
[16] .initPROGBITS0xe6355240x17AXDT_INIT legacy ctor stub
[17] .finiPROGBITS0xe63553c0x9AXDT_FINI legacy dtor stub
[18] .text.hotPROGBITS0xe6355600x1e2eAXoprofile-hot functions
[19] google_mallocPROGBITS0xe6373c00x46f2AXoTCMalloc fast-path code
[20] .text.splitPROGBITS0xe63bab20x0AXoempty placeholder section
[21] .textPROGBITS0xe63c0000x12bdb484 (~299 MiB)AXthe main code body
[22] .text.startupPROGBITS0x212174900x16a454 (~1.4 MiB)AXconstructor/startup code
[23] .text.unlikelyPROGBITS0x213819000x68469 (~427 KiB)AXcold/unlikely paths
[24] google_init_coldPROGBITS0x213e9d800x60f1AXcold init code
[25] malloc_hookPROGBITS0x213efe800x89eAXallocator hook trampolines
[26] __lcxx_overridePROGBITS0x213f07200x105AXlibc++ operator new/delete overrides
[27] .pltPROGBITS0x213f08300x1da0AXprocedure linkage table (473 slots)

Writable: TLS, init arrays, RELRO data (LOAD 2)

[Nr] NameTypeAddressOffsetSizeFlg
[28] .tdataPROGBITS0x215f25e00x213f25e00x110WAT
[29] .tbssNOBITS0x215f26f00x213f26f00xd68WAT
[30] .init_arrayINIT_ARRAY0x215f26f00x213f26f00x5aa0 (23200 B)WAo
[31] .fini_arrayFINI_ARRAY0x215f81900x213f81900x10WA
[32] .data.rel.roPROGBITS0x215f81a00x213f81a00xa50990 (~10.3 MiB)WA
[33] .preinit_arrayPREINIT_ARRAY0x22048b300x21e48b300x10WA
[34] .dynamicDYNAMIC0x22048b400x21e48b400x210WA
[35] .gotPROGBITS0x22048d500x21e48d500xc450WA
[36] .relro_paddingNOBITS0x220551a00x21e551a00xe60WA

NOTE — .init_array is 0x5aa0 = 23,200 bytes at vaddr 0x215f26f0, exactly matching DT_INIT_ARRAY/DT_INIT_ARRAYSZ below. At 8 bytes per pointer that is 2,900 constructor function pointers — an enormous static-initialization fan-out. The roster of which constructors these are (and the ordering hazards) is owned by forensics/static-init.md; this page only fixes the array's location and extent.

Writable: small-model data and the custom sections (LOAD 3)

[Nr] NameTypeAddressOffsetSizeFlg
[37] .dataPROGBITS0x222551c00x21e551c00x26a5d8 (~2.5 MiB)WA
[38] filewrapper_tocPROGBITS0x224bf7980x220bf7980x1e8WA
[39] __rseq_csPROGBITS0x224bf9800x220bf9800x2260WA
[40] __rseq_cs_ptr_arrayPROGBITS0x224c1be00x220c1be00x898WA
[41] linkarr_upb_AllExtsPROGBITS0x224c24800x220c24800x4a0WAo
[42] pb_defaultsPROGBITS0x224c29200x220c29200x18WA
[43] google_malloc_dataPROGBITS0x224c29380x220c29380x48WA
[44] .got.pltPROGBITS0x224c29800x220c29800xee0WA
[45] .bssNOBITS0x224c38800x220c38600xd53b0 (~853 KiB)WAo

The filewrapper_toc, __rseq_cs/__rseq_cs_ptr_array (restartable-sequences critical-section descriptors for TCMalloc), linkarr_upb_AllExts (upb proto-extension link array), pb_defaults, and google_malloc_data sections are non-standard, linker-script-placed sections. Their internal layout and purpose are the subject of forensics/custom-sections.md; they are listed here only to fix their addresses in the global map.

Writable: large-model data (LOAD 4)

[Nr] NameTypeAddressOffsetSizeFlg
[46] .ldataPROGBITS0x22798c300x22198c300x21c00WAl
[47] .lbssNOBITS0x227ba8400x221ba8300x9f940 (~654 KiB)WAl
[48] google_malloc_bssNOBITS0x2285a1800x221ba8300x5100WAl

QUIRK — the runtime's PJRT API singleton lives in .lbss, near vaddr 0x227ba840 (the section base), not in .bss. Because the binary is large-code-model, any global whose address the compiler could not prove reachable with a 32-bit displacement was forced into the l-flagged large tier. For the dispatcher that hands out the PJRT function-pointer table this means the table is reached by 64-bit absolute load, and a reverse engineer tracing GetPjrtApi must map LOAD 4 (delta 0x600000) to find the backing bytes. See lifecycle/get-pjrt-api-thunk.md.

Non-allocated tail (not in any segment)

[Nr] NameTypeOffsetSize
[49] .symtabSYMTAB0x221ba8300x1c3cc50 (~28.2 MiB, 1,233,710 entries)
[50] .shstrtabSTRTAB0x23df74800x243
[51] .strtabSTRTAB0x23df76c30xab824de (~172 MiB)

These three carry no SHF_ALLOC flag (address 0x0): they exist on disk for static tooling and are never mapped. Together they account for roughly 200 MiB of the file — over a quarter of its on-disk size is symbol names that the loader never touches.


Dynamic Section

The .dynamic array (section [34], PT_DYNAMIC) is 33 entries / 0x210 bytes at vaddr 0x22048b40. It is the complete instruction set the loader executes to wire the library up. readelf -d:

 Tag           Type            Name/Value
 NEEDED         libm.so.6
 NEEDED         libpthread.so.0
 NEEDED         libdl.so.2
 NEEDED         librt.so.1
 NEEDED         libc.so.6
 NEEDED         ld-linux-x86-64.so.2
 RELA           0x9170          RELASZ  25660464 (bytes)   RELAENT 24
 RELACOUNT      1069006
 JMPREL         0x1881da0       PLTRELSZ 11352 (bytes)      PLTREL  RELA
 PLTGOT         0x224c2980
 SYMTAB         0x2c8           SYMENT  24
 STRTAB         0x5730          STRSZ   14908 (bytes)
 GNU_HASH       0x50b8
 PREINIT_ARRAY  0x22048b30      PREINIT_ARRAYSZ 16
 INIT_ARRAY     0x215f26f0      INIT_ARRAYSZ    23200
 FINI_ARRAY     0x215f8190      FINI_ARRAYSZ    16
 INIT           0xe635524
 FINI           0xe63553c
 VERSYM         0x4840
 VERDEF         0x4e0c          VERDEFNUM  2
 VERNEED        0x4e44          VERNEEDNUM 6
 NULL           0x0

What the loader is told

AspectValueImplication
DT_NEEDED × 6libm, libpthread, libdl, librt, libc, ld-linux-x86-64Self-contained: no C++ runtime, no CUDA, no driver .so. libstdc++/libgcc_s are not needed — the C++ runtime is statically linked in.
DT_SONAMEabsentThe object names itself only by path; nothing dlopens it by SONAME.
DT_RPATH/DT_RUNPATHabsentNo embedded search path.
Symbol hashDT_GNU_HASH only (0x50b8); no legacy DT_HASHLoader must support GNU hash; a DT_HASH-only loader cannot resolve symbols here.
DT_INIT / DT_FINI0xe635524 / 0xe63553cLegacy single-function ctor/dtor stubs in .init/.fini.
DT_INIT_ARRAY / size0x215f26f0 / 23,200 B (2,900 ptrs)The real constructor fan-out — see static-init.
DT_FINI_ARRAY / size0x215f8190 / 16 B (2 ptrs)Two destructors only.
DT_PREINIT_ARRAY / size0x22048b30 / 16 B (2 ptrs)Runs before .init_array — rare in libraries.
DT_VERSYM/VERDEF/VERNEED0x4840 / 0x4e0c / 0x4e44Symbol versioning active: 2 version definitions, 6 version-need records.

QUIRK — DT_NEEDED lists only six bare system libraries for a 745 MiB plugin. Everything else — the LLVM/MLIR compiler, XLA, gRPC, protobuf, Abseil, Eigen, the libstdc++ ABI, and the TPU driver — is statically archived into this one file. The dependency surface is deliberately minimal so the plugin drops into any manylinux_2_31 host without dragging in a transitive .so graph. A reimplementer who expects to find these as shared dependencies will find them as .text instead.

NOTE — the dynamic-string table DT_STRTAB/STRSZ is only 14,908 bytes — it names just the 741 dynamic symbols and 6 libraries. It is unrelated to the 172 MiB .strtab, which names the 1.2 M static symbols and is never consulted at load time.


Symbol and Relocation Surface

The runtime-visible symbol surface is tiny relative to the binary; the relocation surface is enormous because a 534 MiB PIE text segment full of absolute pointers must be rebased at load.

Dynamic symbols

MetricValue
.dynsym entries741 (size 0x4578 / 24)
of which imports (UND)515 with Ndx == UND (the index-0 NULL + imports) and 226 with a defined section index (the exports)
Exported symbol familiesTpu* C API (TpuExecutor, TpuCompiler, TpuProgram, …), plus GetPjrtApi, TfTpu_Initialize, Abseil internal entry points
Symbol versionsGLIBC_2.x (libc/libm/librt), VERS_1.0 (the object's own definition)

The export surface is intentionally narrow: a host loads this library and calls a handful of C entry points — chiefly GetPjrtApi and the TfTpu_* bootstrap — and the entire compiler/runtime hides behind them as internal (LOCAL) symbols. The polymorphic dispatch behind those few exports is covered in forensics/polymorphic-entry-points.md and lifecycle/get-pjrt-api-thunk.md.

Relocations

TableAddressEntriesType mix
.rela.dyn0x91701,069,186 (size 0x1878c30 / 24)R_X86_64_RELATIVE dominates (DT_RELACOUNT = 1,069,006 ≈ 99.98%)
.rela.plt0x1881da0473 (size 0x2c58 / 24, PLTRELSZ 11,352)R_X86_64_JUMP_SLOT (PLT GOT entries)

GOTCHA — of the ~1.07 M dynamic relocations, all but ~180 are R_X86_64_RELATIVE — pure load-bias adds with no symbol reference. This is the cost of a 534 MiB position-independent text segment: every absolute pointer baked into .data.rel.ro, vtables, and jump tables must be rebased. The relocation table (RELASZ = 25,660,464 bytes ≈ 24.5 MiB) is itself larger than many complete shared libraries. A loader processes over a million relocations before the first constructor runs; this, more than .init_array, dominates dlopen latency for this plugin.

The 473 PLT slots are the lazily-bound calls into the six DT_NEEDED libraries (malloc, pthread_*, dlopen, math, etc.). PLTGOT = 0x224c2980 (section [44] .got.plt) sits in LOAD 3, outside GNU_RELRO, precisely so the lazy resolver can write resolved targets into it at runtime — the rest of the GOT ([35] .got) is RELRO-frozen.

Note: section-count figures of 51 and 52 are both consistent — readelf -h reports Number of section headers: 52, where the 51 counts the meaningful sections and the 52 includes the mandatory NULL section [0]. This page uses the on-the-wire count of 52. Likewise, DT_SONAME is genuinely absent (readelf -n), but the build-id is present and equals 89edbbe81c5b328a958fe628a9f2207d; the version-pin at the top of every wiki page relies on this build-id.


Cross-References

  • Forensics Overview — the parallel high-level tour of the binary; this page is its byte-level substrate
  • Two-Binary Split — why the wheel ships libtpu.so plus a smaller sdk.so, and how their ELF shapes differ
  • Static Initialization — owns the .init_array constructor roster and ordering; this page fixes only the array's location/size
  • Custom Sections — deep dive on google_*, __rseq_cs, linkarr_upb_AllExts, filewrapper_toc, pb_defaults listed in the section table above
  • ELF Entry and Init Procedure — how a library with entry 0x0 reaches first execution through DT_PREINIT_ARRAY/DT_INIT/DT_INIT_ARRAY
  • GetPjrtApi Thunk — the exported entry whose backing singleton lives in the .lbss large-model region documented here
  • Lifecycle Overview — the load-time sequence that consumes this page's dynamic section