Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The libtpu.so / sdk.so Two-Binary Split

All facts on this page apply to the libtpu 0.0.40 wheel, tagged cp314-cp314-manylinux_2_31_x86_64. The build is pinned by build-id, not by any internal version string: libtpu.so build-id 89edbbe81c5b328a958fe628a9f2207d; sdk.so build-id 4e9025466f71009fccb46a803806411c63744a0a. Other wheel builds will differ.

Abstract

The libtpu wheel ships two ELF shared objects side by side in one directory: libtpu.so (745 MiB — the PJRT/XLA-TPU plugin) and sdk.so (21 MiB — its smaller sibling). The natural assumption is that these are two halves of one system, with sdk.so providing a runtime layer that libtpu.so loads at startup. That assumption is wrong, and the page corrects it up front. The two objects are independently linked, share no symbols, and neither names the other in DT_NEEDED. sdk.so is a self-contained CPython extension module — it exports PyInit_sdk and imports the Python C-API, so it is reached only by import libtpu.sdk from a Python interpreter. libtpu.so is a native PJRT plugin reached only by dlopen from a host runtime (JAX, the XLA TPU client); it imports zero Python symbols. They co-reside in the wheel for packaging convenience, not because of any link-time or load-time dependency.

The contrast that matters for a reimplementer is the symbol-population shape, not the dependency edge. libtpu.so is a near-closed object: 918,698 FUNC symbols in its full .symtab but only 226 defined exports in its 741-entry .dynsym (218 versioned C-ABI thunks carrying @@VERS_1.0, plus 8 __start_*/__stop_* linker-set bounds). It statically embeds its entire C++ world — it does not link libstdc++. sdk.so is the mirror image: 78,311 FUNC symbols but 36,787 dynamic symbols, no symbol versioning at all, and it does link libstdc++/libgcc_s like an ordinary extension. One object hides everything behind a 226-entry stable ABI; the other is a normal, openly-linked Python module. The "split" in the wheel is a split between two delivery vehicles for the same vendor's code, not a host/device or compiler/runtime decomposition.

This page establishes the side-by-side ELF facts, proves the no-shared-symbols / no-DT_NEEDED relationship from the dynamic tables, identifies the 226-entry C-ABI surface of libtpu.so (including GetPjrtApi and GetLibtpuSdkApi), explains the sdk.so Python-module nature, and records how the larger object's address space was made navigable for analysis. The two are unrelated link units that happen to share a directory — a packaging fact, not a load-time dependency.

For reimplementation, the contract is:

  • The packaging contract: what the wheel actually contains, which file the Python loader resolves, and which file the C runtime dlopens.
  • The link relationship: that the two objects are independent — no DT_NEEDED edge, no symbol re-export, disjoint dynamic-symbol roles (PJRT C-ABI vs CPython module).
  • The exported-surface model of libtpu.so: a 226-symbol C ABI (218 @@VERS_1.0 thunks plus 8 linker-set bounds) of thin thunks over a statically-linked C++ core, with GetPjrtApi and GetLibtpuSdkApi as the two roots.
  • The analysis-navigation model: why a 745 MiB object with 884,832 functions is audited by address window and segment, not by symbol name.
Wheellibtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64
Objectslibtpu/libtpu.so (745 MiB), libtpu/sdk.so (21 MiB)
Python loader targetlibtpu.so only — __init__.py::get_library_path()
PJRT entryGetPjrtApi@@VERS_1.0libtpu.so 0x0e6a83a0 (5-byte thunk)
SDK C-ABI entryGetLibtpuSdkApi@@VERS_1.0libtpu.so 0x109028c0 (5-byte thunk)
sdk.so entryPyInit_sdksdk.so 0x00739c02 (16 bytes)
Inter-object linknone — no DT_NEEDED, no shared/re-exported symbol

At-a-Glance: The Two Objects Side by Side

Every row is read directly from the two ELF files; the version pin fixes the build-ids.

Propertylibtpu.sosdk.so
On-disk size781,691,048 B (745 MiB)22,541,240 B (21 MiB)
File mode in wheel0755 (executable bit set)0644
ELF typeDYN (shared object), x86-64DYN (shared object), x86-64
file ELF flavorversion 1 (SYSV)version 1 (GNU/Linux)
Build-id formatmd5/uuid, 89edbbe8…2207dsha1, 4e902546…44a0a
Gold-version noteabsentpresent (.note.gnu.gold-version)
DT_SONAMEnonenone
DT_NEEDED count66
DT_NEEDED listlibm libpthread libdl librt libc ld-linuxlibpthread libm libstdc++ libgcc_s libc ld-linux
Links libstdc++?no (statically embedded)yes
Symbol versioningVERS_1.0 (@@VERS_1.0)none
.dynsym entries74136,787
Defined global exports226 (218 @@VERS_1.0 FUNC + 8 linker-set bounds)~36,600 (module symbols)
.symtab FUNC count918,69878,311
IDA function count884,832 (record) / 884,843 (artifact coverage)94,732
DT_FLAGSabsentBIND_NOW (FLAGS_1: NOW)
RELACOUNT (RELATIVE relocs)1,069,006618
Loaded byhost runtime via dlopenCPython via import
Module entryGetPjrtApi, GetLibtpuSdkApi, …PyInit_sdk

Note: libtpu.so and sdk.so are independent link units, not two halves of one system: neither lists the other in DT_NEEDED, neither defines a symbol the other imports, and their dynamic-symbol roles are categorically different (a 226-entry PJRT C-ABI versus a CPython extension exporting PyInit_sdk). Any combined "function total" (e.g. ~979k) is the sum of two unrelated databases, not the size of one binary. Treat them as two binaries that happen to share a directory.

GOTCHA — The wheel is documented as a "stripped 745 MB plugin," but both objects are not stripped per file(1). Their full .symtab survives (918,698 FUNC symbols in libtpu.so), which is why IDA recovers ~884k named functions instead of sub_ blanks. The "stripped" intuition fails here; analysis depth is governed by the surviving .symtab, not by the .dynsym.


libtpu.so — The PJRT Plugin

Purpose

libtpu.so is the native Google-TPU PJRT/XLA plugin: the object a host ML runtime dlopens to obtain a C-ABI handle onto the TPU compiler, runtime, and device fleet. It is the file the libtpu Python package resolves and advertises. The package __init__.py does exactly one functional thing — point an environment variable at this file:

get_library_path()  ->  <pkg>/libtpu.so
configure_library_path():
    if not os.environ.get('TPU_LIBRARY_PATH'):
        os.environ['TPU_LIBRARY_PATH'] = get_library_path()

sdk.so is never named in __init__.py. Nothing in the Python loader touches it. A host runtime that reads TPU_LIBRARY_PATH and dlopens the result therefore loads only libtpu.so.

Exported C ABI

libtpu.so exposes a deliberately tiny surface: 226 defined global symbols in its 741-entry .dynsym (741 − 515 UND/null = 226 defined). Of these, 218 are FUNC exports — every one carrying the @@VERS_1.0 version tag from the object's single VERS_1.0 version definition (VERDEFNUM 2 — the base plus VERS_1.0) — and the remaining 8 are NOTYPE __start_*/__stop_* linker-set bounds. Against 918,698 FUNC symbols in the full .symtab, this is an export ratio under 0.025% — the object is a sealed C++ monolith presenting a hand-curated C door.

The exports cluster into named C-ABI families, every member a *_DoWork / Tpu*_* / TfTpu_* style entry:

ABI family (prefix)Role
GetPjrtApiPJRT C-API vtable root — the host runtime's primary entry
GetLibtpuSdkApilibtpu "SDK" C-API vtable root (a second, distinct ABI surface)
TpuCompiler_*XLA-for-TPU compiler: New, Compile, RunHloPasses, RunBackend, ShapeSize, Free
TpuCompile_*Compilation-cache + program build: CompileAndBuild, CreateCompilationCacheKey, fingerprints
TpuConfigurationApi_*Host/pod configuration: server address, memory limit, TPUs-per-host
TpuComputationPlacer_*Device assignment: New, AssignDevices, AssignLocalDevices
TpuCoreLocation_*, TpuDeviceDescription_*Topology / device-description accessors
TfTpu_*, TfTpuOrdinalSelector_*Runtime bootstrap + core-ordinal selection
HardwareLayout_*, SparseCore_*Layout math and SparseCore queries
TF_InitKernel, TFNPD_InitPluginTensorFlow kernel / next-pluggable-device init
__start_* / __stop_*Linker-set bounds (google_malloc, malloc_hook, pb_defaults, linkarr_upb_AllExts)

QUIRK — GetPjrtApi@@VERS_1.0 is a 5-byte FUNC at 0x0e6a83a0, and GetLibtpuSdkApi@@VERS_1.0 is likewise 5 bytes at 0x109028c0. Five bytes is a single jmp rel32 — these are tail-call thunks, not the implementations. The exported name is a stable trampoline into a statically-linked interior function whose own symbol is internal. A reimplementer must not look for the real PJRT-API constructor at the exported address; follow the jump. The thunk indirection is what lets the 745 MiB interior churn build-to-build while the 226-entry door stays binary-stable. (The GetPjrtApi thunk and the object it returns are detailed on its own page.)

NOTE — libtpu.so carries RELACOUNT 1069006 — over a million R_X86_64_RELATIVE relocations — and an INIT_ARRAY of 23,200 bytes (2,900 init pointers) plus a 16-byte PREINIT_ARRAY. The relative-reloc tonnage and the large constructor array are the load-time cost of statically linking the entire C++ runtime and protobuf/Abseil machinery into one PIE object. sdk.so, linking libstdc++ dynamically, needs only 618 relative relocations.

Why no libstdc++ in DT_NEEDED

libtpu.so's DT_NEEDED is libm, libpthread, libdl, libc, librt, ld-linux — pure C runtime, no libstdc++, no libgcc_s. The object is unmistakably C++ (the export families are C++ subsystems behind C shims), so the C++ standard library is statically embedded. This is the standard hermetic-build posture for a redistributable plugin: depend only on the glibc baseline guaranteed by the manylinux_2_31 tag, carry everything else inside. The statically-linked library inventory that this produces is catalogued separately (see the Embedded-Library Atlas).


sdk.so — The CPython Extension Module

Purpose

sdk.so is not a runtime layer beneath libtpu.so. Its dynamic table proves it is an ordinary CPython extension module:

  • It exports PyInit_sdk (16-byte FUNC at 0x00739c02) — the CPython module-init entry point. By the import protocol, import libtpu.sdk causes CPython to dlopen this file and call PyInit_sdk.
  • Its undefined imports are dominated by the Python C-API: PyObject_GenericGetAttr, PyType_GenericNew, PyCapsule_*, PyExc_ValueError, PyInterpreterState_Get, PyModule_NewObject, and ~170 more non-glibc UND symbols, essentially all Py*. These are satisfied by the running CPython interpreter, not by any sibling .so.
  • It links libstdc++.so.6 and libgcc_s.so.1 dynamically — the normal posture for an extension built into a Python environment that already provides the C++ runtime.

The independence proof

The relationship between the two objects is none, and that is provable three ways from the binaries:

1. DT_NEEDED edges:
     libtpu.so NEEDED: { libm, libpthread, libdl, librt, libc, ld-linux }
     sdk.so    NEEDED: { libpthread, libm, libstdc++, libgcc_s, libc, ld-linux }
   -> "libtpu.so" appears in neither list. "sdk.so" appears in neither list.
      No load-time edge in either direction.

2. Symbol re-export / import:
     sdk.so UND set has ZERO Tpu*/Pjrt*/GetLibtpu*/HardwareLayout*/SparseCore* names.
     -> sdk.so does not consume libtpu.so's 226-entry ABI.
     libtpu.so UND set has ZERO Py* symbols.
     -> libtpu.so does not consume CPython, so it is not loaded as a Python module.

3. Entry-point category:
     libtpu.so module root: GetPjrtApi / GetLibtpuSdkApi  (versioned C-ABI thunks)
     sdk.so    module root: PyInit_sdk                    (CPython module init)
   -> Categorically different load mechanisms (dlopen-by-runtime vs import-by-CPython).

GOTCHA — The shared directory and the near-identical 6-entry DT_NEEDED lists make the two objects look like a paired set. They are not paired at the link level. sdk.so is the compiled form of the libtpu.sdk Python submodule; libtpu.so is the C plugin. A reimplementation that wires sdk.so as a dependency of libtpu.so (or vice versa) is modeling an edge that does not exist in the ELF. The only thing they share is the vendor, the wheel, and the glibc baseline.

Symbol-population shape

sdk.so's shape is the inverse of libtpu.so's. It has 36,787 dynamic symbols — roughly 50× libtpu.so's 741 — because a Python extension exposes its bound C++ classes and protobuf message types broadly across its .dynsym rather than hiding them behind a versioned C door. It carries no symbol versioning (grep VERS_1.0 over its .dynsym returns 0); the @@VERS_1.0 discipline is a libtpu.so-only convention. Its STRSZ is 4,544,547 bytes (~4.3 MiB) of dynamic-string-table — the demangled-symbol cost of that wide export surface. BIND_NOW / FLAGS_1: NOW requests eager binding, normal for an extension that should fail fast at import if a Py* symbol is missing.

NOTE — sdk.so reports 94,732 functions to IDA but only 78,311 FUNC entries in .symtab; the difference is IDA-recovered functions with no surviving symbol (thunks, outlined cold paths, compiler-generated helpers). The 45,156 strings and 164 switches in its database are an order of magnitude below libtpu.so's (1,249,324 strings, 33,016 switches), consistent with a 21 MiB module versus a 745 MiB monolith.


Wheel Packaging Around the Two Objects

Inventory

The wheel is a single distribution — libtpu 0.0.40, one wheel tag, import-root libtpu — with exactly six payload files under the libtpu/ package directory and a dist-info. The two .so files dominate; everything else is small text.

FileSizeKindRole
libtpu/libtpu.so781,691,048 BELF DYNPJRT plugin (the loaded object)
libtpu/sdk.so22,541,240 BELF DYNCPython extension (libtpu.sdk)
libtpu/__init__.py1,131 BPythonSets TPU_LIBRARY_PATHlibtpu.so
libtpu/LICENSE229 BtextGoogle Cloud Platform terms
libtpu/THIRD_PARTY_NOTICES.txt731,537 BtextOSS notices for libtpu.so
libtpu/SDK_THIRD_PARTY_NOTICES.txt103,306 BtextOSS notices for sdk.so

NOTE — The two separate third-party-notices files (THIRD_PARTY_NOTICES.txt for the plugin, SDK_THIRD_PARTY_NOTICES.txt for the SDK) are independent corroboration of the independence finding: the two objects have different OSS dependency closures and are licensed/audited as distinct deliverables. If sdk.so were merely a slice of libtpu.so, one notices file would suffice.

Wheel metadata

FieldValue
Distributionlibtpu 0.0.40
Wheel tagcp314-cp314-manylinux_2_31_x86_64
Requires-Python>=3.11
Requires-Dist(none)
Entry points(none)
purelibfalse (platform wheel)
Summary"Google Cloud TPU runtime library."

The wheel declares no Requires-Dist — it pulls in no Python packages. Its only Python file is the 1.1 KiB __init__.py; all functionality lives in the two native objects. The cp314 ABI tag pins it to CPython 3.14, but note that libtpu.so itself imports no Python symbols — the ABI tag is dictated by sdk.so (PyInit_sdk against the CPython 3.14 C-ABI), not by the plugin. This is why the platform wheel is cp314-specific even though its primary payload is interpreter-agnostic native code.

QUIRK — Because libtpu.so carries no Py* dependency and is reached purely through TPU_LIBRARY_PATH + dlopen, the plugin would run identically under any CPython (or under a non-Python host) — its cp314 tagging is an artifact of bundling the interpreter-bound sdk.so in the same wheel. A reimplementer extracting just the PJRT plugin can ignore the CPython version pin entirely; one extracting sdk.so cannot.


Analysis Navigation of the 745 MiB Object

Why address windows, not symbol names

A 745 MiB object with 884,832 IDA functions (the function-record count; artifact coverage spans 884,843 entries — see Overview) cannot be decompiled in one pass — IDA caps and decompiler limits force a windowed strategy. The object was made navigable two ways:

  • By segment. The loadable map is dominated by one enormous .text (0x0e63c0000x21217484, ~300 MiB of code) plus large read-only constant pools: .lrodata (108 MiB), .rodata (58 MiB), and a 28 MiB .eh_frame. Writable data is comparatively tiny (.data.rel.ro ~10 MiB, .data 2.4 MiB, .bss 853 KiB). The segment map is the first-order coordinate system: a finding at an address is placed by which segment owns it. (The full ELF section map is owned by the ELF-Anatomy page.)
  • By function window. The artifact space (884,843 manifest entries; 884,832 booked function records) was decompiled in offset/limit chunks of ~10,000 functions each (off=… lim=10000), ~97 windows spanning offsets 56,102 → 876,102. Each window records its own decompiled/ctree counts and failure tally; aggregate decompilation failures across the object total 511 (~0.06%), and analysis problems peak at 7,915. This is how a target too large for a single database is audited deterministically — every function lives in a known, reproducible window.
libtpu.so loadable-segment skeleton (selected, by size):
  .text          0x0e63c000 .. 0x21217484   ~300 MiB   code (perm r-x)
  .lrodata       0x01884a00 .. 0x084931d0   ~108 MiB   large const (perm r--)
  .rodata        0x084a0000 .. 0x0be8af28    ~58 MiB   const     (perm r--)
  .eh_frame      0x0c989cb8 .. 0x0e635524    ~28 MiB   unwind    (perm r--)
  .data.rel.ro   0x215f81a0 .. 0x22048b30    ~10 MiB   reloc data(perm rw-)
  .data          0x222551c0 .. 0x224bf798   ~2.4 MiB   data      (perm rw-)
  .bss           0x224c3880 .. 0x22598c30   ~853 KiB   zero-init (perm rw-)
  [init thunks]  .text.startup / .text.unlikely / google_malloc / malloc_hook

NOTE — sdk.so's database is small enough that no windowing is needed — its .text is 0x7394a00xb6b7a2 (~4.4 MiB) inside a single 6.6 MiB LOAD, and IDA decompiled essentially all of its 94,732 functions in one pass. The windowed strategy is a libtpu.so-only necessity born of its three-orders-of-magnitude larger code segment.

GOTCHA — Do not read the IDA "total function" figures (libtpu.so 884,832 records / 884,843 artifact entries; the summed cross-object ~979,575) as a measure of distinct source functions. They include thunks, template instantiations, outlined cold blocks, and per-window overlaps in the selection metadata. The numbers are useful as relative scale (one object is ~9× the other) and as navigation indices, not as a source-function census.


ComponentRelationship
GetPjrtApi thunkThe 5-byte export at 0x0e6a83a0 that returns the PJRT C-API vtable; primary dlopen entry into libtpu.so
GetLibtpuSdkApi thunkThe second 5-byte export at 0x109028c0; the distinct "SDK" C-ABI root inside libtpu.so (named like, but unrelated to, the sdk.so file)
PyInit_sdkThe CPython module-init entry of sdk.so; reached by import libtpu.sdk, never by libtpu.so
Embedded C++ runtimeThe statically-linked libstdc++ + Abseil/protobuf inside libtpu.so that explains its missing DT_NEEDED edge

NOTE — The naming collision between the export GetLibtpuSdkApi (a function inside libtpu.so) and the file sdk.so is a trap. They are unrelated: GetLibtpuSdkApi is a C-ABI vtable root served by the plugin; sdk.so is a Python module that does not import it. "SDK" denotes two different things in the two contexts.

Cross-References