TpuPlatform & TpuNodeContext
All addresses on this page apply to
libtpu.sofrom thelibtpu-0.0.40-cp314wheel (buildlibtpu_lts_20260413_b_RC00, build-id md589edbbe81c5b328a958fe628a9f2207d, 781,691,048 bytes, ELF x86-64 DYN, not stripped; IDA-recovered C names and demangled C++ symbols quoted verbatim)..textVMA equals file offset. Other versions will differ.
Abstract
TpuPlatform_* and TpuNodeContext_* are the two C-ABI lifecycle rosters that bring a TPU process up: the first is the platform-singleton face (one process-wide TPU platform, its device count, its topology pointer, its host location), the second is the per-process node-context face (acquire a reference to the TPU node, query whether the host is closed, tear it down). They are the extern "C" free functions the open-source StreamExecutor TPU backend calls — through the ExecutorApiFn() function-pointer table — to construct and own the runtime, as distinct from the topology accessors that merely read an already-built tpu::TpuTopology. Both clusters live in one contiguous-ish region of the C-ABI block (TpuPlatform_* at 0xEAB8B80–0xEAB8FC0, TpuNodeContext_* at 0xEACA260–0xEACA440), recovered from learning/45eac/tfrc/executor/stream_executor/tpu_node_context_c_api.cc references in the binary.
The single most important structural fact, and the one a reimplementer will get wrong by default, is that the C-ABI TpuPlatform_* functions do not touch tensorflow::tpu::TpuPlatform — the 0x98-byte SE Platform object that RegisterTpuPlatform installs in PlatformManager. They dispatch into a different, lower object: deepsea::executor::DeepseaPlatform, reached through a Meyers-singleton (GetUnderlyingDeepseaPlatform::platform, a __cxa_guard-built function-local static populated by deepsea::executor::GetRegisteredDeepseaPlatform @ 0x1d4935e0). "Deepsea" is the internal codename for the TPU executor layer; the TpuPlatform_* roster is the thin C wrapper over the Deepsea platform's vtable, while the SE TpuPlatform C++ class is a separate StreamExecutor Platform that forwards its own virtual methods through the ExecutorApiFn() slots that point at these very functions. Two "platform" objects, one underlying Deepsea singleton.
TpuNodeContext_* is backed by yet another object — tensorflow::TPUNodeInterfaces and its RAII ScopedRef — that owns the per-process driver attachment. Create mints a heap ScopedRef and runs InitScopedRef; Initialize resolves the live TPUNodeInterfaces*; CloseTpuHost shuts the host down; Free tears the ref down with two fatal CHECKs. This page owns the per-function impl-symbol map for both rosters and the vtable-offset map for the Deepsea dispatch; the topology accessors that read the resulting tpu::TpuTopology are on TpuTopology & TpuCoreLocation.
For reimplementation, the contract is:
- The two rosters — 11
TpuPlatform_*and 5TpuNodeContext_*extern "C"functions, their addresses, and the C++ object/method each backs. - The Deepsea-singleton dispatch — every
TpuPlatform_*call first ensures theGetUnderlyingDeepseaPlatform::platformMeyers-singleton is built, then calls a fixed vtable offset on it (+16Id,+32VisibleDeviceCount,+48Initialize,+72GetExecutor,+96ShouldRegister…,+104GetTopologyPtr). - The two status conventions — POD return (
Initialized→1,VisibleDeviceCount→int) versus theabsl::Status-out idiom (Initialize,GetExecutor, all ofTpuNodeContext_*write an out-paramStatusRep**andUnrefthe prior value). - The node-context RAII shape —
Create/Freeare a paired heap-ScopedRefallocator/destructor with fatal null-CHECKs;CompactionSupportedre-acquires a scoped ref internally rather than taking one.
| Roster prefixes | TpuPlatform_* (11), TpuNodeContext_* (5) |
| C-ABI blocks | TpuPlatform_* @ 0xEAB8B80–0xEAB8FC0; TpuNodeContext_* @ 0xEACA260–0xEACA440 |
| Platform backing object | deepsea::executor::DeepseaPlatform via GetUnderlyingDeepseaPlatform::platform (Meyers-singleton) |
| Singleton builder | deepsea::executor::GetRegisteredDeepseaPlatform @ 0x1d4935e0 |
| Node backing object | tensorflow::TPUNodeInterfaces + tensorflow::TPUNodeInterfaces::ScopedRef |
| Source file (recovered) | …/stream_executor/tpu_node_context_c_api.cc (from a CHECK string in TpuNodeContext_Free) |
| Reached via | ExecutorApiFn() slots (the accessor pattern is on the shim overview) |
| Status convention | absl::Status-out (StatusRep** out-param + Unref) for fallible calls; POD return for queries |
| Evidence grade | Reimplementation-grade / byte-confirmed against IDA decompile |
Scope — the
*ApiFn()accessor pattern, the opaque-handle convention, and the full roster map are on the shim overview (linked, not re-derived). The topology accessor roster (TpuTopology_*/TpuCoreLocation_*) that reads thetpu::TpuTopologythis page'sGetTopologyPtrreturns is on TpuTopology & TpuCoreLocation — contrast: that page reads geometry, this page owns the platform/node-context lifecycle. The SEtensorflow::tpu::TpuPlatformclass registration (RegisterTpuPlatform @ 0xe99a3a0, the0x98-byte object,PlatformManager) is on StreamExecutor Platform & Executor Model. The one-time population of theExecutorApiFn()slots that point at these functions is on TfTpu_Initialize Bootstrap.
Note — three names that look like they belong on this roster are not C-ABI
TpuPlatform_*/TpuNodeContext_*functions in this build. There is noTpuPlatform_GetTpuHostLocation, noTpuPlatform_TpuMemoryLimit, and noStopChipHeartbeatssymbol at all.GetTpuHostLocationandTpuMemoryLimitexist only as C++ methods on the SEtensorflow::tpu::TpuPlatformclass (…TpuPlatform::GetTpuHostLocation() const @ 0xe999f60,…TpuPlatform::TpuMemoryLimit(long*) @ 0xe99a2c0); the memory limit is exposed over the C ABI only under a different roster,TpuConfigurationApi_TpuMemoryLimit @ 0xe8cdc40(see TpuConfigurationApi). TheTpuPlatform_*roster's nearest equivalent to a host location isGetHostLocation(§2); there is noTpuPlatform_*memory-limit entry — that is a configuration-API concern. The verified roster is the 11 + 5 functions documented below.
1. The Deepsea-Singleton Dispatch
Purpose
Every TpuPlatform_* function is a stateless C entry that operates on one process-global object: the registered Deepsea platform. There is no platform handle passed in — the functions that take no platform argument (Id, VisibleDeviceCount, GetTopologyPtr, …) reach the singleton directly. This is the defining difference from the topology roster, whose accessors all receive a topo handle as a1. The platform is a singleton; the topology is an object you hold.
Entry Point
The dispatch shape is identical across the roster. Worked path for VisibleDeviceCount:
xla / SE TPU backend
└─ tensorflow::tpu::TpuPlatform::VisibleDeviceCount() (host-side SE Platform method)
└─ tbl = stream_executor::tpu::ExecutorApiFn() 0x20819360 — singleton fn-ptr struct
└─ tbl[slot]( ) — call through the slot
│ (slot populated at init to point at:)
▼
TpuPlatform_VisibleDeviceCount 0xEAB8E40 — plugin-side C-ABI impl
└─ EnsureDeepseaSingleton() — __cxa_guard + GetRegisteredDeepseaPlatform
└─ (*platform->vtable[+32])(platform) 0x1d4935e0 built it; +32 = device count
Algorithm — the singleton guard
Every TpuPlatform_* body opens with the same Meyers-singleton guard. IDA renders two textually-different but semantically-identical forms (the compiler reorders the early-return); both reduce to "build once, then call the vtable":
function EnsureDeepseaSingleton(): // inlined prologue of every TpuPlatform_*
if !guard_byte(GetUnderlyingDeepseaPlatform::platform):
if __cxa_guard_acquire(&guard):
platform = deepsea::executor::GetRegisteredDeepseaPlatform() // 0x1d4935e0
__cxa_guard_release(&guard)
return GetUnderlyingDeepseaPlatform::platform // may be NULL if no TPU registered
function TpuPlatform_VisibleDeviceCount(): // 0xEAB8E40
p = EnsureDeepseaSingleton()
return (*(int(**)(void*))(*(void**)p + 32))(p) // vtable +32
function TpuPlatform_New(): // 0xEAB8B80
p = EnsureDeepseaSingleton()
if !p: return NULL // only New / GetExecutor null-check p
h = operator new(8); *h = p; return h // 8-byte handle box wrapping the singleton ptr
QUIRK —
TpuPlatform_Newdoes not construct anything. It returns an 8-byte heap box holding a copy of the shared Deepsea singleton pointer — everyNewhands back a fresh box pointing at the same underlying platform.TpuPlatform_Free(0xEAB8C00) is correspondingly justif (h) free(h): it frees the 8-byte box, never the singleton. A reimplementer who makesNewallocate a real platform andFreedestroy it will double-free the process-global on the secondNew/Freepair. The box is a handle, not an owner.
GOTCHA — the singleton can be
NULL.GetRegisteredDeepseaPlatformreturns null when no TPU platform was registered (no device, or registration disabled). OnlyTpuPlatform_New(returnsNULL) andTpuPlatform_GetExecutor(returns a failedStatusOr) defend against it. The query functions (Id,VisibleDeviceCount,GetTopologyPtr,GetHostLocation,ShouldRegisterTpuDeviceToDeviceCopy) dereference the singleton's vtable unconditionally —NULL->vtable[+N]crashes. The contract is: the host must observe a successfulNew/Initializebefore calling any query.
The vtable-offset map
The Deepsea platform vtable offsets the roster dispatches through, confirmed by the call expressions in each body:
| Offset | Deepsea method (inferred) | C-ABI caller |
|---|---|---|
+16 | platform id | TpuPlatform_Id (0xEAB8DE0) |
+32 | visible device count | TpuPlatform_VisibleDeviceCount (0xEAB8E40) |
+48 | initialize (returns absl::Status) | TpuPlatform_Initialize (0xEAB8C20) |
+72 | get executor for ordinal (returns StatusOr<exec*>) | TpuPlatform_GetExecutor (0xEAB8CC0) |
+96 | should register D2D copy | TpuPlatform_ShouldRegisterTpuDeviceToDeviceCopy (0xEAB8EA0) |
+104 | get topology pointer | TpuPlatform_GetTopologyPtr (0xEAB8F00) |
GetHostLocation does not go through the vtable — it calls the non-virtual deepsea::executor::DeepseaPlatform::GetHostLocation @ 0x1d0e79a0 directly on the singleton. Initialized and GetRuntimeVersion do not touch the singleton's vtable at all (see §2).
2. The TpuPlatform_ Roster
Function Map
Eleven extern "C" functions, three lifecycle and eight query/accessor. "Backs" names the Deepsea vtable offset or helper each dispatches to.
| Function | Addr | Backs | Output / Convention |
|---|---|---|---|
TpuPlatform_New | 0xEAB8B80 | EnsureDeepseaSingleton; box the ptr | void* handle (8-byte box) or NULL |
TpuPlatform_Free | 0xEAB8C00 | free(handle) — frees the box only | void |
TpuPlatform_Initialize | 0xEAB8C20 | singleton vtable +48 | absl::Status-out (StatusRep**) |
TpuPlatform_Initialized | 0xEAB8CA0 | nothing — constant | char → always 1 |
TpuPlatform_GetExecutor | 0xEAB8CC0 | singleton vtable +72 | StatusOr<executor*> (boxed) |
TpuPlatform_Id | 0xEAB8DE0 | singleton vtable +16 | scalar id |
TpuPlatform_VisibleDeviceCount | 0xEAB8E40 | singleton vtable +32 | int |
TpuPlatform_ShouldRegisterTpuDeviceToDeviceCopy | 0xEAB8EA0 | singleton vtable +96 | bool |
TpuPlatform_GetTopologyPtr | 0xEAB8F00 | singleton vtable +104 | tpu::TpuTopology* (opaque) |
TpuPlatform_GetHostLocation | 0xEAB8F60 | DeepseaPlatform::GetHostLocation @ 0x1d0e79a0 (non-virtual) | host-location handle |
TpuPlatform_GetRuntimeVersion | 0xEAB8FC0 | BuildData::Timestamp + Changelist; static kMetadata | fills caller struct (version + string) |
NOTE —
GetTopologyPtris the seam between this page and TpuTopology & TpuCoreLocation. It returns the opaquetpu::TpuTopology*(vtable+104); the host then hands that pointer to everyTpuTopology_*accessor as theira1. The platform produces the topology pointer; the topology roster consumes it. Neither side passes the C++ object by value — only thevoid*crosses.
Algorithm — the two non-dispatching outliers
Two functions break the singleton-dispatch mould and a reimplementer must reproduce them exactly:
function TpuPlatform_Initialized(): // 0xEAB8CA0
return 1 // hardcoded TRUE — no state inspected
function TpuPlatform_GetRuntimeVersion(out): // 0xEAB8FC0
// Meyers-singleton over a static `kMetadata` (NOT the platform singleton):
if !guard(TpuPlatform_GetRuntimeVersion::kMetadata):
if __cxa_guard_acquire(&guard):
kMetadata = StrCat(BuildData::Timestamp(), " cl/", BuildData::Changelist())
__cxa_guard_release(&guard)
out[0] = 0 // version major/minor packed
out[8] = 1
out[16] = ptr-to(kMetadata string) // SSO-aware: inline vs heap by sign of len byte
out[24] = kMetadata length
return out
QUIRK —
TpuPlatform_Initializedalways returns1. It inspects no state — not the singleton, not a flag. The "is the platform initialized" question is answered statically true at the C-ABI boundary; the real initialization fallibility lives inTpuPlatform_Initialize(which can return a failedabsl::Statusfrom vtable+48). A reimplementer must not wireInitializedto a real readiness flag: the SE backend treats it as an always-true capability probe, and the actual gate isInitialize's status.
The absl::Status-out idiom
Initialize and GetExecutor use the C-ABI status convention shared across the whole shim: the caller passes a StatusRep** out-param holding the previous status object; the callee computes a new StatusRep*, and if it differs, stores it into the out-param and Unrefs the old one (an absl::Status is a tagged pointer — & 1 distinguishes an inline OK from a heap StatusRep* that needs refcounting).
function TpuPlatform_Initialize(self, status_out): // 0xEAB8C20
p = EnsureDeepseaSingleton()
new = (*(p->vtable[+48]))(p) // Deepsea initialize → StatusRep* (tagged)
old = *status_out
if new == old:
if (new & 1) == 0: StatusRep::Unref(new) // both heap, same ptr → drop one ref
else:
*status_out = new
if (old & 1) == 0: StatusRep::Unref(old) // replace, release the previous status
GetExecutor (0xEAB8CC0) is the StatusOr<T> form of the same idiom: the Deepsea call (vtable +72) writes a 2-slot {status, value} into a stack temp; if the status slot is the OK sentinel (&dword_0 + 1), it boxes value into a fresh operator new(8) handle and returns it; otherwise it Unrefs and returns NULL, with a ThrowBadStatusOrAccess path if the host misuses the result.
3. The TpuNodeContext_ Roster
Purpose
A tensorflow::TPUNodeInterfaces is the per-process attachment to the TPU node — the object that owns the live driver session, the node's topology, and its core location. The C-ABI TpuNodeContext_* roster manages a ScopedRef into it: an RAII reference that, while held, keeps the node alive. The host calls Create once to obtain the ref, Initialize to resolve the underlying TPUNodeInterfaces*, CloseTpuHost to shut the host down, and Free to drop the ref. All four fallible calls use the same absl::Status-out idiom as TpuPlatform_Initialize. Source file recovered from a CHECK string: …/stream_executor/tpu_node_context_c_api.cc.
Function Map
| Function | Addr | Backs (C++ method) | Output / Convention |
|---|---|---|---|
TpuNodeContext_Create | 0xEACA260 | TPUNodeInterfaces::InitScopedRef | ScopedRef** handle + absl::Status-out |
TpuNodeContext_Free | 0xEACA2E0 | ~ScopedRef + free; two fatal CHECKs | void |
TpuNodeContext_CloseTpuHost | 0xEACA3C0 | TPUNodeInterfaces::CloseTPUHost | absl::Status-out |
TpuNodeContext_Initialize | 0xEACA400 | TPUNodeInterfaces::Get | absl::Status-out (+ resolves TPUNodeInterfaces**) |
TpuNodeContext_CompactionSupported | 0xEACA440 | InitScopedRef → TpuChipConfig::Megacore | bool |
Algorithm — the RAII pair
Create and Free are a heap-ScopedRef allocator/destructor. The asymmetry — Create allocates an 8-byte box holding the ScopedRef*, Free validates and tears it down — is the reimplementation contract:
function TpuNodeContext_Create(node_index, status_out): // 0xEACA260
ref = operator new(8); *ref = NULL // 8-byte box for the ScopedRef
st = TPUNodeInterfaces::InitScopedRef(node_index, ref) // fills *ref, returns StatusRep*
merge_status(status_out, st) // same Status-out idiom as §2
handle = operator new(8); *handle = ref // box the box
return handle
function TpuNodeContext_Free(handle): // 0xEACA2E0
CHECK(handle != NULL, "node_context != nullptr") // tpu_node_context_c_api.cc:33 (FATAL)
inner = *handle
CHECK(inner != NULL, "node_context->node_ref != nullptr") // :34 (FATAL)
*handle = NULL
~ScopedRef(inner); free(inner) // destroy + free the ScopedRef
// (second NULL-guarded teardown of *handle handles a re-entrant edge)
free(handle)
GOTCHA —
TpuNodeContext_Freeis fatal on a null handle or a null inner ref — it does not return an error, itLogMessageFatals and aborts (the two CHECKs attpu_node_context_c_api.cc:33and:34). This is the opposite ofTpuPlatform_Free, which is a silentif (h) free(h). A reimplementer must keep the node-context teardown fatal-on-misuse: the SE backend relies on the abort to surface a double-free / use-after-free of the node ref rather than corrupting the driver session. Do not "helpfully" make it tolerate null.
QUIRK —
TpuNodeContext_CompactionSupporteddoes not take a node-context handle. It re-acquires its own throwawayScopedRefviaInitScopedRef, then — only if that ref's status is OK — walksTPUNodeInterfaces::tpu_topology()→tensor_core_location()→tpu::TpuChipConfig::GetUserStack(...), and if the user-stack byte at+40is1, returnstpu::TpuChipConfig::Megacore(chip_config). It defaults to1(supported) on any failure or non-megacore path, and always destroys its scoped ref before returning. So "compaction supported" is really "is this a megacore chip config", computed fresh each call against a transient node reference.
The Initialize / CloseTpuHost status forms
TpuNodeContext_Initialize (0xEACA400) calls TPUNodeInterfaces::Get(node_index, &out, node_out) — it resolves the live TPUNodeInterfaces* into a caller out-param and threads the absl::Status back through the standard out-param idiom. TpuNodeContext_CloseTpuHost (0xEACA3C0) calls TPUNodeInterfaces::CloseTPUHost(node) and merges the returned status into the node's leading StatusRep* slot. Both are pure status-plumbing wrappers over a single TPUNodeInterfaces method; the only logic is the tagged-pointer Unref of the prior status.
4. Validity Gating
The two rosters gate very differently, and the contrast is the reimplementation hazard:
| Function(s) | Guard | Failure behaviour |
|---|---|---|
TpuPlatform_New | singleton non-null after build | returns NULL |
TpuPlatform_GetExecutor | StatusOr OK sentinel | returns NULL; ThrowBadStatusOrAccess on misuse |
TpuPlatform_Initialize / _GetExecutor | absl::Status-out | propagates failed status, Unrefs prior |
TpuPlatform_Id / _VisibleDeviceCount / _GetTopologyPtr / _GetHostLocation / _ShouldRegister… | none | dereferences singleton vtable unconditionally — crashes if singleton is NULL |
TpuPlatform_Initialized | none | always 1 |
TpuNodeContext_Create / _Initialize / _CloseTpuHost | absl::Status-out | propagates failed status |
TpuNodeContext_Free | two CHECKs (:33, :34) | FATAL abort on null handle/ref |
TpuNodeContext_CompactionSupported | scoped-ref status OK | defaults to 1 (supported) on any failure |
NOTE — the split mirrors the topology roster's split (pure readers vs. defended accessors). Here, the platform query functions assume a successfully-built singleton exactly as the topology pure-readers assume a live
tpu::TpuTopology; the node-context functions, dealing with a per-process resource that can legitimately be torn down, instead carry status out-params (for the fallible operations) and a fatal CHECK (for the destructor, where a null is always a caller bug). A reimplementer who applies one discipline uniformly will either over-defend the cheap platform queries or under-defend the node-context teardown.
Related Components
| Name | Relationship |
|---|---|
deepsea::executor::DeepseaPlatform | the underlying TPU platform object every TpuPlatform_* dispatches into (vtable +16…+104) |
deepsea::executor::GetRegisteredDeepseaPlatform @ 0x1d4935e0 | builds the GetUnderlyingDeepseaPlatform::platform Meyers-singleton |
tensorflow::tpu::TpuPlatform (0x98 bytes) | the separate SE Platform class registered by RegisterTpuPlatform; forwards its methods through the ExecutorApiFn() slots that point at this roster |
tensorflow::TPUNodeInterfaces / …::ScopedRef | the per-process node attachment + RAII ref the TpuNodeContext_* roster manages |
tpu::TpuChipConfig::Megacore | backs TpuNodeContext_CompactionSupported's megacore probe |
TpuConfigurationApi_TpuMemoryLimit @ 0xe8cdc40 | where the C ABI actually exposes the memory limit (not on TpuPlatform_*) |
Cross-References
- The TfTpu C-API Shim — the
*ApiFn()accessor pattern, opaque-handle convention, and the roster map this page is one entry of - TpuTopology & TpuCoreLocation — the
TpuTopology_*/TpuCoreLocation_*geometry accessors that read the topologyTpuPlatform_GetTopologyPtrreturns; contrast: that page reads geometry, this owns the platform/node lifecycle - TpuExecutor Roster — the
TpuExecutor_*per-device runtime cluster minted off the platform, reached through the sameExecutorApiFn()table - TpuCompiler Roster — the
TpuCompiler_*/TpuCompile_*compilation C surface that runs against executors from this platform - StreamExecutor Platform & Executor Model — the SE
tensorflow::tpu::TpuPlatformclass,RegisterTpuPlatform @ 0xe99a3a0, andPlatformManager; the host-side consumer of this C-ABI roster - TfTpu_Initialize Bootstrap — the one-time population of the
ExecutorApiFn()slots that point at these functions