Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

PJRT_Api Function-Pointer Table Reconstruction

All addresses on this page apply to libtpu 0.0.40 (cp314, manylinux_2_31_x86_64), build-id 89edbbe81c5b328a958fe628a9f2207d. The PJRT C-API surface is v0.103. Other libtpu releases pin different minors and will renumber the late-addition slots.

Abstract

libtpu.so is a PJRT plugin: it exports one symbol, GetPjrtApi, and everything a host framework (JAX, PyTorch/XLA) ever calls reaches the TPU through the single PJRT_Api struct that symbol returns. That struct is a C ABI vtable — a flat array of 140 qword slots: five scalar header fields followed by 135 function pointers, one per PJRT_* C-API entry point. The host reads api->PJRT_LoadedExecutable_Execute(args) and the call lands in libtpu's text section. There is no C++ inheritance, no dispatch object, no per-version trampoline table — just this one struct, populated once, immutable for process lifetime.

This page is the backbone reference for the PJRT section: the complete, ordered, slot-by-slot reconstruction of that table. It owns the struct header decode (struct_size, extension_start, the embedded PJRT_Api_Version), the full 140-slot map (every slot's field name, the libtpu implementation symbol it points at, and that symbol's virtual address), and the populated-vs-injected map (which five slots are TPU-specialized and which 130 are stock XLA wrappers). The slot ordering matches public xla/pjrt/c/pjrt_c_api.h v0.103 exactly, including the late-addition cluster appended in feature-addition order beyond the original v0.40 surface.

The vtable is assembled by one function — pjrt::CreatePjrtApi @ 0xf874160 — which does nothing but 140 stores into a caller-supplied buffer. That makes reconstruction unusually clean: the slot index is the array subscript in the initializer, and the field name is the assigned pjrt::PJRT_* symbol. Everything below is anchored to those stores. Per-area behavior (how each implementation actually works) lives on the area pages linked at the bottom; the extension linked list dangling off extension_start lives on extension-chain.md. This page is the index they all hang from.

For reimplementation, the contract is:

  • The struct layout — 1120 bytes = 140 × 8; five scalar header slots (0..4), then 135 function pointers (5..139), in the exact order below.
  • The version pinPJRT_Api_Version = {struct_size=24, priv=NULL, major=0, minor=103}, encoded as the qword 0x6700000000 at offset +0x20.
  • The slot-to-symbol map — every populated slot resolves to a text-section symbol; no slot in this build is left null.
  • The five injection points — the only slots not compile-time-fixed: passed in as parameters to CreatePjrtApi by its caller.
  • The per-slot backward-compat guard — every wrapper's first act is ActualStructSizeIsGreaterOrEqual, the mechanism that lets an older-minor host call this v0.103 plugin.
Exported entryGetPjrtApi @ 0xE6A83A0 (thunk → GetTpuPjrtApi)
Builderpjrt::tpu_plugin::GetTpuPjrtApi @ 0xE6AA440
Initializerpjrt::CreatePjrtApi @ 0xF874160 (140 stores, no logic)
Storage_ZZN4pjrt10tpu_plugin13GetTpuPjrtApiEvE8pjrt_api @ 0x227BA840
Section[47] .lbss (NOBITS, zero-filled at load, large-model)
Size1120 bytes = 140 qword slots
C-API versionv0.103 (major=0, minor=103)
Populated slots135 / 135 function pointers; 0 null
TPU-specialized slots5 (slots 8, 9, 15, 87, 103)
Backward-compat guardpjrt::ActualStructSizeIsGreaterOrEqual @ 0xF8A4EC0

Slot count per area

AreaSlotsSlot range
Header (scalars)50..4
Error3 (+1 late)5..7, 137
Plugin28..9
Event5 (+2 late)10..14, 131..132
Client13 (+~12 late)15..27, 98, 100, 108, 115..118, 120..121, 123, 134
DeviceDescription628..33
Device6 (+3 late)34..39, 126..127, 133
Memory5 (+1 late)40..44, 102
Executable10 (+6 late)45..54, 95..96, 99, 101, 129, 139
LoadedExecutable8 (+2 late)55..62, 122, 135
Buffer19 (+5 late)63..81, 97, 105, 125, 130, 136
CopyToDeviceStream582..86
TopologyDescription7 (+2 late)87..93, 119, 138
Compile194
ExecuteContext1 (+1 late)103..104
AsyncHostToDeviceTransferManager9 (late)106..107, 109..114, 124
AsyncTrackingEvent1 (late)128

Population Path and Storage

Purpose

The table is not a static initializer in a PROGBITS section — it is built lazily, once, on the first GetPjrtApi call, into zero-filled .lbss. Understanding the population path is the difference between a reimplementation that returns a correct table and one that races or returns garbage.

Entry Point

dlsym("GetPjrtApi")
  GetPjrtApi @ 0xE6A83A0                       ── thunk; tail-calls the builder
    └─ pjrt::tpu_plugin::GetTpuPjrtApi @ 0xE6AA440
         ── 16 __cxa_guard blocks build the .bss extensions (newest→oldest)
         ── final __cxa_guard wraps:
         └─ pjrt::CreatePjrtApi @ 0xF874160     ── 140 stores into &pjrt_api
              returns &pjrt_api = 0x227BA840

The exported symbol is a pure thunk. Its entire decompiled body is return pjrt::tpu_plugin::GetTpuPjrtApi(a1); (confirmed at 0xE6A83A0, marked thunk by IDA). GetTpuPjrtApi is the orchestrator: it runs one Itanium-ABI __cxa_guard-protected constructor per extension to build the extension chain in .bss, then a final guard around CreatePjrtApi, which writes every slot. The function returns &pjrt_api, the address of a function-local static.

Algorithm

function GetTpuPjrtApi():                       // 0xE6AA440
    // one-shot init of the extension chain (covered on extension-chain.md);
    // each extension is a __cxa_guard-protected function-local static in .bss
    build_extensions_if_needed();               // 16 guards, newest→oldest
    if (guard_for(pjrt_api) not yet set):        // _ZGV... byte == 0
        CreatePjrtApi(&pjrt_api,                  // 0xF874160 — writes all 140 slots
                      Client_Create_impl,         // -> slot 15  (a2)
                      ExecuteContext_Create_impl,  // -> slot 103 (a3)
                      TopologyDescription_Create_impl, // -> slot 87 (a4)
                      Plugin_Initialize_impl,      // -> slot 8   (a5)
                      extension_start,             // -> slot 1   (a6)
                      Plugin_Attributes_Xla_impl); // -> slot 9   (a7)
        mark_guard_set(pjrt_api);
    return &pjrt_api;                             // 0x227BA840
function CreatePjrtApi(a1, a2, a3, a4, a5, a6, a7):  // 0xF874160
    *a1     = 1120;                 // slot 0  struct_size
    a1[1]   = a6;                   // slot 1  extension_start  (injected: chain head)
    a1[2]   = 24;                   // slot 2  pjrt_api_version.struct_size
    a1[3]   = 0;                    // slot 3  pjrt_api_version.priv
    a1[4]   = 0x6700000000LL;       // slot 4  {major=0, minor=103}
    a1[5]   = pjrt::PJRT_Error_Destroy;   // slot 5  ... 130 fixed wrappers ...
    a1[8]   = a5;                   // slot 8  Plugin_Initialize    (injected)
    a1[9]   = a7;                   // slot 9  Plugin_Attributes    (injected)
    a1[15]  = a2;                   // slot 15 Client_Create        (injected)
    a1[87]  = a4;                   // slot 87 TopologyDescription_Create (injected)
    a1[103] = a3;                   // slot 103 ExecuteContext_Create (injected)
    // ... all other slots assigned a compile-time-fixed pjrt::PJRT_* symbol ...
    a1[139] = pjrt::PJRT_Executable_ParameterMemoryKinds;  // slot 139
    return a1;

NOTE — CreatePjrtApi contains no branches and no loop: it is 140 straight-line stores. The slot index is the C-array subscript, which is why the reconstruction below can be exact rather than inferred. The five injected slots (a2..a7) are the only values the function does not hard-code; everything else is a pjrt::PJRT_* relocation baked at link time. This was confirmed by reading the initializer in full at 0xF874160.

Considerations

The struct lives in .lbss (section [47], NOBITS, large-model BSS), at the function-local static _ZZN4pjrt10tpu_plugin13GetTpuPjrtApiEvE8pjrt_api @ 0x227BA840. Two consequences for a reimplementer and for any tool that inspects the binary statically:

  • Static disassembly cannot show the populated table. The 1120 bytes are zero on disk and zero-filled at load; the function pointers only exist after the first GetPjrtApi call runs CreatePjrtApi. The slot map below is reconstructed from the stores in the initializer, not from a data dump — there is nothing to dump until runtime. Runtime confirmation would require reading /proc/self/mem at 0x227BA840 inside a live JAX worker after the first call.
  • Concurrency is __cxa_guard, not a lock the reader holds. First callers serialize through the Itanium ABI guard (one thread runs the constructor, the rest block on its mutex). After the first call the struct is immutable, so steady-state readers take no lock. A reimplementation that rebuilds the table per call, or that omits the one-shot guard, diverges from this contract.

Note — the struct lives in .lbss at the concrete VA 0x227BA840, not in .bss/.data.rel.ro; any &stru_… + _GLOBAL_OFFSET_TABLE_ form in a disassembler is a PIC-relocation artifact, not a real address. Slot 9 is not a TPU-plugin override — it is fed from CreatePjrtApi's a7 parameter = pjrt::PJRT_Plugin_Attributes_Xla, a stock XLA implementation.


Struct Header (Slots 0..4)

Purpose

The first five qwords are not function pointers. They are the ABI handshake: total size (for forward/backward compat), the extension-chain head, and an embedded PJRT_Api_Version substruct. A host reads these before touching any function pointer.

Layout

SlotOffsetFieldTypeValueSource
0+0x000struct_sizesize_t1120*a1 = 1120
1+0x008extension_startPJRT_Extension_Base*0x224C3F68 (chain head)a1[1] = a6 (injected)
2+0x010pjrt_api_version.struct_sizesize_t24a1[2] = 24
3+0x018pjrt_api_version.privvoid*NULLa1[3] = 0
4+0x020pjrt_api_version.{major,minor}int,int{0, 103}a1[4] = 0x6700000000
struct PJRT_Api_Version {       // embedded at PJRT_Api +0x010, 24 bytes
    size_t struct_size;          // = 24
    void*  priv;                 // = NULL
    int    major_version;        // = 0
    int    minor_version;        // = 103 (0x67)
};

The version qword decodes by little-endian byte layout: bytes at +0x20 are 00 00 00 00 67 00 00 00, i.e. major = low 32 bits = 0, minor = high 32 bits = 0x67 = 103. struct_size = 1120 lets a host compiled against a different minor know exactly how many trailing slots it may safely read.

QUIRK — extension_start (slot 1) points at the newest extension, not the oldest. The chain is built in __cxa_guard order and linked head-insert, so walking .next goes newest→oldest and terminates at the profiler extension in .data. A host that assumes "type 0 first" or any type ordering is wrong; it must iterate to NULL. The chain itself is documented on extension-chain.md.


The 140-Slot Map

Purpose

The complete ordered table. Slots 0..4 are the scalar header above. Slots 5..139 are function pointers; each cell below names the field (per pjrt_c_api.h v0.103) and the libtpu symbol the slot points at, with that symbol's virtual address. All 135 function-pointer slots are populated in this build — there is no null/unimplemented slot. Confidence is CERTAIN for every slot whose store and target symbol were both read in the decompiled initializer and confirmed against the symbol table; the 12 spot-confirmed addresses are flagged.

Slot ordering note

The order is grouped by area (Error/Plugin/Event/Client/DeviceDescription/Device/Memory/Executable/LoadedExecutable/Buffer/CopyStream/Topology/Compile) through slot 94, then a flat late-addition cluster (slots 95..139) appended in feature-addition order across minors 0.40→0.103. The late cluster is not re-grouped by area — Buffer_*, Client_*, Executable_*, and Event_* entries are interleaved in the order XLA added them. The "Area" column tags each late slot back to its conceptual home; the area deep-dive pages own the behavior.

Error / Plugin / Event (slots 5..14)

SlotOffFieldImpl symbol (pjrt::…)Addr
5+0x028PJRT_Error_DestroyPJRT_Error_Destroy0x0F85ECE0
6+0x030PJRT_Error_MessagePJRT_Error_Message0x0F85EDE0
7+0x038PJRT_Error_GetCodePJRT_Error_GetCode0x0F85EF40
8+0x040PJRT_Plugin_Initializetpu_plugin::PJRT_Plugin_Initialize0x0E6A9D00
9+0x048PJRT_Plugin_AttributesPJRT_Plugin_Attributes_Xla0x0F85F080
10+0x050PJRT_Event_DestroyPJRT_Event_Destroy0x0F86F920
11+0x058PJRT_Event_IsReadyPJRT_Event_IsReady0x0F86F9E0
12+0x060PJRT_Event_ErrorPJRT_Event_Error0x0F86FBA0
13+0x068PJRT_Event_AwaitPJRT_Event_Await0x0F86FA80
14+0x070PJRT_Event_OnReadyPJRT_Event_OnReady0x0F86FC60

Client (slots 15..27)

SlotOffFieldImpl symbol (pjrt::…)Addr
15+0x078PJRT_Client_Createtpu_plugin::PJRT_Client_Create0x0E6A8840
16+0x080PJRT_Client_DestroyPJRT_Client_Destroy0x0F85F0E0
17+0x088PJRT_Client_PlatformNamePJRT_Client_PlatformName0x0F85F4A0
18+0x090PJRT_Client_ProcessIndexPJRT_Client_ProcessIndex0x0F85F440
19+0x098PJRT_Client_PlatformVersionPJRT_Client_PlatformVersion0x0F85F500
20+0x0A0PJRT_Client_DevicesPJRT_Client_Devices0x0F85F600
21+0x0A8PJRT_Client_AddressableDevicesPJRT_Client_AddressableDevices0x0F85F660
22+0x0B0PJRT_Client_LookupDevicePJRT_Client_LookupDevice0x0F85F6C0
23+0x0B8PJRT_Client_LookupAddressableDevicePJRT_Client_LookupAddressableDevice0x0F85F880
24+0x0C0PJRT_Client_AddressableMemoriesPJRT_Client_AddressableMemories0x0F85FC60
25+0x0C8PJRT_Client_CompilePJRT_Client_Compile0x0F861820
26+0x0D0PJRT_Client_DefaultDeviceAssignmentPJRT_Client_DefaultDeviceAssignment0x0F8630C0
27+0x0D8PJRT_Client_BufferFromHostBufferPJRT_Client_BufferFromHostBuffer0x0F8644C0

DeviceDescription / Device (slots 28..39)

SlotOffFieldImpl symbol (pjrt::…)Addr
28+0x0E0PJRT_DeviceDescription_IdPJRT_DeviceDescription_Id0x0F865360
29+0x0E8PJRT_DeviceDescription_ProcessIndexPJRT_DeviceDescription_ProcessIndex0x0F8653C0
30+0x0F0PJRT_DeviceDescription_AttributesPJRT_DeviceDescription_Attributes0x0F865420
31+0x0F8PJRT_DeviceDescription_KindPJRT_DeviceDescription_Kind0x0F865480
32+0x100PJRT_DeviceDescription_DebugStringPJRT_DeviceDescription_DebugString0x0F865500
33+0x108PJRT_DeviceDescription_ToStringPJRT_DeviceDescription_ToString0x0F8658A0
34+0x110PJRT_Device_GetDescriptionPJRT_Device_GetDescription0x0F8659A0
35+0x118PJRT_Device_IsAddressablePJRT_Device_IsAddressable0x0F865A00
36+0x120PJRT_Device_LocalHardwareIdPJRT_Device_LocalHardwareId0x0F865A60
37+0x128PJRT_Device_AddressableMemoriesPJRT_Device_AddressableMemories0x0F865AC0
38+0x130PJRT_Device_DefaultMemoryPJRT_Device_DefaultMemory0x0F865B20
39+0x138PJRT_Device_MemoryStatsPJRT_Device_MemoryStats0x0F865CE0

Memory (slots 40..44)

SlotOffFieldImpl symbol (pjrt::…)Addr
40+0x140PJRT_Memory_IdPJRT_Memory_Id0x0F865E80
41+0x148PJRT_Memory_KindPJRT_Memory_Kind0x0F865EE0
42+0x150PJRT_Memory_DebugStringPJRT_Memory_DebugString0x0F865FC0
43+0x158PJRT_Memory_ToStringPJRT_Memory_ToString0x0F866040
44+0x160PJRT_Memory_AddressableByDevicesPJRT_Memory_AddressableByDevices0x0F8660C0

Executable (slots 45..54)

SlotOffFieldImpl symbol (pjrt::…)Addr
45+0x168PJRT_Executable_DestroyPJRT_Executable_Destroy0x0F8661C0
46+0x170PJRT_Executable_NamePJRT_Executable_Name0x0F866860
47+0x178PJRT_Executable_NumReplicasPJRT_Executable_NumReplicas0x0F8668C0
48+0x180PJRT_Executable_NumPartitionsPJRT_Executable_NumPartitions0x0F866920
49+0x188PJRT_Executable_NumOutputsPJRT_Executable_NumOutputs0x0F866A40
50+0x190PJRT_Executable_SizeOfGeneratedCodeInBytesPJRT_Executable_SizeOfGeneratedCodeInBytes0x0F867240
51+0x198PJRT_Executable_GetCostAnalysisPJRT_Executable_GetCostAnalysis0x0F867B80
52+0x1A0PJRT_Executable_OutputMemoryKindsPJRT_Executable_OutputMemoryKinds0x0F869520
53+0x1A8PJRT_Executable_OptimizedProgramPJRT_Executable_OptimizedProgram0x0F8672A0
54+0x1B0PJRT_Executable_SerializePJRT_Executable_Serialize0x0F86C5A0

LoadedExecutable (slots 55..62)

SlotOffFieldImpl symbol (pjrt::…)Addr
55+0x1B8PJRT_LoadedExecutable_DestroyPJRT_LoadedExecutable_Destroy0x0F866780
56+0x1C0PJRT_LoadedExecutable_GetExecutablePJRT_LoadedExecutable_GetExecutable0x0F86CFA0
57+0x1C8PJRT_LoadedExecutable_AddressableDevicesPJRT_LoadedExecutable_AddressableDevices0x0F866980
58+0x1D0PJRT_LoadedExecutable_DeletePJRT_LoadedExecutable_Delete0x0F869A80
59+0x1D8PJRT_LoadedExecutable_IsDeletedPJRT_LoadedExecutable_IsDeleted0x0F869AE0
60+0x1E0PJRT_LoadedExecutable_ExecutePJRT_LoadedExecutable_Execute0x0F869B40
61+0x1E8PJRT_Executable_DeserializeAndLoadPJRT_Executable_DeserializeAndLoad0x0F86CC40
62+0x1F0PJRT_LoadedExecutable_FingerprintPJRT_LoadedExecutable_Fingerprint0x0F85FBE0

NOTE — slot 62 PJRT_LoadedExecutable_Fingerprint is the deprecated fingerprint entry; v0.103 hosts use slot 99 PJRT_Executable_Fingerprint instead. Both are populated; the deprecated one is kept for older-minor callers.

Buffer (slots 63..81)

SlotOffFieldImpl symbol (pjrt::…)Addr
63+0x1F8PJRT_Buffer_DestroyPJRT_Buffer_Destroy0x0F86D020
64+0x200PJRT_Buffer_ElementTypePJRT_Buffer_ElementType0x0F86D220
65+0x208PJRT_Buffer_DimensionsPJRT_Buffer_Dimensions0x0F86D280
66+0x210PJRT_Buffer_UnpaddedDimensionsPJRT_Buffer_UnpaddedDimensions0x0F86D300
67+0x218PJRT_Buffer_DynamicDimensionIndicesPJRT_Buffer_DynamicDimensionIndices0x0F86D4C0
68+0x220PJRT_Buffer_GetMemoryLayoutPJRT_Buffer_GetMemoryLayout0x0F86D5E0
69+0x228PJRT_Buffer_OnDeviceSizeInBytesPJRT_Buffer_OnDeviceSizeInBytes0x0F86DA80
70+0x230PJRT_Buffer_DevicePJRT_Buffer_Device0x0F86DB40
71+0x238PJRT_Buffer_MemoryPJRT_Buffer_Memory0x0F86DC60
72+0x240PJRT_Buffer_DeletePJRT_Buffer_Delete0x0F86DD80
73+0x248PJRT_Buffer_IsDeletedPJRT_Buffer_IsDeleted0x0F86DDE0
74+0x250PJRT_Buffer_CopyToDevicePJRT_Buffer_CopyToDevice0x0F86E360
75+0x258PJRT_Buffer_ToHostBufferPJRT_Buffer_ToHostBuffer0x0F86E640
76+0x260PJRT_Buffer_IsOnCpuPJRT_Buffer_IsOnCpu0x0F86ECC0
77+0x268PJRT_Buffer_ReadyEventPJRT_Buffer_ReadyEvent0x0F86ED20
78+0x270PJRT_Buffer_UnsafePointerPJRT_Buffer_UnsafePointer0x0F86EE60
79+0x278PJRT_Buffer_IncreaseExternalReferenceCountPJRT_Buffer_IncreaseExternalReferenceCount0x0F86EF20
80+0x280PJRT_Buffer_DecreaseExternalReferenceCountPJRT_Buffer_DecreaseExternalReferenceCount0x0F86F100
81+0x288PJRT_Buffer_OpaqueDeviceMemoryDataPointerPJRT_Buffer_OpaqueDeviceMemoryDataPointer0x0F86F200

CopyToDeviceStream (slots 82..86)

SlotOffFieldImpl symbol (pjrt::…)Addr
82+0x290PJRT_CopyToDeviceStream_DestroyPJRT_CopyToDeviceStream_Destroy0x0F86F5E0
83+0x298PJRT_CopyToDeviceStream_AddChunkPJRT_CopyToDeviceStream_AddChunk0x0F86F660
84+0x2A0PJRT_CopyToDeviceStream_TotalBytesPJRT_CopyToDeviceStream_TotalBytes0x0F86F7E0
85+0x2A8PJRT_CopyToDeviceStream_GranuleSizePJRT_CopyToDeviceStream_GranuleSize0x0F86F840
86+0x2B0PJRT_CopyToDeviceStream_CurrentBytesPJRT_CopyToDeviceStream_CurrentBytes0x0F86F8A0

TopologyDescription / Compile (slots 87..94)

SlotOffFieldImpl symbol (pjrt::…)Addr
87+0x2B8PJRT_TopologyDescription_Createtpu_plugin::PJRT_TopologyDescription_Create0x0E6A9B20
88+0x2C0PJRT_TopologyDescription_DestroyPJRT_TopologyDescription_Destroy0x0F870040
89+0x2C8PJRT_TopologyDescription_PlatformNamePJRT_TopologyDescription_PlatformName0x0F870200
90+0x2D0PJRT_TopologyDescription_PlatformVersionPJRT_TopologyDescription_PlatformVersion0x0F870260
91+0x2D8PJRT_TopologyDescription_GetDeviceDescriptionsPJRT_TopologyDescription_GetDeviceDescriptions0x0F8702C0
92+0x2E0PJRT_TopologyDescription_SerializePJRT_TopologyDescription_Serialize0x0F870320
93+0x2E8PJRT_TopologyDescription_AttributesPJRT_TopologyDescription_Attributes0x0F8705E0
94+0x2F0PJRT_CompilePJRT_Compile0x0F870640

Late-addition cluster (slots 95..139, addition order)

These slots were appended across minors 0.40→0.103, in the order XLA added them — not regrouped by area. The Area column tags each back to its conceptual home.

SlotOffFieldAreaImpl symbol (pjrt::…)Addr
95+0x2F8PJRT_Executable_OutputElementTypesExecutablePJRT_Executable_OutputElementTypes0x0F868560
96+0x300PJRT_Executable_OutputDimensionsExecutablePJRT_Executable_OutputDimensions0x0F8689E0
97+0x308PJRT_Buffer_CopyToMemoryBufferPJRT_Buffer_CopyToMemory0x0F86E500
98+0x310PJRT_Client_CreateViewOfDeviceBufferClientPJRT_Client_CreateViewOfDeviceBuffer0x0F865040
99+0x318PJRT_Executable_FingerprintExecutablePJRT_Executable_Fingerprint0x0F867AC0
100+0x320PJRT_Client_TopologyDescriptionClientPJRT_Client_TopologyDescription0x0F85F560
101+0x328PJRT_Executable_GetCompiledMemoryStatsExecutablePJRT_Executable_GetCompiledMemoryStats0x0F86CAC0
102+0x330PJRT_Memory_Kind_IdMemoryPJRT_Memory_Kind_Id0x0F865F60
103+0x338PJRT_ExecuteContext_CreateExecuteContexttpu_plugin::PJRT_ExecuteContext_Create0x0E6A9A80
104+0x340PJRT_ExecuteContext_DestroyExecuteContextPJRT_ExecuteContext_Destroy0x0F866120
105+0x348PJRT_Buffer_CopyRawToHostBufferPJRT_Buffer_CopyRawToHost0x0F86DE40
106+0x350PJRT_AsyncHostToDeviceTransferManager_DestroyAsyncH2DPJRT_AsyncHostToDeviceTransferManager_Destroy0x0F860620
107+0x358PJRT_AsyncHostToDeviceTransferManager_TransferDataAsyncH2DPJRT_AsyncHostToDeviceTransferManager_TransferData0x0F8606A0
108+0x360PJRT_Client_CreateBuffersForAsyncHostToDeviceClientPJRT_Client_CreateBuffersForAsyncHostToDevice0x0F85FCC0
109+0x368PJRT_AsyncHostToDeviceTransferManager_RetrieveBufferAsyncH2DPJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer0x0F8611A0
110+0x370PJRT_AsyncHostToDeviceTransferManager_DeviceAsyncH2DPJRT_AsyncHostToDeviceTransferManager_Device0x0F861260
111+0x378PJRT_AsyncHostToDeviceTransferManager_BufferCountAsyncH2DPJRT_AsyncHostToDeviceTransferManager_BufferCount0x0F861380
112+0x380PJRT_AsyncHostToDeviceTransferManager_BufferSizeAsyncH2DPJRT_AsyncHostToDeviceTransferManager_BufferSize0x0F8613E0
113+0x388PJRT_AsyncHostToDeviceTransferManager_SetBufferErrorAsyncH2DPJRT_AsyncHostToDeviceTransferManager_SetBufferError0x0F861440
114+0x390PJRT_AsyncHostToDeviceTransferManager_AddMetadataAsyncH2DPJRT_AsyncHostToDeviceTransferManager_AddMetadata0x0F861500
115+0x398PJRT_Client_DmaMapClientPJRT_Client_DmaMap0x0F860500
116+0x3A0PJRT_Client_DmaUnmapClientPJRT_Client_DmaUnmap0x0F860580
117+0x3A8PJRT_Client_CreateUninitializedBufferClientPJRT_Client_CreateUninitializedBuffer0x0F863660
118+0x3B0PJRT_Client_UpdateGlobalProcessInfoClientPJRT_Client_UpdateGlobalProcessInfo0x0F85F940
119+0x3B8PJRT_TopologyDescription_DeserializeTopologyPJRT_TopologyDescription_Deserialize0x0F870B80
120+0x3C0PJRT_Client_CreateAliasBufferClientPJRT_Client_CreateAliasBuffer0x0F863D60
121+0x3C8PJRT_Client_FulfillAliasBufferClientPJRT_Client_FulfillAliasBuffer0x0F8641A0
122+0x3D0PJRT_LoadedExecutable_GetDeviceAssignmentLoadedExecPJRT_LoadedExecutable_GetDeviceAssignment0x0F870EA0
123+0x3D8PJRT_Client_CreateErrorBufferClientPJRT_Client_CreateErrorBuffer0x0F8638A0
124+0x3E0PJRT_AsyncHostToDeviceTransferManager_TransferLiteralAsyncH2DPJRT_AsyncHostToDeviceTransferManager_TransferLiteral0x0F860960
125+0x3E8PJRT_Buffer_CopyRawToHostFutureBufferPJRT_Buffer_CopyRawToHostFuture0x0F86DFE0
126+0x3F0PJRT_Device_PoisonExecutionDevicePJRT_Device_PoisonExecution0x0F860D00
127+0x3F8PJRT_Device_CreateAsyncTrackingEventDevicePJRT_Device_CreateAsyncTrackingEvent0x0F861080
128+0x400PJRT_AsyncTrackingEvent_DestroyAsyncTrackingPJRT_AsyncTrackingEvent_Destroy0x0F861120
129+0x408PJRT_Executable_GetCompileOptionsExecutablePJRT_Executable_GetCompileOptions0x0F86C6E0
130+0x410PJRT_Buffer_DonateWithControlDependencyBufferPJRT_Buffer_DonateWithControlDependency0x0F86F2E0
131+0x418PJRT_Event_CreateEventPJRT_Event_Create0x0F86FE00
132+0x420PJRT_Event_SetEventPJRT_Event_Set0x0F86FFA0
133+0x428PJRT_Device_GetAttributesDevicePJRT_Device_GetAttributes0x0F873AC0
134+0x430PJRT_Client_LoadClientPJRT_Client_Load0x0F8627E0
135+0x438PJRT_LoadedExecutable_AddressableDeviceLogicalIdsLoadedExecPJRT_LoadedExecutable_AddressableDeviceLogicalIds0x0F8669E0
136+0x440PJRT_Buffer_BitcastBufferPJRT_Buffer_Bitcast0x0F862D00
137+0x448PJRT_Error_ForEachPayloadErrorPJRT_Error_ForEachPayload0x0F85EFC0
138+0x450PJRT_TopologyDescription_FingerprintTopologyPJRT_TopologyDescription_Fingerprint0x0F870520
139+0x458PJRT_Executable_ParameterMemoryKindsExecutablePJRT_Executable_ParameterMemoryKinds0x0F868FC0

Populated vs Injected Map

Purpose

Every one of the 135 function-pointer slots is populated; none is null in this build. But the slots split into two provenance classes, and the split is the heart of the plugin pattern: 130 slots are generic XLA C-API wrappers baked at link time, and 5 are injected at runtime by CreatePjrtApi's caller.

The five injection points

The only slots CreatePjrtApi does not hard-code are its a2..a7 parameters. GetTpuPjrtApi passes the TPU-specific implementations into exactly these positions; everything else is a fixed relocation.

SlotFieldParamInjected implAddrWhy TPU-specialized
15PJRT_Client_Createa2tpu_plugin::PJRT_Client_Create0x0E6A8840Constructs the TPU PjRtClient (device discovery, runtime init)
103PJRT_ExecuteContext_Createa3tpu_plugin::PJRT_ExecuteContext_Create0x0E6A9A80TPU-specific execute context
87PJRT_TopologyDescription_Createa4tpu_plugin::PJRT_TopologyDescription_Create0x0E6A9B20TPU pod/slice topology
8PJRT_Plugin_Initializea5tpu_plugin::PJRT_Plugin_Initialize0x0E6A9D00TPU runtime bring-up
1extension_starta6(extension chain head)0x224C3F68TPU extension set
9PJRT_Plugin_Attributesa7PJRT_Plugin_Attributes_Xla0x0F85F080Plugin attribute table (not in tpu_plugin::, but injected)

The four tpu_plugin::-namespaced impls (slots 8, 15, 87, 103) are the genuine TPU specializations. Slot 9 is injected but resolves to the stock pjrt::PJRT_Plugin_Attributes_Xla — the caller passes the XLA implementation, not a TPU override. Slot 1 is the extension-chain head, also injected (see extension-chain.md). The remaining 130 function pointers are compile-time-fixed pjrt::PJRT_* symbols from XLA's pjrt_c_api_wrapper_impl.cc.

QUIRK — "TPU-specialized" is a much smaller set than a reimplementer might expect. Only four slots carry TPU-specific code; the other 131 function pointers are byte-for-byte the same generic XLA wrappers any PJRT plugin would ship. The TPU-ness lives almost entirely behind Client_Create (which builds the PjRtClient the generic wrappers then dispatch through) and in the extension chain — not in the per-call wrappers.


Backward-Compatibility Guard

Purpose

A host compiled against an older PJRT minor passes a smaller _Args struct than this v0.103 plugin expects. The plugin must accept it and read only the fields the caller actually provided. The mechanism is a per-slot size check at the top of every wrapper.

Algorithm

function PJRT_Error_Destroy(args):                  // 0xF85ECE0
    rc = pjrt::ActualStructSizeIsGreaterOrEqual(     // 0xF8A4EC0
            "PJRT_Error_Destroy_Args",                // API name (for the log line)
            23,                                       // min accepted struct_size
            24,                                       // current (v0.103) struct_size
            args->struct_size);                       // what the caller passed
    if (rc != ok):
        LogMessage(".../pjrt_c_api_wrapper_impl.cc", 545) << status;  // diagnostic
    if (args->struct_size < 0x18): return;            // too small: bail
    // ... read only fields within args->struct_size ...

Every wrapper's first instruction is this ActualStructSizeIsGreaterOrEqual call. The hard-coded (min, current) pair bounds the per-_Args versions the plugin accepts; for PJRT_Error_Destroy_Args it is (23, 24). Fields beyond the caller's struct_size are never read. This is what lets one v0.103 plugin serve a range of host minors without per-version trampolines — the size prefix on each args struct, not a version branch, carries the compatibility logic.

NOTE — the source-path string third_party/tensorflow/compiler/xla/pjrt/c/pjrt_c_api_wrapper_impl.cc is baked into the diagnostic at 0xF85ECE0. It confirms the 130 fixed wrappers are XLA's generic pjrt_c_api_wrapper_impl.cc code, identical across PJRT plugins — the binary's own provenance string, not an external claim.


Hot-Path Slots

The slots a JAX/PyTorch-XLA step touches most. This ranking is a semantic estimate — no call-count instrumentation was run, so confidence is LOW on the ordering, though the slot identities are CERTAIN.

RankSlotFieldPer-step role
160PJRT_LoadedExecutable_ExecuteProgram launch onto TPU; the primary throughput slot
211PJRT_Event_IsReadyPolled in tight loops for async completion
313PJRT_Event_AwaitSynchronous wait on an async result
414PJRT_Event_OnReadyCompletion callback registration, per buffer
510PJRT_Event_DestroyMass destroy after batch await
663PJRT_Buffer_DestroyPer-buffer release; bursts on graph-output cleanup
727PJRT_Client_BufferFromHostBufferHost→TPU input upload, per parameter per step
875PJRT_Buffer_ToHostBufferTPU→host downloads (metrics, probes)
981PJRT_Buffer_OpaqueDeviceMemoryDataPointerZero-copy DMA pointer fetch for foreign-lib interop
1079/80PJRT_Buffer_{In,De}creaseExternalReferenceCountDLPack-style refcount bump/drop

GOTCHA — the Top-20 ranking in the source findings is an ordering heuristic, not measured data. A reimplementer optimizing the dispatch path should treat slot 60 (Execute) and the Event slots (10..14) as certainly hot, but should not trust the precise rank order of the cooler slots without instrumentation.


Cross-References

  • PJRT Overview — where this table sits in the plugin lifecycle; the GetPjrtApiGetTpuPjrtApiCreatePjrtApi chain in context
  • Extension Chain — the linked list dangling off extension_start (slot 1); 17 extensions, type-IDs, per-extension layouts
  • Client and Device — slots 15..44: Client_Create (the key TPU injection) plus device/memory queries
  • Buffer and Memory — slots 63..81, 97, 105, 125, 130, 136: buffer lifecycle, transfers, DMA
  • Executable Execution — slots 45..62, 122, 135 and slot 60 LoadedExecutable_Execute, the hot path
  • Events and Async — slots 10..14, 131..132: the PJRT_Event model
  • CallbacksOnReady / async-tracking-event slots and the callback extension
  • Collectives Communicator — the collectives extension surface reached via the chain, not via these slots