TpuCompiler Roster
All addresses on this page apply to
libtpu.sofrom thelibtpu-0.0.40-cp314wheel (buildlibtpu_lts_20260413_b_RC00, build-id md589edbbe81c5b328a958fe628a9f2207d, ELF x86-64 DYN, not stripped; demangled C++ names and IDA-recovered C names quoted verbatim)..textVMA equals file offset. Other versions will differ.
Abstract
TpuCompiler_* is the C-ABI face of the StreamExecutor TPU compiler — the flat extern "C" surface through which the open-source xla::TpuCompiler shim (built into TensorFlow/XLA, not into this binary) drives the closed libtpu.so compilation pipeline without sharing a single C++ type across the .so boundary. Where the host-side shim has methods named RunHloPasses, RunBackend, Compile, ShapeSize, and DefaultDeviceShapeRepresentation, each forwards through a function-pointer slot of TfTpu_ExecutorApiFn into one of the seven TpuCompiler_* free functions inventoried here. Every one of them takes only opaque handles plus a serialized-proto blob and a status-out object, reconstructs the rich xla:: objects inside libtpu's own statically-linked XLA, calls the real compiler vtable, and re-serializes the result. The compiler object behind the handle is a xla::jellyfish::DeepseaCompiler — the TPU ("Deepsea"/Jellyfish) subclass of xla::Compiler.
A second, lexically adjacent cluster — TpuCompile_* (six functions) — is not the same surface and is a frequent source of confusion. TpuCompiler_* is the SE class-method C-API reached through ExecutorApiFn; TpuCompile_* is the standalone support layer for the TensorFlow TPUCompileOp kernel: a one-shot CompileAndBuild that runs tensorflow::tpu::TpuCompileOpKernelCommon end-to-end and emits XLA_TpuProgram handles, plus compilation-cache-key construction, guaranteed-const fingerprinting, and two policy predicates. The two clusters share the tpu_executor_c_api.cc / tpu_util_c_api.cc translation-unit family and the Compile verb, but they sit at different layers: TpuCompiler_* is one method of the compiler class, TpuCompile_CompileAndBuild is the whole TF compile op.
This page owns the per-function roster and impl-symbol map for both clusters. The ABI seam itself — why a flat C interface exists, the *ApiFn() accessor pattern, and the opaque-handle / ApiConverter::ToC/FromC convention — is established once on the shim overview and not restated here. The HLO pass schedule these functions invoke is owned by the HLO Pass Registry; the XLA_TpuProgram handle they produce is owned by the TpuProgram Roster.
For reimplementation, the contract is:
- The seven
TpuCompiler_*signatures and dispatch — which proto crosses each (HloModuleProto,HloModuleGroupProto), which compiler vtable slot each indexes (+24RunHloPasses,+32RunBackend,+96ShapeSize,+104DefaultDeviceShapeRepresentation), and the handle's double-pointer shape. - The proto-in / proto-out marshalling discipline —
DeserializeProtoin,CreateFromPrototo build thexla::HloModule, run,ToProto+SerializePartialToArrayout into a caller-freed byte buffer. - The
DeepseaCompilerobject model — the handle isCompiler**; the real object is axla::jellyfish::DeepseaCompiler;RegisterAllPhasesis the one-time phase-registration hook that the open-sourceTpuCompilerconstructor calls (the "Initialize" step). - The
TpuCompile_*support cluster — whatCompileAndBuildruns, the cache-key/fingerprint helpers, and the abort-suppression policy predicates.
| Handle type | xla::jellyfish::DeepseaCompiler** (double-pointer; outer = 8-byte box, inner = the compiler) |
| Constructor | TpuCompiler_New @ 0xeabc4a0 → operator new(8) ×2, DeepseaCompiler::DeepseaCompiler |
| Destructor | TpuCompiler_Free @ 0xeabc4e0 → virtual dtor slot +8, then free of the box |
| Compile dispatch | xla::Compiler vtable: RunHloPasses +24, RunBackend +32, Compile via xla::Compiler::Compile, ShapeSize +96, DefaultDeviceShapeRepresentation +104 |
| Wire format in | TpuSerializedProto (ptr+len) → stream_executor::tpu::DeserializeProto<…> |
| Status channel | trailing absl::status_internal::StatusRep** out-param (the SE "ok?-code-msg" idiom) |
| C-ABI source TU | learning/45eac/tfrc/executor/stream_executor/tpu_executor_c_api.cc (compiler), tpu_util_c_api.cc (TpuCompile helpers) |
| Phase-registration hook | xla::TpuCompiler::RegisterAllPhases() @ 0xf849ec0 |
| Roster size | 7 TpuCompiler_* + 6 TpuCompile_* = 13 free functions |
| Evidence grade | Reimplementation-grade / byte-confirmed against IDA decompile |
NOTE — there is no
TpuCompiler_Initializesymbol in this build. The "initialize" step a reimplementer expects is split in two: object construction isTpuCompiler_New, and one-time phase-table registration isxla::TpuCompiler::RegisterAllPhases(0xf849ec0), called from the host-sidexla::TpuCompilerconstructor — not from the C-ABI. Likewise there is noTpuCompiler_*ToTpuProgram; producing a serializedXLA_TpuProgramis the job ofTpuCompile_CompileAndBuildin the second cluster, whileTpuCompiler_RunBackendreturns an in-processxla::Executable*handle rather than a serialized program.
1. Lifecycle — New / Free
Purpose
Construct and destroy the opaque compiler handle. The host-side xla::TpuCompiler shim holds the handle returned by New for its lifetime and releases it through Free. The compiler is stateless across calls — RunHloPasses / RunBackend / Compile each take the handle and a fresh module — so a single handle services the whole process.
Algorithm
// TpuCompiler_New 0xeabc4a0
DeepseaCompiler** TpuCompiler_New():
box = operator new(8) // the handle the host receives
inner = operator new(8) // the actual compiler object (8-byte: just a vptr)
DeepseaCompiler::DeepseaCompiler(inner) // base xla::Compiler ctor; sets vtable
*box = inner
return box // host holds a Compiler**
// TpuCompiler_Free 0xeabc4e0
void TpuCompiler_Free(Compiler** box):
if *box:
(*box)->vtable[+8](*box) // virtual destructor — DeepseaCompiler::~DeepseaCompiler
free(box) // release the outer box
QUIRK — the handle is a double pointer.
Newallocates an 8-byte box, allocates the 8-byte compiler, and stores the compiler inside the box; every otherTpuCompiler_*function dereferences twice (**a1) to reach the vtable. A reimplementer who hands back the compiler pointer directly will mis-align every downstream(*(_QWORD *)*a1 + off)vtable index. The compiler object itself is only 8 bytes — it is pure-virtual behavior over a vtable, with no instance state — soDeepseaCompilercarries its configuration through theCompileOptionspassed per call, not through fields.
Function Map
| Function | Address | Role |
|---|---|---|
TpuCompiler_New | 0xeabc4a0 | Allocate box + DeepseaCompiler, return Compiler** |
TpuCompiler_Free | 0xeabc4e0 | Virtual-destruct the compiler, free the box |
Considerations
New takes no arguments and cannot fail in the observed body (no status-out; raw operator new will std::terminate on OOM rather than return null). Free tolerates a null inner pointer (if (*ptr)) but not a null box. Registration of the phase table — the part a reimplementer might expect inside New — happens elsewhere: see §5.
2. HLO Passes & Backend — RunHloPasses / RunBackend
Purpose
The two halves of XLA's two-phase compile. RunHloPasses runs the target-independent-then-TPU HLO optimization pipeline over a module and returns the optimized module (re-serialized). RunBackend consumes an already-optimized module and lowers it to a backend xla::Executable, returned as an in-process handle. The host shim calls them back-to-back when it wants to interpose between the two phases (e.g. to cache the optimized HLO); Compile (§3) is the fused form.
Entry Point
xla::TpuCompiler::RunHloPasses(...) (host-side shim, not in this binary)
└─ ExecutorApiFn()->slot[RunHloPasses] 0x20819360 = ExecutorApiFn() accessor
(returns &executor_api_fn @ 0x2258c818; the RunHloPasses fn-ptr lives in that struct)
└─ TpuCompiler_RunHloPasses 0xeabcd80 — C-ABI impl
└─ DeepseaCompiler vtable[+24] xla::Compiler::RunHloPasses override
RunBackend is identical with slot/offset +32.
Algorithm
// TpuCompiler_RunHloPasses 0xeabcd80
// args: a1=Compiler** handle, a2=TpuSerializedProto* (HloModuleProto in),
// a3=XLA_HloModuleConfig*, a4=SE_StreamExecutor* (may be null),
// a5=TpuSerializedProto* (out), a6=StatusRep** (status out)
int TpuCompiler_RunHloPasses(a1, a2, cfg, exec, out, status):
proto = DeserializeProto<HloModuleProto, TpuSerializedProto>(a2) // ptr+len → proto
config = ApiConverter::FromC(cfg) // XLA_HloModuleConfig → xla::HloModuleConfig
module = HloModule::CreateFromProto(proto, config, /*flags*/1,0,1,0)
if module is error: write status; return // StatusOr unwrap
// build a CompileOptions whose device allocator wraps the SE executor, if given
opts = {}
if exec && exec->allocator:
opts.device_allocator = new WrapperDeviceMemoryAllocator{ // vtable off_21616EF8
platform = GetUnderlyingDeepseaPlatform(), // Meyers singleton
executor = exec }
opts.layout_canonicalization_callback = empty // default-constructed std::function
optimized = (**a1).vtable[+24](compiler, module, exec, opts) // xla::Compiler::RunHloPasses
if optimized is error: write status; goto cleanup
// re-serialize the optimized module back across the seam
p = HloModuleProto(); optimized->ToProto(&p)
n = p.ByteSizeLong()
buf = operator new(n)
if !p.SerializePartialToArray(buf, n): LOG(FATAL) "proto_helper.h:45"
out->ptr = buf; out->len = n // caller frees buf
cleanup:
destroy module/optimized/opts; return
RunBackend (0xeabd100) is structurally the same up to the dispatch, then diverges at the result:
// TpuCompiler_RunBackend 0xeabd100 (vtable +32)
executable = (**a1).vtable[+32](compiler, module, exec, opts) // xla::Compiler::RunBackend → xla::Executable*
if ok:
h = operator new(8); *h = executable // box the Executable*
*out = h // out is an SE_Executable** handle, NOT a proto
QUIRK —
RunHloPassesreturns a re-serializedHloModuleProto(a byte buffer the caller frees), butRunBackendreturns a livexla::Executable*boxed in an 8-byte allocation — the backend result never crosses the seam as a proto. The asymmetry is deliberate: optimized HLO is portable and cacheable, so it is serialized; anExecutableholds device-resident state and is consumed in-process, so only its pointer (an opaque TpuExecutable handle) is handed back.
GOTCHA — both functions take an optional
SE_StreamExecutor*(a4). When non-null, the impl lazily constructs aWrapperDeviceMemoryAllocator(vtableoff_21616EF8,Allocate/Deallocate/GetStreamover astream_executor::DeviceMemoryAllocator) bound to the process-wideDeepseaPlatformsingleton (GetUnderlyingDeepseaPlatform, guarded by a_cxa_guard) and threads it intoCompileOptions.device_allocator. A reimplementer that ignoresa4will compile without a device allocator and silently disable allocation-aware passes (e.g. memory-space assignment that needs real HBM sizing).
Function Map
| Function | Address | Vtable slot | Result form |
|---|---|---|---|
TpuCompiler_RunHloPasses | 0xeabcd80 | +24 | re-serialized HloModuleProto (byte buf) |
TpuCompiler_RunBackend | 0xeabd100 | +32 | boxed xla::Executable* handle |
Considerations
The CreateFromProto flags (1,0,1,0) request prohibit-ill-formed / no-verifier-on-the-cheap-path behavior (exact flag semantics LOW — derived from arg positions, not a verifier trace). The LOG(FATAL) on a failed SerializePartialToArray (proto_helper.h:45) means a reimplementation must guarantee the buffer is sized by ByteSizeLong() before serializing — there is no error return for an undersized buffer, only process abort.
3. Compile — the Fused Entry
Purpose
TpuCompiler_Compile is the single-call form: it takes an HloModuleGroupProto, builds the module, and runs the full optimize-then-lower pipeline via xla::Compiler::Compile, returning one boxed Executable* per module into a caller-provided array. It is the path the PJRT adapter and most callers use when they do not need to interpose between HLO passes and backend.
Algorithm
// TpuCompiler_Compile 0xeabc520
// a1=Compiler** handle, group proto in, a5/a3 = module count guard,
// per-executor list in, a6=array-of-StreamExecutor-lists, a7=out Executable* array, a8=StatusRep**
void TpuCompiler_Compile(a1, group, ..., out_array, status):
grp = DeserializeProto<HloModuleGroupProto, TpuSerializedProto>(...)
if module_count > 1: // a5<=1 && v54<2 guard
*status = MakeError("Can not compile multiple HLO modules at once.") // c_api.cc:1040
return
config = ApiConverter::FromC(...)
if grp.modules_size() <= 0: LogIndexOutOfBoundsAndAbort() // proto bounds check
module = HloModule::CreateFromProto(grp.modules(0), config, 1,0,1,0)
if error: write status; cleanup
// install the layout-canonicalization callback (TpuCompiler_Compile::$_0) on the module config
module.config.set_layout_canonicalization_callback(&$_0) // module+3864/+3872 fn-ptr pair
// flatten the per-module vector<StreamExecutor*> (loop-unrolled by 8)
execs = copy_stream_executors(...)
opts = CompileOptions{ device_allocator = WrapperDeviceMemoryAllocator(platform, execs) if present }
result = xla::Compiler::Compile(compiler, module, execs, opts) // StatusOr<vector<unique_ptr<Executable>>>
if ok:
for i in result: // box each Executable*, transfer ownership
h = operator new(8); *h = result[i].release()
out_array[i] = h
else:
write status
free temporaries; cleanup
Function Map
| Function | Address | Proto in | Result form |
|---|---|---|---|
TpuCompiler_Compile | 0xeabc520 | HloModuleGroupProto | array of boxed Executable* |
Considerations
GOTCHA — the single-call entry takes an
HloModuleGroupProto(a group) but rejects any group with more than one module:"Can not compile multiple HLO modules at once."attpu_executor_c_api.cc:1040. The group-shaped input is a vestige of the genericxla::Compiler::Compile(HloModuleGroup, ...)signature; the TPU C-API supports exactly one module per call. A reimplementer must keep the group container but enforce the size-1 invariant, and must still bounds-checkmodules(0)(the impl callsLogIndexOutOfBoundsAndAborton an empty group).
QUIRK —
Compileinstalls a layout-canonicalization callback (TpuCompiler_Compile::$_0) directly onto the module config (module+3864/+3872) before dispatch, whereasRunHloPasses/RunBackendinstall an empty callback. This is how the fused path lets the backend re-canonicalize entry-computation layouts that the split path leaves to the host. ReproducingCompileas "RunHloPasses then RunBackend" without this callback will diverge on layout-sensitive modules.
The per-module executor list is copied with an 8-way unrolled loop and the source pointers are double-dereferenced (**(_QWORD**)(base + 8*i)) — the input is a vector<vector<StreamExecutor*>> flattened to one inner list, since only one module is allowed.
4. Shape Queries — ShapeSize / DefaultDeviceShapeRepresentation
Purpose
Two pure functions of the compiler that the host needs for buffer sizing and layout, independent of any compile. ShapeSize returns the byte size a shape occupies on device; DefaultDeviceShapeRepresentation maps a host (logical) shape to the device (physical, tiled/padded) shape the TPU actually stores.
Algorithm
// TpuCompiler_ShapeSize 0xeabd400
int64 TpuCompiler_ShapeSize(Compiler** a1, XLA_Shape* a2):
shape = ApiConverter::FromC(a2) // XLA_Shape → xla::Shape
fn = (**a1).vtable[+96]() // returns a ShapeSizeFunction object (closure)
size = fn.call(shape) // invoke the size functor
fn.dtor() // release the functor
return size
// TpuCompiler_DefaultDeviceShapeRepresentation 0xeabd480
void TpuCompiler_DefaultDeviceShapeRepresentation(Compiler** a1, XLA_Shape* in, XLA_Shape* out):
host_shape = ApiConverter::FromC(in)
dev_shape = (**a1).vtable[+104](compiler, host_shape) // xla::Compiler::DefaultDeviceShapeRepresentation
ApiConverter::ToC(dev_shape, out) // xla::Shape → XLA_Shape (caller owns 'out')
NOTE —
ShapeSizedoes not call a vtable method directly; slot+96returns a functor (astd::function-likeShapeSizeFunction, the 6-qwordv6block: vtable, state, invoke-ptr, deleter), which the C-ABI then invokes and destroys. This indirection lets the compiler expose its size policy as a first-class callable to the host. A reimplementer must invoke the returned functor and run its deleter (slot+8of the functor's vtable), not assume+96is the sizer itself.
Function Map
| Function | Address | Vtable slot | Returns |
|---|---|---|---|
TpuCompiler_ShapeSize | 0xeabd400 | +96 (functor factory) | int64 byte size |
TpuCompiler_DefaultDeviceShapeRepresentation | 0xeabd480 | +104 | device XLA_Shape (out-param) |
Considerations
DefaultDeviceShapeRepresentation fills a caller-provided XLA_Shape via ApiConverter::ToC, so the caller owns the result and must pair it with ApiConverter::Destroy(XLA_Shape*) (the interior-free overload). The device shape it returns is the tiled/padded representation the TPU uses for the input host shape — the mechanism by which XLA learns the TPU's (8, 128)-style tiling for a buffer without libtpu exposing the layout assignment internals.
5. Phase Registration — the "Initialize" Step
Purpose
The open-source xla::TpuCompiler constructor performs one-time registration of every TPU compilation phase into the global phase registry. In this binary that hook is xla::TpuCompiler::RegisterAllPhases() @ 0xf849ec0. It is not part of the TpuCompiler_* C-ABI — it is a C++ method compiled into libtpu because XLA is statically linked here — but a reimplementer building the host side must call it exactly once before any compile, which is why it is the de-facto "Initialize."
Considerations
RegisterAllPhases is also the bridge to the PJRT PhaseCompile extension: the same phase set that RegisterAllPhases populates is what the PJRT PhaseCompile extension enumerates and drives phase-by-phase. The extension's wrapping of this registration is owned by PJRT PhaseCompile Extension; the concrete pass schedule the phases run is owned by the HLO Pass Registry. This page records only that the registration entry point exists and is the missing "Initialize" half of TpuCompiler_New.
| Function | Address | Role |
|---|---|---|
xla::TpuCompiler::RegisterAllPhases | 0xf849ec0 | one-time phase-registry population (host-side ctor hook) |
6. The TpuCompile_* Support Cluster
Purpose
A separate six-function cluster that backs the TensorFlow TPUCompileOp kernel rather than the SE compiler class. Its centerpiece, CompileAndBuild, runs an entire TF compile op — metadata proto in, tensorflow::tpu::TpuCompileOpKernelCommon driven to completion, XLA_TpuProgram handles out — while the other five are small utilities for the TF compilation cache and abort policy. These live in tpu_util_c_api.cc / the compilation-cache-key TU, not tpu_executor_c_api.cc.
Algorithm — the standalone compile
// TpuCompile_CompileAndBuild 0xe8bc1e0
// a1=TPUCompileMetadataProto (serialized), a2=mlir/computation input,
// a4/a5 = out program list, a6=StatusRep**
int TpuCompile_CompileAndBuild(meta, input, ..., out_programs, status):
metadata = parse TPUCompileMetadataProto(meta)
kernel = TpuCompileOpKernelCommon(metadata, ...) // the TF compile-op core
platform = GetRegisteredDeepseaPlatform() // deepsea::executor singleton
topology = platform->GetTopology()
result = kernel.Compile(... topology ...) // -> CompiledProgramsAndMetadatas
if ok:
for each compiled program in result:
out_programs[i] = (XLA_TpuProgram*) program // hand back program handles
else:
*status = result.status()
CompileAndBuild is the one TpuCompile_* function that produces a TpuProgram — it bridges the TF op layer to the serialized XLA_TpuProgram handle, which is where the *ToTpuProgram behavior a reimplementer expects actually lives.
The cache / fingerprint / policy helpers
// TpuCompile_CreateGuaranteedConstFingerprint 0xf6a2040
uint64 CreateGuaranteedConstFingerprint(uint64 seed, const char* data, int64 len):
if len < 0: BUG() // size sanity → trap
return FingerprintCat2011(seed, Fingerprint2011(data, len)) // 64-bit non-crypto fp
// TpuCompile_DestroyCompilationCacheKey 0xf6a2e60
void DestroyCompilationCacheKey(void* key, void* prefix):
if key: free(key) // two heap strings owned by the key
if prefix: free(prefix)
// TpuCompile_IsTpuCompilationEnabled 0xf6a1b40
bool IsTpuCompilationEnabled(): return true // constant in this build
// TpuCompile_ShouldTpuCompileOpIgnoreCancellation 0xf6a1b60
bool ShouldTpuCompileOpIgnoreCancellation():
if !TpuCompilationCancellationTerminatesProcess():
LOG(WARNING) "...process abort is suppressed... only meant for tests b/79359718"
return true
if GetCommandLineOption("xla_jf_exit_process_on_compilation_success") == "true":
LOG(WARNING) "...abort suppressed when --XLA_jf_exit_process_on_compilation_success... b/72471718"
return true
return false
QUIRK —
TpuCompile_IsTpuCompilationEnabledis a hard-codedreturn 1in this build. The function exists because the open-source TF op calls it as a runtime gate, but the shipped libtpu always reports compilation enabled — there is no flag that turns it off. A reimplementer can treat the gate as a no-op for this version, but must keep the symbol because the host op binds to it by name through the API table.
Function Map
| Function | Address | Role | Source TU |
|---|---|---|---|
TpuCompile_CompileAndBuild | 0xe8bc1e0 | Run the whole TpuCompileOpKernelCommon; emit XLA_TpuProgram handles | tpu_util_c_api.cc (HIGH) |
TpuCompile_CreateCompilationCacheKey | 0xf6a2080 | Build the TF compilation-cache key (config + fingerprints) | cache-key TU |
TpuCompile_CreateGuaranteedConstFingerprint | 0xf6a2040 | FingerprintCat2011(seed, Fingerprint2011(data,len)) | cache-key TU |
TpuCompile_DestroyCompilationCacheKey | 0xf6a2e60 | free the two heap strings inside a cache key | cache-key TU |
TpuCompile_IsTpuCompilationEnabled | 0xf6a1b40 | Constant true runtime gate | tpu_util_c_api.cc |
TpuCompile_ShouldTpuCompileOpIgnoreCancellation | 0xf6a1b60 | Abort-suppression policy for cancelled compiles (test-only) | tpu_util_c_api.cc |
Considerations
The cache-key cluster is the data path for XLA's TPU compilation cache: CreateCompilationCacheKey assembles a key string from the metadata and the guaranteed-const fingerprint, CreateGuaranteedConstFingerprint produces the 64-bit fingerprint over const inputs, and DestroyCompilationCacheKey frees the two heap buffers the key owns (the key proper and a prefix). Fingerprint2011 is the non-cryptographic 64-bit hash; a reimplementer must reproduce both Fingerprint2011 and the FingerprintCat2011 combiner bit-exactly or cache keys will not collide across the host/plugin split.
GOTCHA — the two abort-suppression branches in
ShouldTpuCompileOpIgnoreCancellationare explicitly test-only (cited bug refsb/79359718,b/72471718) and both log a warning before returningtrue. In production both predicates are false, so the function returnsfalseand a cancelled compile aborts the process. A reimplementation that returnstrueby default to "be safe" inverts the intended fail-fast behavior of the TPU compile op.
Related Components
| Name | Relationship |
|---|---|
xla::jellyfish::DeepseaCompiler | the real xla::Compiler subclass behind every TpuCompiler_* handle |
xla::TpuCompiler (host-side shim) | the open-source class whose methods forward into this roster; ctor calls RegisterAllPhases |
xla::Compiler::Compile / RunHloPasses / RunBackend | the vtable methods (+24/+32) the C-ABI dispatches into |
ApiConverter::ToC / FromC | marshals XLA_Shape / XLA_HloModuleConfig across the seam (see overview) |
stream_executor::tpu::DeserializeProto<…> | unpacks the TpuSerializedProto (ptr+len) blobs the compile entries take |
deepsea::executor::DeepseaPlatform | the process-wide platform singleton bound into CompileOptions.device_allocator |
WrapperDeviceMemoryAllocator | the DeviceMemoryAllocator subclass (vtable off_21616EF8) the compile entries wrap around an SE_StreamExecutor* |
tensorflow::tpu::TpuCompileOpKernelCommon | the TF op core that TpuCompile_CompileAndBuild drives |
XLA_TpuProgram | the serialized-program handle CompileAndBuild emits and RunBackend does not |
Cross-References
- The TfTpu C-API Shim — the ABI seam, the
*ApiFn()accessor pattern, and the opaque-handle /ApiConverterconvention this roster relies on - TpuProgram Roster — the
XLA_TpuProgramserialized-program handle thatTpuCompile_CompileAndBuildproduces - TpuExecutable Roster — the boxed
xla::Executable*handle thatTpuCompiler_RunBackend/Compilereturn - HLO Pass Registry — the concrete pass schedule that
RunHloPassesexecutes inside theDeepseaCompilervtable - PJRT PhaseCompile Extension — the PJRT extension that wraps
xla::TpuCompiler::RegisterAllPhasesand drives compilation phase-by-phase