Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

HAL Factory Override Matrix

Addresses apply to libtpu.so from the libtpu-0.0.40-cp314 wheel. Other versions differ.

Abstract

The TPU HAL factory framework dispatches in two stages, both keyed on TpuVersion. The first stage selects a factory object from the registry; the second stage runs that factory's CreateImpl to build the per-family impl object, which then carries its own virtual-method specialization. This page tabulates the override matrix at both layers: a five-slot factory vtable (uniform shape, two family-specific slots) and a twenty-three-slot impl vtable (mostly inherited, three to eight per-family overrides). Both matrices are small enough to show in full; the interesting axes are factory class (Jxc / Pxc / Vxc) and vtable slot.

The framework is a textbook abstract factory with a template-method twist. TpuHalFactory is the pure-virtual interface; TpuHalHardwareFactoryBase is the concrete intermediate that implements Create and CanCreate once and leaves CreateImpl as the single per-family hook. Because the base does all the orchestration, the leaf factories override only their destructor and CreateImpl — three of their five vtable slots point at inherited base code. The real per-generation logic lives one level down, in the impl object the factory allocates, and even there the override set is dominated by destructors, the two pure-virtual hooks every concrete subclass must fill (Type, CreateAndInitializeChips), and a small tail of teardown/configuration slots.

The dispatch dimension that matters for a reimplementer is therefore not "which method does family X override" in isolation, but the chain: TpuVersion → registry → factory instance → CreateImpl → impl class → impl vtable. The factory selection is data (a registry lookup); the impl specialization is C++ virtual dispatch. There is no third internal switch on TpuVersion inside any impl method — VXC serves three codenames through one identical code path.

For reimplementation, the contract is:

  • The factory vtable shape (5 slots) and which two are family-specific.
  • The TpuHal base vtable (23 slots), including the two __cxa_pure_virtual slots that force every concrete impl to override them.
  • The per-family impl override matrix (factory × slot), with override addresses and override counts.
  • The two-stage dispatch mechanism and the absence of any per-codename internal switch.
Factory vtable slots5 (D2, D0, Create, CanCreate, CreateImpl)
Family-specific factory slots2 — slot 1 (~Factory D0), slot 4 (CreateImpl)
Impl vtable slots23 (TpuHal base; pure at slots 2 and 20)
Override countsJXC = 7, PXC = 7, VXC = 8
Stage-1 dispatchregistry lookup TpuHalFactory::Get(v) @ 0x1fbb19c0
Stage-2 dispatchC++ vtable on the allocated TpuHal*HardwareImpl
Per-codename internal switchnone (VXC = single class for v3/v4/v5)

Dispatch Mechanism

Dispatch is two stages, each keyed on TpuVersion, but realized by two different mechanisms.

Stage 1 — Factory Selection (Data-Driven)

TpuHalFactory::Get(version) (0x1fbb19c0) reads the process-wide registry under a mutex and returns the factory instance registered for (kHardware, version). The registry was populated at load time by the five init modules (see HAL Families). This stage is not a switch and not a vtable cast — it is a table lookup. Six version keys map onto three factory classes:

AxisValuesSource
TpuVersion key0, 1, 2, 3, 4, 5Register immediate in each init module
Factory classJxc (0,1), Pxc (2), Vxc (3,4,5)make_unique<TpuHal*HardwareFactory> in each Register
Factory vtable0x215fe530, 0x216085c8, 0x21cabf70vtable planted at object+0 in each init module

Stage 2 — Impl Specialization (Virtual Dispatch)

The selected factory's Create (inherited from TpuHalHardwareFactoryBase @ 0x1e80f560) calls the factory's own CanCreate (slot 3) to probe device availability, then — on success — calls the factory's own CreateImpl (slot 4). CreateImpl allocates the impl object, runs TpuHal::TpuHal() with the TpuVersion taken from the factory at +8 (the dword stamped at Register time), and plants the per-family impl vtable into the new object's slot 0. From that point on, all per-generation behavior is ordinary C++ virtual dispatch on the impl vtable.

GOTCHA — in the decompile of Create (0x1e80f560) both indirect calls read their vtable from the second argument, which the decompiler types TpuHostWorkQueue*. The disassembly resolves the aliasing: the caller TpuHal::Create (0x1e814180) invokes factory_vtable[2](ret, factory, wq), so inside Create that second argument is the factory pointer, not the work-queue, and the work-queue is the third argument. Both probe (call *0x18(rax) = slot 3) and build (call *0x20(rax) = slot 4) therefore dispatch on the factory's own vtable. A reimplementation that routes CanCreate/CreateImpl through a work-queue method will not match the binary.

function HardwareFactoryBase::Create(this, factory, wq):  // 0x1e80f560
    if factory->vtable[3](factory):                       // CanCreate — slot 3 (0x1e80f520)
        return factory->vtable[4](this, factory, wq)      // CreateImpl — per-family slot 4
    else:
        return NotFound("No " + device_name + " device found.")   // tpu_hal_hardware_factory_base.cc:22

function JxcFactory::CreateImpl(ret, factory, wq):      // 0x0e723ac0
    v   = factory[2]                                     // TpuVersion at factory+8 (stamped at Register)
    obj = operator new(0xD0)                             // 208 B JxcImpl
    TpuHal::TpuHal(obj, v, wq)                           // base ctor — wq is the genuine work-queue arg
    obj[0]  = &JxcImpl_vtable[+0x10]                     // off_215FE590 — plant impl vtable
    obj[25] = 0                                          // helper @ +200 not yet attached
    ret[1] = obj; ret[0] = OK                            // write StatusOr<unique_ptr> result
    return ret

GOTCHA — the TpuVersion the impl ctor receives is read from factory+8 (*((_DWORD*)factory + 2) in the JXC/VXC stubs), not from the work-queue — the decompiler again mistypes the factory pointer as TpuHostWorkQueue*. factory+8 is the dword each init module stamps at Register time (see HAL Families). The work-queue is a separate argument and is forwarded only to the base ctor. PXC does not even read its factory pointer for the version: it hardcodes the literal 2, because the Pxc factory services Pufferfish alone.

PXC and VXC CreateImpl are byte-for-byte the same shape; only the allocation size, the planted vtable, and (for PXC) a hardcoded TpuVersion literal differ. PXC allocates 0xD0 (208 B) and plants off_21608628 with version 2; VXC allocates 0xD8 (216 B) and plants off_21CABFD0, additionally zeroing an extra flag byte at +208. (See TpuHal Class Hierarchy for the object layout.)

QUIRK — there is no third dispatch. VXC serves Viperfish, Ghostlite, and 6acc60406 (versions 3/4/5) through one identical TpuHalVxcHardwareImpl and one identical CreateImpl. No VXC method contains a switch (TpuVersion). Per-codename differentiation is pushed entirely out of the HAL into the TpuChipParts proto loader and the TpuCodec / CycleTable factories, all of which key on the same TpuVersion.


Factory Vtable Override Matrix

All three leaf factories share one vtable shape: five function-pointer slots. Slots 0, 2, and 3 point at inherited base code (the complete-object destructor, Create, CanCreate); only slot 1 (the deleting destructor) and slot 4 (CreateImpl) carry family-specific code. The deleting destructors are trivially free(this) in every family — the 16-byte factory owns no members.

SlotMethodJXCPXCVXC
0~Factory() D2 (complete-obj dtor)inherited 0x0e723a80inherited 0x0e723a80inherited 0x0e723a80
1~Factory() D0 (deleting dtor)override 0x0e723aa0override 0x0e7f8260override 0x1d110e80
2Create(TpuHostWorkQueue*) constinherited 0x1e80f560inherited 0x1e80f560inherited 0x1e80f560
3CanCreate() constinherited 0x1e80f520inherited 0x1e80f520inherited 0x1e80f520
4CreateImpl(TpuHostWorkQueue*) constoverride 0x0e723ac0override 0x0e7f8280override 0x1d110e00

Each leaf factory's D0 destructor (slot 1) decompiles to free(this) — verified for all three at the addresses above. The override addresses for slot 4 are the CreateImpl stubs whose bodies are shown in the dispatch section.


Impl Vtable Override Matrix

The TpuHal abstract base declares 23 virtual slots. Two are __cxa_pure_virtual and must be overridden in any instantiable subclass: slot 2 (Type) and slot 20 (CreateAndInitializeChips). The intermediate TpuHalHardwareImpl fills the two pure-base methods it can (Type at slot 2, a stricter ValidateTopology at slot 19); the three per-family impls inherit those and add their destructors, the mandatory CreateAndInitializeChips, and a small set of teardown/configuration slots.

The matrix below shows every slot. Cells reading "inherited" point at the TpuHal base implementation in the second column; bold cells are family overrides with their addresses.

SlotMethodBase (TpuHal)JXCPXCVXC
0~Impl() D2base dtor0x0e724de00x0e7f8a400x1d111740
1~Impl() D0base dtor0x0e724e400x0e7f8ac00x1d1117a0
2Type() const__cxa_pure_virtualmid-base 0x1d3b5480mid-base 0x1d3b5480mid-base 0x1d3b5480
3Initialize(TpuHalOptions const&)0x1e8132a0inheritedinheritedinherited
4TearDown()0x1e813440inheritedinheritedinherited
5topology() const0x1e8140a0inheritedinheritedinherited
6host_location() const0x1e814100inheritedinheritedinherited
7hal_location() const0x1e814160inheritedinheritedinherited
8GetConfiguredProperties() const0x0e724ea0inherited0x0e7f82e00x1d110ea0
9GetChip(int)0x1e811e80inheritedinheritedinherited
10GetChip(TpuChipLocation const&)0x1e811e40inheritedinheritedinherited
11AllocatePremapped(unsigned long)0x1e8143a0inheritedinheritedinherited
12DeallocatePremapped(void*)0x1e8143c0inheritedinheritedinherited
13PremappedAllocatorStats() const0x1e8143e0inheritedinheritedinherited
14GetPremappedAlignment() const0x1e814420inheritedinheritedinherited
15Throttle(TpuChipLocation const&)0x1e814440inheritedinheritedinherited
16Unthrottle(TpuChipLocation const&)0x1e814460inheritedinheritedinherited
17GetThrottleState(TpuChipLocation const&)0x1e814480inheritedinheritedinherited
18WaitForCoreDumpComplete()0x213d7760inheritedinherited0x1d110f00
19ValidateTopology()0x1e8139c0mid-base 0x1d3b54a0mid-base 0x1d3b54a0mid-base 0x1d3b54a0
20CreateAndInitializeChips(TpuHalOptions const&)__cxa_pure_virtual0x0e723c200x0e7f83000x1d110f20
21PreTearDownChips()0x1d3b5a20 (no-op)0x0e724da00x0e7f8a200x1d111720
22PostTearDownChips()0x0e7f8b40 (no-op)0x0e724dc0inheritedinherited

Override counts: JXC = 7 (slots 0,1,2,19,20,21,22), PXC = 7 (slots 0,1,2,8,19,20,21), VXC = 8 (slots 0,1,2,8,18,19,20,21).

Slots 2 and 19 are shown as "mid-base" because all three families share the same implementation, inherited from the intermediate TpuHalHardwareImpl rather than from a per-family body: TpuHalHardwareImpl::Type (0x1d3b5480) and TpuHalHardwareImpl::ValidateTopology (0x1d3b54a0). Type returns the constant 0 (the kHardware product-type tag); ValidateTopology is the stricter version that scans hardware devices, compares the detected TpuVersion and TpuChipParts::variant_name against the topology, and emits a "Detected hardware version ... does not match" diagnostic. These two are the entire raison d'être of the intermediate class.

Where the Three Families Actually Differ

Reading the matrix by override delta isolates each family's specialization:

  • All three override destructors (slots 0,1), the mandatory CreateAndInitializeChips (slot 20), and PreTearDownChips (slot 21 → {Jxc,Pxc,Vxc}CommonHelper::TearDownMesh). These are the per-family core: the chip-creation constraint checks and the mesh-teardown path.
  • PXC and VXC override GetConfiguredProperties (slot 8) — both delegate to their family CommonHelper. JXC inherits the base default (GetDefaultConfiguredProperties(topology)).
  • VXC alone overrides WaitForCoreDumpComplete (slot 18 → TpuHalVxcCommonHelper::WaitForCoreDumpComplete). JXC and PXC inherit the generic per-chip wait loop. This matches VXC's broader fabric-attached core-dump machinery (visible also in its TpuChip* override set).
  • JXC alone overrides PostTearDownChips (slot 22), even though its body returns 1 identically to the base no-op — the source file places the override explicitly.

NOTE — the override bodies for slots 20/21 carry the family's hardcoded core-count and HBM constraints and its driver wiring. JxcImpl::CreateAndInitializeChips (0x0e723c20) caps the core counts with three run-time-assembled diagnostics — the TensorCore limit (prefix "Jellyfish Hardware only supports at most " + count + " TensorCore."), the BarnaCore limit (same prefix + count + " Barnacore."), and the HBM limit (the standalone literal "TPU platform only supports up to two HBMs.") — then drives the jxc deepsea DriverFactory. PxcImpl and VxcImpl carry the analogous Pufferfish / Viperfish messages and their CommonHelper::CreateChips paths. The constraint detail belongs to the per-family pages; the slot ownership is what this matrix fixes.

VxcImpl::PreTearDownChips (0x1d111720) is also where the 216-byte VXC object's extra +208 flag byte is read: if set it returns 1 (mesh already torn down), otherwise it calls TpuHalVxcCommonHelper::TearDownMesh(this[25]). JXC and PXC have no such guard byte.


Cross-References

  • HAL Families — the registry that drives stage-1 factory selection and the five init modules that populate it
  • TpuHal Class Hierarchy — the vtable layouts, slot counts, and object sizes this matrix indexes into
  • 6-Codename Authoritative Reconciliation — the TpuVersion keys both dispatch stages switch on
  • JXC Family — the slot-20/21/22 override bodies for Jellyfish and Dragonfish
  • PXC Family — the slot-8/20/21 override bodies for Pufferfish
  • VXC Family — the slot-8/18/20/21 override bodies for Viperfish
  • GXC Family — Ghostlite and 6acc60406, dispatched onto the same VXC impl vtable