JXC Family (Jellyfish, Dragonfish)
Addresses apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id
89edbbe81c5b328a958fe628a9f2207d). Other versions differ.
Abstract
JXC is the oldest of the four TPU HAL families libtpu still carries. One C++ class — tpu::(anonymous namespace)::TpuHalJxcHardwareFactory — serves two silicon generations: Jellyfish (TpuVersion::kJellyfish = 0) and Dragonfish (TpuVersion::kDragonfish = 1). The factory is a thin 16-byte object whose only family-specific behaviour is allocating the right HAL implementation; everything that distinguishes Jellyfish from Dragonfish is data-driven through the embedded TpuChipParts proto, not the C++ type.
The defining architectural trait of JXC, and the reason it anchors the sub-core taxonomy, is that it has no fetch/load-core split. Where every later family (PXC, VXC, GXC) divides each core's instruction stream into a fetch-core and a load-core sub-namespace, JXC's driver layer is a single fused dataflow. Its driver sub-namespaces under asic_sw::driver::deepsea::jxc:: are organized by engine block (dfc, jfc, registers, snap, the *_trace_entry set) rather than by fetch/load role. The compiler-side ISA for both generations lives in platforms_deepsea::jellyfish::isa, not in any jxc::isa namespace.
This page follows the same grammar as the PXC, VXC, and GXC family pages: factory binding, the construction path, the sub-namespace roster, and the per-codename differentiation. For the abstract base chain (TpuHalFactory → TpuHalHardwareFactoryBase → leaf) shared by all four families, see HAL Families.
For reimplementation, the contract is:
- The single-class / two-version registration model: one vtable, two 16-byte instances keyed by
TpuVersionat object offset+8. - The 5-slot factory vtable and the
CreateImplallocator that yields a 208-byteTpuHalJxcHardwareImpl. - The construction chain HardwareImpl → CommonHelper → Chip → Core, and the
JfDmaIssuerDMA engine threaded into each core. - The driver sub-namespace roster — and the explicit fact that there is no fetch/load split and no
jxc::isa.
| Factory class | tpu::(anonymous namespace)::TpuHalJxcHardwareFactory (anon-ns) |
| TpuVersions served | 2 — kJellyfish (0), kDragonfish (1) |
| Factory vtable / vptr | _ZTV 0x215fe530 / installed vptr 0x215fe540 |
| Factory typeinfo | _ZTI 0x215fe568 (__si base → TpuHalHardwareFactoryBase 0x21d343f8) |
| HAL impl class / size | TpuHalJxcHardwareImpl, 208 B (0xD0), vtable 0x215fe580 |
| Init module | google_init_module_tpu_hal_jxc_hardware_impl @ 0x213e9d80 (2× Register) |
| Fetch/load split | None — fused dataflow (defining JXC trait) |
| DMA engine | JfDmaIssuer (separate per-core object) |
Factory Binding and Registration
Purpose
The factory selects, at dlopen time, which HAL implementation a given TpuVersion will instantiate. JXC is the only family whose factory class is registered for more than one version, because Jellyfish and Dragonfish are architecturally close enough to share one HAL implementation. (The xla_* flag prefixes confirm the kinship: there is no xla_df_ prefix at all — Dragonfish reuses Jellyfish's xla_jf_ flags entirely.)
Entry Point
google_init_module_tpu_hal_jxc_hardware_impl (0x213e9d80)
├─ operator new(0x10) ── factory instance #1 (16 B)
│ f0[+8] = 0 (kJellyfish) ; f0[+0] = vptr 0x215fe540
│ TpuHalFactory::Register(kHardware, 0, f0) (0x1fbb16a0)
└─ operator new(0x10) ── factory instance #2 (16 B)
f1[+8] = 1 (kDragonfish) ; f1[+0] = vptr 0x215fe540 (SAME vtable)
TpuHalFactory::Register(kHardware, 1, f1)
Algorithm
function google_init_module_tpu_hal_jxc_hardware_impl(): // 0x213e9d80
// Two registry entries, one shared vtable. The instances differ
// ONLY in the TpuVersion stored at +8.
for version in {0 /*kJellyfish*/, 1 /*kDragonfish*/}:
f = operator_new(0x10) // 16-byte factory object
f[+8] = version // u32 TpuVersion key
f[+0] = &JxcFactory_vtable + 0x10 // installed vptr 0x215fe540
s = TpuHalFactory::Register(kHardware /*0*/, version, unique_ptr(f))
CHECK(s == OK) // fail strings 0x94a3e44 (kJellyfish), 0x94a4057 (kDragonfish)
QUIRK — there is no class-level Jellyfish-vs-Dragonfish dispatch. Both 16-byte instances point at the identical vtable
0x215fe540and differ only by theTpuVersionu32 at+8. The two CHECK strings byte-confirmstd::make_unique<TpuHalJxcHardwareFactory>(...kJellyfish)and(...kDragonfish)— i.e. the JXC factory constructor does take aTpuVersionargument, unlike PXC's argument-less ctor. Per-codename behaviour (core counts, MXU shape, HBM cap) is resolved later, fromTpuChipParts.
Function Map
| Function | Address | Role |
|---|---|---|
google_init_module_tpu_hal_jxc_hardware_impl | 0x213e9d80 | 2× Register (v0, v1) |
TpuHalFactory::Register | 0x1fbb16a0 | registry insert [platform][version] |
TpuHalFactory::Get | 0x1fbb19c0 | runtime registry lookup under mutex |
TpuHal::Create | 0x1e814180 | public entry: Get → Create → bind profiler |
The Factory vtable
Purpose
The factory exposes the abstract 5-slot TpuHalFactory interface. JXC overrides only two slots; the other three are literally the same function addresses PXC and VXC point at — the base Create/CanCreate are shared, not copied per family.
Vtable Layout
| vaddr | slot | resolves to | base/override |
|---|---|---|---|
| 0x215fe540 | 0 — ~TpuHalFactory() D2 | 0x0e723a80 (ret) | INHERITED |
| 0x215fe548 | 1 — ~TpuHalJxcHardwareFactory() D0 | 0x0e723aa0 | OVERRIDE |
| 0x215fe550 | 2 — HardwareFactoryBase::Create(wq) | 0x1e80f560 | INHERITED |
| 0x215fe558 | 3 — HardwareFactoryBase::CanCreate() | 0x1e80f520 | INHERITED |
| 0x215fe560 | 4 — TpuHalJxcHardwareFactory::CreateImpl(wq) | 0x0e723ac0 | OVERRIDE |
Slot 1 (the deleting destructor) is tpu::(anonymous namespace)::TpuHalJxcHardwareFactory::~TpuHalJxcHardwareFactory (0x0e723aa0), a bare free(this) after operator delete(this, 0x10) is inlined — proof the factory object is 16 bytes. Slot 2 is the GoF template method: Create calls slot 3 (CanCreate) then slot 4 (CreateImpl), else builds a NotFoundError. Slot 3 reads the factory's stored TpuVersion at +8 and matches it against ScanHardwareDevices (0x1fba53c0).
Algorithm — CreateImpl
function TpuHalJxcHardwareFactory::CreateImpl(out, this, wq): // 0x0e723ac0
version = *(u32*)(wq + 8) // TpuVersion read from the work-queue arg, *((_DWORD*)wq+2)
obj = operator_new(0xD0) // 208 B = TpuHalJxcHardwareImpl
TpuHal::TpuHal(obj, version, wq) // base ctor stores wq + version into the TpuHal sub-object
*(void**)(obj + 0) = &JxcImpl_vtable + 0x10 // plant 0x215fe580 → vptr 0x215fe590
*(void**)(obj + 0xC8) = nullptr // CommonHelper slot (qword index 25), null until CreateAndInitializeChips
out.value = obj ; out.status = OK
return out
NOTE — the JXC
CreateImplreads theTpuVersionit passes to theTpuHalbase ctor from the work-queue argument at+8(*((_DWORD*)wq + 2)), not from the factory's own stored key atthis+8. This is the same mechanism VXC uses (*((unsigned int*)wq + 2)) — the two version-bearing families both source the version from the work queue. Only PXC, which services a single generation, hardcodes the literal2into the ctor call. (The factory's ownTpuVersionatthis+8, stamped by the init module, is whatCanCreatematches againstScanHardwareDevices; it is not whatCreateImplforwards.) The 208-byte impl and the helper slot at qword index 25 (+0xC8) are common to JXC and PXC; VXC's impl is 216 bytes (operator new(0xD8), an extra flag byte at+0xD0). See the HAL Factory Override Matrix for the full impl override table.
Construction Chain Below the Factory
Purpose
The factory returns only the HAL object. The chip/core graph is built lazily in TpuHalJxcHardwareImpl::CreateAndInitializeChips (impl vtable slot 20, @ 0x0e723c20), invoked during Initialize.
Entry Point
TpuHalJxcHardwareImpl::CreateAndInitializeChips (0x0e723c20)
├─ TpuChipParts::CoreCount / SharedMemoryCount ── data-driven constraints
├─ tpu::CreateFishTopology (0x1fc57c60) ── Jellyfish-class BarnaCore mesh
├─ jxc::DriverFactory::Create (0x0e778a40) ── per-device jxc::DriverInterface
├─ TpuHalJxcCommonHelper (24 B, 0x0e725820) ── stored at impl+0xC8
└─ helper->CreateChips (0x0e726a40)
└─ TpuChipJxcDriverImpl (432 B, 0x0e727f80)
└─ core-factory lambda
└─ TpuCoreJxcDriverImpl (824 B, 0x0e733760) ── takes a JfDmaIssuer*
Considerations
JXC's construction path is the heaviest of the four families (a large CreateAndInitializeChips body — the function spans the bulk of tpu_hal_jxc_hardware_impl.cc, source lines 69–142 visible in its fatal-log site annotations). It is the only family that builds its driver set through a standalone jxc::DriverFactory::Create per device and a CreateFishTopology mesh builder before handing the result to TpuHalJxcCommonHelper::InitializeDrivers; later families fold the per-device driver construction directly into their helper without a separate factory or fish-topology builder. JXC enforces three hardcoded ceilings during this path, each as a distinct fatal/error string: at most 2 TensorCores ("Jellyfish Hardware only supports at most 2 TensorCore.", CoreCount(...,0) >= 3 guard, line 71), at most 2 BarnaCores ("… 2 Barnacore.", CoreCount(...,1) >= 3, line 78), and an HBM cap of 2 ("TPU platform only supports up to two HBMs.", SharedMemoryCount(...,0) >= 3, line 83). All three "two" limits are JXC-specific literals baked into the message text, whereas PXC/VXC runtime-format their caps.
GOTCHA — the per-core constructor takes a
JfDmaIssuer*(the Jellyfish-family DMA engine, ctor @ 0x0e73aea0), a separate object created per core. This is unique to JXC: PXC, VXC, and GXC have no standalone DMA-issuer class — they fold DMA into the per-family driver (TpuPxcDriver/TpuVxcDriver). A reimplementation that assumes a DMA-issuer object for the newer families will find no such symbol.
Driver Sub-Namespace Roster
The asic_sw::driver::deepsea::jxc:: namespace is the strongest evidence for the no-split nature of JXC. Its direct sub-namespaces, confirmed in the symbol table, are organized by engine block and trace-entry type, not by fetch/load role:
| Sub-namespace | Role |
|---|---|
jxc::dfc | dataflow controller engine (1991 symbols) |
jxc::jfc | Jellyfish core engine (988 symbols) |
jxc::registers | register-block definitions (330 symbols) |
jxc::snap | snapshot / checkpoint support (241 symbols) |
jxc::jellyfish_performance_counters | gen-0 perf counters |
jxc::dragonfish_performance_counters | gen-1 perf counters |
jxc::*_trace_entry | profiler trace-entry types (see below) |
The *_trace_entry family includes bcs_internal, brn_fabric_sync, brn_sync_wait, cs_internal, cs_external_sync_flag_update, hbm_mux_switch, hib_request, hib_interrupt, hib_hbm_write, hib_sync_update, ici_packet, and the nf_* set — engine-block event records, not standalone namespaces.
NOTE — there is no standalone
jxc::bcs/brn/hbm/hib/ici/isanamespace. Those tokens are prefixes inside trace-entry type names (e.g.bcs_internal_trace_entry,ici_packet_trace_entry). The compiler-side ISA for both generations lives inplatforms_deepsea::jellyfish::isa(mangledplatforms_deepsea9jellyfish3isa; e.g.BarnaCoreAddressHandlerProgram,ProgramFacade,BundleFacade), reflecting the deepsea umbrella wherejellyfish::is the shared compiler-base namespace for all generations.
QUIRK — JXC has no
jxc::profiler::TraceEntryclass. The namedprofiler::TraceEntryevent class that the sub-core taxonomy groups exists only for the fetch/load-split families. JXC's profiler support is realized through the per-engine*_trace_entrytypes instead. JXC is therefore not one of the trace-entry sub-cores despite being a HAL family.
Per-Codename Differentiation
Jellyfish and Dragonfish differ only in data, never in C++ type. Both produce the same TpuHalJxcHardwareImpl, the same TpuChipJxcDriverImpl/TpuCoreJxcDriverImpl, and share one DMA descriptor model (the V1 jxc::DmaDescriptor, 8×32-bit = 32 bytes). Each generation has its own named codec class — TpuCodecJellyfish and TpuCodecDragonfish (both fully named, RTTI-symbol-bearing) — selected by TpuCodec::Create case 0/1, but these are codec objects, not HAL types.
| Axis | Jellyfish (v0) | Dragonfish (v1) | Source |
|---|---|---|---|
| TpuVersion enum | kJellyfish = 0 | kDragonfish = 1 | TpuVersionToString 0x20b3a480 |
| ToString | "jellyfish" | "dragonfish" | rodata pointer table 0x22011bf0 |
| Codec class | TpuCodecJellyfish (named) | TpuCodecDragonfish (named) | symtab; TpuCodec::Create case 0/1 |
| TensorCore / BarnaCore | yes / yes | yes / yes | TpuChipParts (Core.type BARNA_CORE) |
| SparseCore | no | no | TpuChipParts (no SPARSE_CORE Core) |
| Flag prefix | xla_jf_ (417 flags) | xla_jf_ (no xla_df_) | flag scan |
Cross-References
- PXC Family — the next generation; first to split fetch/load cores, drops the
JfDmaIssuer - VXC Family — first SparseCore-bearing family; one factory serves three codenames
- GXC Family — Ghostlite + 6acc60406, registered into the shared VXC factory
- Sub-Core Taxonomy — where JXC's fused dataflow sits in the fetch/load-split evolution
- HAL Families — the shared
TpuHalFactorybase chain and template-methodCreate - Codename Matrix — the 6-value
TpuVersionenum and HAL routing - HAL Factory Override Matrix — the per-impl 23-slot override tables