Dragonfish Bundle
Addresses apply to libtpu.so from the libtpu-0.0.40-cp314 wheel. Other versions differ.
Abstract
Dragonfish (TpuVersion::kDragonfish = 1) does not have a bundle layout of its own. Its TensorCore VLIW bundle is the identical 41-byte (328-bit) Jellyfish bundle — same internal-struct/12-byte-strip mechanism, same slot-mask dispatch, same per-slot bit positions, same kNeverExecute prefill. The shared layout is not an inference from width coincidence; it is structural in the binary. EncoderDf is a C++ subclass of EncoderJf that overrides only three slot writers and contributes no EncodeBundleInternal, no BundleSizeBytes, and no DragonfishCodecMetadata — the bundle assembler, the width constant, and the codec-metadata class are all inherited from Jellyfish. Dragonfish and Jellyfish are paired everywhere the codec is selected: the encoder factory builds one shared CreateEncoderJfDf for versions 0+1, and Dragonfish reuses JellyfishBundleRestrictions rather than carrying its own.
The only Dragonfish-specific behaviour is a set of MXU-validity checks layered on top of the inherited Jellyfish encoders. EncoderDf overrides exactly the three slot writers that name an MXU — EncodeVectorExtendedInstruction, EncodeVectorResultInstruction, and EncodeMiscInstruction — and each override calls the Jellyfish encoder verbatim, then runs CheckMxuValid<T> to reject any op whose mxu_num >= 2. These checks write no bundle bits; they are pure legality gates plus an internal encoder-state flag at this+12, reflecting Dragonfish's different physical MXU configuration. The wire format is byte-for-byte the Jellyfish 41-byte bundle.
This page is therefore short by design: it documents the shared-codec evidence and the precise Dragonfish delta, and defers the full slot map to the canonical Jellyfish page. For reimplementation, the contract is:
- Dragonfish encodes the same 41-byte bundle as Jellyfish — reuse the Jellyfish 41B Bundle slot map verbatim.
EncoderDfinheritsEncodeBundleInternal,BundleSizeBytes, andJellyfishCodecMetadata; do not implement a separate Dragonfish codec.- The only Dragonfish-specific code is
CheckMxuValid<T>rejectingmxu_num >= 2in the three MXU-bearing slot encoders — a validity delta, not a layout delta.
| Encoder class | EncoderDf (subclass of EncoderJf); ctor 0x1e85e340 calls EncoderJf::EncoderJf(this, 1) |
| Bundle assembler | inherited EncoderJf::EncodeBundleInternal @ 0x1e86c7c0 (no EncoderDf override) |
| Wire width | 41 bytes / 328 bits — inherited JellyfishCodecMetadata::BundleSizeBytes @ 0x1ecf7460 |
| Codec metadata | shares JellyfishCodecMetadata; no DragonfishCodecMetadata symbol exists |
| Encoder factory | CreateEncoderJfDf services versions 0 (Jellyfish) + 1 (Dragonfish) |
| Bundle restrictions | shares JellyfishBundleRestrictions (no Dragonfish-specific class) |
| Dragonfish delta | CheckMxuValid<T> (mxu_num < 2) in EncodeVectorExtended/Result/MiscInstruction |
The Shared-Codec Evidence
Four independent pieces of binary evidence establish that Dragonfish reuses the Jellyfish 41-byte bundle codec rather than carrying its own.
1 — EncoderDf is a subclass of EncoderJf. The Dragonfish encoder constructor base-constructs an EncoderJf and then installs its own vtable:
// EncoderDf::EncoderDf() @ 0x1e85e340 (decompiled)
EncoderJf::EncoderJf(this, 1); // construct the Jellyfish base (config arg = 1)
*(void**)this = off_21D36BF0; // install the EncoderDf vtable
// vmovaps/vmovups: store a 16-byte config blob at this+0x40
*((uint32*)this + 20) = 4; // set the per-gen config word
The EncoderJf::EncoderJf(this, 1) call is the literal IS-A: a Dragonfish encoder is a Jellyfish encoder with a different vtable and config word. Every Jellyfish method EncoderDf does not override is dispatched to the Jellyfish implementation.
2 — EncoderDf has no EncodeBundleInternal and no BundleSizeBytes. A symbol-table sweep finds an EncoderJf::EncodeBundleInternal (0x1e86c7c0) and an EncoderBcsDf::EncodeBundleInternal (0x1e85cd20, the BarnaCore-sequencer encoder, a different sequencer type) but no EncoderDf::EncodeBundleInternal and no EncoderDf::BundleSizeBytes. The bundle assembler and width constant are inherited unchanged from Jellyfish.
3 — there is no DragonfishCodecMetadata. The codec-metadata classes are JellyfishCodecMetadata, PufferfishCodecMetadata, ViperfishCodecMetadata, GhostliteCodecMetadata — there is no Dragonfish entry. Dragonfish's width comes from the same JellyfishCodecMetadata::BundleSizeBytes (0x1ecf7460) that returns 41 for the TensorCore (component 0) and 16 for BarnaCore (component 1). The Bundle Model lists Dragonfish as sharing the Jellyfish codec metadata for exactly this reason.
4 — the factory and bundle restrictions pair 0+1. The encoder factory routes versions 0 and 1 to one shared CreateEncoderJfDf, and Dragonfish reuses JellyfishBundleRestrictions rather than a Dragonfish-specific class. The pairing is direct binary evidence of the shared codec — see the Codename Matrix and the JXC Family page, where one TpuHalJxcHardwareFactory and one shared compiler-side xla::jellyfish::isa namespace serve both generations.
NOTE —
TpuCodecDragonfishexists, but it is not a separate bundle codec.TpuCodec::Createhas a case 1 that builds a namedTpuCodecDragonfishobject (CreateTpuCodecDragonfish@0x1e8360e0), andTpuCodecDragonfish::EncodeBundle(0x1e8369a0) exists. But that wrapper does no bit-packing itself: itswitches onTpuSequencerTypeand forwards toEncodeSequencerBundle<EncoderDf>for the TensorCore sequencer (type 0),EncodeSequencerBundle<EncoderBcsDf>for the BarnaCore sequencer (type 1), andEncodeBarnaCoreAddressHandlerBundle<EncoderJf>for the address handler (type 2); sequencer types 3–5 returnUnimplemented. The TensorCore path therefore lands inEncoderDf, whoseEncodeBundleInternalis the inheritedEncoderJfone — the named codec is a dispatch/RTTI wrapper, not a distinct 41-byte layout. A reimplementation should model one 41-byte bundle codec for both generations, dispatched by aTpuVersionargument, not two.
The Dragonfish Delta: MXU-Validity Checks
EncoderDf overrides exactly three slot writers — the three that reference an MXU — and each follows the same shape: call the Jellyfish encoder verbatim (which writes all the bundle bits), then run CheckMxuValid<T> on the slot's mxu_num, then update an internal encoder-state flag. No override touches a bundle byte position.
// EncoderDf::EncodeVectorExtendedInstruction @ 0x1e85e520 (decompiled, representative)
status = EncoderJf::EncodeVectorExtendedInstruction(this, inst, bundle); // identical bit layout
if (!status.ok()) return status; // encoder_df.cc:65
if (!CheckMxuValid<VectorExtendedInstruction>(inst, inst->mxu_num).ok()) // mxu_num < 2 gate
return status; // encoder_df.cc:66
if (inst->mxu_num == 1 && (this->state[+12] & 0x600000000) == 0)
this->state[+12] |= 2; // internal MXU-1 tracking flag
return ok;
CheckMxuValid<T> is a pure legality gate — it never writes the bundle:
// EncoderDf::CheckMxuValid<T>(inst, mxu_num)
// <VectorExtendedInstruction> @ 0x1e85e5c0
// <VectorResultInstruction> @ 0x1e85e420
// <MiscInstruction> @ 0x1e85e7a0
function CheckMxuValid(inst, mxu_num):
if mxu_num >= 2:
return InvalidArgument("invalid mxu_num for " + inst.ShortDebugString()); // encoder_df.cc:36
return ok;
CheckMxuValid is a function template, instantiated once per slot type; all three instantiations share the identical body (mxu_num >= 2 → MakeErrorImpl<3> at encoder_df.cc:36, message "invalid mxu_num for " + ShortDebugString).
The three overrides and the slot whose mxu_num they validate:
Slot encoder (EncoderDf) | Address | Validates mxu_num of | Delegates to (JF base) | CheckMxuValid<T> |
|---|---|---|---|---|
EncodeVectorExtendedInstruction | 0x1e85e520 | the vector-extended / matmul / latch op (inst+0x70) | EncoderJf::EncodeVectorExtendedInstruction @ 0x1e869f00 | 0x1e85e5c0 |
EncodeVectorResultInstruction | 0x1e85e380 | the matres / result-FIFO op (inst+0x4C) | EncoderJf::EncodeVectorResultInstruction @ 0x1e865ae0 | 0x1e85e420 |
EncodeMiscInstruction | 0x1e85e6c0 | the misc op's MXU operand — the ClearResultFifoOperands sub-message (or its _globals_ default), mxu_num at sub-message +0x1C | EncoderJf::EncodeMiscInstruction @ 0x1e86be80 | 0x1e85e7a0 |
These are precisely the three slots that name an MXU. The vector-ALU, scalar, vector-load, and vector-store slots have no EncoderDf override at all — they are encoded by the inherited Jellyfish writers without any Dragonfish-specific check, because they do not address the matrix unit.
QUIRK — the Dragonfish delta is a validity delta, not a layout delta. A reimplementer must encode the Dragonfish bundle bytes with the Jellyfish slot map — same LSB-first bit numbering, same per-slot
shl/orshift constants (see Jellyfish 41B Bundle and Bundle Model); the bit positions are byte-identical — and additionally reject any matmul / matres / misc op that namesmxu_num >= 2. The check changes which programs encode, not where bits land. Skipping the check produces a bit-identical-but-illegal bundle for an out-of-range MXU; adding a layout difference where there is none corrupts every Dragonfish bundle.
The internal this+12 state update is encoder bookkeeping — it tracks which MXU a bundle has already committed to so a later slot in the same bundle cannot claim a conflicting MXU. Each of the three overrides uses its own mask over the same this+12 qword: the vector-extended override tests & 0x600000000 and sets | 2 (decompile shows the mxu_num == 1 branch), EncodeVectorResultInstruction tests & 0x300000 == 0x100000 and sets | 0x300000, and EncodeMiscInstruction clears the low-5-bit field and conditionally sets | 0x60 for its ClearResultFifoOperands form. This state lives in the encoder object, not the bundle word, and is invisible on the wire.
Cross-References
- Jellyfish 41B Bundle — the canonical 41-byte slot map, prefill, and packing model Dragonfish reuses verbatim.
- Codename Matrix —
kDragonfish = 1, its codename string, and the 0+1 shared-encoder pairing. - JXC Family — the single HAL family and
xla::jellyfish::isanamespace that serve both Jellyfish and Dragonfish.