Memory-Space Master Table
Every enum ordinal, address-space ID, allocator symbol, and per-gen geometry on this page was decoded byte-exactly from
libtpu.soin thelibtpu-0.0.40-cp314wheel (buildlibtpu_lts_20260413_b_RC00, BuildID md589edbbe81c5b328a958fe628a9f2207d— the unambiguous version anchor). Other builds will differ.
Abstract
This appendix is the single consolidated reference for every memory space libtpu.so addresses — the off-chip HBM main store, the on-chip TensorCore tiers (VMEM, SMEM, CMEM, SFLAG, IMEM), the BarnaCore sub-core tiers, the SparseCore sequencer pools, the host-interface and host-RAM pools, and — orthogonally — the SparseCore LLVM address-space ID space (AS0 plus the 0xC9..0xE1 / 0x1F5/0x1F6 band) the SparseCore backend uses to tag pointers. It aggregates and cross-checks the facts that live on the per-tier memory deep pages and the two ISA/target enum pages into one master table, re-verified against the binary.
There are three distinct, separately-numbered integer spaces here, and the central job of this page is to keep them apart, because the same word (smem, sflag, hbm) appears in all three with different integers:
- The LLO
xla::jellyfish::MemorySpaceenum — 17 values (0..16), the operand-space tag on every TensorCore LLO load/store, decoded byMemorySpaceToString@0x1d6ffae0readingoff_21CE6B08[ms]. This is what the allocator, the DMA emitter, and the bundle packer dispatch on.smem= 5,sflag= 6,sparse_core_sequencer_smem= 14. - The wire
MemorySpaceProtofield numbers — the same 17 spaces, remapped integers (hib/vmem/cmemdiffer); a (de)serializer must remap at the boundary. Owned by memory-space-enum.md. - The SparseCore
mlir::sparse_core::MemorySpaceenum (22 values, 1-based, value-8 gap) and its LLVM address-space IDs (AS0,AS201..225,AS501/502). Heresmem= MS 1 (AS0). Owned by address-space-ids.md and fat-pointers-as789.md.
The three are not convertible by arithmetic — only named pools correspond, and only by physical identity. This page gives the master table keyed on the LLO enum, then a focused section for the LLO ordinals, a section for the SparseCore AS space (including the dead AS7/8/9 fat-pointer reserve), and a section for the alignment/geometry rules. One deep-page label still disagrees with the binary (an off-by-one sequencer-SMEM name); it is corrected in place below.
This is a pure reference catalog — there is no algorithm to reimplement, only data to reproduce exactly. Every factual table carries a Confidence column.
| LLO enum | xla::jellyfish::MemorySpace — 17 values 0..16 |
| LLO decoder | MemorySpaceToString(MemorySpace) @ 0x1d6ffae0 → off_21CE6B08[ms] (no bounds check) |
| LLO color remap | ColorToMemorySpace(color) @ 0x1d6ffb00 → byte_B5435CA[color], color < 0xA (10) |
| Wire enum | xla.jellyfish.MemorySpaceProto descriptor @ VA 0xbf8cc80 (17 values, remapped) |
| SparseCore enum | mlir::sparse_core::MemorySpace — 22 values, 1-based, value-8 gap |
| SparseCore AS↔MS | AddressSpaceToMemorySpace @ 0x14b78800 / MemorySpaceToAddressSpace @ 0x14b78780 (table dword_AF36CE8, mask 0x3FFF7F) |
| Universal allocator | tpu::BestFitAllocator (208-B instance, ctor 0x1e817500); one per tier, per-tier Config |
| Compile-time placer | ProgramMemoryAllocator::AllocateBytes(MemorySpace, …) @ 0x1c629e40 |
| Confidence | CONFIRMED (byte-anchored) unless a cell or callout says otherwise |
The Master Memory-Space Table
The rows are keyed on the LLO MemorySpace enum (the operand-space tag, the one number that flows end-to-end through the TensorCore compiler and runtime). AS-id is the SparseCore LLVM address-space integer for the physically corresponding SparseCore pool, where one exists — the two enums meet only at the SparseCore/SC-sequencer pools, and the AS-id column is blank for pure-TensorCore tiers that the SparseCore backend never names. Allocator/owner names the manager that owns the tier's bytes; Owning page is the deep page that documents it. Per-gen size/geometry literals live in chip_parts.binarypb (boot-filled) and are not in .text; the formulas and field offsets are exact, the literal byte counts are not asserted here (see chip-parts-binarypb.md).
| LLO# | Space | AS-id | Purpose | Per-gen size / geometry | Allocator / owner | Owning page |
|---|---|---|---|---|---|---|
| 0 | <no memory space> | — | unset / invalid sentinel (default-constructed) | — | — | memory-space-enum.md |
| 1 | hbm | 203 (0xCB) | off-chip DRAM main store: program I/O, spill, embeddings | tens of GiB; Config{base=0, end=chip_parts HBM −reserved, align=1024 B DMA / 16 KiB compile, granule=chip_parts} | BestFitAllocator (runtime) / ProgramMemoryAllocator (compile) | hbm-allocator.md, hbm-dma-alignment.md |
| 2 | hib | — | Host-Interface Buffer: HBM↔host staging tier the HIB DMA engine drives | chip_parts | HIB DMA engine | memory-space-enum.md |
| 3 | vmem | 205 (0xCD) | vector memory: MXU/VPU operand staging, the MSA kAlternate fast tier | ~16–64 MiB/TensorCore; align=VmemAlignmentBoundaryInBytes(), granule=VmemWordSizeBytes() (Target+0x50C); banks JF8/PF16/VF32/GL32 | BestFitAllocator / MSA + ProgramMemoryAllocator | vmem-allocator.md |
| 4 | cmem | — | constant memory: Pufferfish-only read-mostly operand pool (dedicated co-issue load slot) | CmemSizeBytes() (Target+0x460); word=granule=CmemWordSizeBytes() (Target+0x510, ~16 B PF); banks PF=32 only | BestFitAllocator / MSA (xla_tpu_cmem_*) | cmem-pool.md |
| 5 | smem | 0 (0x0) | scalar memory: SPU spill/parameter store, loop counters, completion descriptors | SmemSizeBytes() (Target+0x470); word=SmemWordSizeBytes()=4 B (Target+0x508); banks JF2/PF8/VF8/GL8 | BestFitAllocator / ProgramMemoryAllocator (opcode-driven, not MSA) | smem-scalar-memory.md, smem-register-window.md |
| 6 | sflag | 204 (0xCC) | sync-flag register file: DMA-completion/barrier handshake words, atomic counter/done-bit | word-granular S32; SflagWordSizeBytes() (Target+0x504), log2 cached Target+0x4c8; byte_off = 4·n | BestFitAllocator (size) + fixed number-space partition | sflag-protocol.md |
| 7 | imem | 214 (0xD6) † | instruction memory: bundles the sequencer fetches | chip_parts | sequencer | memory-space-enum.md |
| 8 | barna_core_bmem | — | BarnaCore (embedding-engine) bulk scratchpad | chip_parts; PXC family only | BarnaCore | memory-space-enum.md |
| 9 | barna_core_smem | — | BarnaCore scalar scratchpad | Target+0x47C size, +0x480 base, +0x51C word | BarnaCore (BarnaCoreSflagImmPtr, scoped trampoline) | smem-scalar-memory.md |
| 10 | barna_core_sflag | — | BarnaCore sync-flag tier (distinct from TC SFLAG) | Target+0x478 (BarnaCoreSflagSizeBytes) | BarnaCore | sflag-protocol.md |
| 11 | barna_core_imem | — | BarnaCore instruction memory | chip_parts | BarnaCore | memory-space-enum.md |
| 12 | sparse_core_sequencer_sflag | 223 (0xDF) ‡ | SC-sequencer sync-flag bank | chip_parts; SC sequencer | SparseCore sequencer | sflag-protocol.md |
| 13 | host | — | host-resident buffer (transfer source/sink; MSA offload spill target) | host DRAM | PremappedMemoryManager / tsl::BFCAllocator over posix_memalign | overview.md |
| 14 | sparse_core_sequencer_smem | 224 (0xE0) ‡ | SC-sequencer scalar scratchpad (well-known constants: chip_id, replica_id, …) | SCS SMEM 64 KiB hard immediate on VF/GL/GF | SparseCore sequencer | smem-scalar-memory.md |
| 15 | sparse_core_private_stack_hbm | 203 (0xCB) | per-SC private stack carved from HBM | HBM-backed | HBM-backed | memory-space-enum.md |
| 16 | pinned_hbm | — | page-pinned HBM for host-visible DMA (repacker may not relocate) | 1024 B DMA floor + host pin | HBM (pinned) | hbm-dma-alignment.md |
NOTE — † The AS-id column maps each TensorCore tier to the physically corresponding SparseCore pool, not to an identity.
imem(LLO 7) ↔ SCtimem(AS 214) andsparse_core_sequencer_sflag/_smem(LLO 12/14) ↔ SCsflag_scs/smem_scs(AS 223/224) are physical-identity correspondences, not arithmetic conversions. ‡ The SparseCore sequencer banks are the per-SCS variants; the AS-id band also carries per-tile (AS217 sflag_tile,AS219 smem_tile) and chip-shared (AS202 spmem) SC pools that have no LLOMemorySpaceequivalent (they live only inside the SparseCore LLVM lowering). See the SparseCore AS section below.
GOTCHA — the
MemorySpaceToStringtable at0x21ce6b08does not stop at index 16. Indices 17/18/19 resolve toabsolute,heap_relative(0x8678cad), andstack_relative(0x8678cbb) — pointer-relativity tags appended to the same string array, not memory pools. They belong to theLloAddressrelocation model. A reimplementation that sizes the enum by the string-table length, or treatsabsolute/heap_relative/stack_relativeas tiers, is wrong: the canonical region enum is exactly 17 values.
The LLO MemorySpace Enum (TensorCore Operand Tag)
Decoder and ground truth
MemorySpaceToString is the ground truth for the integer→region mapping: it is a single indexed load with no bounds check, so the enum value is a direct array index and the over-long table is shared with the relativity tags.
// xla::jellyfish::MemorySpaceToString(MemorySpace ms) sub_1D6FFAE0, 14 bytes
const char *MemorySpaceToString(int ms):
return (&off_21CE6B08)[ms]; // off_21CE6B08[ms] — no bounds check
The ordinal assignment is re-verified four independent ways, all byte-exact and mutually consistent:
| Probe | Function | What it pins |
|---|---|---|
| String-table index | MemorySpaceToString @ 0x1d6ffae0 | off_21CE6B08[ms] flat lookup |
| DMA-render switch | MemorySpaceToDriverResource @ 0x1d6223e0 | input ordinals: 1=hbm,2=hib,3=vmem,4=cmem,5=smem,6=sflag,7=imem,8..11=barna_core_*,12..16=sparse_core_* (FATAL on cmem + SC) |
| CMEM constant ctor | LloAddress::MakeCmemConstant @ 0x1d60ba20 | LloAddress(MemorySpace=4, off) → cmem = 4 |
| SC-seq SMEM ctor | LloAddress::MakeSparseCoreSequencerSmemConstant @ 0x1d60bc60 | LloAddress(MemorySpace=14, off) → sparse_core_sequencer_smem = 14 |
The MemBanks(MemorySpace) overrides independently confirm the mid-range ordinals: GhostliteTarget::MemBanks (0x1d4969c0) returns 32 for ms==3 and 8 for ms==5, FATAL otherwise — i.e. kVmem=3, kSmem=5. PufferfishTarget::MemBanks (0x1d493900) indexes qword_B5305C8[ms-3] = {16,32,8} over ms ∈ {3,4,5} — i.e. vmem/cmem/smem = banks 16/32/8.
The DMA-render numbering is a third, distinct integer space
MemorySpaceToDriverResource (0x1d6223e0) maps the LLO enum to a hardware driver-resource id stamped into a DMA descriptor's address word. It is not the enum value and it traps on cmem and the SparseCore spaces:
// xla::jellyfish::MemorySpaceToDriverResource(MemorySpace ms) sub_1D6223E0
function MemorySpaceToDriverResource(ms):
switch ms: // ms = the 17-value LLO MemorySpace enum
case 0 (<no space>): return 10
case 1 (hbm): return 2
case 2 (hib): return 3
case 3 (vmem): return 4
case 4 (cmem): FATAL("Unsupported memory space") // memory_space.cc:31 — not DMA-addressable here
case 5 (smem): return 6
case 6 (sflag): return 0
case 7 (imem): return 5
case 8 (barna_core_bmem): return 7
case 9 (barna_core_smem): return 9
case 10 (barna_core_sflag): return 1
case 11 (barna_core_imem): return 8
case 12..16 (sparse_core_*): FATAL("Unsupported memory space") // memory_space.cc:49
The sflag → render id 6 ordering this switch implies is the same one SflagImmPtr (0x1d5185a0) bakes into its pointer: it passes render-space 6 to ImmPtr while the resulting operand still carries the kSflag(6) tag. A reimplementer must carry the MemorySpace enum end-to-end and convert to a driver-resource id only at the descriptor boundary via this explicit switch.
Wire-format remap
LLO serializes through MemorySpaceProto (descriptor @ VA 0xbf8cc80). The proto and the C++ enum name the same 17 spaces with different integers across 2..11 (hib is C++ 2 / proto 10; vmem is C++ 3 / proto 2; cmem is C++ 4 / proto 11); they agree at 0, 1, and 12..16. The full remap table and the masked DMA-validity gates live on memory-space-enum.md; a (de)serializer that conflates proto field numbers with the runtime enum silently relabels every vmem/cmem/hib buffer.
The canonical assignment, four ways anchored
NOTE — the byte-exact ordinal assignment is the 17-value table at the top of this page, anchored four independent ways above (
MemorySpaceToString,MemorySpaceToDriverResource,MakeCmemConstant,MakeSparseCoreSequencerSmemConstant). The boundary cases a reimplementer most often gets wrong:sflag = 6(not 7) withimem = 7;sparse_core_sequencer_sflag = 12andhost = 13andsparse_core_sequencer_smem = 14(the sequencer SFLAG/SMEM ordinals are not adjacent —hostsits between them);hib = 2andpinned_hbm = 16(there is nokPinnedHbmat slot 2). overview.md §2 carries the samekNone=0 … kPinnedHbm=16assignment.
SparseCore Address Spaces
Two number spaces, disjoint by construction
The SparseCore LLVM backend tags every pointer with a numeric address-space ID — the N in !llvm.ptr<N> — drawn from a sparse, banded range: 0 (inherited scalar memory), 201..225 (0xC9..0xE1, the SC-specific pools and *Any alias supersets), and 501/502 (0x1F5/0x1F6, the two CBREG circular-buffer windows). Each ID maps 1:1 onto a 1-based mlir::sparse_core::MemorySpace enum value (22 values, value-8 gap). The conversion is byte-exact and self-inverse:
// AddressSpaceToMemorySpace(uint id) sub_14B78800 (low 32 bits of 0x1_0000000N = MS)
// MemorySpaceToAddressSpace(MemorySpace ms) sub_14B78780
// guard: (ms-1) > 0x15 || ((0x3FFF7F >> (ms-1)) & 1) == 0 -> FATAL("Unsupported memory space")
// return dword_AF36CE8[ms-1]
The validity mask 0x3FFF7F is the bit-set of the 22 valid MemorySpace values with the value-8 gap clear; ms-1 > 0x15 bounds the table.
The AS-ID master table
MS# is the 1-based mlir::sparse_core::MemorySpace; tile? is IsOffTileMemory == false, true only for MS 2 and MS 18. A blank MS# means the ID is an alias-analysis grouping or a reserved gap with no physical pool.
| AS# | hex | Pool (stringifyMemorySpace) | MS# | tile? | Notes |
|---|---|---|---|---|---|
| 0 | 0x00 | smem | 1 | off | inherited base TPU scalar memory |
| 201 | 0xC9 | tile_spmem | 2 | ON | per-tile SC SRAM (KB) |
| 202 | 0xCA | spmem | 3 | off | chip-shared SC SRAM (MB) |
| 203 | 0xCB | hbm | 4 | off | global (GB) embedding tables |
| 204 | 0xCC | sflag | 5 | off | sync-flag memory (MS 22 sflag_tc also maps here) |
| 205 | 0xCD | vmem | 6 | off | TC vector memory (TC↔SC handoff) |
| 206/207 | 0xCE/0xCF | — | — | — | reserved gap |
| 208 | 0xD0 | dreg | 7 | off | data-register window |
| 209/210 | 0xD1/0xD2 | — | — | — | reserved gap |
| 211 | 0xD3 | — (SflagAny) | — | off | sflag may-alias superset (no pool) |
| 212 | 0xD4 | smem_any | 9 | off | smem may-alias superset |
| 213 | 0xD5 | hbm_any | 10 | off | hbm may-alias superset |
| 214 | 0xD6 | timem | 11 | off | per-tile instruction memory |
| 215 | 0xD7 | simem | 12 | off | SC instruction memory (empty desc) |
| 216 | 0xD8 | iova | 13 | off | I/O virtual address (GB) |
| 217 | 0xD9 | sflag_tile | 14 | off | per-tile sflag bank |
| 218 | 0xDA | spmem_any | 15 | off | spmem may-alias superset |
| 219 | 0xDB | smem_tile (TileSmem) | 16 | off | per-tile SMEM (KB) |
| 220 | 0xDC | mar | 17 | off | memory-access-region (empty desc) |
| 221/222 | 0xDD/0xDE | — | — | — | reserved gap |
| 223 | 0xDF | sflag_scs | 20 | off | per-SCS sflag bank |
| 224 | 0xE0 | smem_scs | 21 | off | per-SCS SMEM (KB) |
| 225 | 0xE1 | — (SflagAnySynctile) | — | off | sflag-any-synctile (no pool) |
| 501 | 0x1F5 | tile_spmem_cb | 18 | ON | CBREG-windowed TILE_SPMEM |
| 502 | 0x1F6 | smem_cb | 19 | off | CBREG-windowed SMEM |
NOTE — the on-tile gate is a single masked compare:
IsOffTileMemory(ms) = (ms & ~0x10) != 2(0x13d7ac00). Clearing bit 4 folds MS 2 (tile_spmem) and MS 18 =0x12(tile_spmem_cb) together, so only those two are on-tile; every other pool requires a DMA/stream/sync to reach. This is the predicate the DMA and stream lowerings consult before selecting a data-movement intrinsic, and it is why a TEC needs the tile-id cast to turn an on-tileTileSpmem(201) pointer into an off-tile-addressableSpmem(202) pointer.
The *Any may-alias canonicalisation
Four IDs (211 SflagAny, 212 SmemAny, 213 HBMAny, 218 SpmemAny, plus the synthetic 225 SflagAnySynctile) carry a description but no MemorySpace pool — they are alias-analysis groupings the backend widens a pointer to when its exact tile or core is statically unknown. GetAnyTypeFromAddressSpace(int) (0x1357b400) canonicalises a concrete ID to its wildcard; calling it on a leaf or already-wildcard space LOG(FATAL)s, so it is total only over the concrete spaces below:
| concrete ID (pool) | → canonical ID (superset) |
|---|---|
201 tile_spmem, 202 spmem | 218 SpmemAny |
203 hbm | 213 HBMAny |
204 sflag | 211 SflagAny |
205 vmem | 205 vmem (self — no separate wildcard) |
219 smem_tile, 0 smem | 212 SmemAny |
This is the SparseCore answer to the fat-pointer problem: a pointer into HBM/SPMEM whose owning tile is a runtime value cannot be proven disjoint from another, so the backend assigns both the *Any superset and lets alias analysis treat them as may-alias. The concrete-vs-Any distinction is what keeps statically-resolved tile-local accesses from being pessimised.
The AS7/8/9 fat-pointer reserve is dead
The TPU DataLayout (@ 0x973de15) carries a p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32 … ni:7:8:9 fragment — the 160/128/192-bit AMDGPU buffer-fat-pointer family — inherited verbatim because the TPU TargetMachine shares LLVM's AMDGPU ABI fragment. No TPU or SparseCore op ever constructs an AS7/8/9 pointer. A SparseCore pointer is at most a 64-bit LLVM ptr (default p:64:64) carrying a 32-bit word offset; the routing a fat pointer would pack into bits rides as separate SSA operands instead (tpu_tileid for on-tile TEC casts, destination-id for remote). The full negative result, the operand-arity split, and the value-preserving addrspacecast lowering are owned by fat-pointers-as789.md.
GOTCHA — do not allocate SparseCore address-space numbers from
{7,8,9}. Allocate from{0, 201..225, 501, 502}. The two ranges are disjoint, and a reimplementation that drives off thep7/p8/p9DataLayoutentries will look for a constructor that does not exist.
Alignment, Geometry, and the Allocator Model
One allocator class, per-tier Config
Every runtime tier — HBM, VMEM, CMEM, SMEM, SFLAG — is a single tpu::BestFitAllocator instance (208 B, ctor 0x1e817500), distinguished only by a 32-byte MemoryAllocator::Config{base_offset, allocatable_range_end, alignment, granule}. There is no HbmAllocator/VmemAllocator/SmemAllocator class and no per-TpuVersion branch inside the allocator: every per-codename divergence is data carried in chip_parts.binarypb and surfaced as the Config triple. The allocate/deallocate algorithm (boundary-tag SwissTable + size-ordered free RB-tree, best-fit lower_bound, eager bidirectional coalescing, no min-split-remainder) is documented once on hbm-allocator.md.
Per-tier alignment / geometry
| Tier | base_offset | alignment | granule | Geometry source |
|---|---|---|---|---|
| HBM | 0 | 1024 B DMA floor (kHbmMinimumDmaAlignment); 16 KiB compile-time (xla_jf_program_hbm_alignment_in_kib) | chip_parts HBM granule | dual-quantum; DMA floor enforced at issue (WritePremappedHbm) + descriptor (SetHbmAddress, fatal) |
| VMEM | 0 | VmemAlignmentBoundaryInBytes() — ChunkBytes (JF) / max(GranuleBytes, VmemWordSizeBytes) (PF/VF/GL) | VmemWordSizeBytes() (Target+0x50C) | ChunkBytes = 4·topology.word_count (0x1d619f40) |
| CMEM | 0 | CmemWordSizeBytes() (Target+0x510, ~16 B PF) | CmemWordSizeBytes() | alignment == granule; Pufferfish only |
| SMEM | 0 | SmemWordSizeBytes() (4 B; Target+0x508) | SmemWordSizeBytes() | word-flat; SmemWordImmPtr asserts word == 4 B |
| SFLAG | 0 | SflagWordSizeBytes() (Target+0x504) | SflagWordSizeBytes() | byte_off = 4·n per flag; log2 cached Target+0x4c8 |
| Host (premapped) | per-partition partition_size·i | 4 KiB if ≤ 2 MiB, else 2 MiB (PickPageAlignment) | = alignment | PremappedMemoryManager over posix_memalign |
| Host (BFC offload) | 0 | ≥ 16 B (posix_memalign) | 2 MiB region growth | tsl::BFCAllocator (256 GiB cap) |
GOTCHA — HBM has two alignment numbers and confusing them silently corrupts a DMA.
kHbmMinimumDmaAlignment = 1024 Bis the hardware floor: every DMA site masks with& 0x3FFand rejects a non-zero remainder (recoverableRetCheckat issue, fatalCHECKat descriptor). The 16 KiB compile-time figure rounds every program-level HBM tensor up before MSA places it. The 1024-B floor is the wire contract; the 16-KiB rule is the placement contract. See hbm-dma-alignment.md.
NOTE — the on-chip tiers (VMEM/CMEM/SMEM/SFLAG) all set
alignment == granule == <tier>WordSizeBytes()andbase_offset == 0. Only HBM separates alignment from granule, and only the host premapped manager uses a non-zerobase_offset. The numeric per-codename word/byte sizes live inchip_parts.binarypband are not in.text; the formulas and field offsets above are exact, the literals are not asserted. The(sublane, lane)on-chip tile geometry that buffers pad to —(8, 128)on Trillium/v5+,(16, 128)on v4 — is owned by tpu-buffer-layout.md.
Per-generation on-chip bank counts
The one piece of on-chip geometry that is baked into .text (not chip_parts) is the bank count, returned by the per-Target MemBanks(MemorySpace) virtual. The bank index for a byte offset B is (B / <tier>WordSizeBytes) mod MemBanks(tier). Banking is an access-scheduling property, not an allocation property — the allocator hands out byte offsets and the LLO bundle packer derives the (bank, sub-bank) coordinate at issue time. Decompile-confirmed:
| Target (gen) | VMEM (MS 3) | CMEM (MS 4) | SMEM (MS 5) | MemBanks accessor |
|---|---|---|---|---|
| JellyfishTarget (v2) | 8 | — (LogFatal) | 2 | 0x1d48fc80 |
| PufferfishTarget (v4) | 16 | 32 | 8 | 0x1d493900 (qword_B5305C8[ms-3]={16,32,8}) |
| ViperfishTarget (v5p) | 32 | — (LogFatal) | 8 | 0x1d4999c0 |
| GhostliteTarget (v6e) | 32 | — (LogFatal) | 8 | 0x1d4969c0 |
Pufferfish is the only generation where MemBanks(kCmem) returns a value rather than LogFatal — the structural marker that CMEM is a real tier only on Pufferfish (PXC, TPU v4). Viperfish (0x1d4999c0) is graded HIGH (symbol-table body, source-identical to the confirmed Ghostlite shape, not separately re-read).
MSA management is VMEM/CMEM-only
Only VMEM (and CMEM on Pufferfish) is MSA-managed — the kAlternate/kDefault tug-of-war that colors HloValues. SMEM is placed by scalar load/store opcode semantics (the operand declares MemorySpace=kSmem); SFLAG is placed out of a fixed number-space partition, never the byte heap. All tiers nonetheless flow through the same ProgramMemoryAllocator → ProgramMemoryMetadata_Allocation proto → CreateFromProto → BestFitAllocator hand-off. A reimplementer who routes SMEM/SFLAG through the MSA cost model will mis-place them.
The buffer-layout sequencer-SMEM label
NOTE — the
ShapeSizeBytesRaw(0x1d6add40) untiled-dense branch testsColorToMemorySpace(layout.memory_space) == 12. The constant12issparse_core_sequencer_sflagin the canonical LLO enum, notsparse_core_sequencer_smem(which is14, byte-confirmed byMakeSparseCoreSequencerSmemConstant@0x1d60bc60) — the two are off-by-one neighbours and easy to mislabel. The branch routes asparse_core_sequencer_sflag-colored buffer to the dense, untiled byte-size path.ColorToMemorySpace(0x1d6ffb00) is abyte_B5435CA[color]remap withcolor < 0xA, so its output is the canonicalMemorySpaceenum — the12is an enum value, not a raw layout color.
Cross-References
- memory-space-enum.md — the 17-value LLO
MemorySpaceenum, theMemorySpaceToStringdecoder, the proto↔enum remap, and the masked DMA-validity gates; the authority for the ordinals on this page - overview.md — the six-region taxonomy, the universal
BestFitAllocator, and the compile-time→runtime hand-off; its §2 carries the same canonicalkNone=0 … kPinnedHbm=16ordinal assignment used here - hbm-allocator.md — the universal best-fit allocate/deallocate algorithm shared by every tier
- hbm-dma-alignment.md — the 1024-B DMA floor vs. the 16-KiB compile-time program alignment
- vmem-allocator.md — the
kAlternatefast tier; per-gen VMEM size/word/bank/alignment formulas - cmem-pool.md — the Pufferfish-only constant-memory operand pool;
MemBanks(kCmem)=32, thexla_tpu_cmem_*family - smem-scalar-memory.md — the SPU scalar tier (
kSmem=5);SmemWordImmPtr, opcode-driven placement, the BarnaCore SMEM sibling - smem-register-window.md — why no SMEM register window exists; the flat 32-entry SREG file and CBREG/OperandWindow disambiguation
- sflag-protocol.md — the sync-flag atomic tier (
kSflag=6); the4·nstride, counter/done-bit semantics, theVsync*/Vwait*primitives - tpu-buffer-layout.md — how a logical XLA buffer maps to padded, tiled physical offsets in these tiers (its §4 sequencer-SMEM
12label is thesparse_core_sequencer_sflagenum value, not SMEM — see the note above) - address-space-ids.md — the full SparseCore AS-ID table, the
*Anymay-alias canonicalisation, andCheckAddressSpaces - fat-pointers-as789.md — the dead AS7/8/9 fat-pointer reserve and the actual 64-bit/32-bit-word SparseCore pointer representation
- chip-parts-binarypb.md — the boot-time resource that supplies the per-codename size/word/granule literals absent from
.text - per-gen-comparison-matrix.md — the per-generation feature/geometry comparison this table feeds
- back to index — Part XVII — Appendices