getSequencerType — SCS / TAC / TEC Engine Selection
Every function address, enum value, attribute-string byte pattern, and routing constant on this page was read from
libtpu.soin thelibtpu-0.0.40-cp314wheel (build-id89edbbe81c5b328a958fe628a9f2207d; buildlibtpu_lts_20260413_b_RC00) — from the decompiled C++ of the named functions, the demangled symbol table, the embedded proto-descriptor strings, and the.rodatajump table atoff_22010DE0. Other versions differ.
Abstract
The SparseCore back-end places every lowered op onto exactly one of the three SparseCore sub-engines — SCS (scalar control), TAC (tile-access / DMA), or TEC (vector compute) — and the binary records this placement as a single per-function string attribute named sc.sequencer. There is no monolithic "selector" that reasons about an op and returns an engine; instead the decision is split across three layers that this page documents end to end:
- A policy classifier,
GetTransferKind(@0x1351b140), that decides whether an off-tile data movement is a Stream (indirect gather/scatter) or a DMA (bulk contiguous) transfer, keyed on the source/destination memory-space pair plus one SparseCore capability bit. - A region outliner (
LowerSequencerFunctionsPass/OutlineSequencerFunction) that stamps each per-engine outlinedfunc.funcwithsc.sequencer = "scs"/"access"/"execute". - A trivial attribute accessor,
LowerMemrefToMlo::getSequencerType(@0x13507760), that reads that string back so later passes pick the matching per-engine bundle codec (SparseCoreScsCodecBase/…TacCodecBase/…TecCodecBase).
Sitting underneath all of this is the tpu::TpuSequencerType C++ enum and its TpuSequencerTypeToString jump table — the runtime numbering used to size per-engine resource arrays. That C++ enum numbers SparseCore engines {SCS=3, TAC=4, TEC=5}, identical to the codec template-parameter numbering. The off-by-one peer is the protobuf enum TpuSequencerTypeProto, which inserts an INVALID=0 slot and so numbers {SCS=4, TAC=5, TEC=6}; a TpuSequencerTypeFromProto switch subtracts one to cross from proto into the C++ enum. This page reconciles all three.
The reimplementation contract:
- The op-level engine tag is a string, not a number. Every op's engine membership is the
sc.sequencerStringAttr on its enclosing outlined function — one of exactly three byte-confirmed values"scs"/"access"/"execute". A reimplementer must route on the string; the two numericTpuSequencerTypeenums never appear at the op. getSequencerTypeis an accessor, not a decision.LowerMemrefToMlo::getSequencerType(Operation&)returnsoptional<StringRef>— it readssc.sequencer(via inherent-attr then dictionary-attr fallback) and returns its value, ornullopt. The decision was made upstream byGetTransferKind+ outlining.- Stream-vs-DMA is decided by memory spaces + one capability bit.
GetTransferKindnormalizes eachmlir::sparse_core::MemorySpace, jump-tables on the source space, and sets kStream only when the destination space is in the HBM/SPMEM/TILE_SPMEM gather set (bitmasks0x210018/0x210004) and the target reports the SparseCore-variable capability (vtable[+0xa0], =0 in this wheel); otherwise kDma, with anInvalidArgumentdiagnostic for an illegal pair. - Two numberings, off by one — but the split is proto-vs-C++, not runtime-vs-codec. The C++
tpu::TpuSequencerTypeenum (the oneTpuSequencerTypeToStringrenders and the one the codecEncoderBasenon-type template parameter uses) numbers{TC=0, BARNA=1, BARNA_ADDR=2, SCS=3, TAC=4, TEC=5}— there is noINVALIDslot at index 0, and SCS is3, the same as the codec template. The off-by-one peer is the protobuf enumTpuSequencerTypeProto, which does begin withINVALID=0and so numbers{TC=1, BARNA=2, BARNA_ADDR=3, SCS=4, TAC=5, TEC=6, SCv0=7/8}.TpuSequencerTypeFromProto(@0x20b36300) is the one-subtracting bridge (proto4→ C++3, etc.). Thegfc(6acc60406) family carries codec params 3 and 5 only — itsSparseCoreTacCodecBaseis entirely absent.
| What it is | The SparseCore op→engine assignment mechanism (SCS / TAC / TEC) |
| Op-level tag | sc.sequencer StringAttr (12-char name); values "scs" / "access" / "execute" |
| Accessor | LowerMemrefToMlo::getSequencerType @0x13507760 → optional<StringRef> |
| Policy classifier | xla::tpu::sparse_core::GetTransferKind @0x1351b140 (kStream vs kDma) |
| Outliner | LowerSequencerFunctionsPass::runOnOperation @0x13532120; OutlineSequencerFunction |
| Enum | tpu::TpuSequencerType; TpuSequencerTypeToString @0x20b362e0 over off_22010DE0 |
| C++ enum (ToString + codec) | TC=0 · BARNA=1 · BARNA_ADDR=2 · SCS=3 · TAC=4 · TEC=5 (no INVALID slot) |
| Codec template enum | SCS=3 · TAC=4 · TEC=5 (EncoderBase<…, TpuSequencerType=N>) — same as the C++ enum |
| Proto enum (off-by-one peer) | TpuSequencerTypeProto: INVALID=0 · TC=1 · BARNA=2 · BARNA_ADDR=3 · SCS=4 · TAC=5 · TEC=6 · SCv0=7/8 |
6acc60406 (gfc) | No TAC: gfc::…SparseCoreTacCodecBase = 0 files; only codec params 3 & 5 |
| Confidence | CONFIRMED (function-byte / symbol / string-table anchored) unless a row says otherwise |
For the engine roles themselves see SparseCore Overview and Architecture; for what the outliner produces see Region → Sequencer Outliner.
The Three Layers at a Glance
A single off-tile memory op (tpu.enqueue_dma, tpu.enqueue_indirect_dma) traverses three decision layers before it lands in a sequencer-specific bundle:
tpu.enqueue_dma / tpu.enqueue_indirect_dma (TC-framework op, has memref operands)
│
▼ LowerMemrefToMlo::lowerEnqueueDma @0x135105a0
LowerMemrefToMlo::lowerEnqueueIndirectDma @0x13511da0
│
▼ getTransferKind<EnqueueDMAOp> @0x135114a0 → GetTransferKind @0x1351b140
│ (srcMemSpace, dstMemSpace, local/remote bits, capability)
│
┌───┴────────────────────────┐
kStream ([result+8]=1) kDma ([result+8]=0 / InvalidArgument)
gather/scatter slot SparseCoreDma bulk slot
│ │
▼ ▼
emitted into a TileTask region (Access vs Execute)
│
▼ LowerSequencerFunctionsPass / OutlineSequencerFunction
stamps the outlined func.func with:
sc.sequencer = "scs" (control sequencer → SCS)
sc.sequencer = "access" (tile-fetch / gather → TAC) [VF/GL only]
sc.sequencer = "execute" (vector compute → TEC)
│
▼ later passes:
LowerMemrefToMlo::getSequencerType(op) @0x13507760 → reads sc.sequencer
→ select per-engine codec: SparseCoreScsCodecBase / …Tac… / …Tec…
Layer 1 (GetTransferKind) answers "is this a gather/scatter or a bulk copy?". Layer 2 (the outliner) answers "which engine's program does this op belong to?" and writes the answer as a string. Layer 3 (getSequencerType) is the read-back that drives codec selection. The three numeric TpuSequencerType values exist only at the resource-sizing layer (per-engine bundle-limit tables), never at the op.
NOTE — there is no
getSequencerTypethat returns SCS/TAC/TEC from an op's opcode. A reimplementer expecting aswitch(op.kind)selector will not find one. The mapping op→engine is fully materialized as thesc.sequencerstring on the outlined function, and ops inherit their engine from the function they were outlined into (enforced by theParentFuncHasCoreSequencerTypeAttributetrait, below).getSequencerTypeonly re-reads that string.
Layer 3: The getSequencerType Accessor
The named function is the simplest of the three layers and the one this page is titled after. Decompiled (@0x13507760), it is a string-attribute getter returning a 17-byte optional<StringRef> (8-byte data ptr, 8-byte length, 1-byte present flag):
// mlir::tpu::LowerMemrefToMlo::getSequencerType(this=result, op) @0x13507760
// result layout: [0]=StringRef.data, [8]=StringRef.size, [16]=present
optional<StringRef> getSequencerType(Operation& op) {
// 1. fast path: inherent attr if the op's registered-info bit is set
// (*((u32*)op + 11) >= 0x1000000) AND getInherentAttr succeeds
Attribute a = op.getInherentAttr("sc.sequencer", /*len=*/12);
// 2. fallback: dictionary attr lookup on the op's attr dict (op + 56)
if (!a) a = op.getDiscardableAttrDictionary().get("sc.sequencer", 12);
// 3. must be a StringAttr (TypeID check against StringAttr::id)
if (a && typeid(a) == StringAttr::id) {
result = { StringAttr::getValue(a), /*present=*/1 }; // "scs"/"access"/"execute"
} else {
result = { /*present=*/0 }; // nullopt
}
return result;
}
Two structural facts to preserve:
- The attribute name is the 12-character string
"sc.sequencer"(the literal and length12are baked into thegetInherentAttr(a2, "sc.sequencer", 12)call in the decompiled body). The identical name+length pair appears inHasCoreSequencerTypeAttributeandHasExecuteSequencerTypeAttribute(below), confirming all three read the same attribute. - The two-step inherent→dictionary lookup mirrors MLIR's split between inherent attributes (declared on the op definition) and discardable dictionary attributes. The accessor accepts either, so the outliner may attach
sc.sequencerthrough whichever path is convenient for the op kind.
| Property | Value |
|---|---|
| Function VA | 0x13507760 |
| Attribute name / length | "sc.sequencer" / 12 |
| Return type | optional<StringRef> (data, size, present-byte at +16) |
| Lookup order | inherent attr, then discardable dictionary attr |
| Type guard | StringAttr TypeID (TypeIDResolver<StringAttr>::id) |
| Returned values | "scs" / "access" / "execute" |
The Attribute Values — Byte-Confirmed
The three legal sc.sequencer values are not just strings in a table — two of them are matched by dedicated predicate functions whose decompiled byte-comparisons pin the exact spelling and length. ScDialect::HasCoreSequencerTypeAttribute (@0x14599ec0) and ScDialect::HasExecuteSequencerTypeAttribute (@0x1459a020) both reuse the same sc.sequencer (len-12) accessor, then compare the StringAttr value against a length and a packed byte literal:
// HasCoreSequencerTypeAttribute @0x14599ec0 — value == "scs"
if (len == 3)
return ( (*(u16*)v ^ 0x6373) | (*(u8*)(v+2) ^ 0x73) ) == 0; // 's','c' | 's'
// 0x6373 LE = bytes {0x73='s', 0x63='c'}; v[2]=0x73='s' → "scs"
// HasExecuteSequencerTypeAttribute @0x1459a020 — value == "execute"
if (len == 7)
return ( (*(u32*)v ^ 0x63657865) | (*(u32*)(v+3) ^ 0x65747563) ) == 0;
// 0x63657865 LE = {'e','x','e','c'}; (v+3) 0x65747563 LE = {'c','u','t','e'}
// overlapping at offset 3 → "exec"+"cute" = "execute"
Decoding the little-endian masks:
sc.sequencer value | Engine | Length | Byte-literal evidence | Predicate |
|---|---|---|---|---|
"scs" | SCS (scalar control) | 3 | 0x6373="sc", 0x73="s" | HasCoreSequencerTypeAttribute @0x14599ec0 |
"execute" | TEC (vector compute) | 7 | 0x63657865="exec", 0x65747563="cute" | HasExecuteSequencerTypeAttribute @0x1459a020 |
"access" | TAC (tile-access / DMA) | 6 | — (no dedicated Has* predicate) | — |
GOTCHA —
"access"has no dedicated predicate. SCS and TEC each get aHas…SequencerTypeAttributetest because the SC-MLO pipeline operates on the SCS↔TEC boundary; the third value"access"(TAC) carries noHas*function in this binary. This is consistent with the 6acc60406 (gfc) family having dropped TAC altogether — on the newest gen the work that would land in an"access"function is folded into the"execute"function (see TAC Engine and Region → Sequencer Outliner). A reimplementer that only models the SCS/TEC pair will produce correct 6acc60406 code; the"access"value is needed only for Viperfish/Ghostlite.
The parent-function trait
The sc.sequencer attribute lives on the outlined function, not on individual ops. Ops that require it to exist carry the OpTrait::ParentFuncHasCoreSequencerTypeAttribute trait (verified for TileTaskWaitOp at @0x14689880; the same trait is attached to the TileTask family). The shared check is ParentHasSequencerTypeAttribute (@0x1353e980):
// ParentHasSequencerTypeAttribute @0x1353e980
// walk parent ops until the enclosing LLVMFuncOp, then test BOTH predicates
for (op = start; ; op = op->getBlock()->getParentOp()) {
if (!op) return false;
if (typeid(*op) == LLVMFuncOp::id) break; // reached the outlined func
}
h_core = HasCoreSequencerTypeAttribute(func); // "scs"?
h_exec = HasExecuteSequencerTypeAttribute(func); // "execute"?
// require BOTH predicate calls to have evaluated (present bit 0x100 set on each),
// then return their OR of the low (match) bits
return (h_core & 0x100) && (h_exec & 0x100) ? (h_core | h_exec) & 1 : false;
So the trait climbs the op tree to the enclosing LLVM::LLVMFuncOp and asserts that function is tagged either "scs" or "execute". This is the binary's enforcement that every TileTask op runs inside a function whose engine is known at verify time — engine membership is a function-scoped property, not a per-op field.
Layer 1: GetTransferKind — Stream vs DMA
Before an op can be outlined into an engine, the lowering must decide whether it is a Stream (indirect gather/scatter, the embedding datapath) or a DMA (contiguous bulk move). That is xla::tpu::sparse_core::GetTransferKind (@0x1351b140), reached from LowerMemrefToMlo::lowerEnqueueDma (@0x135105a0) and lowerEnqueueIndirectDma (@0x13511da0) via the typed wrappers getTransferKind<EnqueueDMAOp> (@0x135114a0) and getTransferKind<WaitDMA2Op> (@0x135145e0).
Its signature (demangled): GetTransferKind(const jellyfish::Target&, mlir::sparse_core::MemorySpace src, MemorySpace dst, bool, bool, bool, bool) returning FailureOr<TransferKind>. The decompiled body:
// GetTransferKind @0x1351b140 (args: target a2; src a3; dst a4;
// a5=src-local, a6=dst-local, a7=capability-allowed-flag, a8=strict-ordering)
// 1. normalize spmem: a space encoded as 1 maps to (16 if a7 else 21)
if (src == 1) src = 5*(a7 ^ 1) + 16; // → 16 (cap) or 21 (no cap)
if (dst == 1) dst = 5*(a7 ^ 1) + 16;
// 2. kStream only when BOTH endpoints are local (a6 & a5 == 1)
if ((a6 & a5) == 1) {
switch (src) { // jump table on the source memory space
case 2: /* HBM */ ... if (dst<=0x15 && bittest(0x210018, dst)) ok; // 2162712
else if (dst==6) ok only if target.vtable[+0xa0]() // SupportsScVar
case 3: /* HBM_4B*/ ... if (dst<=0x15 && bittest(0x210004, dst)) ok; // 2162692
case 4: case 21: ... if (dst==2) ok; // → HBM
case 6: ... ok only if a7 && dst==2 && target.vtable[+0xa0]()
case 16: /* SPMEM */ ... if (a7 && ((dst-2)&~2)==0) ok; // dst in {2,4}
default: goto kDma;
}
result.kind = kStream; result.present = 1; // [this+8]=1; [this]=1
return;
}
// 3. otherwise kDma — but only for a recognized legal contiguous pair;
// an unrecognized pair builds an InvalidArgument status
kDma:
if (legal_dma_pair(src, dst, a6)) { result.kind = kDma; [this]=1; return; }
// diagnostic (transfer_emitter.cc:196):
return InvalidArgument(
"SparseCore does not support transfers with %s ordering from %s %v to %s %v "
"issued %sfrom TEC.", ordering, srcLocality, src, dstLocality, dst, fromTec);
The routing constants matter for reimplementation:
| Mechanism | Value | Meaning |
|---|---|---|
| spmem normalization | src/dst==1 → 5*(¬cap)+16 | encoded 1 → 16 (cap) or 21 (no cap) |
| both-local gate | (dst_local & src_local) == 1 | kStream requires both endpoints local |
| HBM dst-set bitmask | 0x210018 (2162712) over dst | gatherable destinations from HBM source |
| HBM_4B dst-set bitmask | 0x210004 (2162692) over dst | gatherable destinations from HBM_4B |
| capability slot | target vtable +0xa0 | the SupportsScVar predicate |
| kStream result | [result+0]=1, [result+8]=1 | FailureOr success + kind=kStream |
| kDma result | [result+0]=1, [result+8]=0 | FailureOr success + kind=kDma |
| illegal-pair diag | transfer_emitter.cc:196 InvalidArgument | "SparseCore does not support transfers…" |
NOTE — the capability bit is the
SupportsScVarpredicate, and it is 0 in this wheel. The cases that gate ontarget.vtable[+0xa0]()(src==6and thedst==6sub-branch of HBM) call a virtual method on theSparseCoreTarget; the cross-references resolve this slot toSupportsScVar(Ghostlite0x1d499340, Viperfish0x1d49c7e0), which returns 0 for every generation shipped here. So those capability-gated Stream routes are compiled out in this build — they fall through to the kDma path. A reimplementer targeting these chips must treatSupportsScVaras false.
GOTCHA — the diagnostic says "from TEC", confirming the kDma/Stream split is TEC-centric. The
transfer_emitter.cc:196message ("…issued %sfrom TEC") and theMemorySpaceoperands show the classifier is reasoning about transfers issued from the TEC vector engine. This ties directly to IndirectVregStream being a TEC-only Stream form: the gather/scatter datapath is anchored on TEC, andGetTransferKindis the gate that decides whether an off-tile move uses that datapath (kStream) or a plain bulk descriptor (kDma).
The full memory-space enum and the per-(core,mem) DMA-destination resolution are out of scope here; see Stream Gather/Scatter for the Stream-slot descriptor and SC Backend Pipeline for where these lowerings run in the pass order.
Layer 2: Region Outlining — Where sc.sequencer Is Written
GetTransferKind's kStream/kDma result, together with the op's data dependencies, determines which TileTask region an op is emitted into. LowerSequencerFunctionsPass::runOnOperation (@0x13532120) then outlines each region into a standalone LLVM::LLVMFuncOp and stamps it with the sc.sequencer string. The decompiled pass body is large (it also builds per-engine parameter tables via GetParameterTable @0x13534ec0 and loads HBM pointers via LoadPointersFromHbm @0x13536c40); the engine-tagging step is the OutlineSequencerFunction callback that attaches the StringAttr.
The mapping the outliner produces:
| TileTask region | sc.sequencer | Engine | Carries | Present on |
|---|---|---|---|---|
| Control / sequencer | "scs" | SCS | program counter, addressing, sync-flag/atomic issue, SCS Stream/DMA slots | vfc · glc · gfc |
| Access | "access" | TAC | tile-fetch DMA issue, gather-stream issue, address staging | vfc · glc only |
| Execute | "execute" | TEC | vector reductions, pack/unpack, the TEC Stream slot (incl. IndirectVreg) | vfc · glc · gfc |
NOTE — the MLIR tile-task pipeline emits only
"scs"and"execute";"access"is a codec/proto-path engine. The"access"row above is the engine theTpuSequencerType=4TAC codec serves, and it is real on Viperfish/Ghostlite. But decompilation of the outlining callback (0x136066e0) shows it references only the"execute"value string (@0x8681624, 7 chars) andsc.sequencer— it stamps tile bodies"execute"unconditionally on every gen, and never writes"access"(there is noHasAccessSequencerTypeAttributepredicate, and no length-6"access"compare anywhere in the lowering chain). The TAC engine is reached through the legacyProgramWrapper.tacproto field and the standaloneSparseCoreTacCodecBase, not through the MLIR tile-task outliner. So on VF/GL the MLIR path also folds gather/scatter into the"execute"(TEC) function via the TEC Stream slot — the same shape as 6acc60406, which additionally has no TAC codec at all. See the TEC engine for the full byte-level account, and IndirectVregStream for the TEC-exclusive Stream form that anchors this.
The exact per-op rule that chooses the Access region versus the Execute region for a given lowered op was not bit-traced in this analysis (the GetTransferKind result plus the op's tile-data dependencies feed it). It is flagged LOW here and owned by Region → Sequencer Outliner.
The TpuSequencerType Enum and Its Jump Table
Underneath the string mechanism is the numeric tpu::TpuSequencerType enum, used to size and index per-engine resource tables (e.g. bundle limits). It is rendered to text by TpuSequencerTypeToString (@0x20b362e0), which is a pure jump table:
// tpu::TpuSequencerTypeToString(unsigned a1) @0x20b362e0
__int64 TpuSequencerTypeToString(unsigned a1) {
return (__int64) *(&off_22010DE0 + a1); // indexed array of C-string pointers
}
The off_22010DE0 table indexes directly into the string-table literals confirmed in .rodata (resolved through the array's R_X86_64_RELATIVE relocations). Their order fixes the C++ runtime numbering — and the first entry is the TensorCore sequencer, not an INVALID placeholder:
| C++ enum value (table index) | off_22010DE0[i] string literal | Short | Bundle |
|---|---|---|---|
| 0 | "TensorCoreSequencer" | TC | — |
| 1 | "BarnaCoreSequencer" | Barna | — |
| 2 | "BarnaCoreAddressHandler" | Barna-AH | — |
| 3 | "SparseCoreSequencer" | SCS | 32 B |
| 4 | "SparseCoreTileAccessCoreSequencer" | TAC | 64 B |
| 5 | "SparseCoreTileExecuteCoreSequencer" | TEC | 64 B |
The off_22010DE0 array has exactly these six SparseCore/TensorCore/BarnaCore entries; index 6 already belongs to an adjacent unrelated string array ("IMEM"). So the C++ enum carries no INVALID slot and no SCv0 entries — SCv0 lives only in the protobuf enum (below). Resolving the relocations: index 0 → 0x85b767c "TensorCoreSequencer", index 3 → 0x85b76b3 "SparseCoreSequencer", index 5 → 0x85b7690 "SparseCoreTileExecuteCoreSequencer".
The off-by-one: C++/codec enum vs the protobuf enum
The codec layer and the TpuSequencerTypeToString layer share the same C++ enum. The per-engine codecs are template-parameterized on TpuSequencerType as a non-type template argument — EncoderBase<…SparseCore{Scs,Tac,Tec}CodecBase…, TpuSequencerType=N> — and there the values are {SCS=3, TAC=4, TEC=5}, exactly matching the off_22010DE0 indices above (nm shows the demangled template literals (TpuSequencerType)3 ×32 and (TpuSequencerType)4 ×16; codec-metadata BundleSizeBytes(TpuVersion, TpuSequencerType) @0x1ecf7180 consumes the same C++ enum). There is no off-by-one between codec and runtime — both are {SCS=3, TAC=4, TEC=5}.
The off-by-one is against a third numbering: the protobuf enum TpuSequencerTypeProto (descriptor TpuSequencerTypeProto in .rodata, full TPU_SEQUENCER_TYPE_* literals), which does begin with INVALID=0 and numbers {INVALID=0, TC=1, BARNA=2, BARNA_ADDR=3, SCS=4, TAC=5, TEC=6, SCv0=7, SCv0-AH=8}. The proto→C++ bridge is tpu::TpuSequencerTypeFromProto (@0x20b36300), whose switch maps proto 1→0, 2→1, 3→2, 4→3, 5→4, 6→5 (a literal subtract-one over the SparseCore block) and rejects proto 7/8 (SCv0) as Invalid sequencer type — confirming SCv0 has no C++ enum value at all:
| Engine | sc.sequencer string | C++ enum (ToString + codec) | Proto enum |
|---|---|---|---|
| SCS | "scs" | 3 | 4 |
| TAC | "access" | 4 (vfc/glc only) | 5 |
| TEC | "execute" | 5 | 6 |
GOTCHA — the +1 is at the proto boundary, not the codec boundary. A protobuf
TpuSequencerTypeProtovalue is one greater than the C++TpuSequencerType(and codec template) value for the same engine, because only the proto enum reservesINVALID=0.TpuSequencerTypeFromProto@0x20b36300is the sanctioned conversion site (subtract one). A reimplementer that feeds a proto ordinal directly into a codec template selector will pick the wrong engine; one that feeds the C++ enum straight through is correct. The op-level assignment uses neither number — it uses thesc.sequencerstring.
Codec-base presence confirms the missing TAC on gfc
Counting decompiled SparseCore{Scs,Tac,Tec}CodecBase instantiation files per family namespace directly confirms the TAC-removal that the off-by-one table implies (gfc carries codec params 3 and 5 only, never 4):
| Family ns | Gen | SparseCoreScsCodecBase files | SparseCoreTacCodecBase files | SparseCoreTecCodecBase files |
|---|---|---|---|---|
vfc | Viperfish | 13 | 13 | 13 |
glc | Ghostlite | 30 | 30 | 30 |
gfc | 6acc60406 | 31 | 0 | 32 |
The 6acc60406 (gfc) namespace has zero SparseCoreTacCodecBase files against 13/30 for Viperfish/Ghostlite, while SCS and TEC codec bases are present. There is no codec template parameterized on TpuSequencerType=4 in gfc, so the runtime can never select a TAC engine on 6acc60406, and the "access" sequencer value is unreachable there — exactly the folding documented in Layer 2.
SCv0 — Enum-Only
The two trailing proto values (TPU_SEQUENCER_TYPE_SPARSE_CORE_V0_SEQUENCER = 7, …_V0_ADDRESS_HANDLER = 8) name the legacy monolithic SparseCore predecessor. They survive in this build only as proto-descriptor literals on TpuSequencerTypeProto — they are not in the off_22010DE0 C++ TpuSequencerTypeToString table (which stops at TEC, index 5), and TpuSequencerTypeFromProto @0x20b36300 rejects them as Invalid sequencer type, so they have no C++ TpuSequencerType value at all. No SCv0 codec, encoder, decoder, or sc.sequencer value ("scs0" etc.) ships. The engine-selection machinery never produces an SCv0 tag — getSequencerType returns only the three live values. See SparseCore Overview for the full SCv0-deprecation account.
Reimplementation Checklist
To reproduce SparseCore engine selection:
- Model
sc.sequenceras a function-scoped StringAttr with exactly three legal values"scs"/"access"/"execute". Attach it during outlining; never attach it per-op. Enforce its presence on TileTask ops via a parent-function trait. - Implement
getSequencerTypeas a pure accessor — inherent-attr lookup with dictionary-attr fallback, StringAttr type guard, returningoptional<StringRef>. It makes no decisions. - Implement
GetTransferKindas the kStream/kDma gate with the exact memory-space normalization (1 → 5*(¬cap)+16), the both-local gate, the0x210018/0x210004destination bitmasks per source space, theSupportsScVarcapability call (false on these chips), and theInvalidArgumentfallback for illegal pairs. - Outline by region, fold "access" into "execute" when TAC is absent (the 6acc60406 /
gfcfamily). Gate the existence of an"access"function on whether the target ships aSparseCoreTacCodecBase. - Keep the proto and C++ numberings separate — the C++
TpuSequencerType{SCS=3,TAC=4,TEC=5}is used by both the resource-sizing tables (TpuSequencerTypeToString, codec-metadata) and codec selection (the template parameter), so no conversion is needed between those two. The only +1 is at the protobuf boundary:TpuSequencerTypeProto{SCS=4,TAC=5,TEC=6}must pass throughTpuSequencerTypeFromProto(subtract one) before it can index any C++-enum-keyed table.
Confidence Summary
| Claim | Evidence |
|---|---|
getSequencerType is an attribute accessor returning optional<StringRef> | decompiled @0x13507760: inherent→dictionary sc.sequencer lookup, StringAttr guard |
Attribute name is "sc.sequencer" (12 chars) | getInherentAttr(…, 12) literal in all three reader functions |
"scs" → SCS, "execute" → TEC, "access" → TAC | byte-literal compares in HasCore… @0x14599ec0 / HasExecute… @0x1459a020; "access" is the third value (no predicate) |
| Engine tag is function-scoped; ops inherit via parent-func trait | ParentHasSequencerTypeAttribute @0x1353e980 walks to LLVMFuncOp; trait verified on TileTaskWaitOp @0x14689880 |
GetTransferKind selects kStream vs kDma on memory-space pair + capability | decompiled @0x1351b140: both-local gate, bitmasks 0x210018/0x210004, vtable[+0xa0], transfer_emitter.cc:196 diag |
SupportsScVar capability is 0 on these gens (capability-gated Stream routes compiled out) | vtable[+0xa0] resolves to SupportsScVar (GL 0x1d499340 / VF 0x1d49c7e0), =0 |
Feeders: lowerEnqueueDma @0x135105a0, lowerEnqueueIndirectDma @0x13511da0, getTransferKind<…> @0x135114a0/@0x135145e0 | demangled decompiled symbols present |
Outliner stamps sc.sequencer per region | LowerSequencerFunctionsPass::runOnOperation @0x13532120 + OutlineSequencerFunction |
6acc60406 (gfc) folds "access" into "execute" (no TAC) | gfc::…SparseCoreTacCodecBase = 0 files vs 13/30; no HasAccessSequencerTypeAttribute |
TpuSequencerTypeToString is a jump table over off_22010DE0 | decompiled @0x20b362e0: *(&off_22010DE0 + a1) |
C++ enum order via off_22010DE0 = {TC=0, BARNA=1, BARNA_ADDR=2, SCS=3, TAC=4, TEC=5}; no INVALID slot | R_X86_64_RELATIVE relocs resolve idx0→"TensorCoreSequencer", idx3→"SparseCoreSequencer", idx5→"SparseCoreTileExecuteCoreSequencer"; idx6→"IMEM" (adjacent table) |
| Codec template enum {SCS=3,TAC=4,TEC=5} == the C++ enum (no off-by-one) | nm (TpuSequencerType)3×32 / )4×16; codec-metadata BundleSizeBytes @0x1ecf7180 keys on the same C++ enum |
Proto enum {INVALID=0…SCS=4,TAC=5,TEC=6,SCv0=7/8}; +1 vs C++ enum, bridged by TpuSequencerTypeFromProto | TpuSequencerTypeProto descriptor literals; FromProto @0x20b36300 switch maps proto 4→3,5→4,6→5, rejects 7/8 |
| Per-op Access-vs-Execute region rule; runtime→codec conversion site | not bit-traced in this analysis |
Cross-References
- SparseCore Overview — the three engine classes, per-gen presence, and the SCv0 enum-only story.
- Architecture — engine roles and the embedding datapath that the engine split serves.
- SCS (Scalar) Engine — the
"scs"control sequencer. - TAC Engine — the
"access"tile-fetch engine and its removal on the 6acc60406 (gfc) family. - TEC (Vector) Engine — the
"execute"vector compute engine. - Region → Sequencer Outliner — the pass that partitions a computation into per-engine functions and writes
sc.sequencer. - IndirectVregStream — the TEC-only Stream form whose existence anchors the kStream datapath on TEC.
- Stream Gather/Scatter — the indirect-DMA descriptor reached on the kStream path.
- SC Backend Pipeline — where
GetTransferKind, outlining, andgetSequencerTypesit in the SparseCore pass order. - Binary:
extracted/libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64/libtpu/libtpu.so(build-id89edbbe81c5b328a958fe628a9f2207d) - Index entry: Part IX — SparseCore & BarnaCore / SparseCore engines — back to index