Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

getSequencerType — SCS / TAC / TEC Engine Selection

Every function address, enum value, attribute-string byte pattern, and routing constant on this page was read from libtpu.so in the libtpu-0.0.40-cp314 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d; build libtpu_lts_20260413_b_RC00) — from the decompiled C++ of the named functions, the demangled symbol table, the embedded proto-descriptor strings, and the .rodata jump table at off_22010DE0. Other versions differ.

Abstract

The SparseCore back-end places every lowered op onto exactly one of the three SparseCore sub-engines — SCS (scalar control), TAC (tile-access / DMA), or TEC (vector compute) — and the binary records this placement as a single per-function string attribute named sc.sequencer. There is no monolithic "selector" that reasons about an op and returns an engine; instead the decision is split across three layers that this page documents end to end:

  1. A policy classifier, GetTransferKind (@0x1351b140), that decides whether an off-tile data movement is a Stream (indirect gather/scatter) or a DMA (bulk contiguous) transfer, keyed on the source/destination memory-space pair plus one SparseCore capability bit.
  2. A region outliner (LowerSequencerFunctionsPass / OutlineSequencerFunction) that stamps each per-engine outlined func.func with sc.sequencer = "scs" / "access" / "execute".
  3. A trivial attribute accessor, LowerMemrefToMlo::getSequencerType (@0x13507760), that reads that string back so later passes pick the matching per-engine bundle codec (SparseCoreScsCodecBase / …TacCodecBase / …TecCodecBase).

Sitting underneath all of this is the tpu::TpuSequencerType C++ enum and its TpuSequencerTypeToString jump table — the runtime numbering used to size per-engine resource arrays. That C++ enum numbers SparseCore engines {SCS=3, TAC=4, TEC=5}, identical to the codec template-parameter numbering. The off-by-one peer is the protobuf enum TpuSequencerTypeProto, which inserts an INVALID=0 slot and so numbers {SCS=4, TAC=5, TEC=6}; a TpuSequencerTypeFromProto switch subtracts one to cross from proto into the C++ enum. This page reconciles all three.

The reimplementation contract:

  • The op-level engine tag is a string, not a number. Every op's engine membership is the sc.sequencer StringAttr on its enclosing outlined function — one of exactly three byte-confirmed values "scs" / "access" / "execute". A reimplementer must route on the string; the two numeric TpuSequencerType enums never appear at the op.
  • getSequencerType is an accessor, not a decision. LowerMemrefToMlo::getSequencerType(Operation&) returns optional<StringRef> — it reads sc.sequencer (via inherent-attr then dictionary-attr fallback) and returns its value, or nullopt. The decision was made upstream by GetTransferKind + outlining.
  • Stream-vs-DMA is decided by memory spaces + one capability bit. GetTransferKind normalizes each mlir::sparse_core::MemorySpace, jump-tables on the source space, and sets kStream only when the destination space is in the HBM/SPMEM/TILE_SPMEM gather set (bitmasks 0x210018 / 0x210004) and the target reports the SparseCore-variable capability (vtable[+0xa0], =0 in this wheel); otherwise kDma, with an InvalidArgument diagnostic for an illegal pair.
  • Two numberings, off by one — but the split is proto-vs-C++, not runtime-vs-codec. The C++ tpu::TpuSequencerType enum (the one TpuSequencerTypeToString renders and the one the codec EncoderBase non-type template parameter uses) numbers {TC=0, BARNA=1, BARNA_ADDR=2, SCS=3, TAC=4, TEC=5} — there is no INVALID slot at index 0, and SCS is 3, the same as the codec template. The off-by-one peer is the protobuf enum TpuSequencerTypeProto, which does begin with INVALID=0 and so numbers {TC=1, BARNA=2, BARNA_ADDR=3, SCS=4, TAC=5, TEC=6, SCv0=7/8}. TpuSequencerTypeFromProto (@0x20b36300) is the one-subtracting bridge (proto 4 → C++ 3, etc.). The gfc (6acc60406) family carries codec params 3 and 5 only — its SparseCoreTacCodecBase is entirely absent.
What it isThe SparseCore op→engine assignment mechanism (SCS / TAC / TEC)
Op-level tagsc.sequencer StringAttr (12-char name); values "scs" / "access" / "execute"
AccessorLowerMemrefToMlo::getSequencerType @0x13507760optional<StringRef>
Policy classifierxla::tpu::sparse_core::GetTransferKind @0x1351b140 (kStream vs kDma)
OutlinerLowerSequencerFunctionsPass::runOnOperation @0x13532120; OutlineSequencerFunction
Enumtpu::TpuSequencerType; TpuSequencerTypeToString @0x20b362e0 over off_22010DE0
C++ enum (ToString + codec)TC=0 · BARNA=1 · BARNA_ADDR=2 · SCS=3 · TAC=4 · TEC=5 (no INVALID slot)
Codec template enumSCS=3 · TAC=4 · TEC=5 (EncoderBase<…, TpuSequencerType=N>) — same as the C++ enum
Proto enum (off-by-one peer)TpuSequencerTypeProto: INVALID=0 · TC=1 · BARNA=2 · BARNA_ADDR=3 · SCS=4 · TAC=5 · TEC=6 · SCv0=7/8
6acc60406 (gfc)No TAC: gfc::…SparseCoreTacCodecBase = 0 files; only codec params 3 & 5
ConfidenceCONFIRMED (function-byte / symbol / string-table anchored) unless a row says otherwise

For the engine roles themselves see SparseCore Overview and Architecture; for what the outliner produces see Region → Sequencer Outliner.


The Three Layers at a Glance

A single off-tile memory op (tpu.enqueue_dma, tpu.enqueue_indirect_dma) traverses three decision layers before it lands in a sequencer-specific bundle:

  tpu.enqueue_dma / tpu.enqueue_indirect_dma        (TC-framework op, has memref operands)
            │
            ▼  LowerMemrefToMlo::lowerEnqueueDma         @0x135105a0
               LowerMemrefToMlo::lowerEnqueueIndirectDma @0x13511da0
            │
            ▼  getTransferKind<EnqueueDMAOp> @0x135114a0  →  GetTransferKind @0x1351b140
            │      (srcMemSpace, dstMemSpace, local/remote bits, capability)
            │
        ┌───┴────────────────────────┐
   kStream  ([result+8]=1)        kDma  ([result+8]=0 / InvalidArgument)
   gather/scatter slot            SparseCoreDma bulk slot
            │                              │
            ▼                              ▼
        emitted into a TileTask region (Access vs Execute)
            │
            ▼  LowerSequencerFunctionsPass / OutlineSequencerFunction
               stamps the outlined func.func with:
                 sc.sequencer = "scs"     (control sequencer  → SCS)
                 sc.sequencer = "access"  (tile-fetch / gather → TAC)   [VF/GL only]
                 sc.sequencer = "execute" (vector compute      → TEC)
            │
            ▼  later passes:
               LowerMemrefToMlo::getSequencerType(op) @0x13507760  → reads sc.sequencer
               → select per-engine codec: SparseCoreScsCodecBase / …Tac… / …Tec…

Layer 1 (GetTransferKind) answers "is this a gather/scatter or a bulk copy?". Layer 2 (the outliner) answers "which engine's program does this op belong to?" and writes the answer as a string. Layer 3 (getSequencerType) is the read-back that drives codec selection. The three numeric TpuSequencerType values exist only at the resource-sizing layer (per-engine bundle-limit tables), never at the op.

NOTE — there is no getSequencerType that returns SCS/TAC/TEC from an op's opcode. A reimplementer expecting a switch(op.kind) selector will not find one. The mapping op→engine is fully materialized as the sc.sequencer string on the outlined function, and ops inherit their engine from the function they were outlined into (enforced by the ParentFuncHasCoreSequencerTypeAttribute trait, below). getSequencerType only re-reads that string.


Layer 3: The getSequencerType Accessor

The named function is the simplest of the three layers and the one this page is titled after. Decompiled (@0x13507760), it is a string-attribute getter returning a 17-byte optional<StringRef> (8-byte data ptr, 8-byte length, 1-byte present flag):

// mlir::tpu::LowerMemrefToMlo::getSequencerType(this=result, op)  @0x13507760
//   result layout: [0]=StringRef.data, [8]=StringRef.size, [16]=present
optional<StringRef> getSequencerType(Operation& op) {
  // 1. fast path: inherent attr if the op's registered-info bit is set
  //    (*((u32*)op + 11) >= 0x1000000) AND getInherentAttr succeeds
  Attribute a = op.getInherentAttr("sc.sequencer", /*len=*/12);
  // 2. fallback: dictionary attr lookup on the op's attr dict (op + 56)
  if (!a) a = op.getDiscardableAttrDictionary().get("sc.sequencer", 12);
  // 3. must be a StringAttr (TypeID check against StringAttr::id)
  if (a && typeid(a) == StringAttr::id) {
    result = { StringAttr::getValue(a), /*present=*/1 };   // "scs"/"access"/"execute"
  } else {
    result = { /*present=*/0 };                            // nullopt
  }
  return result;
}

Two structural facts to preserve:

  • The attribute name is the 12-character string "sc.sequencer" (the literal and length 12 are baked into the getInherentAttr(a2, "sc.sequencer", 12) call in the decompiled body). The identical name+length pair appears in HasCoreSequencerTypeAttribute and HasExecuteSequencerTypeAttribute (below), confirming all three read the same attribute.
  • The two-step inherent→dictionary lookup mirrors MLIR's split between inherent attributes (declared on the op definition) and discardable dictionary attributes. The accessor accepts either, so the outliner may attach sc.sequencer through whichever path is convenient for the op kind.
PropertyValue
Function VA0x13507760
Attribute name / length"sc.sequencer" / 12
Return typeoptional<StringRef> (data, size, present-byte at +16)
Lookup orderinherent attr, then discardable dictionary attr
Type guardStringAttr TypeID (TypeIDResolver<StringAttr>::id)
Returned values"scs" / "access" / "execute"

The Attribute Values — Byte-Confirmed

The three legal sc.sequencer values are not just strings in a table — two of them are matched by dedicated predicate functions whose decompiled byte-comparisons pin the exact spelling and length. ScDialect::HasCoreSequencerTypeAttribute (@0x14599ec0) and ScDialect::HasExecuteSequencerTypeAttribute (@0x1459a020) both reuse the same sc.sequencer (len-12) accessor, then compare the StringAttr value against a length and a packed byte literal:

// HasCoreSequencerTypeAttribute @0x14599ec0  — value == "scs"
if (len == 3)
  return ( (*(u16*)v ^ 0x6373) | (*(u8*)(v+2) ^ 0x73) ) == 0;   // 's','c' | 's'
//   0x6373 LE = bytes {0x73='s', 0x63='c'};  v[2]=0x73='s'  →  "scs"

// HasExecuteSequencerTypeAttribute @0x1459a020 — value == "execute"
if (len == 7)
  return ( (*(u32*)v ^ 0x63657865) | (*(u32*)(v+3) ^ 0x65747563) ) == 0;
//   0x63657865 LE = {'e','x','e','c'};  (v+3) 0x65747563 LE = {'c','u','t','e'}
//   overlapping at offset 3  →  "exec"+"cute" = "execute"

Decoding the little-endian masks:

sc.sequencer valueEngineLengthByte-literal evidencePredicate
"scs"SCS (scalar control)30x6373="sc", 0x73="s"HasCoreSequencerTypeAttribute @0x14599ec0
"execute"TEC (vector compute)70x63657865="exec", 0x65747563="cute"HasExecuteSequencerTypeAttribute @0x1459a020
"access"TAC (tile-access / DMA)6— (no dedicated Has* predicate)

GOTCHA — "access" has no dedicated predicate. SCS and TEC each get a Has…SequencerTypeAttribute test because the SC-MLO pipeline operates on the SCS↔TEC boundary; the third value "access" (TAC) carries no Has* function in this binary. This is consistent with the 6acc60406 (gfc) family having dropped TAC altogether — on the newest gen the work that would land in an "access" function is folded into the "execute" function (see TAC Engine and Region → Sequencer Outliner). A reimplementer that only models the SCS/TEC pair will produce correct 6acc60406 code; the "access" value is needed only for Viperfish/Ghostlite.

The parent-function trait

The sc.sequencer attribute lives on the outlined function, not on individual ops. Ops that require it to exist carry the OpTrait::ParentFuncHasCoreSequencerTypeAttribute trait (verified for TileTaskWaitOp at @0x14689880; the same trait is attached to the TileTask family). The shared check is ParentHasSequencerTypeAttribute (@0x1353e980):

// ParentHasSequencerTypeAttribute @0x1353e980
//   walk parent ops until the enclosing LLVMFuncOp, then test BOTH predicates
for (op = start; ; op = op->getBlock()->getParentOp()) {
  if (!op) return false;
  if (typeid(*op) == LLVMFuncOp::id) break;          // reached the outlined func
}
h_core = HasCoreSequencerTypeAttribute(func);        // "scs"?
h_exec = HasExecuteSequencerTypeAttribute(func);     // "execute"?
// require BOTH predicate calls to have evaluated (present bit 0x100 set on each),
// then return their OR of the low (match) bits
return (h_core & 0x100) && (h_exec & 0x100) ? (h_core | h_exec) & 1 : false;

So the trait climbs the op tree to the enclosing LLVM::LLVMFuncOp and asserts that function is tagged either "scs" or "execute". This is the binary's enforcement that every TileTask op runs inside a function whose engine is known at verify time — engine membership is a function-scoped property, not a per-op field.


Layer 1: GetTransferKind — Stream vs DMA

Before an op can be outlined into an engine, the lowering must decide whether it is a Stream (indirect gather/scatter, the embedding datapath) or a DMA (contiguous bulk move). That is xla::tpu::sparse_core::GetTransferKind (@0x1351b140), reached from LowerMemrefToMlo::lowerEnqueueDma (@0x135105a0) and lowerEnqueueIndirectDma (@0x13511da0) via the typed wrappers getTransferKind<EnqueueDMAOp> (@0x135114a0) and getTransferKind<WaitDMA2Op> (@0x135145e0).

Its signature (demangled): GetTransferKind(const jellyfish::Target&, mlir::sparse_core::MemorySpace src, MemorySpace dst, bool, bool, bool, bool) returning FailureOr<TransferKind>. The decompiled body:

// GetTransferKind @0x1351b140  (args: target a2; src a3; dst a4;
//   a5=src-local, a6=dst-local, a7=capability-allowed-flag, a8=strict-ordering)
// 1. normalize spmem: a space encoded as 1 maps to (16 if a7 else 21)
if (src == 1) src = 5*(a7 ^ 1) + 16;       // → 16 (cap) or 21 (no cap)
if (dst == 1) dst = 5*(a7 ^ 1) + 16;
// 2. kStream only when BOTH endpoints are local (a6 & a5 == 1)
if ((a6 & a5) == 1) {
  switch (src) {                            // jump table on the source memory space
    case 2:  /* HBM   */  ... if (dst<=0x15 && bittest(0x210018, dst)) ok;   // 2162712
                          else if (dst==6) ok only if target.vtable[+0xa0]()  // SupportsScVar
    case 3:  /* HBM_4B*/  ... if (dst<=0x15 && bittest(0x210004, dst)) ok;    // 2162692
    case 4:  case 21:     ... if (dst==2) ok;                                 // → HBM
    case 6:               ... ok only if a7 && dst==2 && target.vtable[+0xa0]()
    case 16: /* SPMEM */  ... if (a7 && ((dst-2)&~2)==0) ok;                  // dst in {2,4}
    default:              goto kDma;
  }
  result.kind = kStream; result.present = 1;   // [this+8]=1; [this]=1
  return;
}
// 3. otherwise kDma — but only for a recognized legal contiguous pair;
//    an unrecognized pair builds an InvalidArgument status
kDma:
  if (legal_dma_pair(src, dst, a6)) { result.kind = kDma; [this]=1; return; }
  // diagnostic (transfer_emitter.cc:196):
  return InvalidArgument(
    "SparseCore does not support transfers with %s ordering from %s %v to %s %v "
    "issued %sfrom TEC.", ordering, srcLocality, src, dstLocality, dst, fromTec);

The routing constants matter for reimplementation:

MechanismValueMeaning
spmem normalizationsrc/dst==1 → 5*(¬cap)+16encoded 1 → 16 (cap) or 21 (no cap)
both-local gate(dst_local & src_local) == 1kStream requires both endpoints local
HBM dst-set bitmask0x210018 (2162712) over dstgatherable destinations from HBM source
HBM_4B dst-set bitmask0x210004 (2162692) over dstgatherable destinations from HBM_4B
capability slottarget vtable +0xa0the SupportsScVar predicate
kStream result[result+0]=1, [result+8]=1FailureOr success + kind=kStream
kDma result[result+0]=1, [result+8]=0FailureOr success + kind=kDma
illegal-pair diagtransfer_emitter.cc:196 InvalidArgument"SparseCore does not support transfers…"

NOTE — the capability bit is the SupportsScVar predicate, and it is 0 in this wheel. The cases that gate on target.vtable[+0xa0]() (src==6 and the dst==6 sub-branch of HBM) call a virtual method on the SparseCoreTarget; the cross-references resolve this slot to SupportsScVar (Ghostlite 0x1d499340, Viperfish 0x1d49c7e0), which returns 0 for every generation shipped here. So those capability-gated Stream routes are compiled out in this build — they fall through to the kDma path. A reimplementer targeting these chips must treat SupportsScVar as false.

GOTCHA — the diagnostic says "from TEC", confirming the kDma/Stream split is TEC-centric. The transfer_emitter.cc:196 message ("…issued %sfrom TEC") and the MemorySpace operands show the classifier is reasoning about transfers issued from the TEC vector engine. This ties directly to IndirectVregStream being a TEC-only Stream form: the gather/scatter datapath is anchored on TEC, and GetTransferKind is the gate that decides whether an off-tile move uses that datapath (kStream) or a plain bulk descriptor (kDma).

The full memory-space enum and the per-(core,mem) DMA-destination resolution are out of scope here; see Stream Gather/Scatter for the Stream-slot descriptor and SC Backend Pipeline for where these lowerings run in the pass order.


Layer 2: Region Outlining — Where sc.sequencer Is Written

GetTransferKind's kStream/kDma result, together with the op's data dependencies, determines which TileTask region an op is emitted into. LowerSequencerFunctionsPass::runOnOperation (@0x13532120) then outlines each region into a standalone LLVM::LLVMFuncOp and stamps it with the sc.sequencer string. The decompiled pass body is large (it also builds per-engine parameter tables via GetParameterTable @0x13534ec0 and loads HBM pointers via LoadPointersFromHbm @0x13536c40); the engine-tagging step is the OutlineSequencerFunction callback that attaches the StringAttr.

The mapping the outliner produces:

TileTask regionsc.sequencerEngineCarriesPresent on
Control / sequencer"scs"SCSprogram counter, addressing, sync-flag/atomic issue, SCS Stream/DMA slotsvfc · glc · gfc
Access"access"TACtile-fetch DMA issue, gather-stream issue, address stagingvfc · glc only
Execute"execute"TECvector reductions, pack/unpack, the TEC Stream slot (incl. IndirectVreg)vfc · glc · gfc

NOTE — the MLIR tile-task pipeline emits only "scs" and "execute"; "access" is a codec/proto-path engine. The "access" row above is the engine the TpuSequencerType=4 TAC codec serves, and it is real on Viperfish/Ghostlite. But decompilation of the outlining callback (0x136066e0) shows it references only the "execute" value string (@0x8681624, 7 chars) and sc.sequencer — it stamps tile bodies "execute" unconditionally on every gen, and never writes "access" (there is no HasAccessSequencerTypeAttribute predicate, and no length-6 "access" compare anywhere in the lowering chain). The TAC engine is reached through the legacy ProgramWrapper.tac proto field and the standalone SparseCoreTacCodecBase, not through the MLIR tile-task outliner. So on VF/GL the MLIR path also folds gather/scatter into the "execute" (TEC) function via the TEC Stream slot — the same shape as 6acc60406, which additionally has no TAC codec at all. See the TEC engine for the full byte-level account, and IndirectVregStream for the TEC-exclusive Stream form that anchors this.

The exact per-op rule that chooses the Access region versus the Execute region for a given lowered op was not bit-traced in this analysis (the GetTransferKind result plus the op's tile-data dependencies feed it). It is flagged LOW here and owned by Region → Sequencer Outliner.


The TpuSequencerType Enum and Its Jump Table

Underneath the string mechanism is the numeric tpu::TpuSequencerType enum, used to size and index per-engine resource tables (e.g. bundle limits). It is rendered to text by TpuSequencerTypeToString (@0x20b362e0), which is a pure jump table:

// tpu::TpuSequencerTypeToString(unsigned a1)  @0x20b362e0
__int64 TpuSequencerTypeToString(unsigned a1) {
  return (__int64) *(&off_22010DE0 + a1);   // indexed array of C-string pointers
}

The off_22010DE0 table indexes directly into the string-table literals confirmed in .rodata (resolved through the array's R_X86_64_RELATIVE relocations). Their order fixes the C++ runtime numbering — and the first entry is the TensorCore sequencer, not an INVALID placeholder:

C++ enum value (table index)off_22010DE0[i] string literalShortBundle
0"TensorCoreSequencer"TC
1"BarnaCoreSequencer"Barna
2"BarnaCoreAddressHandler"Barna-AH
3"SparseCoreSequencer"SCS32 B
4"SparseCoreTileAccessCoreSequencer"TAC64 B
5"SparseCoreTileExecuteCoreSequencer"TEC64 B

The off_22010DE0 array has exactly these six SparseCore/TensorCore/BarnaCore entries; index 6 already belongs to an adjacent unrelated string array ("IMEM"). So the C++ enum carries no INVALID slot and no SCv0 entries — SCv0 lives only in the protobuf enum (below). Resolving the relocations: index 0 → 0x85b767c "TensorCoreSequencer", index 3 → 0x85b76b3 "SparseCoreSequencer", index 5 → 0x85b7690 "SparseCoreTileExecuteCoreSequencer".

The off-by-one: C++/codec enum vs the protobuf enum

The codec layer and the TpuSequencerTypeToString layer share the same C++ enum. The per-engine codecs are template-parameterized on TpuSequencerType as a non-type template argument — EncoderBase<…SparseCore{Scs,Tac,Tec}CodecBase…, TpuSequencerType=N> — and there the values are {SCS=3, TAC=4, TEC=5}, exactly matching the off_22010DE0 indices above (nm shows the demangled template literals (TpuSequencerType)3 ×32 and (TpuSequencerType)4 ×16; codec-metadata BundleSizeBytes(TpuVersion, TpuSequencerType) @0x1ecf7180 consumes the same C++ enum). There is no off-by-one between codec and runtime — both are {SCS=3, TAC=4, TEC=5}.

The off-by-one is against a third numbering: the protobuf enum TpuSequencerTypeProto (descriptor TpuSequencerTypeProto in .rodata, full TPU_SEQUENCER_TYPE_* literals), which does begin with INVALID=0 and numbers {INVALID=0, TC=1, BARNA=2, BARNA_ADDR=3, SCS=4, TAC=5, TEC=6, SCv0=7, SCv0-AH=8}. The proto→C++ bridge is tpu::TpuSequencerTypeFromProto (@0x20b36300), whose switch maps proto 1→0, 2→1, 3→2, 4→3, 5→4, 6→5 (a literal subtract-one over the SparseCore block) and rejects proto 7/8 (SCv0) as Invalid sequencer type — confirming SCv0 has no C++ enum value at all:

Enginesc.sequencer stringC++ enum (ToString + codec)Proto enum
SCS"scs"34
TAC"access"4 (vfc/glc only)5
TEC"execute"56

GOTCHA — the +1 is at the proto boundary, not the codec boundary. A protobuf TpuSequencerTypeProto value is one greater than the C++ TpuSequencerType (and codec template) value for the same engine, because only the proto enum reserves INVALID=0. TpuSequencerTypeFromProto @0x20b36300 is the sanctioned conversion site (subtract one). A reimplementer that feeds a proto ordinal directly into a codec template selector will pick the wrong engine; one that feeds the C++ enum straight through is correct. The op-level assignment uses neither number — it uses the sc.sequencer string.

Codec-base presence confirms the missing TAC on gfc

Counting decompiled SparseCore{Scs,Tac,Tec}CodecBase instantiation files per family namespace directly confirms the TAC-removal that the off-by-one table implies (gfc carries codec params 3 and 5 only, never 4):

Family nsGenSparseCoreScsCodecBase filesSparseCoreTacCodecBase filesSparseCoreTecCodecBase files
vfcViperfish131313
glcGhostlite303030
gfc6acc6040631032

The 6acc60406 (gfc) namespace has zero SparseCoreTacCodecBase files against 13/30 for Viperfish/Ghostlite, while SCS and TEC codec bases are present. There is no codec template parameterized on TpuSequencerType=4 in gfc, so the runtime can never select a TAC engine on 6acc60406, and the "access" sequencer value is unreachable there — exactly the folding documented in Layer 2.


SCv0 — Enum-Only

The two trailing proto values (TPU_SEQUENCER_TYPE_SPARSE_CORE_V0_SEQUENCER = 7, …_V0_ADDRESS_HANDLER = 8) name the legacy monolithic SparseCore predecessor. They survive in this build only as proto-descriptor literals on TpuSequencerTypeProto — they are not in the off_22010DE0 C++ TpuSequencerTypeToString table (which stops at TEC, index 5), and TpuSequencerTypeFromProto @0x20b36300 rejects them as Invalid sequencer type, so they have no C++ TpuSequencerType value at all. No SCv0 codec, encoder, decoder, or sc.sequencer value ("scs0" etc.) ships. The engine-selection machinery never produces an SCv0 tag — getSequencerType returns only the three live values. See SparseCore Overview for the full SCv0-deprecation account.


Reimplementation Checklist

To reproduce SparseCore engine selection:

  1. Model sc.sequencer as a function-scoped StringAttr with exactly three legal values "scs" / "access" / "execute". Attach it during outlining; never attach it per-op. Enforce its presence on TileTask ops via a parent-function trait.
  2. Implement getSequencerType as a pure accessor — inherent-attr lookup with dictionary-attr fallback, StringAttr type guard, returning optional<StringRef>. It makes no decisions.
  3. Implement GetTransferKind as the kStream/kDma gate with the exact memory-space normalization (1 → 5*(¬cap)+16), the both-local gate, the 0x210018/0x210004 destination bitmasks per source space, the SupportsScVar capability call (false on these chips), and the InvalidArgument fallback for illegal pairs.
  4. Outline by region, fold "access" into "execute" when TAC is absent (the 6acc60406 / gfc family). Gate the existence of an "access" function on whether the target ships a SparseCoreTacCodecBase.
  5. Keep the proto and C++ numberings separate — the C++ TpuSequencerType {SCS=3,TAC=4,TEC=5} is used by both the resource-sizing tables (TpuSequencerTypeToString, codec-metadata) and codec selection (the template parameter), so no conversion is needed between those two. The only +1 is at the protobuf boundary: TpuSequencerTypeProto {SCS=4,TAC=5,TEC=6} must pass through TpuSequencerTypeFromProto (subtract one) before it can index any C++-enum-keyed table.

Confidence Summary

ClaimEvidence
getSequencerType is an attribute accessor returning optional<StringRef>decompiled @0x13507760: inherent→dictionary sc.sequencer lookup, StringAttr guard
Attribute name is "sc.sequencer" (12 chars)getInherentAttr(…, 12) literal in all three reader functions
"scs" → SCS, "execute" → TEC, "access" → TACbyte-literal compares in HasCore… @0x14599ec0 / HasExecute… @0x1459a020; "access" is the third value (no predicate)
Engine tag is function-scoped; ops inherit via parent-func traitParentHasSequencerTypeAttribute @0x1353e980 walks to LLVMFuncOp; trait verified on TileTaskWaitOp @0x14689880
GetTransferKind selects kStream vs kDma on memory-space pair + capabilitydecompiled @0x1351b140: both-local gate, bitmasks 0x210018/0x210004, vtable[+0xa0], transfer_emitter.cc:196 diag
SupportsScVar capability is 0 on these gens (capability-gated Stream routes compiled out)vtable[+0xa0] resolves to SupportsScVar (GL 0x1d499340 / VF 0x1d49c7e0), =0
Feeders: lowerEnqueueDma @0x135105a0, lowerEnqueueIndirectDma @0x13511da0, getTransferKind<…> @0x135114a0/@0x135145e0demangled decompiled symbols present
Outliner stamps sc.sequencer per regionLowerSequencerFunctionsPass::runOnOperation @0x13532120 + OutlineSequencerFunction
6acc60406 (gfc) folds "access" into "execute" (no TAC)gfc::…SparseCoreTacCodecBase = 0 files vs 13/30; no HasAccessSequencerTypeAttribute
TpuSequencerTypeToString is a jump table over off_22010DE0decompiled @0x20b362e0: *(&off_22010DE0 + a1)
C++ enum order via off_22010DE0 = {TC=0, BARNA=1, BARNA_ADDR=2, SCS=3, TAC=4, TEC=5}; no INVALID slotR_X86_64_RELATIVE relocs resolve idx0→"TensorCoreSequencer", idx3→"SparseCoreSequencer", idx5→"SparseCoreTileExecuteCoreSequencer"; idx6→"IMEM" (adjacent table)
Codec template enum {SCS=3,TAC=4,TEC=5} == the C++ enum (no off-by-one)nm (TpuSequencerType)3×32 / )4×16; codec-metadata BundleSizeBytes @0x1ecf7180 keys on the same C++ enum
Proto enum {INVALID=0…SCS=4,TAC=5,TEC=6,SCv0=7/8}; +1 vs C++ enum, bridged by TpuSequencerTypeFromProtoTpuSequencerTypeProto descriptor literals; FromProto @0x20b36300 switch maps proto 4→3,5→4,6→5, rejects 7/8
Per-op Access-vs-Execute region rule; runtime→codec conversion sitenot bit-traced in this analysis

Cross-References

  • SparseCore Overview — the three engine classes, per-gen presence, and the SCv0 enum-only story.
  • Architecture — engine roles and the embedding datapath that the engine split serves.
  • SCS (Scalar) Engine — the "scs" control sequencer.
  • TAC Engine — the "access" tile-fetch engine and its removal on the 6acc60406 (gfc) family.
  • TEC (Vector) Engine — the "execute" vector compute engine.
  • Region → Sequencer Outliner — the pass that partitions a computation into per-engine functions and writes sc.sequencer.
  • IndirectVregStream — the TEC-only Stream form whose existence anchors the kStream datapath on TEC.
  • Stream Gather/Scatter — the indirect-DMA descriptor reached on the kStream path.
  • SC Backend Pipeline — where GetTransferKind, outlining, and getSequencerType sit in the SparseCore pass order.
  • Binary: extracted/libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64/libtpu/libtpu.so (build-id 89edbbe81c5b328a958fe628a9f2207d)
  • Index entry: Part IX — SparseCore & BarnaCore / SparseCore engines — back to index