Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SCTypeConverter

All addresses, symbol names, and table values on this page were read byte-exactly from libtpu.so in the libtpu-0.0.40-cp314 wheel (build libtpu_lts_20260413_b_RC00, BuildID md5 89edbbe81c5b328a958fe628a9f2207d). The .symtab is not stripped; every claim is anchored to a demangled symbol, a relocation addend, or a decompiled body. Other versions will differ.

Abstract

When the SparseCore (SC) lowering descends a mlir::sparse_core (ScDialect) function into LLVM dialect, it needs to answer one structural question for every memref it touches: what LLVM address-space integer goes on the !llvm.ptr inside the lowered memref descriptor struct? The SCTypeConverter answers it. It is not a free-standing class but the LLVMTypeConverter that LowerToSparseCoreLlvmPass::lowerFunc (0x13568280) builds, augmented by exactly one registerTypeAttributeConversion lambda (0x135763c0) that converts a memref's MemorySpaceAttr into a 64-bit IntegerAttr carrying the raw SparseCore address-space ID. The headline result is that the rule is one line: the LLVM address space is MemorySpaceToAddressSpace(MS) — the same forward map the rest of the backend uses — so a memref<…, #sc.memory_space<spmem>> lowers to a descriptor whose pointers are !llvm.ptr<202>. There is no renumbering; the SC address-space ID is the LLVM addrspace.

This page owns three things, all of which the LowerToSparseCoreLlvm rewrite bodies and the DMA bridge-cast sit on top of:

  • The address-space → !llvm.ptr type map — the MemorySpaceAttr → IntegerAttr(addrspace) conversion lambda, the byte-dumped MemorySpaceToAddressSpace reverse table for all 21 named IDs, and the sequencer-context flatten override that collapses the per-tile and per-SCS spaces onto their generic base when the function is not in an execute-/core-sequencer context. That flatten is the elide-vs-emit decision the MemorySpaceCast lowering consults.
  • The CheckAddressSpaces legality matrix — the single legality gate of the SC simple-DMA tier (0x135b8e00): the "simple DMA touching generic SMEM is only legal on the Scalar Core Sequencer" contract, with its SupportsTileSmemDma target-capability hatch and its sc.sequencer=="scs" escape decoded top-to-bottom.
  • The EUP instantiation roster — the 42 type-converted elementwise lowerings in two families: 12 UnaryFloatVectorOpLowering (the 1:1 EUP push+pop macro) and 30 AluEpOpLowering (the 1:N unpack → compute → repack), each mapped from its source op to its EUP intrinsic or re-emitted compute op, with the IsDynamicallyLegal predicate that picks between the two paths.

The AS-id table itself — ID ↔ MemorySpace enum ↔ memory pool ↔ on/off-tile — is not re-derived here; it lives on Fat Pointers (AS7/8/9), and the per-cast addrspacecast ISel lives on addrspacecast ISel. This page documents how the type converter consumes that table to stamp pointer types, and how the legality and EUP layers ride on the converted types.

Type-converter siteLowerToSparseCoreLlvmPass::lowerFunc @ 0x13568280 (LLVMTypeConverter ctor @ 0x13568369)
AS-attr conversion installTypeConverter::registerTypeAttributeConversion @ 0x135685e8
The conversion rule!llvm.ptr<MemorySpaceToAddressSpace(MemorySpaceAttr::getValue())>
Conversion lambda (lowerFunc)0x135763c0 — with the sequencer-context flatten override
Conversion lambda (lowerAsserts)0x135b6f80 — same map, no flatten override
Forward mapmlir::sparse_core::MemorySpaceToAddressSpace(MemorySpace) @ 0x14b78780 (table 0xaf36ce8, mask 0x3fff7f)
Flatten gate booleansScDialect::HasExecuteSequencerTypeAttribute @ 0x1459a020 · HasCoreSequencerTypeAttribute @ 0x14599ec0
Pointer constructorLLVM::LLVMPointerType::get(ctx, ID) @ 0x1746eb40 (ID = raw SC address-space)
Legality gateCheckAddressSpaces(SparseCoreTarget&, Operation*, int, int) @ 0x135b8e00
Legality callerDmaSimpleStartOpLowering::matchAndRewrite @ 0x135a9100 (call @ 0x135a977a)
EUP roster12 UnaryFloatVectorOpLowering (1:1 macro) + 30 AluEpOpLowering (1:N unpack/compute/pack)
1:1-vs-1:N gateIsDynamicallyLegal @ 0x135ddd20
ConfidenceCONFIRMED (decompile-anchored) unless a row or callout says otherwise

The Address-Space → !llvm.ptr Type Map

Where the converter is built

The SparseCore type converter is not a distinct C++ class — there is no SCTypeConverter symbol. It is an ordinary mlir::LLVMTypeConverter constructed inside LowerToSparseCoreLlvmPass::lowerFunc (0x13568280): the ctor fires at 0x13568369, and the only SC-specific behaviour added on top is a single TypeConverter::registerTypeAttributeConversion call at 0x135685e8 that installs the BaseMemRefType × MemorySpaceAttr lambda at 0x135763c0. Everything else — the memref → {alloc-ptr, align-ptr, offset, [sizes], [strides]} descriptor struct, the function-signature conversion, the scalar/vector type passthrough — is stock upstream LLVMTypeConverter.

That separation is the reimplementation contract: you do not subclass the type converter; you register one attribute-conversion lambda. The lambda's job is narrow — turn a memref's source-dialect MemorySpaceAttr into the integer the base converter will bake into the descriptor's pointers — and the base converter does the rest.

The conversion rule is one line

The lambda at 0x135763c0 is, stripped of MLIR bookkeeping:

AttributeConversionResult convertSCMemorySpace(BaseMemRefType, MemorySpaceAttr msAttr):
    MemorySpace ms = msAttr.getValue()                    // 0x145929e0  (1-based enum)
    int id        = MemorySpaceToAddressSpace(ms)         // 0x14b78780  (the AS-id table)
    id            = applySequencerFlatten(id, ms, fn)     // override band — see below
    IntegerType i64 = IntegerType::get(ctx, 64, Signless) // 0x1d8c60c0
    return IntegerAttr::get(i64, id)                       // 0x1d859f00 — the new memory-space attr

The result is a 64-bit signless IntegerAttr carrying the raw address-space ID. The base LLVMTypeConverter then reads that integer attribute when it lowers the MemRefType and stamps it onto every !llvm.ptr field of the descriptor struct via LLVM::LLVMPointerType::get(ctx, ID) (0x1746eb40). So a memref<…, #sc.memory_space<spmem>> becomes !llvm.struct<(ptr<202>, ptr<202>, i64, …)>. The path is byte-confirmed end-to-end through CircularBufferDescriptor::GetMemRefType (0x135c6020): MemRefType::get with a MemorySpaceAttrconvertType (0x1c956740) → mlir::StructBuilder (0x171c1640, extractPtr/setPtr).

MemorySpaceToAddressSpace (0x14b78780) is the same forward map the fat-pointers page documents; its decompiled body is:

__int64 MemorySpaceToAddressSpace(unsigned int ms) {
  if (ms - 1 > 0x15 || ((0x3FFF7Fu >> (ms - 1)) & 1) == 0)
    LOG(FATAL) << "Unsupported memory space: " << ms;   // sc_enums.cc:110
  return dword_AF36CE8[ms - 1];                          // 22-entry reverse table
}

The range check ms - 1 > 0x15 admits MemorySpace ∈ 1..22; the bitmask 0x3FFF7F rejects the MemorySpace 8 gap (and any unset bit), LOG(FATAL)-ing on an invalid space. The table 0xAF36CE8 is reproduced below — it is the exact inverse of the AS-id table on the fat-pointers page.

GOTCHA — the result attribute is a plain 64-bit IntegerAttr, not a re-emitted MemorySpaceAttr. The SC MemorySpaceAttr does not survive into LLVM dialect; only its numeric address-space ID does, carried as an i64 integer attribute that the base converter consumes when building ptr<ID>. A reimplementation that tries to thread the source-dialect attribute through unchanged will not match the descriptor the base converter actually builds.

Table A — MemorySpaceToAddressSpace reverse table (0xaf36ce8)

The SCTypeConverter result for !llvm.ptr<N> is N = MemorySpaceToAddressSpace(MS). Index = MS − 1; MemorySpace 8 is a gap (invalid → LOG(FATAL)). The flatten column is the lowerFunc override (next section); the lowerAsserts lambda omits it.

MSpool→ addrspace (= ptr<N>)sequencer flatten
1smem0 (0x00)
2tile_spmem201 (0xC9)
3spmem202 (0xCA)
4hbm203 (0xCB)
5sflag204 (0xCC)
6vmem205 (0xCD)
7dreg208 (0xD0)
8(gap)invalid → LOG(FATAL)
9smem_any212 (0xD4)
10hbm_any213 (0xD5)
11timem214 (0xD6)
12simem215 (0xD7)
13iova216 (0xD8)
14sflag_tile217 (0xD9)204 if !execute-seq
15spmem_any218 (0xDA)
16smem_tile219 (0xDB)0 if !execute-seq
17mar220 (0xDC)
18tile_spmem_cb501 (0x1F5)
19smem_cb502 (0x1F6)
20sflag_scs223 (0xDF)204 if !core-seq
21smem_scs224 (0xE0)0 if !core-seq
22sflag_tc204 (0xCC)always 204

The address-space ID is used directly as the LLVM addrspace — there is no remap. Sweeping LLVMPointerType::get immediates across the SC lowering band 0x13530000..0x135c0000 recovers exactly these IDs as literals: 0xCA (Spmem), 0xCB (HBM), 0xCC (Sflag, ×5), 0xD0 (Dreg), 0xD3 (SflagAny, ×4), 0xD4 (SmemAny), 0xD5 (HBMAny), 0xDB (TileSmem), 0xE1 (SflagAnySynctile). The pool/MemorySpace/on-tile semantics of each ID belong to the fat-pointers page.

The sequencer-context flatten — the elide-vs-emit collapse set

After MemorySpaceToAddressSpace produces the base ID, the lowerFunc lambda applies an override jump table (0xae4633c, indexed by MS − 14, 9 entries) gated by two closure booleans captured from the function being lowered:

b0 = !ScDialect::HasExecuteSequencerTypeAttribute(fn)   // 0x1459a020  (TEC execute-lane context?)
b1 = !ScDialect::HasCoreSequencerTypeAttribute(fn)      // 0x14599ec0  (core-sequencer context?)

The per-MemorySpace override, byte-decoded from the jump-table targets:

MSspacebase IDoverride
14sflag_tile217→ 204 (Sflag) if b0 (not execute-seq)
16smem_tile219→ 0 (Smem) if b0 (not execute-seq)
20sflag_scs223→ 204 (Sflag) if b1 (not core-seq)
21smem_scs224→ 0 (Smem) if b1 (not core-seq)
22sflag_tc204always 204 (TC sflag is generic sflag)
15/17/18/19spmem_any/mar/tile_spmem_cb/smem_cbno override (keep base)

The intent is a context-sensitive flatten: outside an execute-sequencer or core-sequencer function, the per-tile (sflag_tile, smem_tile) and per-SCS (sflag_scs, smem_scs) spaces collapse onto the generic Sflag (204) / Smem (0) pointer types — there is no per-tile or per-SCS bank to address when the code is not running on that sequencer. The tc-sflag (MS 22) is unconditionally generic Sflag.

This flatten is the MemorySpaceCast elide-vs-emit decision. The cast lowering elides an addrspacecast exactly when its source and destination MemorySpaces map — post-flatten — to the same ID, which happens precisely for {sflag_tile, sflag_scs, sflag_tc, sflag} → 204 and {smem_tile, smem_scs, smem} → 0 in non-sequencer functions, and never otherwise. The companion lambda in lowerAsserts (0x135b6f80) is the identical map without the override band — assertion lowering always uses the un-flattened (per-tile/per-SCS) IDs, because the assert text wants to name the exact bank.

QUIRK — the converter is stateful in the booleans b0/b1 — the same MemorySpace lowers to a different ptr<N> depending on which sequencer's function it appears in. smem_tile is ptr<219> inside a TEC execute-lane function and ptr<0> everywhere else; smem_scs is ptr<224> inside an SCS function and ptr<0> everywhere else. A reimplementation must capture the enclosing function's sequencer-type attributes before converting any memref. The two booleans are stored at -0xf0(rbp) in lowerFunc (set @ 0x135685d3). The sequencer-type attributes themselves are documented on GetSequencerType.


The CheckAddressSpaces Legality Matrix

The one simple-DMA gate

CheckAddressSpaces(SparseCoreTarget& tgt, Operation* op, int srcAS, int dstAS) (0x135b8e00) is the single address-space legality gate the SC lowering runs, and it has exactly one caller. It enforces the SparseCore simple-DMA data-movement contract: a simple-tier DMA whose source OR destination is the generic SMEM space (address-space 0) is only legal when issued from the Scalar Core Sequencer (SCS) — unless the hardware advertises native tile/SMEM DMA. The decompiled body resolves to a short-circuit OR of three conditions; if all three fail it emits an error and returns failure:

__int64 CheckAddressSpaces(SparseCoreTarget *tgt, Operation *op, int srcAS, int dstAS) {
  result = 1;                                          // assume legal
  if (!tgt->vtable[+0xd8]()) {                         // (1) SupportsTileSmemDma() ?
    fn = walkParentsTo<LLVM::LLVMFuncOp>(op);          //     enclosing llvm.func
    attr = fn.getInherentAttr("sc.sequencer", 12);     // (2) sc.sequencer attribute
    s    = StringAttr::getValue(attr);
    bool isScs = (s.size == 3) &&                       //     "scs": 0x6373='sc', 0x73='s'
                 ((s[0..1] ^ 0x6373) | (s[2] ^ 0x73)) == 0;
    if (!isScs && (srcAS == 0 || dstAS == 0)) {        // (3) neither endpoint is SMEM ?
      op->emitError("Simple DMAs on SMEM only supported on SCS");  // 41 chars
      result = failure;
    }
  }
  return result;
}

Table D — the legality conditions (short-circuit OR, top to bottom)

#condition (any one ⇒ legal)source
1tgt.SupportsTileSmemDma()vtable +0xd8; VF = false, GL = false
2enclosing llvm.func's sc.sequencer inherent attr == "scs"getInherentAttr("sc.sequencer", 12) → 3-char cmp
3srcAS != 0 AND dstAS != 0 (neither endpoint is generic SMEM)the two int params (post-cast Table-A IDs)
else → emitError("Simple DMAs on SMEM only supported on SCS") → failureerror string @ 0x91b1ca3

The decompile pins each arm: vtable slot +216 (= 0xd8) is SupportsTileSmemDma (line gate at *(this+216)(this)); the parent walk stops at mlir::detail::TypeIDResolver<mlir::LLVM::LLVMFuncOp> (the enclosing llvm.func); the "scs" compare is the literal (s[0..1] ^ 0x6373) | (s[2] ^ 0x73) with s.size == 3; and the failure arm builds the 41-byte string "Simple DMAs on SMEM only supported on SCS".

Both current gens lack native tile/SMEM DMA

SupportsTileSmemDma returns false on both shipping SparseCore targets — ViperfishSparseCoreTarget (0x1d49c8e0, xor eax,eax) and GhostLiteSparseCoreTarget (0x1d499460, xor eax,eax), reached through their vtables' R_X86_64_RELATIVE addends at 0x21cc9078 and 0x21cc86f8. So on every current generation, condition (1) is dead, and the effective rule is condition (2) ∨ (3): simple DMA touching SMEM is legal only from the SCS. This is the IR-side twin of the SCS being the SparseCore's scalar control engine — see SCS Engine.

The single call site and the address-space normalisation

The only caller is DmaSimpleStartOpLowering::matchAndRewrite (0x135a9100), with the call at 0x135a977a. Critically, the srcAS/dstAS the gate sees are the post-cast normalised IDs: immediately before the call, CastTileSmemPointerToSmem (0x135b86e0) casts any tile-resident pointer down to generic SMEM (via tpu_addrspacecast_smem taking a tpu_tileid window operand — see Tile-ID Cast) and writes the resulting address-space integers into the out-params the gate consumes. So a tile_spmem-resident DMA endpoint is first flattened to SMEM, then the "SMEM ⇒ SCS only" gate applies. The two-stage DMA lowering this gate sits inside is documented on the LowerToMlo DMA bridge-cast.

GOTCHA — the gate cares about the generic SMEM space (address-space 0), not about SMEM-family spaces in general. A DMA between two non-zero IDs (spmem/hbm/tile_spmem/…) is unconditionally legal regardless of sequencer — condition (3) passes the moment both endpoints are non-zero. The "SCS only" restriction is specifically the generic-SMEM (ptr<0>) case, which is why CastTileSmemPointerToSmem running first matters: it is what creates the ptr<0> endpoints the gate then guards.


The EUP Instantiation Roster

Two families, one source-op fan-in

The SparseCore EUP (Extended Unary Processor) lowering surface is 42 type-converted instantiations in two templated pattern families, both registered by the same LowerToSparseCoreLlvm pass and both operating on the converted (LLVM-dialect) types:

  • UnaryFloatVectorOpLowering<Src, tpu_*_macro> — 12 instantiations. The 1:1 path: the operand already fits a single EUP lane width, so the body emits one EUP push+pop macro intrinsic. Algebra: get operand + result type → FilterLLVMAttributes (drop access_groups, 0x135b7a20) → tpu_X_macro::create(b, loc, {resT}, {operand}, attrs)replaceOp. The _macro intrinsic is the EUP VALU3 push + result pop pair.
  • AluEpOpLowering<Src, Compute, UnpackF, PackF> — 30 instantiations. The 1:N path: the operand is a packed sub-element vector (e.g. two bf16 in a 32-bit lane), so the body unpacks the wide vreg into a deque of narrow sub-element values (UnpackOperand<UnpackOp>, 0x1360fac0), re-emits the ComputeOp per piece, then repacks (PackResults<PackOp>, 0x13610940).

The same source op can appear in both families: sparse_core::RsqrtOp has a UnaryFloatVector instantiation (0x1357e540) and an AluEp instantiation (0x135e1c80). The IsDynamicallyLegal predicate (below) decides which one fires for a given operand type. The AluEp family is therefore the general sub-element staging wrapper for any elementwise math/arith op — not only transcendentals — with the transcendentals being the subset that also has the fast 1:1 macro form.

Table B — UnaryFloatVectorOpLowering roster (12; 1:1 EUP push+pop macro)

Each row's source op fans into the macro intrinsic at the body tail. All 12 matchAndRewrite symbols and their template parameters are confirmed in the decompile.

matchAndRewrite @VAsource op→ EUP macro intrinsicEUP selector
0x1357e2c0math::TanhOptpu_tanh_macro (0x14988180)Tanh 0x13/0x1b
0x1357e540sparse_core::RsqrtOptpu_rsqrt_macro (0x14735840)ReciprocalSqrt 0x10/0x0c
0x1357e880math::Log2Optpu_log2_macro (0x14730640)LogTwo 0x12/0x1a
0x1357eb00sparse_core::ReciprocalOptpu_rcp_macro (0x147346c0)Reciprocal 0x15/0x1d
0x1357ef40sparse_core::Log2Optpu_log2_macro (0x14730640)LogTwo 0x12/0x1a
0x1357f380sparse_core::Pow2Optpu_pow2_macro (0x147339c0)PowTwo 0x11/0x19
0x1357f6c0math::SinOptpu_sin_macro (0x14736880)Sinq 0x17/0x1e
0x1357f840math::CosOptpu_cos_macro (0x146d8540)Cosq 0x18/0x1f
0x1357fac0sparse_core::VsinqOptpu_sin_macro (0x14736880)Sinq 0x17/0x1e
0x1357ff00sparse_core::VcosqOptpu_cos_macro (0x146d8540)Cosq 0x18/0x1f
0x13580240math::ErfOptpu_erf_macro (0x1472efa0)Erf 0x0e/0x0f
0x135804c0sparse_core::VsigshftOptpu_sigshft (0x147365e0)ShiftedSigmoid 0x14/0x1c

Eight distinct macros — rsqrt, rcp, tanh, sin, cos, erf, log2, pow2 — plus the bare tpu_sigshft. The math:: and sparse_core:: source ops fan into the same macro: math.tanh and a sc.tanh both reach tpu_tanh_macro; the sc::Vsinq/Vcosq/Vsigshft ops are the EUP-native dialect forms.

Table C — AluEpOpLowering roster (30; 1:N unpack → compute → pack)

Each AluEpOpLowering<Src, Compute, Unpack, Pack> unpacks the operand, re-emits the Compute op per sub-element piece, and repacks. Column 3 is the re-emitted Compute op; the pack family (F=float, SI=signed-int, UI=unsigned-int) is the Unpack/Pack template pair. All 30 template signatures are confirmed in the decompile.

matchAndRewrite @VAsource opre-emitted Computepackclass
0x135de780math::RsqrtOpmath::RsqrtOpFtranscendental
0x135df200math::ExpOpmath::ExpOpFtranscendental
0x135dfca0math::Log2Opmath::Log2OpFtranscendental
0x135e0740math::TanhOpmath::TanhOpFtranscendental
0x135e11e0math::FloorOpmath::FloorOpFrounding
0x135e1c80sparse_core::RsqrtOpsparse_core::RsqrtOpFtranscendental
0x135e2720sparse_core::ReciprocalOpsparse_core::ReciprocalOpFtranscendental
0x135e31c0math::AbsFOpmath::AbsFOpFabs
0x135e3c60arith::FPToSIOparith::FPToSIOpF→SIconvert
0x135e4700math::CeilOpmath::CeilOpFrounding
0x135e51a0sparse_core::Pow2Opsparse_core::Pow2OpFtranscendental
0x135e5c40sparse_core::Log2Opsparse_core::Log2OpFtranscendental
0x135e66e0arith::AddFOparith::AddFOpFbinary float
0x135e7160arith::DivFOparith::DivFOpFbinary float
0x135e7c00math::CopySignOpmath::CopySignOpFbinary float
0x135e86a0arith::MaximumFOparith::MaximumFOpFbinary float
0x135e9140arith::MinimumFOparith::MinimumFOpFbinary float
0x135e9be0arith::MulFOparith::MulFOpFbinary float
0x135ea680arith::NegFOparith::NegFOpFunary float
0x135eb120arith::SubFOparith::SubFOpFbinary float
0x135ebbc0sparse_core::ClampFOpsparse_core::ClampFOpFclamp
0x135ec660arith::MaxSIOparith::MaxSIOpSIbinary int
0x135ed100arith::MinSIOparith::MinSIOpSIbinary int
0x135edba0arith::MaxUIOparith::MaxUIOpUIbinary uint
0x135ee640arith::MinUIOparith::MinUIOpUIbinary uint
0x135ef0e0arith::MulIOparith::MulIOpSIbinary int
0x135efb80arith::AddIOparith::AddIOpSIbinary int
0x135f0620arith::SubIOparith::SubIOpSIbinary int
0x135f10c0sparse_core::AddSIOparith::AddIOpSIint alias
0x135f1b20sparse_core::AddUIOparith::AddIOpUIuint alias

Two structural facts the template signatures pin down:

  • The two sc:: int aliases re-emit arith::AddIOp. sparse_core::AddSIOp (UnpackSIOp/PackSIOp) and sparse_core::AddUIOp (UnpackUIOp/PackUIOp) both lower their Compute op to arith::AddIOp — they are SI/UI-flavoured aliases of the integer add, not distinct compute kernels.
  • Exp has no EUP macro. math::ExpOp appears only in AluEp (0x135df200, re-emitting math::ExpOp) — there is no tpu_exp_macro. Exp is always the polynomial/pow2-built form, never a single EUP push, consistent with Exp being absent from the EUP-native function set.

The shared unpack/pack layer:

UnpackOperand<UnpackFOp>  0x1360fac0      PackResults<PackFOp>   0x13610940
UnpackOperand<UnpackSIOp> 0x13610080      PackResults<PackSIOp>  0x13611280
UnpackOperand<UnpackUIOp> 0x136104e0      PackResults<PackUIOp>  0x13610de0
GetUnpackResultElementType                0x1360ff20

The 1:N body was decoded on AluEp<math::ExpOp> (0x135df200): math::ExpOp::create (0x1782be40) is invoked twice — once as a type probe and once as the per-piece loop body — bracketed by UnpackOperand and PackResults.

The 1:1-vs-1:N decision — IsDynamicallyLegal

The selector between the two families is the addDynamicallyLegalOp predicate IsDynamicallyLegal(Operation*, SparseCoreTarget&, int) (0x135ddd20), installed per-op by PackedOperandsLowering::AddDynamicallyLegalAluEpOps<Op, UnpackF, PackF>. It marks the AluEp pattern legal-as-is (so AluEp does not fire; the 1:1 UnaryFloatVector macro path runs) when the operand is not a packed sub-element vector, and illegal (AluEp does fire) when the operand needs sub-element staging. The predicate calls:

ForceBF16ALUOperationsToUnpack(op, type)   0x135dd6e0   (force-unpack BF16 ALU ops)
IsPackedVectorType(type, target, bool)     0x13611720   (packed sub-element layout, e.g. 2×bf16 / 32-bit lane?)
lowering_util::GetVpackFormat(type)        0x13dad800   (VPACK format enum)
element bit-width == 0x10                    @0x135dddc9  (16-bit → bf16/fp16 packed-pair detect)
target vtable *0x780                         @0x135ddda5  (lane-width / pack capability)
target vtable *0x260                         @0x135dde75  (EUP / format support)

NOTE (HIGH) — the structure of the predicate — "fire AluEp iff the operand is a packed sub-element vector" — is decompile-confirmed via the IsPackedVectorType / ForceBF16ALUOperationsToUnpack / GetVpackFormat calls and the element-width == 16 test. What is not name-resolved is the precise per-gen return of the two target vtable accessors at slots *0x780 (lane width) and *0x260 (EUP-format support); they are confirmed called but their per-generation values — which would make the unpack-iteration count numeric — are not byte-decoded. This is the one place the page drops below CONFIRMED.

NOTE (INFERRED) — because a transcendental source op is registered in both families (e.g. sc::RsqrtOp at 0x1357e540 UnaryFloatVector and 0x135e1c80 AluEp), some arbitration must pick one when both could match. The dynamic-legality predicate decides applicability (packed operand → AluEp; lane-fitting operand → the 1:1 macro), but the explicit PatternBenefit integers and the conversion-target legality-marking order were traced structurally, not byte-decoded; the page treats the dynamic-legality predicate as the effective selector.

EUP availability is universal on current gens

SupportsScEupOps returns true on both ViperfishSparseCoreTarget (0x1d49c8c0, mov $1, al) and GhostLiteSparseCoreTarget (0x1d499420, mov $1, al) — both ship the EUP transcendental unit, so the UnaryFloatVector macro lowerings are universally available at the IR level. (Whether a cost-model-cheap single-µop sinq/cosq path exists is a separate per-generation question charged downstream of this lowering, not by these patterns.)


NameRelationship
LowerToSparseCoreLlvmPass::lowerFunc (0x13568280)builds the converter and installs the AS-attr lambda
MemorySpaceToAddressSpace (0x14b78780)the forward map the conversion lambda calls (Table A)
LLVM::LLVMPointerType::get (0x1746eb40)stamps the raw ID onto the descriptor's !llvm.ptr<ID>
CheckAddressSpaces (0x135b8e00)the simple-DMA SMEM-on-SCS legality gate (Table D)
CastTileSmemPointerToSmem (0x135b86e0)normalises endpoints to SMEM before the gate sees them
IsDynamicallyLegal (0x135ddd20)the 1:1-vs-1:N selector between the two EUP families

Cross-References

  • LowerToSparseCoreLlvm — the per-class rewrite bodies that sit on top of the converted types this page produces.
  • LowerToMlo DMA Bridge-Cast — the two-stage DMA lowering inside which CheckAddressSpaces runs.
  • The tpu MLIR Dialect — the op-registration ABI for the tpu/sparse_core ops these patterns rewrite into.
  • Compiler Overview — Part V orientation; where SC lowering sits in the five-phase descent.
  • Fat Pointers (AS7/8/9) — the AS-id ↔ MemorySpace ↔ pool table this converter consumes (do not duplicate).
  • addrspacecast ISel — the MemorySpaceCastllvm.addrspacecast conversion the flatten collapse-set drives.
  • Tile-ID Cast — the 2-operand {base, tileId} cast CastTileSmemPointerToSmem emits to normalise tile pointers.
  • SCS Engine — the Scalar Core Sequencer the sc.sequencer=="scs" legality escape names.
  • TEC Engine — the per-tile execute lane whose sequencer context unlocks the per-tile flatten.
  • GetSequencerType — the sequencer-type attributes the flatten booleans (b0/b1) read.
  • Binary: extracted/libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64/libtpu/libtpu.so (build-id 89edbbe81c5b328a958fe628a9f2207d)
  • Index entry: Part V — Compiler: Lowering & Optimization Passes / MLIR lowering chain — back to index