nvvm.tcgen05.* covers the Blackwell (sm_100+) tensor-memory family. Tensor memory (TMEM) is a per-SM scratchpad allocated and freed through the dialect's alloc / dealloc ops, accessed through ld / st and the long-K MMA path, and torn down before the kernel exits. The roster below is the only path to TMEM from MLIR; Hopper's WGMMA family (nvvm.wgmma.*) does not reach Blackwell tensor cores. See tcgen05 Tensor Memory Model for the TMEM allocation discipline and the variant taxonomy, and tcgen05 Machine Validation for the codegen-side verifier rules.
tcgen05.mma carries a control-word modifier table that selects element-type interpretation, sparsity, block-scaling, and collector behaviour. Block-scaled UMMA exposes scale-vector size and scale-format enums; the cross-product produces several thousand legal PTX forms from a single dialect op.
The "Properties slots used" column tracks where each op stores its attribute payload in the inline Properties record; see Properties Blob — Per-op-family slot maps for the exact byte offsets.
The collector modifier controls how the MMA pipeline reuses register-file data across iterations: discard evicts on commit, fill accumulates without evicting, use consumes a previously-filled buffer, last_use consumes and then evicts.
The descriptor operands %desc_a and %desc_b are 64-bit SMEM descriptors when the operand is SMEM-resident, or TMEM column indices when the operand is TMEM-resident.
nvvm.tcgen05.cp reaches PTX through llvm.inline_asm when the multicast / src_fmt combination has no matching LLVM intrinsic at the snapshot revision Tileiras tracks:
The two r slots are the destination and source TMEM column indices. The shape, multicast, and src_fmt tokens are baked into the template literal at lowering time; the constraint string never changes.
sm_100a is the architecture-qualified Blackwell target; the family is also legal on sm_100f for the few f-suffixed copy variants. Datacenter Blackwell (sm_100) is the only sub-arch the dialect exposes; Blackwell Ultra (sm_103) and Jetson Thor (sm_110) reuse the same op surface. See Per-SM Emission Templates — SM100 / SM103 for the codegen-side templates and NVPTX Subtarget Feature Matrix for the feature gating.
cta_group agrees between matched alloc / dealloc and between the in-flight MMA and its commit / wait.
scale is a compile-time immediate.
Block-scaled (atom_K, vecSize) matches one of (32, 32), (64, 16), (64, 32); other combinations are rejected by the per-combo expectation diagnostics listed under nv_tileas Verifiers — Block-Scaled MMA Verification (e.g. "expects A/B element types to be Float4E2M1FNType and sfa/sfb element types to be Float8E8M0FNUType when (atom_K=64 && vecSize=32)").
Sparse metadata column must be valid TMEM and non-zero stride.
Accumulator element type is f32 for every block-scaled variant.
kindA and kindB agree (no mixed scale-factor formats).