Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TypeID Sentinels and Anchors

Abstract

Tileiras materialises an MLIR TypeID in exactly two ways. Idiom 1 is a 1-byte sentinel in .bss whose address is the identity — the byte's value is never read, only its pointer is. Idiom 2 is a Meyers singleton that lazily interns a __PRETTY_FUNCTION__-derived RTTI string the first time the accessor runs, then caches the resulting TypeID* in a qword next to a one-shot guard byte. The two never mix: every TypeID in the binary is either a static sentinel pointer or a Meyers-cached qword.

Neither idiom touches the Itanium C++ ABI's typeinfo/vtable machinery. The binary's .data.rel.ro typeinfo block (0x4FA5242..0x5A2C360) holds only libstdc++ classes — exceptions, streams, locale facets — and no MLIR class appears there. This is the architectural reason both idioms exist: MLIR needs cross-DSO identity for types that the C++ standard's &typeid(T) cannot give it (anonymous namespaces, hidden visibility, statically linked dialects), so it builds its own discriminators on top of address-taking and string-interning instead. A reimplementation that swaps in std::type_info* will be unable to keep pointer-equality stable across the registered-dialect set.

The distinction is significant. Idiom 1 carries the ABI-frozen identities — dialects, the registered Type and Attribute subclasses, the upstream MLIR built-ins — whose addresses are link-time constants and whose registration happens before main. Idiom 2 carries identities that come into existence at runtime as part of an addInterfaces<> call, an analysis registration, or a pattern RTTI tag.

Idiom 1 — Static Pointer-Identity Sentinel

Each dialect, each concrete Type, each concrete Attribute that ships with the binary owns a 1-byte sentinel in .bss. The byte's value is irrelevant; the linker assigns it an address and that address is the TypeID. Hot dispatch compares op->kindPtr (or a Type's vtable slot) against a sentinel by pointer-identity — one MOV+CMP, no string compare, no hash lookup.

typedef uint8_t TypeIDSentinel;

extern TypeIDSentinel kCuteLayoutTypeID;          /* &unk_5B49AE0 */
extern TypeIDSentinel kCuteNvgpuSm90MmaTypeID;    /* &unk_5B48E28 */

bool is_cute_layout(Type *t) {
    return t->kind_ptr == &kCuteLayoutTypeID;
}

Sentinels do not scatter across the binary — they cluster into a small number of address bands, one band per owning dialect or category. Three bands carry the weight of Tileiras dispatch.

BandOwnerExamples
&unk_5B38B[B0..C8]cuda_tile dialect Type TypeIDscuda_tile.tile, cuda_tile.ptr, cuda_tile.tensor_view
&unk_5B48D[88..F8] / 5B48E[00..58]cute_nvgpu concrete Type TypeIDs (27 slots, 8-byte pitch)cute_nvgpu.sm90.mma, cute_nvgpu.smem_desc, cute_nvgpu.atom.tma_load
&unk_5B49A[98..B18]cute dialect concrete Type TypeIDs (17 slots, 8-byte pitch)cute.layout, cute.swizzle, cute.tile
&unk_5B44E[B8..F8] / 5B44F[08..FD8]nv_tileas per-op opInfo sentinels (21 ops, 8-byte pitch)nv_tileas.tiled_load @ 5B44ED0, nv_tileas.gather_load @ 5B44FA8, nv_tileas.convert_layout @ 5B44FD8. Paired kindPtr forms live in 0x5BE3F* / 0x5BE4* / 0x5BE5* — see Sentinel Sharing And Aliasing.
&unk_5B46[D28..F68]nv_tileaa per-op FoldRecord sentinels (33 ops)nv_tileaa.make_memref, nv_tileaa.block_tile
&unk_5BE5xxx / &unk_5BE6xxxUpstream MLIR Type and Attribute TypeIDs (built-in dialect)f32 at &unk_5BE6030, f8E4M3FN at &unk_5BE60A0
&unk_5BAADxxOpaque / erased-storage TypeIDsthe i32-blocked-layout-id-1 variant at &unk_5BAADB8

Dialect TypeIDs get their own one-byte slots too: &unk_5B496B8 for the cute dialect, &unk_5B482C8 for cute_nvgpu, &unk_5BA8F60 for LLVM, &unk_5BE5908 for arith. The nv_tile_ir::as::schedule_utils::ScheduleAnalysis analysis registration at qword_5B38E78 is the canonical Idiom-2 example for analyses; the dialect-level Idiom-1 sentinels and the analysis-level Idiom-2 sentinels coexist in the same MLIRContext without colliding because their bands never overlap.

Dispatch By Pointer-Identity

Every walker, canonicalizer, and verifier in the binary distinguishes ops the same way: load op->kindPtr from *(qword*)(op + 48) + 16 and compare it against a list of sentinel addresses. The classifier sub_7ACC40 — the mode-classifier for the TileAS layout-assignment pass — is a representative example.

int classify_load_store_op(Operation *op, ModuleSpec *spec, LayoutCandVec *cands) {
    void *kind = *(void **)(*(qword *)(op + 48) + 16);

    if (kind == &unk_5BE6138) {                       /* null-opinfo (mid-rewrite) */
        if (kind_ptr_is_tiled_atomic_rmw(op))         /* &unk_5B44ED8 via leaf */
            return tail_call_primary_resolver(op, spec, cands);
        if (kind_ptr_is_scatter_store(op))            /* &unk_5B44EF0 via leaf */
            return tail_call_fallback_resolver(op, spec, cands);
        return FAILURE;
    }

    if (kind == &unk_5B44ED0 /* tiled_load   */ ||
        kind == &unk_5B44EC8 /* tiled_store  */)
        return tail_call_primary_resolver(op, spec, cands);

    if (kind == &unk_5B44F90 /* nv_tileas.load     */) return classify_load_inline(op, spec, cands);
    if (kind == &unk_5B44EE0 /* nv_tileas.store    */) return CANONICAL_MODE;
    if (kind == &unk_5B44FA8 /* gather_load        */) return tail_call_gather_resolver(op, spec, cands);

    return FAILURE;
}

The entire switch is a sequence of pointer comparisons. No string lives in this function — six CMP instructions on a hot path that runs once per op. Reimplementations must preserve this property; the address-as-identity model is what makes generic walkers cheap.

The &unk_5BE6138 Null-Opinfo Guard

One sentinel earns its own paragraph. &unk_5BE6138 is the "no properties" guard that ops without an inline Properties payload carry as their kindPtr discriminator during construction and mid-rewrite. Dispatchers test it first to short-circuit the properties-decode path. It is also the address an in-flight RewritePattern leaves in op->kindPtr after wiping the original singleton — which is why every resolver in the load-store cluster (sub_7ACC40, sub_788BE0, sub_7E3440) tests for it before falling through to the leaf-predicate helpers sub_7A9D30 (tiled_atomic_rmw) and sub_79DA80 (scatter_store), which read one indirection deeper to recover the original identity.

Treat the null-opinfo sentinel as a transient state. A walker that observes it on a fully constructed op outside a rewrite frame should report failure rather than guess the kind.

Idiom 2 — Meyers-Cached TypeID

When a TypeID is not a link-time constant — primarily Op and Type interfaces attached via addInterfaces<>, analysis types registered through mlir::AnalysisManager, and pattern RTTI tags — the binary falls back to a {guard:u8, qword:u64} pair plus a one-shot init function. The factory sub_44A6CA0 takes a string ending in ] (the closing bracket of __PRETTY_FUNCTION__ captured by MLIR's TypeID::get<T>() trick) and returns the uniqued TypeID* for that string. These strings sit in ordinary .rodata literal pools, not in the Itanium typeinfo block, so a binary triage that scans for typeinfo for'mlir::... will find nothing — every MLIR identity string in this binary is addressable only through the corresponding install-site call.

TypeID get_function_op_interface_typeid(void) {
    static uint8_t  guard = 0;        /* byte_5B37668  */
    static uint64_t cached = 0;       /* qword_5B37670 */

    if (__builtin_expect(guard == 0, 0)) {
        if (__cxa_guard_acquire(&guard) != 0) {
            cached = (uint64_t)sub_44A6CA0("mlir::FunctionOpInterface]", 22);
            __cxa_guard_release(&guard);
        }
    }
    return (TypeID)cached;
}

void install_interface(InterfaceMap *map, TypeID id, void *concept) {
    sub_4492D60(map, id, concept);
}

The guard byte sits immediately before the qword in .bss, with 8-byte alignment so the qword stays naturally aligned. The Itanium ABI's __cxa_guard_acquire / __cxa_guard_release pair makes initialisation thread-safe; __builtin_expect(guard == 0, 0) keeps the steady-state load on the fast path. After the first call the qword is the TypeID, and the slot behaves exactly like an Idiom-1 sentinel — except its address is the address of a 64-bit pointer, not a 1-byte tag.

Concrete examples observed in the binary, all matching this exact template:

Qword slotRTTI string (verbatim)Used by
qword_5B37670mlir::FunctionOpInterface]ConvertTileFuncToLLVM
qword_5B37798mlir::SymbolTable]symbol-table analysis lookup
qword_5B38E18mlir::LoopLikeOpInterface]loop pipeliner and licm passes
qword_5B38E78mlir::nv_tile_ir::as::schedule_utils::ScheduleAnalysis]TileAS scheduler analysis manager
qword_5B44600mlir::cutlass_ir::cute::LayoutTypeInterface]every CuTe layout-bearing type
qword_5B44618mlir::cutlass_ir::cute::ViewTypeInterface]every CuTe view-bearing type
qword_5B46FF8mlir::cutlass_ir::cute::MmaAtomTypeInterface]9 SM-specific MMA atom installs (SM70..SM120)
qword_5B47028mlir::cutlass_ir::cute::PrintableTypeInterface]16+ concrete cute / cute_nvgpu type installs
qword_5B47088mlir::cutlass_ir::cute::DescriptorIteratorTypeInterface]TMA / shared-memory descriptor iteration

The pairs cluster tightly in .bss by design: seven cute-interface slots in the band 0x5B47000..0x5B470D0, three more in 0x5B44600..0x5B44890. Each band holds the interface-id table for a single owning dialect, registered in one initialiser.

Choosing Between the Two

Idiom 1 covers objects that exist before main — dialect TypeIDs, registered Type and Attribute subclasses, the per-op kindPtr singletons in &unk_5B44Exx / 5B44Fxx. Their addresses are link-time constants, and the linker packs them into dense 8-byte-pitched bands. Idiom 2 covers objects whose existence depends on a runtime registration step — interfaces attached after a dialect loads, analyses keyed by C++ type, pattern RTTI tags. Their identity has to derive from the C++ type alone, even across translation units, so the binary spells the type name out and uniques the string.

A TypeID never moves between idioms. Once an interface owns a Meyers slot, every install site uses that slot; once a concrete Type owns a .bss sentinel, no caller ever asks the factory for its name.

Sentinel Sharing And Aliasing

Two cross-dialect aliases deserve a flag. &unk_5B49B18 serves as the TypeID for both cute.ConstrainedInt and cute_nvgpu.AtomIType — the two share an identical inline i<N>(<divby M>)? printer surface, and the binary treats them as one identity. qword_5B47028 (PrintableTypeInterface) attaches to every concrete cute Type and most cute_nvgpu Types — 27+ installs of the same interface against different concrete types. Both patterns are legitimate; both rely on Idiom-1 and Idiom-2 sentinels being stable for the lifetime of the MLIRContext.

The "OperationName ↔ AbstractOperation" split is the other place dispatch sentinels alias. A single op mnemonic owns two singletons: &unk_5B44FD8 is the OperationName.opInfo (the descriptor passed to sub_4461CA0 during dialect registration) for nv_tileas.convert_layout, while &unk_5BE4008 is the AbstractOperation kindPtr that ends up at *(qword*)(op+48)+16 after the op is uniqued. Same op identity, two sentinels at different indirection levels — a verifier that wants to recognise the op needs to pick the right level.

How to Recognize in a Binary

Idiom 1 is identified by the address band rather than the content. Sentinels cluster densely in 8-byte-pitched runs inside the .bss ranges listed above; the only operation performed on a sentinel is to take its address. A 1-byte object at an 8-byte-aligned offset, never written, whose address appears as the right-hand side of a CMP against an op-header field at +0x40 or against *(qword*)(op + 48) + 16 is an Idiom-1 sentinel.

Idiom 2 is identified by the {guard:u8, qword:u64} pair plus a guarded one-shot init body. The characteristic sequence is __cxa_guard_acquire(guard) → factory(rtti_string, length) → store result into qword → __cxa_guard_release(guard). The factory sub_44A6CA0 takes a string ending in ] (the closing bracket from __PRETTY_FUNCTION__); any call to this factory with a string literal argument is an Idiom-2 install site.

The null-opinfo sentinel &unk_5BE6138 is the single most useful cross-cutting fingerprint for in-flight rewrites. Any walker that loads *(qword*)(op + 48) + 16 and immediately compares it against &unk_5BE6138 is auditing for the mid-rewrite state documented above.

Consumers

The kindPtr at *(qword*)(op + 48) + 16 is read by every walker, verifier, canonicaliser, and pattern matcher in the binary. The walker driver sub_447FBB0 from Operation Layout — Walker Contract dispatches against these sentinels; the pattern fingerprint map built by FrozenRewritePatternSet in Pattern Vtables and Shapes keys on OperationName.opInfo addresses; the InterfaceMap in Interface Vtables — InterfaceMap Layout keys on the same TypeID addresses (the Meyers Idiom-2 ones for interfaces, the Idiom-1 ones for concrete classes).

Cross-References

The companion page TypeID Construction Idioms covers the two idioms in the abstract — why &typeid(T) is unusable under hidden visibility, how Idiom 1 builds identity from a per-class static sentinel address, how Idiom 2 builds it from a __PRETTY_FUNCTION__-derived string interned through the process-wide pool — without the address-band specifics that occupy this page. Read that page first for the architectural justification; read this page for the tileiras layout.

Operation Layout describes the op header where the kindPtr lives. Interface VTables covers the concept tables that the Meyers-cached interface TypeIDs key into. Storage Uniquer and ContextImpl documents the registration machinery that installs both idioms during dialect load. The companion address-sorted reference TypeID Sentinel Address Table enumerates every individual sentinel referenced anywhere in the binary, including the full 213-slot NVVM op slab at 0x5B8D610..0x5B8DCB8 and the 33-slot nv_tileaa FoldRecord band at 0x5B46D28..0x5B46F68, neither of which is unpacked here.