Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Binary Vtable Banks + Static Ctors

Abstract

The tileiras ELF uses conventional C++ static registration heavily. MLIR pass models, rewrite patterns, dialect descriptors, op interfaces, NVPTX register classes, and target-lowering hooks all publish through vtables or static constructors before normal compilation starts. This page documents the runtime shape of that registration layer without treating raw table addresses as part of the public API.

PassConcept vtable shape

Every MLIR new-PM PassModel<PassT> instantiation uses the same PassConcept<PassT> shape: Itanium ABI prefix, destructor pair, run wrapper, name printer, isRequired hook, and tail destructor. The uniform shape is what lets MLIR store arbitrary pass models behind one concept pointer while still dispatching to the typed pass implementation.

SlotRole
0typeinfo pointer, null in this no-RTTI build
1typeinfo extension word, null in this no-RTTI build
2deleting destructor
3non-virtual destructor body
4run(IRUnitT&, AM&) wrapper
5name() printer trampoline
6isRequired()
7tail ~PassModel()

The run wrapper adjusts from the erased PassConcept base to the typed PassT object, calls the real pass body, and returns the model pointer. For a reimplementation only the ownership-and-dispatch contract matters: a pass instance must retain enough typed state for its run, name, and isRequired hooks to agree.

RewritePattern tables

Rewrite patterns split into two shapes: conversion patterns and plain rewrite patterns. Conversion patterns share a generic rewrite driver that delegates to the typed matchAndRewrite; plain patterns have a smaller table and often use the default no-op match hook. Pattern identity is carried by the op name, benefit, context pointer, and typed rewrite hook — not by a binary table address.

Dialect vtables

Every MLIR Dialect subclass uses the same dialect ABI: destructor pair, canonicalization-pattern hook, constant materializer, attribute parser/printer, type parser/printer, and region/op attribute verifiers.

DialectDistinctive behavior
nv_tileaaInstalls the inliner interface used by the alias-analysis layer.
cutlassDisables textual attribute/type parsing for unsupported forms.
cute_nvgpuProvides the non-trivial textual type printer for GPU atom types.
cuteRelies heavily on generic ODS assembly behavior.
cutlass.seq_bar familyDense op-model family for sequence-barrier operations.

Dialect construction is registration-heavy: the constructor installs namespace, TypeID, attribute/type parsers, op interfaces, and dialect interfaces as one coherent unit. A reimplementation should reproduce the observable dialect behavior and parser/printer hooks, not the original table layout.

NVPTX register-class descriptors

The NVPTX backend ships declared-pool TargetRegisterClass descriptors for PTX register pools: %p, %rs, special registers, %r, %f, %rd, and %rq. These descriptors are pure data: MC register class pointer, subclass masks, allocatable bit, superclass list, and related metadata. The asm printer reads them to choose declarations such as .reg .b<width> %<prefix><N>;.

NVPTXTargetLowering

tileiras carries the normal LLVM SelectionDAG target-lowering surface for NVPTX. Key hooks: LowerFormalArguments, LowerCall, LowerOperation, and the load/vector helper path. Slot-by-slot behavior is covered in NVPTX Target Lowering, Call and Args; this page only records that the target-lowering surface is a conventional LLVM virtual interface.

Static Constructors

The binary has hundreds of static constructors. Each body falls into one of three useful categories:

  • cl::opt<> registrars that publish command-line options, help text, defaults, and flags.
  • TypeID::get<T>() static-local initializers guarded by the Itanium ABI guard protocol.
  • Dispatch-table initializers that install vtables, op interfaces, or dialect interface records.

For reimplementation, constructor order matters only where later code expects a registry to be populated before first use. The durable behavior is the registry side effect, not the original constructor body address.

Per-Dialect Ctor Chain

A .ctors table at 0x591CE78..0x591E2F0 lists all 653 ctor bodies — a void(*)()[] array of function pointers walked by _start before main runs. Each entry is a __cxa_atexit-registered ctor. The order is established at link time and matches the order of declarations across the source units; the dependency-ordered listing matches the actual link order.

Six of those ctors initialise the six in-binary dialects, and the order over those six is not incidental. Each later dialect ctor reads types, attributes, or interface records that a previous ctor registered, so swapping the order would observe partially-populated registries during construction. The chain is strict, single-threaded, and runs to completion before any user code touches the dialect registry.

OrderDialectCtorNotes
1nv_tileaasub_1545E80Lowest dialect; registers Type / Attribute / OperationName slabs first
2nv_tileassub_147EC90Depends on nv_tileaa for token types
3(intermediate)sub_153EC20Registers shared interfaces (FunctionOpInterface, SymbolTable, LoopLikeOpInterface)
4CutlassDialectsub_17640C0Depends on the shared interfaces
5CuteNvgpuDialectsub_17D1190Depends on CutlassDialect for pipeline ops
6CuteDialectsub_1928370Highest dialect; depends on all the above

The cuda_tile dialect registers separately — it is the public input dialect and has its own ctor chain via the dialect-target registry, registered through RegisteredDialect at sub_6B3ED0. It does not appear in the six-step chain above because the chain only covers in-binary dialects whose ctors emit registration calls into the global MLIRContext table; cuda_tile is published into the target registry instead.

__cxa_atexit and the XOR-3 Pool Exception

Most of the 653 ctors register a corresponding __cxa_atexit dtor for ordered teardown — but the XOR-3-encrypted .data pools (mnemonic and register-name pools — see Data Section Decryption for the cipher and decoders, and AsmPrinter Monster and Windows for the AsmWriter consumer) do not register dtors. The pools are zeroed at static-init, decoded at first use via pthread_once, and never re-encoded. The omission is deliberate: pools sit memory-mapped read-only after the first use, so re-encrypting them on shutdown is pointless and a no-op dtor would only add wasted entries to the exit chain.

The init-order over the six dialects also lines up with which dialects use pthread_once guards versus eager static-init. Only the shared-interfaces step at order 3 is gated by a one-shot guard — the interfaces it publishes are queried lazily on first use, so the ctor stages a pthread_once_t slot rather than running registration immediately. The other five dialects run their entire registration at static-init and need no one-shot guard.

Ctorsub_ADDRDialectpthread_once slot
ctor_001sub_1545E80nv_tileaadword_5B6A640 (not used; nv_tileaa has eager init)
ctor_002sub_147EC90nv_tileas(eager)
ctor_003sub_153EC20shared interfacesdword_5B37670 (FunctionOpInterface guard)
ctor_004sub_17640C0CutlassDialect(eager)
ctor_005sub_17D1190CuteNvgpuDialect(eager)
ctor_006sub_1928370CuteDialect(eager)

Referenced Ctor Bodies at 0x46xxxxx

Of the 653 total ctor pointers, only 49 are referenced from elsewhere in the binary — the rest are template instantiations of upstream LLVM/MLIR static-init that fire once and never get named again. The 49 referenced ctors split into a small handful of roles: the six dialect ctors above, twelve cl::opt registrations, eight pass registrations, eleven TypeID-singleton initialisers, four raw_ostream sinks, three fingerprint-singleton initialisers, and five miscellaneous singletons. A reimplementation only needs to reproduce those 49 effects in a clean way; the unreferenced 604 are link-order noise from upstream with no observable post-init behavior in tileiras.

Reimplementation Notes

startup():
    register_command_line_options()
    register_type_ids()
    register_dialects()
    register_passes()
    register_rewrite_patterns()
    register_nvptx_target_lowering()

The registration graph should be explicit in a clean implementation. Avoid making address-contiguous table placement part of the design; it is an artifact of the original linker layout, not a semantic requirement.