Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Intrinsic ID Switch + Name Table

Abstract

tileiras carries the LLVM constant-folder predicate that decides whether a CallBase can be evaluated at compile time. It is the upstream llvm::canConstantFoldCallTo(const CallBase*, const Function*) shape with NVIDIA extensions for NVPTX intrinsics and libdevice naming conventions. A positive result permits the APFloat/APInt folding body to replace the call with a constant.

The dispatcher decomposes into a primary 412-case switch on Function::IntrinsicID, a secondary 161-case switch for the Intrinsic::nvvm_* block, a sparse high-ID range tree, and a name-walking tail for non-intrinsic libdevice and finite-math aliases.

412-case Intrinsic::ID switch

The primary switch is indexed by IntrinsicID ∈ [0, 411]. Five successor buckets are reached:

TargetBucketCasesSemantic
T_FALSEA311return false; intrinsic carries side effects or is not foldable.
T_ATTRB29return !NoFold && !StrictFP; floating-point arithmetic gated by attributes.
T_TRUEC71return true; pure integer/bit-domain APInt-foldable.
T_LIBD1Intrinsic::not_intrinsic; dispatch on Function::getName().
T_DEFdefault arm; range tree for IDs above the primary table.

Bucket A (T_FALSE, 311 cases) collects the IDs that have observable side effects on memory, the debug-info family, EH/GC/sanitizer support, frame/return-address probes, the entire VP-intrinsic block, and the low-numbered NVPTX intrinsics whose lowering happens during NVPTX ISel pattern matching rather than at constant-fold time. The verbatim union of cases is 2..11, 13, 16..19, 22, 23, 27..62, 68..87, 91..96, 98..101, 110..113, 116..127, 129, 130, 134..139, 141..172, 174, 180, 181, 185..187, 189..208, 213..220, 224..230, 232..237, 241..248, 252, 254..287, 290..311, 318..328, 331, 338, 340, 341, 344..349, 351..358, 360..362, 365, 367, 368, 371, 372, 374, 377..380, 382..387, 391..396, 399..404.

Bucket B (T_ATTR, 29 cases) is the floating-point arithmetic family: llvm.{sin,cos,exp,exp2,exp10,log,log2,log10,pow,sqrt,fma,minnum,maxnum,copysign,fabs,floor,ceil,trunc,round,roundeven,nearbyint,rint} and their f16/bf16/f32/f64/fp128/x86_fp80 type-overloaded variants. The folder can evaluate them via the APFloat-emulating tail, but only when the surrounding Function carries neither NoFold nor StrictFP. Cases: 12, 24, 25, 63, 64, 88..90, 176..179, 182, 212, 221..223, 238..240, 249..251, 288, 289, 329, 330, 332, 339.

Bucket C (T_TRUE, 71 cases) is the bit-precise integer arithmetic surface: llvm.abs, umax/umin/smax/smin, the vector_reduce_* family (102..109), the saturating-arith block (209..211), the bswap/ctlz/cttz/ctpop/bitreverse/fshl/fshr bitfield block (312..317), and the matrix / masked-{load,store,gather,scatter} family at the upper end (405..411). Cases: 1, 14, 15, 20, 21, 26, 65..67, 97, 102..109, 114, 115, 128, 131..133, 140, 173, 175, 183, 184, 188, 209..211, 231, 253, 312..317, 333..337, 342, 343, 350, 359, 363, 364, 366, 369, 370, 373, 375, 376, 381, 388..390, 397, 398, 405..411.

Bucket D is the single case 0 (Intrinsic::not_intrinsic) path. Before reaching the name-walking sub-tree it checks that the function only reads memory, re-runs the NoFold and StrictFP gates, loads Function::getName(), and dispatches on the first character. The sum 311 + 29 + 71 + 1 = 412 exhausts every label in the primary table.

161-case secondary switch — 8851..9011 (NVPTX block)

When the default arm sees an ID in the NVPTX intrinsic range, it falls into a 161-case secondary switch. This block covers per-shape variants of cp.async.bulk.tensor.{1..5}d, tcgen05.* alloc/dealloc/commit, wgmma.fence, fence.proxy.*, mbarrier.*, cluster.*, ldmatrix.*, stmatrix.*, and block-scaled MMA dispatcher entries. All 161 IDs are explicitly classified between T_FALSE and T_ATTR; no NVPTX hardware-effect intrinsic is always foldable.

IDBucketClassNotes
8851T_ATTRTMA-tensor metadataFirst case in block; per-shape "no-op" variant
8852T_ATTRTMA prefetchFoldable to no-op if not StrictFP-marked
8853T_FALSETMA storeSide-effecting on shared/global
8854T_ATTRcommit-group headFirst of 5-stride boundary family
8855..8916T_FALSEcp.async.bulk.tensor.* body62-case contiguous block — all SM90+ TMA primitives
8917T_ATTRTMA fence variant+5 stride from 8852
8923T_ATTRtcgen05.alloc head5th in the 5-step pattern
8931, 8936, 8941, 8946, 8951T_ATTRtcgen05.commit / tcgen05.fenceOne per dimension
8956, 8972, 8978T_ATTRwgmma.fence.{sync,async,wait}Hopper warpgroup-MMA fences
8957..8971T_FALSEwgmma.mma_async.*Side-effecting matrix multiply
8997..9010T_FALSEmbarrier.arrive.* / cluster.*Side-effecting sync primitives
9011T_ATTRlast caseFinal ID in block

The 23 T_ATTR IDs {8851, 8852, 8854, 8917, 8919, 8923, 8926, 8931, 8936, 8941, 8946, 8951, 8956, 8972, 8974, 8978, 8981, 8986, 8991, 8996, 9001, 9006, 9011} cluster suspiciously on +5 strides — they correspond to the metadata-only / prefetch / commit-group variants of each TMA-tensor dimension. The remaining 138 IDs go to T_FALSE.

Default-case binary tree for high IDs

When ID > 9011 the default arm executes a hand-coded binary search over the sparse high-ID space [3184, 15923]. Membership for tight ranges is tested with 64-bit bitmasks rather than nested compares — a classic clang sparse-switch pattern. The decision tree splits at 0x2628 (9768), 0x3AA3 (15011), 0x2628, 0x255F (9567), 0x254B (9547), and 0x21FF (8703); each leaf is a goto T_TRUE/T_ATTR/T_FALSE. The bit-mask leaves are:

Range baseSelected IDsTarget
87408740..8755, 8770..8786T_ATTR
95489548, 9553..9567T_ATTR
96959695, 9696, 9697, 9699, 9704, 9708T_ATTR
97239723..9726, 9762, 9764, 9766T_ATTR
98309830, 9832, 9833, 9839..9842T_ATTR
1588915889, 15890, 15921, 15922, 15923T_ATTR

Isolated T_TRUE IDs from the same tree: 1352, 3184, 3260, 3278, 3299, 3422..3424, 3600..3604, 8294 (cvt.packfloat head), 9211, and 14542..14543. Isolated T_ATTR IDs: 2191, 2192..2196, 2315, 2318..2319, 3312, 8625, 8638..8653, 8698..8699, 8703, 9178, 15006..15011, and 15486..15493. Every other ID outside the enumerated leaves falls through to T_FALSE.

LLVM 17/18 fingerprint analysis

Three independent fingerprints converge on the LLVM 17/18 family. The generic Intrinsic::ID space contains exactly 412 entries, which sits between upstream LLVM 17 and 18 counts. The Function::IntrinsicID field position rules out older layouts, and the attribute gate uses the slot occupied by NoFold and StrictFP in the LLVM 17 family. The combined evidence favors an LLVM 17-era generic table with NVIDIA NVPTX additions, though LLVM 18 with selected legacy removals remains close enough that the public documentation should treat this as a 17/18-family implementation detail.

libdevice suffix name table

The case 0 tail walks Function::getName() byte-by-byte and dispatches into nested switches for generic libm names, Itanium-mangled names, and CUDA-C suffix overloads such as *d, *ff, and *dd.

StringClass
remainderflibdevice helper
powff, powddCUDA-C type-suffix helpers
acosd, asind, atand, ceild, coshd, exp2d, fabsddouble-precision suffix helpers
sinhd, sqrtd, tanhd, floord, log10ddouble-precision suffix helpers
__acos_finite, __acosf_finite, __asin_finite, __asinf_finitefinite-math aliases
__atan2_finite, __atan2f_finite, __cosh_finite, __coshf_finitefinite-math aliases
__sinh_finite, __sinhf_finitefinite-math aliases

The suffix names are CUDA-C overload helpers that disambiguate float and double arguments where C++ ABI mangling is unavailable: f means float scalar, d means double scalar, ff means (float, float), and dd means (double, double). These symbols are recognition keys; libdevice itself exposes canonical __nv_* names. When the walker matches a suffix helper, lowering rewrites the call to the canonical symbol pair, for example acosd to __nv_acos and powff to __nv_powf. The __<name>_finite entries are GCC/Clang finite-math call targets and fold identically to their non-finite siblings for constant operands.

A separate mini-table holds the Itanium-mangled binary-argument helpers consumed by the constant-fold rewriter:

StringDemangled
_Z4fmodfffmod(float, float)
_Z4fmodddfmod(double, double)
_Z5atan2ffatan2(float, float)
_Z5atan2ddatan2(double, double)

Together the suffix table, mangled helper table, and finite-math aliases form the NVIDIA extension to LLVM's TargetLibraryInfo recognition set.

Reimplementation Notes

can_constant_fold(call):
    if call.callee.is_intrinsic:
        return classify_intrinsic(call.callee.intrinsic_id, call.function_attrs)

    if not call.callee.only_reads_memory:
        return false
    if call.function_attrs.has("NoFold") or call.function_attrs.has("StrictFP"):
        return false

    return classify_libdevice_name(call.callee.name)

Keep the side-effecting NVPTX intrinsics out of the always-foldable bucket. Metadata-only and prefetch-like intrinsics may be attribute-gated, but barriers, async copies, tensor-memory operations, and cluster synchronization must remain non-foldable.

Cross-references

The libdevice linking and reflect-folding sequence that produces the call sites this table classifies is documented in libdevice Overview — Pipeline. The reflection mechanism behind __CUDA_PREC_* / __CUDA_FTZ is documented in NVVMReflect Mechanism. The lowering side — which MLIR math.* / arith.* ops feed this table through __nv_* calls — is documented in Math Pass Pipeline and Crosswalk — Full math-op crosswalk. The NVPTX intrinsic IDs in the 8851..9011 range correspond to the cluster/TMA/tcgen05/WGMMA families documented in tcgen05, WGMMA, mbarrier, and Cluster Sync, TMA, Tensormap, and cp.async.bulk Emission, and the NVVM dialect overviews (nvvm cluster ops, nvvm mbarrier ops, nvvm tma ops, nvvm tcgen05 ops, nvvm wgmma ops).