Intrinsic ID Switch + Name Table

Abstract

tileiras carries the LLVM constant-folder predicate that decides whether a CallBase can be evaluated at compile time. It is the upstream llvm::canConstantFoldCallTo(const CallBase*, const Function*) shape with NVIDIA extensions for NVPTX intrinsics and libdevice naming conventions. A positive result permits the APFloat/APInt folding body to replace the call with a constant.

The dispatcher decomposes into a primary 412-case switch on Function::IntrinsicID, a secondary 161-case switch for the Intrinsic::nvvm_* block, a sparse high-ID range tree, and a name-walking tail for non-intrinsic libdevice and finite-math aliases.

412-case `Intrinsic::ID` switch

The primary switch is indexed by IntrinsicID ∈ [0, 411]. Five successor buckets are reached:

Target	Bucket	Cases	Semantic
`T_FALSE`	A	311	`return false`; intrinsic carries side effects or is not foldable.
`T_ATTR`	B	29	`return !NoFold && !StrictFP`; floating-point arithmetic gated by attributes.
`T_TRUE`	C	71	`return true`; pure integer/bit-domain APInt-foldable.
`T_LIB`	D	1	`Intrinsic::not_intrinsic`; dispatch on `Function::getName()`.
`T_DEF`	—	—	default arm; range tree for IDs above the primary table.

Bucket A (T_FALSE, 311 cases) collects the IDs that have observable side effects on memory, the debug-info family, EH/GC/sanitizer support, frame/return-address probes, the entire VP-intrinsic block, and the low-numbered NVPTX intrinsics whose lowering happens during NVPTX ISel pattern matching rather than at constant-fold time. The verbatim union of cases is 2..11, 13, 16..19, 22, 23, 27..62, 68..87, 91..96, 98..101, 110..113, 116..127, 129, 130, 134..139, 141..172, 174, 180, 181, 185..187, 189..208, 213..220, 224..230, 232..237, 241..248, 252, 254..287, 290..311, 318..328, 331, 338, 340, 341, 344..349, 351..358, 360..362, 365, 367, 368, 371, 372, 374, 377..380, 382..387, 391..396, 399..404.

Bucket B (T_ATTR, 29 cases) is the floating-point arithmetic family: llvm.{sin,cos,exp,exp2,exp10,log,log2,log10,pow,sqrt,fma,minnum,maxnum,copysign,fabs,floor,ceil,trunc,round,roundeven,nearbyint,rint} and their f16/bf16/f32/f64/fp128/x86_fp80 type-overloaded variants. The folder can evaluate them via the APFloat-emulating tail, but only when the surrounding Function carries neither NoFold nor StrictFP. Cases: 12, 24, 25, 63, 64, 88..90, 176..179, 182, 212, 221..223, 238..240, 249..251, 288, 289, 329, 330, 332, 339.

Bucket C (T_TRUE, 71 cases) is the bit-precise integer arithmetic surface: llvm.abs, umax/umin/smax/smin, the vector_reduce_* family (102..109), the saturating-arith block (209..211), the bswap/ctlz/cttz/ctpop/bitreverse/fshl/fshr bitfield block (312..317), and the matrix / masked-{load,store,gather,scatter} family at the upper end (405..411). Cases: 1, 14, 15, 20, 21, 26, 65..67, 97, 102..109, 114, 115, 128, 131..133, 140, 173, 175, 183, 184, 188, 209..211, 231, 253, 312..317, 333..337, 342, 343, 350, 359, 363, 364, 366, 369, 370, 373, 375, 376, 381, 388..390, 397, 398, 405..411.

Bucket D is the single case 0 (Intrinsic::not_intrinsic) path. Before reaching the name-walking sub-tree it checks that the function only reads memory, re-runs the NoFold and StrictFP gates, loads Function::getName(), and dispatches on the first character. The sum 311 + 29 + 71 + 1 = 412 exhausts every label in the primary table.

161-case secondary switch — 8851..9011 (NVPTX block)

When the default arm sees an ID in the NVPTX intrinsic range, it falls into a 161-case secondary switch. This block covers per-shape variants of cp.async.bulk.tensor.{1..5}d, tcgen05.* alloc/dealloc/commit, wgmma.fence, fence.proxy.*, mbarrier.*, cluster.*, ldmatrix.*, stmatrix.*, and block-scaled MMA dispatcher entries. All 161 IDs are explicitly classified between T_FALSE and T_ATTR; no NVPTX hardware-effect intrinsic is always foldable.

ID	Bucket	Class	Notes
8851	T_ATTR	TMA-tensor metadata	First case in block; per-shape "no-op" variant
8852	T_ATTR	TMA prefetch	Foldable to no-op if not `StrictFP`-marked
8853	T_FALSE	TMA store	Side-effecting on shared/global
8854	T_ATTR	commit-group head	First of 5-stride boundary family
8855..8916	T_FALSE	`cp.async.bulk.tensor.*` body	62-case contiguous block — all SM90+ TMA primitives
8917	T_ATTR	TMA fence variant	`+5` stride from 8852
8923	T_ATTR	`tcgen05.alloc` head	5th in the 5-step pattern
8931, 8936, 8941, 8946, 8951	T_ATTR	`tcgen05.commit` / `tcgen05.fence`	One per dimension
8956, 8972, 8978	T_ATTR	`wgmma.fence.{sync,async,wait}`	Hopper warpgroup-MMA fences
8957..8971	T_FALSE	`wgmma.mma_async.*`	Side-effecting matrix multiply
8997..9010	T_FALSE	`mbarrier.arrive.` / `cluster.`	Side-effecting sync primitives
9011	T_ATTR	last case	Final ID in block

The 23 T_ATTR IDs {8851, 8852, 8854, 8917, 8919, 8923, 8926, 8931, 8936, 8941, 8946, 8951, 8956, 8972, 8974, 8978, 8981, 8986, 8991, 8996, 9001, 9006, 9011} cluster suspiciously on +5 strides — they correspond to the metadata-only / prefetch / commit-group variants of each TMA-tensor dimension. The remaining 138 IDs go to T_FALSE.

Default-case binary tree for high IDs

When ID > 9011 the default arm executes a hand-coded binary search over the sparse high-ID space [3184, 15923]. Membership for tight ranges is tested with 64-bit bitmasks rather than nested compares — a classic clang sparse-switch pattern. The decision tree splits at 0x2628 (9768), 0x3AA3 (15011), 0x2628, 0x255F (9567), 0x254B (9547), and 0x21FF (8703); each leaf is a goto T_TRUE/T_ATTR/T_FALSE. The bit-mask leaves are:

Range base	Selected IDs	Target
8740	8740..8755, 8770..8786	`T_ATTR`
9548	9548, 9553..9567	`T_ATTR`
9695	9695, 9696, 9697, 9699, 9704, 9708	`T_ATTR`
9723	9723..9726, 9762, 9764, 9766	`T_ATTR`
9830	9830, 9832, 9833, 9839..9842	`T_ATTR`
15889	15889, 15890, 15921, 15922, 15923	`T_ATTR`

Isolated T_TRUE IDs from the same tree: 1352, 3184, 3260, 3278, 3299, 3422..3424, 3600..3604, 8294 (cvt.packfloat head), 9211, and 14542..14543. Isolated T_ATTR IDs: 2191, 2192..2196, 2315, 2318..2319, 3312, 8625, 8638..8653, 8698..8699, 8703, 9178, 15006..15011, and 15486..15493. Every other ID outside the enumerated leaves falls through to T_FALSE.

LLVM 17/18 fingerprint analysis

Three independent fingerprints converge on the LLVM 17/18 family. The generic Intrinsic::ID space contains exactly 412 entries, which sits between upstream LLVM 17 and 18 counts. The Function::IntrinsicID field position rules out older layouts, and the attribute gate uses the slot occupied by NoFold and StrictFP in the LLVM 17 family. The combined evidence favors an LLVM 17-era generic table with NVIDIA NVPTX additions, though LLVM 18 with selected legacy removals remains close enough that the public documentation should treat this as a 17/18-family implementation detail.

libdevice suffix name table

The case 0 tail walks Function::getName() byte-by-byte and dispatches into nested switches for generic libm names, Itanium-mangled names, and CUDA-C suffix overloads such as *d, *ff, and *dd.

String	Class
`remainderf`	libdevice helper
`powff`, `powdd`	CUDA-C type-suffix helpers
`acosd`, `asind`, `atand`, `ceild`, `coshd`, `exp2d`, `fabsd`	double-precision suffix helpers
`sinhd`, `sqrtd`, `tanhd`, `floord`, `log10d`	double-precision suffix helpers
`__acos_finite`, `__acosf_finite`, `__asin_finite`, `__asinf_finite`	finite-math aliases
`__atan2_finite`, `__atan2f_finite`, `__cosh_finite`, `__coshf_finite`	finite-math aliases
`__sinh_finite`, `__sinhf_finite`	finite-math aliases

The suffix names are CUDA-C overload helpers that disambiguate float and double arguments where C++ ABI mangling is unavailable: f means float scalar, d means double scalar, ff means (float, float), and dd means (double, double). These symbols are recognition keys; libdevice itself exposes canonical __nv_* names. When the walker matches a suffix helper, lowering rewrites the call to the canonical symbol pair, for example acosd to __nv_acos and powff to __nv_powf. The __<name>_finite entries are GCC/Clang finite-math call targets and fold identically to their non-finite siblings for constant operands.

A separate mini-table holds the Itanium-mangled binary-argument helpers consumed by the constant-fold rewriter:

String	Demangled
`_Z4fmodff`	`fmod(float, float)`
`_Z4fmoddd`	`fmod(double, double)`
`_Z5atan2ff`	`atan2(float, float)`
`_Z5atan2dd`	`atan2(double, double)`

Together the suffix table, mangled helper table, and finite-math aliases form the NVIDIA extension to LLVM's TargetLibraryInfo recognition set.

Reimplementation Notes

can_constant_fold(call):
    if call.callee.is_intrinsic:
        return classify_intrinsic(call.callee.intrinsic_id, call.function_attrs)

    if not call.callee.only_reads_memory:
        return false
    if call.function_attrs.has("NoFold") or call.function_attrs.has("StrictFP"):
        return false

    return classify_libdevice_name(call.callee.name)

Keep the side-effecting NVPTX intrinsics out of the always-foldable bucket. Metadata-only and prefetch-like intrinsics may be attribute-gated, but barriers, async copies, tensor-memory operations, and cluster synchronization must remain non-foldable.

Cross-references

The libdevice linking and reflect-folding sequence that produces the call sites this table classifies is documented in libdevice Overview — Pipeline. The reflection mechanism behind __CUDA_PREC_* / __CUDA_FTZ is documented in NVVMReflect Mechanism. The lowering side — which MLIR math.* / arith.* ops feed this table through __nv_* calls — is documented in Math Pass Pipeline and Crosswalk — Full math-op crosswalk. The NVPTX intrinsic IDs in the 8851..9011 range correspond to the cluster/TMA/tcgen05/WGMMA families documented in tcgen05, WGMMA, mbarrier, and Cluster Sync, TMA, Tensormap, and cp.async.bulk Emission, and the NVVM dialect overviews (nvvm cluster ops, nvvm mbarrier ops, nvvm tma ops, nvvm tcgen05 ops, nvvm wgmma ops).

Keyboard shortcuts

Tileiras Internals