LLVM Patch Catalog
Index page. This is a one-row-per-component summary of every upstream LLVM file that cicc v13.0 carries, classified by patch class. For the underlying analysis, follow the Wiki page column to the per-pass deep dive. For the orchestration that schedules these components, see Pipeline & Ordering.
CICC v13.0 statically links a fork of LLVM 20.0.0. Some files are byte-for-byte upstream; some carry GPU-specific threshold overrides registered as new cl::opt knobs; some are recognizably upstream but have algorithmic additions tied to the GPU execution model; and a handful are fully proprietary with no upstream counterpart. This page enumerates every component covered by the LLVM Passes section of the wiki and labels each.
Patch Classes
| Class | Meaning | Reimplementation strategy |
|---|---|---|
| none | Stock LLVM 20.0.0. No behavioral delta from upstream detected in the binary. Symbol layout may differ from a default LLVM build because of NVIDIA's LTO inlining choices, but the algorithm is unchanged. | Use the upstream .cpp directly. |
| config-only | Upstream algorithm preserved verbatim, but cicc registers extra cl::opt knobs (under the NVVMPassOptions umbrella), overrides upstream defaults, and re-runs the pass at additional pipeline positions. No source-level edits to the core algorithm. | Use upstream and re-apply the knob defaults from the cicc constructor. |
| algorithmic | Upstream algorithm clearly recognizable, but cicc adds GPU-specific code paths: extra DAG combines, intrinsic-aware fast paths, NVPTX-aware safety checks, expanded opcode coverage, structural reorganizations, or extra scoped tables. Reimplementing means stock LLVM plus patches. | Use upstream as the base, port the NVIDIA delta as a fork patch. |
| proprietary fork | Either a complete rewrite of an upstream pass (sharing only the registration name) or a pass that has no upstream equivalent at all. The binary may also ship the stock upstream copy alongside the proprietary one for unrelated linkage reasons. | Implement from scratch from the wiki deep-dive page. The upstream .cpp is not a viable starting point. |
Within the tables below, rows are grouped by upstream LLVM directory (the layout reimplementers will most naturally use when porting). Each row points at the dedicated wiki page where the delta is documented in detail. The "Upstream file" column refers to LLVM 20.0.0 unless otherwise noted.
Provenance Signals
A row's patch class is assigned by combining four signals visible in the binary:
- Symbol-level fingerprints. Function lengths, hash-table sizes, switch case counts, and constructor patterns line up against the same compilation of LLVM 20.0.0. A function whose disassembly walks the same case labels in the same order as the upstream
.cpp(read alongside the wiki deep-dive) isnoneorconfig-only. A function that emits extra basic blocks before joining the upstream control flow isalgorithmic. A function that shares only the entry-point name with upstream isproprietary fork. - Knob inventory.
cl::opt<...>constructors (ctor_NNNsymbols at fixed addresses) register the cicc-side tunables. Knobs whose names match upstream verbatim and whose defaults match upstream constants areconfig-onlysignals. Knobs whose names are NVIDIA-prefixed (nv-*,nvvm-*) or whose registrar uses NVIDIA-specific calling conventions (sub_190BE40'sint**default initializer) arealgorithmicsignals. - Pipeline placement. A pass that is scheduled multiple times, at non-upstream pipeline positions, or under cicc's tier system (Tier 0/1/2/3 in
sub_12E54A0) carries scheduling-level customization even when the per-pass binary is stock. - Companion NVIDIA-only symbols. Where the cicc pipeline calls a stock pass and immediately follows it with an NVIDIA-only pass that compensates for what the stock pass cannot do on GPU (the second-pass coalescer after
RegisterCoalescer, MRPA afterMachineScheduler, the NVPTX-specific block placement afterMachineBlockPlacement), the upstream row is labelednonebut the companion row appears under the Target / NVPTX table asproprietary fork.
Rows where the binary contains both an unused stock upstream copy and an active NVIDIA copy (StructurizeCFG, LSR, the dual RAGreedy instances) are labeled by the active copy. See the quirks at the bottom of the page for why both copies coexist.
Pipeline & Infrastructure
| Upstream file | Patch class | NVIDIA delta summary | Wiki page |
|---|---|---|---|
lib/Passes/PassBuilder.cpp | algorithmic | 35 NVIDIA pass classes (33 StringMap registration rows) injected via the standard extension-callback table at [PassBuilder+2208]. Custom per-tier driver (sub_12E54A0) layered on top of the stock PassBuilder. See canonical count. | Pipeline & Ordering |
lib/Passes/PassBuilderPipelines.cpp | algorithmic | Tier 0/1/2/3 sub-pipelines, seven nvopt<...> shorthand expansions, GPU-aware ordering between SROA, LSR, and structurization. | Pipeline & Ordering |
lib/Passes/PassRegistry.def | config-only | Extended with NVIDIA pass names (nvvm-reflect, process-restrict, lower-struct-args, lower-aggr-copies, nv-memory-space-opt, etc.) registered through the same StringMap<PassInfo>. | Pipeline & Ordering |
Analysis
| Upstream file | Patch class | NVIDIA delta summary | Wiki page |
|---|---|---|---|
lib/Analysis/ConstantFolding.cpp | algorithmic | Eligibility checker (sub_14D90D0) and evaluator (sub_14D1BC0) layered on top of ConstantFoldCall. Recognizes 110+ math name variants (C, glibc __*_finite, mangled C++) and 60+ NVVM intrinsic IDs. Exception-safe host FPU wrapper that aborts folding on domain errors. | Constant Folding |
lib/Analysis/ValueTracking.cpp (KnownBits) | algorithmic | computeKnownBits and SimplifyDemandedBits fused into a single 127 KB function (sub_11A7600). GPU-specific oracle (sub_F0C4B0) returns ranges for %tid, %ntid, %ctaid, %nctaid, %warpsize, %laneid. | KnownBits & DemandedBits |
lib/Analysis/ScalarEvolution.cpp | algorithmic | Non-recursive worklist createSCEV (matches LLVM 16+ refactor). NVIDIA additions: simple_mode complexity gate, GPU-specific SCEV sources (thread-index bounds), CUDA loop-idiom recognition (warp-stride, grid-stride). Core SCEV algebra untouched. | SCEV Overview, SCEV Ranges, SCEV Invalidation |
Transforms / Scalar
| Upstream file | Patch class | NVIDIA delta summary | Wiki page |
|---|---|---|---|
lib/Transforms/Scalar/SROA.cpp | none | LLVM 17+ two-pass analysis path (qword_50055E8); preserve-cfg/modify-cfg parser params unchanged. No detected modification to core splitting. NVIDIA's contribution is purely scheduling — SROA runs twice (early + late) in the cicc pipeline. | SROA Deep Dive |
lib/Transforms/Scalar/EarlyCSE.cpp | algorithmic | Four CUDA extensions: barrier-aware memory versioning (__syncthreads invalidation), AS 7 store-fwd protection, NVVM intrinsic CSE handler (sub_2780450) with thread-invariant special-register fast path, PHI operand limit = 5. Adds a fourth scoped hash table (store-forwarding) absent upstream. | EarlyCSE Deep Dive, Scalar Passes |
lib/Transforms/InstCombine/*.cpp | algorithmic | ~15 upstream files inlined into a single 58 KB visitor (sub_10EE7A0, 12,673 insns). 80 unique opcode cases, three-level switch covering standard LLVM opcodes plus NVIDIA-extended FMA/vector opcodes (238–245) plus three high-opcode NVVM intrinsics (9549, 9553, 9567). Separate 11 KB intrinsic folder (sub_1169C30, 2,541 insns). | InstCombine |
lib/Transforms/Scalar/GVN.cpp | algorithmic | 11 NVIDIA tunables for PRE, store splitting, PHI removal, dominator caching, and recursion depth. Custom registrar sub_190BE40 for int**-style defaults. | GVN |
lib/Transforms/Scalar/NewGVN.cpp | algorithmic | Ships alongside classic GVN at 0x19F99A0 (68 KB). Same knob constructor (ctor_201). Used at NewGVN/GVNHoist pipeline slot. | GVN |
lib/Transforms/Scalar/JumpThreading.cpp | config-only | Core algorithm unmodified. Adds loop-aware duplication threshold overrides (qword_501D628, qword_501D548), extra pipeline positions, an OCG disable flag, and NVVMPassOptions integration. | JumpThreading Deep Dive |
lib/Transforms/Scalar/LICM.cpp | config-only | Stock IR-level LICM. No NVIDIA-specific knobs in the IR pass registration. disable-machine-licm and disable-postra-machine-licm are upstream knobs. | LICM |
lib/Transforms/Scalar/DeadStoreElimination.cpp | algorithmic | 91 KB across three functions. Partial store forwarding with type conversion, cross-store dependency tracking, store-chain decomposition for aggregates, native CUDA vector type (float4, etc.) overwrite recognition. | DSE |
lib/CodeGen/CodeGenPrepare.cpp | none | All 20+ upstream cl::opt knobs preserved unchanged. | CodeGenPrepare |
lib/Transforms/Scalar/LoopStrengthReduce.cpp | proprietary fork | Stock LLVM LSR present at 0x284F650–0x287C150, but a separate 160 KB NVIDIA formula solver (sub_19A87A0, 2688 lines, wrapped by NVLoopStrengthReduce at sub_19CE990) replaces formula generation and selection. 7-phase pipeline with 11 GPU-aware knobs. SCEV infrastructure and IV rewriting reused. | LSR (NVIDIA) |
lib/Transforms/Scalar/StructurizeCFG.cpp | proprietary fork | Stock AMDGPU copy at sub_1F0EBC0 shipped for linkage, but the pipeline uses NVPTX-customized copy at sub_35CC920 (95 KB, 2,397 lines). Irreducibility detector, uniform-branch classifier, region structurizer core, edge-reroute, and domtree NCA walk are all separate symbols from the AMDGPU original. | StructurizeCFG |
lib/Transforms/Scalar/SCEV-CGP.cpp (NVIDIA-only) | proprietary fork | Has no upstream counterpart. Disabled by default (nv-disable-scev-cgp = true); when enabled, SCEV-driven GEP/IV restructuring runs before CodeGenPrepare. | CodeGenPrepare |
Transforms / Loop
| Upstream file | Patch class | NVIDIA delta summary | Wiki page |
|---|---|---|---|
lib/Transforms/Utils/LoopSimplify.cpp | none | Stock LLVM canonicalization (single preheader, single backedge, dedicated exits). | Standard Loop Passes, Loop Passes Overview |
lib/Transforms/Utils/LCSSA.cpp | none | Stock LLVM. | Standard Loop Passes |
lib/Transforms/Scalar/LoopRotation.cpp | none | Stock LLVM. Appears in the pipeline multiple times (canonicalization prerequisite for LICM and unroll). | Standard Loop Passes |
lib/Transforms/Scalar/IndVarSimplify.cpp | config-only | Three NVIDIA knobs that materially change LFTR / IV-widening behavior on GPU code. Algorithm itself unchanged. | Standard Loop Passes |
lib/Transforms/Scalar/LoopIdiomRecognize.cpp | none | Stock memcpy/memset/mismatch recognition. | Standard Loop Passes |
lib/Transforms/Scalar/LoopInterchange.cpp | config-only | Stock algorithm with threshold overrides. | Standard Loop Passes |
lib/Transforms/Scalar/LoopDistribute.cpp | config-only | Stock algorithm with threshold overrides. | Standard Loop Passes |
lib/Transforms/Scalar/IRCE.cpp | none | Stock LLVM Inductive Range Check Elimination. | Standard Loop Passes |
lib/Transforms/Scalar/LoopDeletion.cpp | none | Stock LLVM. | Standard Loop Passes |
lib/Transforms/Scalar/LoopSink.cpp | none | Stock LLVM. | Standard Loop Passes |
lib/Transforms/Utils/LoopUnroll.cpp (transformation engine) | config-only | Lightly modified upstream llvm::UnrollLoop. Bulk of the NVIDIA delta lives in the decision engine, not here. | Loop Unrolling |
lib/Transforms/Scalar/LoopUnrollPass.cpp (decision engine) | algorithmic | computeUnrollCount substantially reworked: priority-based decision cascade, local-array threshold multiplier, power-of-two factor enforcement, pragma threshold ~200x stock. Runs twice in the pipeline (sub_197E720 / sub_19C1680, the latter gated by opts[1360]). | Loop Unrolling |
lib/Transforms/Vectorize/LoopVectorize.cpp | algorithmic | NVPTX's TTI::getRegisterBitWidth() returns 32, which would force VF=1 under stock LLVM; NVIDIA overrides the VF computation to produce v2/v4 vectorization that maps onto packed 32-bit register pairs and PTX ld.v2/ld.v4. VPlan kept. | LoopVectorize & VPlan |
lib/Transforms/Vectorize/SLPVectorizer.cpp | algorithmic | Stock SLP algorithm; the divergence is in the proprietary NVPTX TTI implementation that this pass queries for cost. TTI differs significantly from upstream open-source NVPTX. | SLP Vectorizer |
CodeGen / Machine-Level (Target-Independent)
| Upstream file | Patch class | NVIDIA delta summary | Wiki page |
|---|---|---|---|
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | none | Target-independent infrastructure at 0xF05000–0xF70000 is stock LLVM 20 with no detected modifications. | SelectionDAG |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | none | Generic combiner stock; NVPTX-specific combines hosted in NVPTXISelLowering (see target section). | SelectionDAG |
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | config-only | Stock algorithm; legality tables driven by NVPTX target hooks. 137 KB action-dispatch function (sub_1FFB890) has 967 cases due to the NVPTX opcode count, not source modification. | SelectionDAG |
lib/CodeGen/SelectionDAG/LegalizeTypes.cpp (+ Integer/Float/Vector) | algorithmic | Four upstream files collapsed into a single 348 KB monolithic function (sub_20019C0) — either LTO inlining artifact or deliberate I-cache locality. Functional behavior faithful to upstream DAGTypeLegalizer. | Type Legalization |
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp | algorithmic | Per-function cost table, priority-queue topological worklist, three-level dispatch (sub_3090F90 → sub_308FEE0 → sub_347A8D0 / sub_348D3E0). Iteration budget guards pathological DAGs. | ISel Pattern Matching |
lib/CodeGen/SelectionDAG/InstrEmitter.cpp | algorithmic | Dedicated CopyToReg handler for the .param ABI, triple-vtable dispatch gating GPU pseudo-expansion, extended MachineInstr flag at bit 36 (0x1000000000) absent from stock LLVM. | InstrEmitter |
lib/CodeGen/TwoAddressInstructionPass.cpp | algorithmic | Structurally stock LLVM (the libNVVM build at sub_F4EA80 is byte-for-byte identical). Four additions: extended EXTRACT_SUBREG handling for multi-register results, deeper LiveVariables maintenance, OptRemark integration, unconditional post-pass verifier. | TwoAddressInstruction |
lib/CodeGen/MachineScheduler.cpp | config-only | Stock ScheduleDAGMILive. NVPTX provides MRPA incremental pressure tracker and Texture Group Merge as separate NVIDIA-only passes (see Targets table). | Instruction Scheduling |
lib/CodeGen/MachinePipeliner.cpp | none | Stock Swing Modulo Scheduler. | Instruction Scheduling |
lib/CodeGen/LiveRangeCalc.cpp | algorithmic | Same updateSSA algorithm as upstream, but a global fast-compile flag (qword_5025F68) bypasses the entire dataflow loop — no upstream equivalent. Likely wired to -Ofast-compile / -O0. | LiveRangeCalc |
lib/CodeGen/RegisterCoalescer.cpp | none | Stock worklist-driven coalescer at sub_2F71140. Handles the generic COPY pseudo. The NVPTX-specific coalescer is a separate NVIDIA pass (see Targets). | Register Coalescing |
lib/CodeGen/RegAllocGreedy.cpp (+ SplitKit, LiveRangeEdit) | algorithmic | Two complete copies of RAGreedy (legacy PM at 0x1EC0400, new PM at 0x2F4C2E0). Pressure-driven allocation, -maxreg ceiling, occupancy-aware rematerialization layered via TTI hooks and custom knobs. | Register Allocation |
lib/CodeGen/PrologEpilogInserter.cpp | algorithmic | Ten-phase monolithic implementation. Significantly more sophisticated than upstream's linear scan; tied to PTX .local frame layout and the NVPTX-specific nvptx-prolog-epilog machine pass. | PrologEpilogInserter |
lib/CodeGen/BranchFolding.cpp | algorithmic | Critical divergence: cicc removes the requiresStructuredCFG() gate that upstream uses to disable tail merging on GPU targets, and adds a reserved-register merge safety check absent from any upstream version. | BranchFolding & TailMerge |
lib/CodeGen/MachineBlockPlacement.cpp | algorithmic | Two instances: stock LLVM copy for internal use plus an NVPTX-pipeline copy at sub_3521FF0. The NVPTX instance queries a divergence flag on the MachineFunction for tail-duplication profitability and adds an alternative layout proposal path (sub_34BEDF0 / sub_34C7080) absent upstream. | Block Placement |
lib/CodeGen/MachineOutliner.cpp | none | Byte-for-byte upstream. NVPTX delta is limited to two target hooks (NVPTXInstrInfo::getOutliningType and the calling convention 95 assignment) and an activation policy in NVPTXPassConfig::addMachineOutliner. | MachineOutliner |
Target / NVPTX
| Upstream file | Patch class | NVIDIA delta summary | Wiki page |
|---|---|---|---|
lib/Target/NVPTX/NVPTXISelLowering.cpp | proprietary fork | 20 KB LowerOperation dispatcher (sub_32E3060, 4,315 insns), 19 KB LowerCall (sub_3040BF0, 3,856 insns) implementing the PTX .param-space calling convention, 60 KB intrinsic-lowering mega-switch (sub_33B0210, 13,121 insns) covering 785 contiguous Intrinsic::ID values 0--0x310. NVPTX-side DAG combine (sub_3425710, 23 KB) and computeKnownBits (sub_33D4EF0, 28 KB). | SelectionDAG |
lib/Target/NVPTX/NVPTXISelLowering.h (NVPTXISD enum) | proprietary fork | 460 distinct NVPTXISD::* opcodes — roughly 15x upstream's ~30. 372/460 are the texture/surface family; the rest cover SM90+ load/store variants, the four call-flavor matrix, funnel-shift-with-clamp, bitfield extract/insert, and the .param calling convention. | NVPTXISD Opcodes |
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | algorithmic | Three-level NVPTX-specific select dispatch (sub_3090F90 driver at 12 KB, sub_347A8D0 hand-written switch at 50 KB, sub_348D3E0 TableGen-generated SelectCode at 26 KB), plus six sub-selectors for memory, texture/surface, complex addressing, vector patterns, atomics. Compressed per-SM legality table gates which opcodes exist per architecture. | ISel Pattern Matching |
lib/Target/NVPTX/NVPTXInstrInfo.td (TableGen) | proprietary fork | NVPTX patterns expanded to cover MMA/tensor-core families, surface/texture intrinsics, and SM-version-gated instructions. The TableGen-generated matcher feeds into sub_348D3E0. | ISel Pattern Matching |
lib/Target/NVPTX/NVPTXTargetMachine.cpp | algorithmic | registerPassBuilderCallbacks injects the 35 NVIDIA pass classes (33 StringMap rows) into the New PM extension table. NVPTXPassConfig::addMachineOutliner controls outliner activation. See canonical count. | Pipeline & Ordering |
| MMA codegen (NVIDIA-only) | proprietary fork | No upstream MMA in NVPTXISelLowering.cpp. Full tensor-core pipeline spanning Volta through Blackwell (HMMA/IMMA/BMMA), SM90 WGMMA, SM100 tcgen05. Two parallel lowering paths (sub_955A70 NVVM builtin + sub_33B0210 SelectionDAG intrinsic) converging at a common PTX string builder driven by a packed 64-bit descriptor. | Tensor / MMA Codegen |
| NVPTX-specific Register Coalescer (NVIDIA-only) | proprietary fork | Second coalescer at sub_34AF4A0 runs after the stock RegisterCoalescer. Handles NVPTX pseudo-COPYs (typed register-class boundaries: %r ↔ %rd, %f ↔ %fd) that the generic algorithm cannot reason about. | Register Coalescing |
| MRPA / Texture Group Merge (NVIDIA-only) | proprietary fork | Two scheduler-adjacent passes absent from upstream: MRPA tracks incremental register pressure for the scheduler's reorder decisions; Texture Group Merge clusters texture fetches to maximize TEX unit utilization. | Instruction Scheduling, Machine-Level Passes |
| Machine-pass umbrella | mixed | 51 of 64 registered MF passes are stock LLVM 20.0.0; 13 are NVIDIA-only. The NVIDIA additions cluster around four areas: PTX structurization (nvptx-lower-aggr-copies, nvptx-lower-args), pre-RA pressure shaping (MRPA, IV demotion, rematerialization), texture-group merging, and AsmPrinter glue (nvptx-prolog-epilog, nvptx-proxy-reg-erasure). | Machine-Level Passes |
Summary Counts
| Class | Count |
|---|---|
| none (stock LLVM) | 18 |
| config-only | 9 |
| algorithmic | 19 |
| proprietary fork | 9 |
| mixed | 1 |
| total rows | 56 |
(Counts include both upstream files and the small set of NVIDIA-only entries documented under their nearest-upstream sibling.)
Quirks of the Classification
QUIRK -- "Stock" sometimes means two copies side-by-side. Several passes that classify as
noneorconfig-only(notablyStructurizeCFG,MachineBlockPlacement,RegisterCoalescer, andLoopStrengthReduce) ship as two distinct symbols in the binary: the upstream copy that arrives from linking againstlibLLVMCodeGen/libLLVMScalarOptsand a separate NVIDIA copy that the cicc pipeline actually invokes. The unused upstream copy is dead code from the pipeline's perspective but cannot be stripped because other components in the binary still link the upstream symbol. Reimplementers should not be misled by sizeof-binary into thinking these are heavily modified — the modification is in which copy gets scheduled, not in the algorithm of either copy.
QUIRK -- LTO inlining inflates apparent patch class.
LegalizeTypesis labeledalgorithmicbecause all four upstream files (LegalizeTypes.cpp,LegalizeIntegerTypes.cpp,LegalizeFloatTypes.cpp,LegalizeVectorTypes.cpp) collapse into a single 348 KB function. The functional behavior is a faithful reproduction of upstreamDAGTypeLegalizer; the structural shape is the only delta. Likewise,InstCombine's ~15 upstream files inline into one 405 KB visitor. A faithful reimplementation can keep the upstream file split — the monolithic shape is a build-system artifact, not a behavioral requirement.
QUIRK -- "config-only" hides materially different schedules. A pass classified
config-only(e.g. SROA, LICM) may still produce dramatically different output from a stock LLVM 20 invocation because cicc runs it more than once and at different pipeline positions. SROA runs twice (early after NVVMReflect, late after sinking); LICM runs early (hoist) plus late (sink); LoopRotate appears at several canonicalization points. The pass binary is upstream, but the scheduled effect is GPU-tuned. The full ordering lives in Pipeline & Ordering.
How to Use This Catalog
For a reimplementer porting cicc to an open LLVM base, the patch class maps directly to the work effort:
- none rows: link the matching upstream
.cppunchanged. Re-apply scheduling from Pipeline & Ordering. - config-only rows: link upstream and re-register the cicc knobs and their defaults. Wire into the same pipeline slots.
- algorithmic rows: start from upstream, then port the NVIDIA-specific extensions documented on the linked wiki page. Expect roughly one fork patch per row.
- proprietary fork rows: build from the wiki deep dive. Upstream provides only the registration interface and surrounding infrastructure; the algorithm itself comes from the binary analysis.
For the inverse view — pipeline slot to pass list — see Pipeline & Ordering. For NVIDIA-only passes that do not appear in this catalog at all (because they have no upstream counterpart even at the file level), see NVIDIA Custom Passes.
Suggested Porting Order
A reimplementation that follows the binary's optimization order rather than the upstream LLVM source order tends to surface fewer surprises:
- Start with the none rows — link upstream, confirm the cicc pipeline assembler will accept them, and lock in the registration names.
- Layer in the config-only rows. The knob defaults live in the cicc constructors documented on each deep-dive page; mismatched defaults are the most common cause of "the pass runs but the output is different" bugs during porting.
- Port the algorithmic rows next, one wiki page at a time. The deep-dive pages enumerate the NVIDIA additions as discrete patches against the upstream
.cpp; each can be applied and tested independently. - Tackle the proprietary fork rows last. These have the longest implementation tail (LSR alone is 2,688 decompiled lines; the SelectionDAG intrinsic switch is 9,518) and benefit from the surrounding infrastructure being already in place when their tests run.
Where the Catalog is Deliberately Incomplete
This catalog covers the LLVM-derived components only. Three classes of cicc functionality are intentionally absent:
- NVIDIA-only IR passes (Memory Space Opt, NVVM Reflect, Printf Lowering, the process-restrict family, etc.) live under NVIDIA Custom Passes. They have no upstream file to point at.
- The EDG frontend that feeds the LLVM pipeline (NVVM IR generation, builtin lowering at
sub_955A70) is documented under Pipeline Overview and is not derived from LLVM at all. - PTX emission (the AsmPrinter at
sub_21E74C0and the surrounding text-writer infrastructure) is heavily NVIDIA-customized but classifying it row-by-row would require enumerating every PTX directive; the Pipeline / Emission page covers it as one unit.
Together this catalog plus the NVIDIA Custom Passes Overview cover the full optimization-and-codegen surface of cicc v13.0.