Hash Table and Collection Infrastructure
Every associative container in cicc v13.0 is built from the same handful of primitives: a pointer-hash DenseMap/DenseSet with quadratic probing, a wyhash-v4-family string hasher, and a SmallVector with inline buffer optimization. Before this page existed, the same hash table description was duplicated across 30+ wiki pages. This is the single source of truth. If you are reimplementing cicc's data structures, start here.
There are no NVIDIA-specific modifications to the DenseMap hashing or probing logic — cicc links the LLVM 20.0.0 implementation unmodified. The only NVIDIA-original hash infrastructure is the wyhash-v4 string hasher used for the builtin name table.
DenseMap Layout
Two variants exist, distinguished by bucket stride. Both share the same 28-byte inline header, the same hash function, the same probing sequence, the same sentinel values, and the same growth policy. The header is always embedded directly inside a larger structure (context object, analysis result, pass state) — never heap-allocated on its own.
Variant A — DenseSet (8 bytes/bucket)
| Offset | Size | Type | Field |
|---|---|---|---|
| +0 | 8 | uint64_t | NumEntries |
| +8 | 8 | ptr | Buckets (heap-allocated array) |
| +16 | 4 | uint32_t | NumItems (live entries) |
| +20 | 4 | uint32_t | NumTombstones |
| +24 | 4 | uint32_t | NumBuckets (always power of 2) |
Bucket array size: NumBuckets * 8 bytes. Each bucket holds either a valid pointer, an empty sentinel, or a tombstone sentinel.
Variant B — DenseMap (16 bytes/bucket)
Same 28-byte header. Each bucket holds a key-value pair at a 16-byte stride:
v30 = (_QWORD *)(buckets + 16LL * slot); // sub_163D530 line 561
*v30 = key; // +0: key
v30[1] = value; // +8: value
Variant B is used by the SelectionDAG builder (context offsets +120 and +152), the NVVM IR node uniquing tables, and any subsystem that maps pointers to pointers.
Where the Variants Appear
| Subsystem | Variant | Context offset | Purpose |
|---|---|---|---|
NVVM IR uniquing (sub_162D4F0) | B (16B) | context qw[130..178] | Node deduplication per opcode |
SelectionDAG builder (sub_163D530) | B (16B) | +120, +152 | Node mapping |
SelectionDAG builder (sub_163D530) | A (8B) | +184 | Worklist set |
| Per-node analysis structures | A (8B) | +72 inside v381 | Visited set |
CSSA PHI map (sub_3720740) | B (16B) | r15+0x60 | PHI-to-ID mapping |
| Coroutine spill tracking | B (16B) | +0x18 inline | Spill/reload tracking |
| Builtin name table | custom (12B stride) | context+480 | Name-to-ID with hash cache |
Pointer Hash Function
Every DenseMap/DenseSet instance in cicc that uses pointer keys employs the same hash:
hash(ptr) = (ptr >> 9) ^ (ptr >> 4)
This is LLVM's DenseMapInfo<void*>::getHashValue, unchanged. The right-shift by 4 discards the low bits that are always zero due to 8- or 16-byte alignment. The right-shift by 9 mixes in higher-order address bits to break up the stride patterns that arise from slab allocation (where consecutive objects are separated by a fixed power-of-two). The XOR combines these two views of the pointer into a single hash value that distributes well for both heap-allocated and slab-allocated objects.
Representative decompiled evidence (appears identically in dozens of functions):
v9 = (v12 - 1) & (((unsigned int)v11 >> 9) ^ ((unsigned int)v11 >> 4));
Integer-Key Hash Variant
A separate hash function is used for DenseMap<unsigned, T> instances (integer keys rather than pointers):
hash(key) = key * 37
This is LLVM's DenseMapInfo<unsigned>::getHashValue. It appears in the instruction emitter (sub_2E29BA0), the two-address pass (sub_1F4E3A0), the vector legalization tables, and the SelectionDAG instruction selection cost table (sub_3090F90). Integer-key maps use a different sentinel pair: 0xFFFFFFFF (empty) and 0xFFFFFFFE (tombstone).
wyhash v4 String Hasher — sub_CBF760
The NVVM builtin name table uses a separate, NVIDIA-original hash function for string keys. sub_C92610 is a thin wrapper that tail-calls sub_CBF760. The function dispatches on input length into six code paths, each using different constant sets and mixing strategies:
Length Dispatch Table
| Length | Strategy | Constants |
|---|---|---|
| 0 | Return constant | 0x2D06800538D394C2 |
| 1–3 | 3-byte read + XOR + multiply | seed 0x87275A9B, mul 0xC2B2AE3D27D4EB4F, avalanche 0x165667B19E3779F9 |
| 4–8 | 2x uint32 + combine + rotate | XOR 0xC73AB174C5ECD5A2, mul 0x9FB21C651E98DF25 |
| 9–16 | 2x uint64 + 128-bit multiply | XOR 0x6782737BEA4239B9 / 0xAF56BC3B0996523A, avalanche 0x165667919E3779F9 |
| 17–128 | Paired 16B reads from both ends | Per-pair constants, 128-bit multiplies, length mixed with 0x61C8864E7A143579 |
| 129–240 | Extended mixing | Delegates to sub_CBF370 |
| 240+ | Bulk processing | Delegates to sub_CBF100 |
Pseudocode (length 1–3, the most common case for short builtins)
#![allow(unused)] fn main() { fn wyhash_short(data: &[u8], len: usize) -> u32 { let a = data[0] as u64; let b = data[len / 2] as u64; let c = data[len - 1] as u64; let combined = a | (b << 8) | (c << 16) | (len as u64) << 24; let mixed = combined ^ 0x87275A9B; let wide = mixed.wrapping_mul(0xC2B2AE3D27D4EB4F); let folded = wide ^ (wide >> 32); let result = folded.wrapping_mul(0x165667B19E3779F9); (result ^ (result >> 32)) as u32 } }
Pseudocode (length 17–128, covering most __nvvm_* names)
#![allow(unused)] fn main() { fn wyhash_medium(data: &[u8], len: usize) -> u32 { let pairs = [ (0x1CAD21F72C81017C, 0xBE4BA423396CFEB8), // pair 0 (0x1F67B3B7A4A44072, 0xDB979083E96DD4DE), // pair 1 (0x2172FFCC7DD05A82, 0x78E5C0CC4EE679CB), // pair 2 // ... additional pairs for 64/96/128 thresholds ]; let (mut v8, mut v10) = (0u64, 0u64); // read 16 bytes from front, 16 from back, mix with pair constants for i in 0..((len + 15) / 32) { let front = read_u128(&data[i * 16..]); let back = read_u128(&data[len - (i + 1) * 16..]); (v8, v10) = mix_128(v8, v10, front, back, pairs[i]); } let combined = v8 ^ v10 ^ (len as u64 ^ 0x61C8864E7A143579); let result = 0x165667919E3779F9u64.wrapping_mul(combined ^ (combined >> 37)); (result ^ (result >> 32)) as u32 } }
The final return value is always a uint32 — the high dword of the 64-bit result XORed with the low dword. Most NVVM builtin names are 8–35 bytes, hitting the optimal 4–8 and 9–16 and 17–128 paths.
Probing Strategy
All DenseMap instances use quadratic probing with triangular-number increments:
slot = hash & (capacity - 1) // initial probe
step = 1
loop:
if bucket[slot] == key -> found
if bucket[slot] == EMPTY -> not found (insert here)
if bucket[slot] == TOMBSTONE -> record for reuse
slot = (slot + step) & (capacity - 1)
step++
The probe sequence for initial position h visits:
h, h+1, h+3, h+6, h+10, h+15, h+21, ...
h + T(k) where T(k) = k*(k+1)/2 (triangular numbers)
This guarantees that for a power-of-2 table size n, all n slots are visited before any index repeats. The proof relies on the fact that the differences T(k+1) - T(k) = k+1 produce all residues modulo n when n is a power of 2.
Comparison Guard (Builtin Table)
The builtin name hash table (sub_C92740, sub_C92860) adds a triple comparison guard before performing the expensive memcmp:
- Cached hash equality:
hash_cache[slot] == search_hash - Length equality:
entry->length == search_length - Content equality:
memcmp(search_data, entry->string_data, length) == 0
The hash cache is stored in a separate array immediately after the bucket array and the end-of-table sentinel. This layout avoids polluting bucket cache lines with hash values that are only needed on collision.
Probing Label: "Linear" vs "Quadratic"
Some analysis reports describe the probing as "linear" because the step variable increments by 1 each iteration. The actual probe position advances quadratically (by accumulating triangular numbers). Both descriptions refer to the same code. This page uses the technically precise term: quadratic probing with triangular numbers.
Growth Policy
Load Factor Threshold — 75%
After every successful insertion, the map checks whether to grow:
if (4 * (NumItems + 1) >= 3 * NumBuckets)
// load factor > 75% -> double capacity
new_capacity = 2 * NumBuckets
Tombstone Compaction — 12.5%
If the load factor is acceptable but tombstones have accumulated:
elif (NumBuckets - NumTombstones - NumItems <= NumBuckets >> 3)
// fewer than 12.5% of slots are truly empty
// rehash at same capacity to clear tombstones
new_capacity = NumBuckets
Rehash Procedure — sub_C929D0
calloc(new_capacity + 1, bucket_stride)for the new array.- Write the end-of-table sentinel at position
new_capacity. - For each live (non-empty, non-tombstone) entry in the old table, reinsert into the new table using quadratic probing.
- Copy the cached hash (if the table has a hash cache).
- Track the new position of a "current slot" pointer so the caller can continue using the entry it just inserted.
- Free the old array.
- Reset
NumTombstonesto 0. - Update
NumBucketstonew_capacity. - Return the new position of the tracked slot.
Capacity Constraints
- Power of 2: always. Enforced by the bit-smearing pattern:
x |= x>>1; x |= x>>2; x |= x>>4; ...; x += 1. - Minimum: 64 buckets for standard DenseMap instances. The builtin name table starts at 16 and grows through
16 -> 32 -> 64 -> 128 -> 256 -> 512 -> 1024as its 770 entries are inserted. - Allocation:
sub_22077B0(operatornew[]), freed viaj___libc_free_0.
Sentinel Values
Two sentinel families exist, distinguished by magnitude. Both are chosen to be impossible values for aligned pointers.
NVVM-Layer Sentinels (small magnitude)
Used by the NVVM IR uniquing tables, the SelectionDAG builder maps, and the builtin name table:
| Role | Value | Hex | Why safe |
|---|---|---|---|
| Empty | -8 | 0xFFFFFFFFFFFFFFF8 | Low 3 bits = 0b000 after masking, but no 8-byte-aligned pointer is this close to (uint64_t)-1 |
| Tombstone | -16 | 0xFFFFFFFFFFFFFFF0 | Same reasoning, distinct from -8 |
The builtin name table also uses a value of 2 as an end-of-table sentinel placed at bucket_array[capacity].
LLVM-Layer Sentinels (large magnitude)
Used by the majority of LLVM pass infrastructure — SCEV, register coalescing, block placement, SLP vectorizer, StructurizeCFG, machine pipeliner, prolog-epilog, and others:
| Role | Value | Hex | Decimal |
|---|---|---|---|
| Empty | 0xFFFFFFFFFFFFF000 | -4096 | -4096 |
| Tombstone | 0xFFFFFFFFFFFFE000 | -8192 | -8192 |
Integer-Key Sentinels
Used by DenseMap<unsigned, T> instances (instruction emitter, two-address pass):
| Role | Value | Hex |
|---|---|---|
| Empty | 0xFFFFFFFF | 32-bit all-ones |
| Tombstone | 0xFFFFFFFE | 32-bit all-ones minus 1 |
Which Sentinel Set to Expect
| Subsystem | Sentinel pair |
|---|---|
| NVVM IR uniquing, SelectionDAG builder | -8 / -16 |
| Builtin name table | -8 (tombstone), 0 (empty), 2 (end marker) |
| SCEV, block placement, SLP vectorizer | -4096 / -8192 |
| Register coalescing, machine pipeliner | -4096 / -8192 |
| StructurizeCFG, prolog-epilog | -4096 / -8192 |
| Instruction emitter, two-address | 0xFFFFFFFF / 0xFFFFFFFE |
| Coroutine spill tracking | 0xFFFFFFFFF000 / 0xFFFFFFFFE000 |
| CSSA PHI map | 0xFFFFFFFFF000 / 0xFFFFFFFFE000 |
| Debug verify | 0xFFFFFFFFF000 / 0xFFFFFFFFE000 |
| LazyCallGraph | 0xFFFFFFFFF000 / 0xFFFFFFFFE000 |
The -8/-16 pair appears exclusively in NVVM-layer (NVIDIA-original) code. The -4096/-8192 pair is the standard LLVM DenseMapInfo<void*> sentinel set. The difference is cosmetic — both pairs are safe for the same reasons — but it reveals code provenance: if you see -8/-16, the code was written or heavily modified by NVIDIA; if you see -4096/-8192, it is stock LLVM.
SmallVector Pattern
SmallVector is the universal dynamic array throughout cicc, with two growth implementations:
Layout
[BeginPtr, Size:Count:Capacity, InlineData...]
| Offset | Size | Field |
|---|---|---|
| +0 | 8 | data_ptr (points to inline buffer initially, heap after growth) |
| +8 | 4 | size (live element count) |
| +12 | 4 | capacity (allocated slots) |
| +16 | N | Inline buffer (N = InlineCapacity * element_size) |
When size == capacity on insertion, the vector grows.
Growth Functions
| Function | Address | Description |
|---|---|---|
SmallVector::grow | sub_C8D5F0 | Generic growth — copies elements, used for non-POD types |
SmallVectorBase::grow_pod | sub_C8D7D0 | POD-optimized growth — uses realloc when buffer is heap-allocated |
SmallVector::grow (MIR) | sub_16CD150 | Second copy in the MachineIR address range, identical logic |
SmallVector::grow (extended) | sub_C8E1E0 | Larger variant (11KB), handles edge cases |
Growth Policy
The standard LLVM SmallVector growth: double the current capacity, with a minimum of 1. If the current buffer is the inline buffer, malloc a new heap buffer and memcpy the contents. If the buffer is already on the heap, realloc it (for POD types) or malloc + copy + free (for non-POD types).
new_capacity = max(2 * old_capacity, required_capacity)
if (data_ptr == &inline_buffer)
heap_buf = malloc(new_capacity * elem_size)
memcpy(heap_buf, inline_buffer, size * elem_size)
else
// POD: heap_buf = realloc(data_ptr, new_capacity * elem_size)
// non-POD: heap_buf = malloc(...); copy; free(old)
data_ptr = heap_buf
capacity = new_capacity
Common Inline Capacities
Observed across the codebase:
| Inline capacity | Element size | Total inline bytes | Typical use |
|---|---|---|---|
| 2 | 8 | 16 | SCEV delinearization terms |
| 4 | 8 | 32 | LazyCallGraph SCC lists, basic block worklists |
| 8 | 8 | 64 | NVVMReflect call collection, PHI operand lists |
| 16 | 8 | 128 | AA evaluation pointer sets |
| 22 | 8 | 176 | Printf argument arrays (stack-allocated) |
| 8 | 56 | 448 | SROA slice descriptors |
Builtin Name Table — Specialized Hash Table
The builtin name table at context+480 is a specialized variant that does not use the standard DenseMap layout. It stores string entries rather than pointers, includes a parallel hash cache, and uses the wyhash function instead of the pointer hash.
Table Structure (20 bytes)
| Offset | Size | Field |
|---|---|---|
| +0 | 8 | bucket_array_ptr |
| +8 | 4 | capacity (power of 2) |
| +12 | 4 | count (live entries) |
| +16 | 4 | tombstone_count |
Memory Layout
[0 .. 8*cap-1] bucket_array: cap QWORD pointers
[8*cap .. 8*cap+7] sentinel: value 2 (end-of-table)
[8*cap+8 .. 8*cap+8+4*cap-1] hash_cache: uint32 per slot
String Entry (heap-allocated via sub_C7D670)
| Offset | Size | Field |
|---|---|---|
| +0 | 8 | string_length |
| +8 | 4 | builtin_id (set after insertion) |
| +16 | N+1 | Null-terminated string data |
Total allocation: length + 17 bytes, 8-byte aligned. The string data offset (16) is stored at hashtable+20 for use during comparison.
See Builtins for the complete 770-entry builtin ID inventory.
Usage Across the Compiler
Subsystems Using DenseMap (pointer hash, -8/-16 sentinels)
- NVVM IR uniquing (
sub_162D4F0): 8+ DenseMap instances in the NVVM context object, one per opcode range (0x04–0x1F). Tables at fixed qword-indexed offsets, spaced 32 bytes apart. - SelectionDAG builder (
sub_163D530): Three maps at context offsets +120, +152, +184. Map A and B are 16-byte-stride (key-value), Set C is 8-byte-stride (keys only). - Per-node analysis structures: Embedded DenseSet at +72 within analysis objects created during DAG construction.
- Memory space optimization (
sub_1C6A6C0): DenseMap-style tables for address space tracking.
Subsystems Using DenseMap (pointer hash, -4096/-8192 sentinels)
- SCEV (
sub_F03CD0and family): Expression caching, range computation, back-edge taken count. - Register coalescing (
sub_1F2F8F0): Already-coalesced set, equivalence class map. - Block placement (
sub_2E3B720): Chain membership, tail-merge candidates. - SLP vectorizer (
sub_1ACCE50): AllOps and Scalars hash tables (32-byte entries). - StructurizeCFG (
sub_1B66CF0): Flow-block mapping, region membership. - Machine pipeliner (
sub_20C40D0): Schedule stage tracking. - CSSA (
sub_3720740): PHI-to-ID mapping. - Debug/verify (
sub_265D050): Instruction validation tables. - LazyCallGraph (
sub_D1A040): Edge membership, SCC identity.
Subsystems Using DenseMap (integer hash key * 37)
- Instruction emitter (
sub_2E1F350): Opcode-to-constraint mapping. Sentinels:0xFFFFFFFF/0xFFFFFFFE. - Two-address pass (
sub_1F4BFE0): TiedOperandMap (56-byte entries, 4 inline). EqClassMap. - Vector legalization (
sub_3302A00): Type-split record mapping. - SelectionDAG isel (
sub_3090F90): Argument cost table.
Subsystems Using wyhash (string keys)
- Builtin name table (
sub_90AEE0): 770 NVVM/CUDA builtin names. Uses the specialized 20-byte table header with hash cache. - This is the only known use of
sub_CBF760in cicc.
Key Functions
| Function | Address | Size | Role |
|---|---|---|---|
| DenseMap pointer hash | inline | — | (ptr >> 9) ^ (ptr >> 4) — always inlined |
| DenseMap integer hash | inline | — | key * 37 — always inlined |
| wyhash v4 | sub_CBF760 | ~4 KB | String hash, length-dispatched |
| wyhash wrapper | sub_C92610 | tiny | Tail-calls sub_CBF760 |
| Builtin insert-or-find | sub_C92740 | ~2 KB | Quadratic probe with hash cache |
| Builtin find-only | sub_C92860 | ~1 KB | Read-only variant of sub_C92740 |
| Builtin rehash | sub_C929D0 | ~1 KB | 75% load factor, tombstone compaction |
| Builtin table init | sub_C92620 | tiny | Creates 16-bucket initial table |
| SmallVector::grow | sub_C8D5F0 | ~2 KB | Generic element growth |
| SmallVectorBase::grow_pod | sub_C8D7D0 | ~5 KB | POD-optimized realloc growth |
| SmallVector::grow (MIR) | sub_16CD150 | ~2 KB | Duplicate in MachineIR range |
| SmallPtrSet::insertOrFind | sub_C9A3C0 | ~16 KB | Small pointer set with growth |
| DenseMap grow (LLVM passes) | varies per pass | — | Each pass has its own inlined or outlined rehash |
Cross-References
- Builtins — Hash Table and ID Inventory — complete 770-entry builtin table with wyhash usage
- DenseMap and Symbol Table Structures — original page (now a subset of this one, kept for EDG node layout)
- NVVM IR Node — NVVM context object with DenseMap uniquing tables
- CSSA — PHI hash map with -4096/-8192 sentinels
- Register Coalescing — integer-key and pointer-key hash map variants
- SLP Vectorizer — 32-byte-entry DenseMap with -4096/-8192 sentinels
- SCEV — SCEV expression caching with -4096/-8192 sentinels
- Instruction Emitter — integer-key hash with
key * 37