Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Hash Table and Collection Infrastructure

Every associative container in cicc v13.0 is built from the same handful of primitives: a pointer-hash DenseMap/DenseSet with quadratic probing, a wyhash-v4-family string hasher, and a SmallVector with inline buffer optimization. Before this page existed, the same hash table description was duplicated across 30+ wiki pages. This is the single source of truth. If you are reimplementing cicc's data structures, start here.

There are no NVIDIA-specific modifications to the DenseMap hashing or probing logic — cicc links the LLVM 20.0.0 implementation unmodified. The only NVIDIA-original hash infrastructure is the wyhash-v4 string hasher used for the builtin name table.

DenseMap Layout

Two variants exist, distinguished by bucket stride. Both share the same 28-byte inline header, the same hash function, the same probing sequence, the same sentinel values, and the same growth policy. The header is always embedded directly inside a larger structure (context object, analysis result, pass state) — never heap-allocated on its own.

Variant A — DenseSet (8 bytes/bucket)

OffsetSizeTypeField
+08uint64_tNumEntries
+88ptrBuckets (heap-allocated array)
+164uint32_tNumItems (live entries)
+204uint32_tNumTombstones
+244uint32_tNumBuckets (always power of 2)

Bucket array size: NumBuckets * 8 bytes. Each bucket holds either a valid pointer, an empty sentinel, or a tombstone sentinel.

Variant B — DenseMap (16 bytes/bucket)

Same 28-byte header. Each bucket holds a key-value pair at a 16-byte stride:

v30 = (_QWORD *)(buckets + 16LL * slot);   // sub_163D530 line 561
*v30 = key;                                  // +0: key
v30[1] = value;                              // +8: value

Variant B is used by the SelectionDAG builder (context offsets +120 and +152), the NVVM IR node uniquing tables, and any subsystem that maps pointers to pointers.

Where the Variants Appear

SubsystemVariantContext offsetPurpose
NVVM IR uniquing (sub_162D4F0)B (16B)context qw[130..178]Node deduplication per opcode
SelectionDAG builder (sub_163D530)B (16B)+120, +152Node mapping
SelectionDAG builder (sub_163D530)A (8B)+184Worklist set
Per-node analysis structuresA (8B)+72 inside v381Visited set
CSSA PHI map (sub_3720740)B (16B)r15+0x60PHI-to-ID mapping
Coroutine spill trackingB (16B)+0x18 inlineSpill/reload tracking
Builtin name tablecustom (12B stride)context+480Name-to-ID with hash cache

Pointer Hash Function

Every DenseMap/DenseSet instance in cicc that uses pointer keys employs the same hash:

hash(ptr) = (ptr >> 9) ^ (ptr >> 4)

This is LLVM's DenseMapInfo<void*>::getHashValue, unchanged. The right-shift by 4 discards the low bits that are always zero due to 8- or 16-byte alignment. The right-shift by 9 mixes in higher-order address bits to break up the stride patterns that arise from slab allocation (where consecutive objects are separated by a fixed power-of-two). The XOR combines these two views of the pointer into a single hash value that distributes well for both heap-allocated and slab-allocated objects.

Representative decompiled evidence (appears identically in dozens of functions):

v9 = (v12 - 1) & (((unsigned int)v11 >> 9) ^ ((unsigned int)v11 >> 4));

Integer-Key Hash Variant

A separate hash function is used for DenseMap<unsigned, T> instances (integer keys rather than pointers):

hash(key) = key * 37

This is LLVM's DenseMapInfo<unsigned>::getHashValue. It appears in the instruction emitter (sub_2E29BA0), the two-address pass (sub_1F4E3A0), the vector legalization tables, and the SelectionDAG instruction selection cost table (sub_3090F90). Integer-key maps use a different sentinel pair: 0xFFFFFFFF (empty) and 0xFFFFFFFE (tombstone).

wyhash v4 String Hasher — sub_CBF760

The NVVM builtin name table uses a separate, NVIDIA-original hash function for string keys. sub_C92610 is a thin wrapper that tail-calls sub_CBF760. The function dispatches on input length into six code paths, each using different constant sets and mixing strategies:

Length Dispatch Table

LengthStrategyConstants
0Return constant0x2D06800538D394C2
1–33-byte read + XOR + multiplyseed 0x87275A9B, mul 0xC2B2AE3D27D4EB4F, avalanche 0x165667B19E3779F9
4–82x uint32 + combine + rotateXOR 0xC73AB174C5ECD5A2, mul 0x9FB21C651E98DF25
9–162x uint64 + 128-bit multiplyXOR 0x6782737BEA4239B9 / 0xAF56BC3B0996523A, avalanche 0x165667919E3779F9
17–128Paired 16B reads from both endsPer-pair constants, 128-bit multiplies, length mixed with 0x61C8864E7A143579
129–240Extended mixingDelegates to sub_CBF370
240+Bulk processingDelegates to sub_CBF100

Pseudocode (length 1–3, the most common case for short builtins)

#![allow(unused)]
fn main() {
fn wyhash_short(data: &[u8], len: usize) -> u32 {
    let a = data[0] as u64;
    let b = data[len / 2] as u64;
    let c = data[len - 1] as u64;
    let combined = a | (b << 8) | (c << 16) | (len as u64) << 24;
    let mixed = combined ^ 0x87275A9B;
    let wide = mixed.wrapping_mul(0xC2B2AE3D27D4EB4F);
    let folded = wide ^ (wide >> 32);
    let result = folded.wrapping_mul(0x165667B19E3779F9);
    (result ^ (result >> 32)) as u32
}
}

Pseudocode (length 17–128, covering most __nvvm_* names)

#![allow(unused)]
fn main() {
fn wyhash_medium(data: &[u8], len: usize) -> u32 {
    let pairs = [
        (0x1CAD21F72C81017C, 0xBE4BA423396CFEB8),  // pair 0
        (0x1F67B3B7A4A44072, 0xDB979083E96DD4DE),  // pair 1
        (0x2172FFCC7DD05A82, 0x78E5C0CC4EE679CB),  // pair 2
        // ... additional pairs for 64/96/128 thresholds
    ];
    let (mut v8, mut v10) = (0u64, 0u64);
    // read 16 bytes from front, 16 from back, mix with pair constants
    for i in 0..((len + 15) / 32) {
        let front = read_u128(&data[i * 16..]);
        let back  = read_u128(&data[len - (i + 1) * 16..]);
        (v8, v10) = mix_128(v8, v10, front, back, pairs[i]);
    }
    let combined = v8 ^ v10 ^ (len as u64 ^ 0x61C8864E7A143579);
    let result = 0x165667919E3779F9u64.wrapping_mul(combined ^ (combined >> 37));
    (result ^ (result >> 32)) as u32
}
}

The final return value is always a uint32 — the high dword of the 64-bit result XORed with the low dword. Most NVVM builtin names are 8–35 bytes, hitting the optimal 4–8 and 9–16 and 17–128 paths.

Probing Strategy

All DenseMap instances use quadratic probing with triangular-number increments:

slot = hash & (capacity - 1)      // initial probe
step = 1
loop:
    if bucket[slot] == key   -> found
    if bucket[slot] == EMPTY -> not found (insert here)
    if bucket[slot] == TOMBSTONE -> record for reuse
    slot = (slot + step) & (capacity - 1)
    step++

The probe sequence for initial position h visits:

h, h+1, h+3, h+6, h+10, h+15, h+21, ...
h + T(k) where T(k) = k*(k+1)/2   (triangular numbers)

This guarantees that for a power-of-2 table size n, all n slots are visited before any index repeats. The proof relies on the fact that the differences T(k+1) - T(k) = k+1 produce all residues modulo n when n is a power of 2.

Comparison Guard (Builtin Table)

The builtin name hash table (sub_C92740, sub_C92860) adds a triple comparison guard before performing the expensive memcmp:

  1. Cached hash equality: hash_cache[slot] == search_hash
  2. Length equality: entry->length == search_length
  3. Content equality: memcmp(search_data, entry->string_data, length) == 0

The hash cache is stored in a separate array immediately after the bucket array and the end-of-table sentinel. This layout avoids polluting bucket cache lines with hash values that are only needed on collision.

Probing Label: "Linear" vs "Quadratic"

Some analysis reports describe the probing as "linear" because the step variable increments by 1 each iteration. The actual probe position advances quadratically (by accumulating triangular numbers). Both descriptions refer to the same code. This page uses the technically precise term: quadratic probing with triangular numbers.

Growth Policy

Load Factor Threshold — 75%

After every successful insertion, the map checks whether to grow:

if (4 * (NumItems + 1) >= 3 * NumBuckets)
    // load factor > 75% -> double capacity
    new_capacity = 2 * NumBuckets

Tombstone Compaction — 12.5%

If the load factor is acceptable but tombstones have accumulated:

elif (NumBuckets - NumTombstones - NumItems <= NumBuckets >> 3)
    // fewer than 12.5% of slots are truly empty
    // rehash at same capacity to clear tombstones
    new_capacity = NumBuckets

Rehash Procedure — sub_C929D0

  1. calloc(new_capacity + 1, bucket_stride) for the new array.
  2. Write the end-of-table sentinel at position new_capacity.
  3. For each live (non-empty, non-tombstone) entry in the old table, reinsert into the new table using quadratic probing.
  4. Copy the cached hash (if the table has a hash cache).
  5. Track the new position of a "current slot" pointer so the caller can continue using the entry it just inserted.
  6. Free the old array.
  7. Reset NumTombstones to 0.
  8. Update NumBuckets to new_capacity.
  9. Return the new position of the tracked slot.

Capacity Constraints

  • Power of 2: always. Enforced by the bit-smearing pattern: x |= x>>1; x |= x>>2; x |= x>>4; ...; x += 1.
  • Minimum: 64 buckets for standard DenseMap instances. The builtin name table starts at 16 and grows through 16 -> 32 -> 64 -> 128 -> 256 -> 512 -> 1024 as its 770 entries are inserted.
  • Allocation: sub_22077B0 (operator new[]), freed via j___libc_free_0.

Sentinel Values

Two sentinel families exist, distinguished by magnitude. Both are chosen to be impossible values for aligned pointers.

NVVM-Layer Sentinels (small magnitude)

Used by the NVVM IR uniquing tables, the SelectionDAG builder maps, and the builtin name table:

RoleValueHexWhy safe
Empty-80xFFFFFFFFFFFFFFF8Low 3 bits = 0b000 after masking, but no 8-byte-aligned pointer is this close to (uint64_t)-1
Tombstone-160xFFFFFFFFFFFFFFF0Same reasoning, distinct from -8

The builtin name table also uses a value of 2 as an end-of-table sentinel placed at bucket_array[capacity].

LLVM-Layer Sentinels (large magnitude)

Used by the majority of LLVM pass infrastructure — SCEV, register coalescing, block placement, SLP vectorizer, StructurizeCFG, machine pipeliner, prolog-epilog, and others:

RoleValueHexDecimal
Empty0xFFFFFFFFFFFFF000-4096-4096
Tombstone0xFFFFFFFFFFFFE000-8192-8192

Integer-Key Sentinels

Used by DenseMap<unsigned, T> instances (instruction emitter, two-address pass):

RoleValueHex
Empty0xFFFFFFFF32-bit all-ones
Tombstone0xFFFFFFFE32-bit all-ones minus 1

Which Sentinel Set to Expect

SubsystemSentinel pair
NVVM IR uniquing, SelectionDAG builder-8 / -16
Builtin name table-8 (tombstone), 0 (empty), 2 (end marker)
SCEV, block placement, SLP vectorizer-4096 / -8192
Register coalescing, machine pipeliner-4096 / -8192
StructurizeCFG, prolog-epilog-4096 / -8192
Instruction emitter, two-address0xFFFFFFFF / 0xFFFFFFFE
Coroutine spill tracking0xFFFFFFFFF000 / 0xFFFFFFFFE000
CSSA PHI map0xFFFFFFFFF000 / 0xFFFFFFFFE000
Debug verify0xFFFFFFFFF000 / 0xFFFFFFFFE000
LazyCallGraph0xFFFFFFFFF000 / 0xFFFFFFFFE000

The -8/-16 pair appears exclusively in NVVM-layer (NVIDIA-original) code. The -4096/-8192 pair is the standard LLVM DenseMapInfo<void*> sentinel set. The difference is cosmetic — both pairs are safe for the same reasons — but it reveals code provenance: if you see -8/-16, the code was written or heavily modified by NVIDIA; if you see -4096/-8192, it is stock LLVM.

SmallVector Pattern

SmallVector is the universal dynamic array throughout cicc, with two growth implementations:

Layout

[BeginPtr, Size:Count:Capacity, InlineData...]
OffsetSizeField
+08data_ptr (points to inline buffer initially, heap after growth)
+84size (live element count)
+124capacity (allocated slots)
+16NInline buffer (N = InlineCapacity * element_size)

When size == capacity on insertion, the vector grows.

Growth Functions

FunctionAddressDescription
SmallVector::growsub_C8D5F0Generic growth — copies elements, used for non-POD types
SmallVectorBase::grow_podsub_C8D7D0POD-optimized growth — uses realloc when buffer is heap-allocated
SmallVector::grow (MIR)sub_16CD150Second copy in the MachineIR address range, identical logic
SmallVector::grow (extended)sub_C8E1E0Larger variant (11KB), handles edge cases

Growth Policy

The standard LLVM SmallVector growth: double the current capacity, with a minimum of 1. If the current buffer is the inline buffer, malloc a new heap buffer and memcpy the contents. If the buffer is already on the heap, realloc it (for POD types) or malloc + copy + free (for non-POD types).

new_capacity = max(2 * old_capacity, required_capacity)
if (data_ptr == &inline_buffer)
    heap_buf = malloc(new_capacity * elem_size)
    memcpy(heap_buf, inline_buffer, size * elem_size)
else
    // POD: heap_buf = realloc(data_ptr, new_capacity * elem_size)
    // non-POD: heap_buf = malloc(...); copy; free(old)
data_ptr = heap_buf
capacity = new_capacity

Common Inline Capacities

Observed across the codebase:

Inline capacityElement sizeTotal inline bytesTypical use
2816SCEV delinearization terms
4832LazyCallGraph SCC lists, basic block worklists
8864NVVMReflect call collection, PHI operand lists
168128AA evaluation pointer sets
228176Printf argument arrays (stack-allocated)
856448SROA slice descriptors

Builtin Name Table — Specialized Hash Table

The builtin name table at context+480 is a specialized variant that does not use the standard DenseMap layout. It stores string entries rather than pointers, includes a parallel hash cache, and uses the wyhash function instead of the pointer hash.

Table Structure (20 bytes)

OffsetSizeField
+08bucket_array_ptr
+84capacity (power of 2)
+124count (live entries)
+164tombstone_count

Memory Layout

[0 .. 8*cap-1]                    bucket_array: cap QWORD pointers
[8*cap .. 8*cap+7]                sentinel: value 2 (end-of-table)
[8*cap+8 .. 8*cap+8+4*cap-1]     hash_cache: uint32 per slot

String Entry (heap-allocated via sub_C7D670)

OffsetSizeField
+08string_length
+84builtin_id (set after insertion)
+16N+1Null-terminated string data

Total allocation: length + 17 bytes, 8-byte aligned. The string data offset (16) is stored at hashtable+20 for use during comparison.

See Builtins for the complete 770-entry builtin ID inventory.

Usage Across the Compiler

Subsystems Using DenseMap (pointer hash, -8/-16 sentinels)

  • NVVM IR uniquing (sub_162D4F0): 8+ DenseMap instances in the NVVM context object, one per opcode range (0x04–0x1F). Tables at fixed qword-indexed offsets, spaced 32 bytes apart.
  • SelectionDAG builder (sub_163D530): Three maps at context offsets +120, +152, +184. Map A and B are 16-byte-stride (key-value), Set C is 8-byte-stride (keys only).
  • Per-node analysis structures: Embedded DenseSet at +72 within analysis objects created during DAG construction.
  • Memory space optimization (sub_1C6A6C0): DenseMap-style tables for address space tracking.

Subsystems Using DenseMap (pointer hash, -4096/-8192 sentinels)

  • SCEV (sub_F03CD0 and family): Expression caching, range computation, back-edge taken count.
  • Register coalescing (sub_1F2F8F0): Already-coalesced set, equivalence class map.
  • Block placement (sub_2E3B720): Chain membership, tail-merge candidates.
  • SLP vectorizer (sub_1ACCE50): AllOps and Scalars hash tables (32-byte entries).
  • StructurizeCFG (sub_1B66CF0): Flow-block mapping, region membership.
  • Machine pipeliner (sub_20C40D0): Schedule stage tracking.
  • CSSA (sub_3720740): PHI-to-ID mapping.
  • Debug/verify (sub_265D050): Instruction validation tables.
  • LazyCallGraph (sub_D1A040): Edge membership, SCC identity.

Subsystems Using DenseMap (integer hash key * 37)

  • Instruction emitter (sub_2E1F350): Opcode-to-constraint mapping. Sentinels: 0xFFFFFFFF / 0xFFFFFFFE.
  • Two-address pass (sub_1F4BFE0): TiedOperandMap (56-byte entries, 4 inline). EqClassMap.
  • Vector legalization (sub_3302A00): Type-split record mapping.
  • SelectionDAG isel (sub_3090F90): Argument cost table.

Subsystems Using wyhash (string keys)

  • Builtin name table (sub_90AEE0): 770 NVVM/CUDA builtin names. Uses the specialized 20-byte table header with hash cache.
  • This is the only known use of sub_CBF760 in cicc.

Key Functions

FunctionAddressSizeRole
DenseMap pointer hashinline(ptr >> 9) ^ (ptr >> 4) — always inlined
DenseMap integer hashinlinekey * 37 — always inlined
wyhash v4sub_CBF760~4 KBString hash, length-dispatched
wyhash wrappersub_C92610tinyTail-calls sub_CBF760
Builtin insert-or-findsub_C92740~2 KBQuadratic probe with hash cache
Builtin find-onlysub_C92860~1 KBRead-only variant of sub_C92740
Builtin rehashsub_C929D0~1 KB75% load factor, tombstone compaction
Builtin table initsub_C92620tinyCreates 16-bucket initial table
SmallVector::growsub_C8D5F0~2 KBGeneric element growth
SmallVectorBase::grow_podsub_C8D7D0~5 KBPOD-optimized realloc growth
SmallVector::grow (MIR)sub_16CD150~2 KBDuplicate in MachineIR range
SmallPtrSet::insertOrFindsub_C9A3C0~16 KBSmall pointer set with growth
DenseMap grow (LLVM passes)varies per passEach pass has its own inlined or outlined rehash

Cross-References