Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GlobalValue Flag Bits

Abstract

The tileiras binary inherits LLVM's 16-bit GlobalValue flag word. The word stores linkage, visibility, DLL storage class, thread-local mode, unnamed-address policy, and subclass data. NVIDIA reuses bit 14 as a fast marker for functions whose nvvm.annotations metadata has been transplanted into function attributes. This page covers the bit-level contract and the annotation kinds mirrored by the attribute infrastructure.

GlobalValue 16-bit flag word — LLVM standard low 14 bits

The word lives in LLVM's GlobalValue flag field. Bits 0..13 follow upstream LLVM; bit 14 is NVIDIA-repurposed; bit 15 is reserved.

BitsFieldNotes
0..3LinkageTypesvalues 0..15; InternalLinkage = 7. Cleared on kernels, set to internal on non-kernels.
4..5VisibilityTypesDefault=0, Hidden=1, Protected=2. Tested via 0x30 to gate the marker write.
6..7DLLStorageClassDefault / Import / Export. Preserved through 0xFCC0.
8..9ThreadLocalModeone of four TLS models. Preserved.
10..11UnnamedAddrNone / Local / Global. Preserved.
12HasLLVMReservedNameLLVM sentinel. Preserved.
13subclass-dataGlobalValue subclass slot. Preserved.
14NVIDIA-repurposedSee next section.
15reservedAlways zero in observed paths.

Bit 14 — NVIDIA-repurposed "nvvm.annotations_transplanted" marker

Bit 14 is the NVIDIA-private "nvvm.annotations_transplanted" marker. It is set on defined kernel functions with default visibility. The same fact is also stored as a string-keyed function attribute, giving the backend a dual encoding — a fast bit check plus the structured attribute. isKernelFunction can short-circuit when the marker is set, skipping the fallback to legacy !nvvm.annotations !"kernel" metadata.

Consumers read the marker through a single bit test before falling back to attribute or metadata lookup:

static bool is_kernel_fast(const Function *fn) {
    /* Bit 14 of the 16-bit GlobalValue flag word doubles as the
     * "nvvm.annotations_transplanted" cache. If it is set, the
     * attribute is guaranteed present and the !nvvm.annotations
     * fallback can be skipped. */
    return (fn->global_value_flags & (1u << 14)) != 0
        || function_has_attribute(fn, "nvvm.kernel");
}

The bit is only a cache; the source of truth remains the string attribute, which is what IR-text consumers see. Dropping the bit but keeping the attribute is correct (just slower); setting the bit but omitting the attribute breaks anything that round-trips IR through textual MLIR.

Bit-mask decoded for IPMSP clone-stamping

The linkage pass rewrites the flag word with four hard-coded masks. Each one names exactly one logical operation on the field.

MaskWidthPreserved bitsCleared bitsOperationResulting state
0xFCC0166, 7, 10..150..5, 8, 9flags = (flags & 0xFCC0) | 7 on non-kernelsLinkageTypes = InternalLinkage; DLLStorage, unnamed-addr, marker survive.
0xF08 lo4..70..3flags_lo &= 0xF0 on kernelsClears the 4-bit linkage nibble; visibility + DLLStorage low half preserved.
0x308 lo— (test)(flags_lo & 0x30) != 0 on kernelsTests visibility != Default. If non-zero, marker write is skipped.
0x408 hi— (OR)flags_hi |= 0x40 on kernels with Default visibilitySets bit 14, the "nvvm.annotations_transplanted" marker.

The asymmetry between non-kernels (0xFCC0 mask plus InternalLinkage) and kernels (0xF0 low-byte mask plus visibility test plus marker bit) is deliberate. Non-kernels exit after one rewrite; kernels preserve externally visible linkage state but add the NVIDIA marker when visibility allows it.

nvvm.annotations 10-kind catalog

Ten distinct kinds are encoded by the legacy !nvvm.annotations named metadata node and the parallel "nvvm.<kind>" function-attribute form. Bit 14 gates short-circuiting between the two via isKernelFunction.

#Legacy MDStringAttribute formFormat
1kernelnvvm.kerneli32 1 becomes an empty attribute.
2maxntid{x,y,z}nvvm.maxntidDim3 tuple serialized as "X,Y,Z".
3reqntid{x,y,z}nvvm.reqntidDim3 tuple serialized as "X,Y,Z".
4cluster_dim_{x,y,z}nvvm.cluster_dimDim3 tuple serialized as "X,Y,Z".
5minctasmnvvm.minctasmi32 serialized as a decimal string.
6maxnregnvvm.maxnregi32 serialized as a decimal string.
7cluster_max_blocks / maxclusterranknvvm.maxclusterranki32 stored as an integer-valued attribute.
8nvvm.blocksareclustersnvvm.blocksareclustersi32 1 becomes an empty attribute.
9grid_constantnvvm.grid_constantVariadic 1-based indices.
10nvvm.annotations_transplantedEmpty attribute plus bit 14 marker.

Kinds 2..4 are dim3 triples serialized as comma-joined strings. Kinds 5..6 are string-valued scalars; kind 7 is the lone integer-valued attribute, matching the CUDA cluster-attribute shape. Kind 8's legacy MDString already carries the "nvvm." prefix and passes through unchanged. Kind 9 (grid_constant) is the only kind the transplanter does not rewrite; it remains in the legacy node. Kind 10 has no legacy MDString and exists only as the dual-encoded marker.

Reimplementation Notes

for function in module.functions:
    if is_kernel(function):
        function.flags.linkage = external_kernel_linkage(function)
        if function.visibility == default:
            function.flags.nvvm_annotations_transplanted = true
            function.attrs["nvvm.annotations_transplanted"] = unit
    else:
        function.flags.linkage = internal

Keep the bit and the string attribute synchronized. Consumers may use the bit as a fast path, but tools that inspect IR text should still see the explicit nvvm.annotations_transplanted attribute.