Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Device ELF Format

CUDA device code is packaged as ELF binaries with an NVIDIA-proprietary OS/ABI and a dedicated machine type. nvlink both consumes and produces these device ELFs. This page documents the complete device ELF format as understood from reverse engineering the elfw constructor (sub_4438F0) and the ELF serialization path (sub_45BF00), cross-referenced with the header validation functions and the main linking loop. The format is a strict superset of standard ELF64 (or ELF32 for legacy targets) -- every field follows the System V ELF specification, but NVIDIA overloads the e_ident, e_type, e_flags, and section type spaces with GPU-specific semantics.

Key Facts

PropertyValue
e_machine0xBE (190) -- EM_CUDA, registered with the ELF standards committee
OS/ABI (64-bit GPU)0x41 (65) at e_ident[EI_OSABI] -- device ABI for 64-bit CUDA targets
OS/ABI (32-bit GPU)0x33 (51) -- device ABI for 32-bit CUDA targets (legacy)
ELF classELFCLASS64 (2) for all modern targets; ELFCLASS32 (1) for 32-bit device code
e_type valuesET_REL (1) relocatable, ET_EXEC (2) executable, 0xFF00 Mercury relocatable
Constructorsub_4438F0 (elfw_create) -- 14,821 bytes, allocates 672-byte elfw struct
Serializersub_45BF00 (write_elf_to_buffer) -- 13,258 bytes
File writersub_45C920 -- writes elfw to file descriptor
Memory writersub_45C950 -- writes elfw to arena buffer
Initial sections.shstrtab, .strtab, .symtab, .symtab_shndx (always); .note.nv.tkinfo, .note.nv.cuinfo (device ELF only)

ELF Identification (e_ident)

The 16-byte ELF identification array carries both standard ELF metadata and NVIDIA-specific ABI tags.

Byte Layout

OffsetFieldDevice ELF (64-bit)Device ELF (32-bit)Notes
0--3EI_MAG7F 45 4C 467F 45 4C 46Standard ELF magic
4EI_CLASS02 (ELFCLASS64)01 (ELFCLASS32)Set from a2 parameter: (a2 != 0) + 1
5EI_DATA01 (little-endian)01 (little-endian)Hardcoded via *(_WORD *)(v17 + 5) = 257 (0x0101)
6EI_VERSION01 (EV_CURRENT)01 (EV_CURRENT)Upper byte of the 0x0101 word
7EI_OSABI0x41 (65)0x33 (51)Selects the e_flags encoding scheme
8EI_ABIVERSIONa3 (ELF ABI version)a3Passed through from caller

The OS/ABI byte determines far more than ABI compatibility -- it selects which e_flags encoding is used and how the SM architecture is packed into the header. When a9 & 0x8000 is set in the constructor, the ELF is marked as a device ELF (OSABI 0x41); otherwise it is a 32-bit device ELF (OSABI 0x33).

OS/ABI Selection Logic

// sub_4438F0 -- elfw_create, offset into ELF header setup
// a9 is the merge_flags parameter
if (a9 & 0x8000) {
    // Device ELF -- 64-bit GPU ABI
    v17->e_ident[EI_OSABI] = 0x41;   // 65
} else {
    // Legacy / 32-bit GPU ABI
    v17->e_ident[EI_OSABI] = 0x33;   // 51
}
v17->e_ident[EI_ABIVERSION] = a3;
v17->e_machine = 190;                // EM_CUDA, always

ELF Type (e_type)

Device ELFs use three e_type values, each representing a different linking stage:

e_typeValueMeaningWhen produced
ET_REL1Relocatable objectnvlink -r (relocatable link), or unlinked .o from ptxas
ET_EXEC2Executable cubinNormal final link output (non-Mercury)
0xFF0065280Mercury relocatablePre-link Mercury object (sm >= 100) before finalization

The e_type is stored at elfw+16 (the standard ELF header offset for e_type in Elf64). It is set directly from the first parameter (a1) of elfw_create:

// e_type assignment in elfw_create (sub_4438F0)
v114 = (__int16)a1;                    // truncate first parameter to 16 bits
*(WORD *)(v17 + 16) = v114;           // e_type = a1

The caller determines the value:

// In sub_1406B40 (link output path):
v39 = 65280;                           // 0xFF00 (Mercury)
if (!is_mercury)
    v39 = (is_relocatable == 0) + 1;   // 1 = ET_REL, 2 = ET_EXEC
sub_4438F0(v39, ...);

// In main_0x409800:
sub_4438F0((byte_2A5F1E8 == 0) + 1, ...);   // 1 or 2

Important: The constructor also writes values 1 or 4 to *(DWORD *)(v17 + 48) (byte offset 48 = e_flags), which are the link_state flags seeded into e_flags before the SM architecture is ORed in. These are not e_type writes despite appearing at a confusingly similar DWORD offset in decompiled output.

For the legacy (OSABI 0x33) path, the e_type is also set from the first parameter.

ELF Flags (e_flags)

The e_flags field encodes the SM architecture and a link-state tag. The encoding differs between OSABI 0x41 and OSABI 0x33. Debug state and other ABI attributes are stored in the merge_flags field of the elfw struct (offset 76), not in e_flags itself.

OSABI 0x41 (64-bit GPU) -- e_flags Layout

BitsWidthFieldDescription
[7:0]8link_stateLink state flags: 0x01 = relocatable, 0x04 = executable (SASS present). The SM architecture reader (sub_4402A0) ignores these bits entirely.
[23:8]16sm_majorSM major architecture number (e.g., 90 = 0x5A for Hopper). Extracted by the reader as (uint16_t)(e_flags >> 8). Current SM values (all < 256) only occupy bits [15:8]; bits [23:16] are zero but architecturally part of the field.
[31:24]8reservedZero in all observed outputs

The SM minor version is not stored in e_flags for OSABI 0x41. It is stored internally in the elfw struct at offset 134 (*(WORD *)(elfw + 134) = a5).

The constructor packs the architecture as (sm_major << 8) | link_state:

// Device ELF (OSABI 0x41) flags encoding in sub_4438F0
// a4 = SM major version
// Step 1: set link_state at DWORD offset 12 (= byte offset 48 = e_flags)
if (relocatable)
    *(DWORD *)(v17 + 48) = 1;       // link_state = 0x01 (relocatable)
else
    *(DWORD *)(v17 + 48) = 4;       // link_state = 0x04 (executable)

// Step 2: OR in sm_major shifted left 8
v22 = *(DWORD *)(v17 + 48);         // read back 1 or 4
*(DWORD *)(v17 + 48) = (a4 << 8) | v22;   // e_flags = (sm_major << 8) | link_state

The SM architecture reader (sub_4402A0) confirms this encoding:

// sub_4402A0 -- get_sm_arch(elfw)
uint32_t flags = *(DWORD *)(elfw + 48);       // e_flags
if (*(BYTE *)(elfw + 7) == 0x41)              // OSABI == 0x41?
    return (uint16_t)(flags >> 8);             // bits [23:8] = sm_major
return (uint8_t)flags;                         // bits [7:0] for OSABI 0x33

The relocatable flag is checked by sub_443260:

// sub_443260 -- relocatable check
uint32_t mask = 0x80000000;          // OSABI 0x33 default
if (*(BYTE *)(elfw + 7) == 0x41)
    mask = 1;                         // OSABI 0x41: bit 0 = relocatable
if ((mask & *(DWORD *)(elfw + 48)) == 0)
    // Not relocatable

OSABI 0x33 (32-bit GPU) -- e_flags Layout

BitsWidthFieldDescription
[7:0]8sm_majorSM architecture number directly (e.g., 75 for Turing). Extracted by sub_4402A0 as (uint8_t)e_flags.
[15:8]8reservedZero / unused gap
[23:16]8sm_minorSM minor version. Packed via (a5 << 16) in the constructor.
[30:24]7reservedZero
[31]1is_relocatableSet to 1 (making the full DWORD 0x80000000) when relocatable
// 32-bit GPU (OSABI 0x33) flags encoding in sub_4438F0
// a4 = sm_major (v19), a5 = sm_minor, v22 = relocatable_bit
*(DWORD *)(v17 + 48) = v19 | (a5 << 16) | v22;
// v22 = 0x80000000 if relocatable, 0 otherwise
// Result: e_flags = sm_major[7:0] | sm_minor[23:16] | reloc[31]

Flag Bit Decomposition

The constructor extracts individual flag bits from the merge_flags parameter (a9) into boolean fields in the 672-byte elfw struct:

Bit in a9elfw offsetMeaning
0x0001 (bit 0)byte 84Debug info present
0x0002 (bit 1)byte 85Extended debug
0x0004 (bit 2)byte 87Preserve relocations
0x0008 (bit 3)byte 88Reserve null pointer
0x0010 (bit 4)byte 89Force RELA (or set if a10 relocatable)
0x0020 (bit 5)byte 90Optimize data layout
0x0040 (bit 6)byte 92Extra warnings
0x0080 (bit 7)byte 94Combined with sm_minor > 0x45 check
0x0100 (bit 8)byte 93Verbose keep
0x0200 (bit 9)byte 86Allow undefined globals
0x0400 (bit 10)--Allocate separate arena for ELF writer
0x0800 (bit 11)byte 96Suppress debug info
0x1000 (bit 12)byte 99Inverted: (flags >> 12) ^ 1) & 1 (legacy mode)
0x2000 (bit 13)byte 100Stack protector
0x4000 (bit 14)byte 91Enable extended shared memory
0x8000 (bit 15)byte 101Device ELF (vs host/32-bit)
0x70000 (bits 16--18)dword at offset 68Link mode flags (stored as a9 & 0x70000)
0x80000 (bit 19)dword at offset 76Relocatable flag (forced on when a10 is set)

The elfw Struct (672 bytes)

sub_4438F0 allocates a 672-byte structure via arena_alloc that serves as the complete in-memory representation of a device ELF being constructed. This is the central data structure for nvlink's output ELF.

Layout (reconstructed from sub_4438F0)

OffsetSizeFieldDescription
04elf_magic0x464C457F -- the \x7fELF magic, stored in native byte order
41ei_class(is_64bit != 0) + 1 -- 1 for Elf32, 2 for Elf64
5--62ei_data_version0x0101 -- ELFDATA2LSB + EV_CURRENT
71ei_osabi0x41 (device) or 0x33 (32-bit)
81ei_abiversiona3 parameter
162e_typeELF type (1, 2, or 0xFF00)
182e_machine190 (0xBE) always
204e_versionAPI version (a7), or 1 for device ELF
484e_flagsOSABI 0x41: (sm_major << 8) | link_state; OSABI 0x33: sm_major | (sm_minor << 16) | reloc_bit
622shstrtab_idxSection index of .shstrtab
641verbose_flagsa8 parameter (verbose output level)
684link_modea9 & 0x70000 -- link mode control bits
724sm_archa4 parameter (SM major version)
764merge_flagsFull a9 parameter (or a9 | 0x80000 if relocatable)
801debug_flaga6 parameter
831has_shstrtab1 if section[42] (shstrndx stored) != 0
84--10017flag_booleansIndividual boolean flags extracted from merge_flags
1011is_device_elf(a9 & 0x8000) != 0
10832tkinfo_bufferInitialized via sub_43E490(offset, 1000)
14032cuinfo_bufferInitialized via sub_43E490(offset, 2000)
1928string_bufferProgram header name buffer (set in sub_443730)
200--210variesphdr_offsetsProgram header string offsets
2888section_hash_posHash table for section lookup by name (positive indices)
2968section_hash_negHash table for section lookup (negative indices)
3044shstrtab_countCount of section name string table entries
3124strtab_countCount of symbol string table entries
3288strtab_entriesPointer to string table entry array
3368shstrtab_entriesPointer to shstrtab entry array
3448pos_sectionsSorted-array of sections (positive index)
3528neg_sectionsSorted-array of sections (negative index)
3608section_dataArray of section data records
3688section_orderArray mapping virtual indices to physical
376--408variessym_tablesSymbol table management structures
4808file_listLinked list of input file names
4888arch_statePointer to architecture-specific state (from sub_45AC50/sub_459640)
4968entry_hashHash table for entry-point symbols
5128input_filesInput file tracking for verbose output
520--576variessorted_arraysSix sorted-arrays for section/symbol management
6088arena_ptrOwning memory arena (if a9 & 0x400)
6168arena_handleArena handle from sub_45CAE0
6244option_flagsOption parser result from sub_42F8B0()

Constructor Flow: sub_4438F0 (elfw_create)

The constructor takes 10 parameters and initializes the complete output ELF structure:

elfw_create(
    a1: type_or_arena,    // elfw type code (e.g., 1 = relocatable)
    a2: is_64bit,         // 0 = 32-bit, nonzero = 64-bit
    a3: abi_version,      // value for e_ident[EI_ABIVERSION]
    a4: sm_major,         // SM architecture major (e.g., 90 for Hopper)
    a5: sm_minor,         // SM minor version / variant letter
    a6: debug_flag,       // debug info generation flag
    a7: api_version,      // CUDA API version or e_version value
    a8: verbose_flags,    // verbose output control
    a9: merge_flags,      // bitmask controlling all link behavior
    a10: is_relocatable   // explicit relocatable flag
)

Initialization Sequence

  1. Arena creation (if a9 & 0x400): Allocates a separate "elfw memory space" arena with 4096-byte page size via sub_432020.

  2. Struct allocation: Allocates 672 bytes from the arena, zeroes the entire buffer.

  3. ELF header setup: Writes magic bytes, class, data encoding, OSABI, ABI version, machine type.

  4. Flag decomposition: Extracts individual boolean flags from a9 into the flag bytes at offsets 84--100.

  5. Architecture state: Calls sub_45AC50 (for relocatable) or sub_459640 (for executable) to initialize architecture-specific state. Fatal error "couldn't initialize arch state" if this fails.

  6. Hash tables: Creates two 512-bucket hash tables for section name lookup (sub_4489C0).

  7. Sorted arrays: Creates six 16-element sorted arrays and three 64-element sorted arrays for section and symbol management.

  8. Input file record: Creates a 16-byte <input> record with the SM minor version.

  9. Core sections:

    • .shstrtab -- section header string table (SHT_STRTAB = 3, alignment 1)
    • .strtab -- symbol string table (SHT_STRTAB = 3, alignment 1)
    • .symtab -- symbol table (SHT_SYMTAB = 2, linked to .strtab). Entry size is 24 bytes for Elf64 or 16 bytes for Elf32; alignment is 8 or 4 respectively.
    • .symtab_shndx -- extended section indices (SHT_SYMTAB_SHNDX = 18, 4-byte entries)
  10. Device-only sections (when is_device_elf):

    • .note.nv.tkinfo -- tool kit info note (SHT_NOTE = 7, alignment 0x2000000)
    • .note.nv.cuinfo -- CUDA info note (SHT_NOTE = 7, alignment 0x1000000)
  11. UFT section (when e_type != ET_REL):

    • .nv.uft.entry -- unified function table entry points (section type 0x70000011 = 1879048209, 32-byte entries with 32-byte alignment)
  12. Section name registry: Populates the section name hash from a static table at off_1D3A9C0 containing known NVIDIA section name strings.

  13. Entry hash table: Creates an 8-element hash table for kernel entry point tracking.

  14. Option state: Calls sub_42F8B0() to capture the current option parser state.

  15. Finalization: Calls sub_4504B0(elfw, 0) to complete initialization.

Section Type Encoding

Device ELF sections use both standard ELF section types and NVIDIA vendor types in the SHT_LOPROC--SHT_HIPROC range (0x70000000--0x7FFFFFFF).

Type ValueNameDescription
0x00000001SHT_PROGBITSCode (.text.*), initialized data
0x00000002SHT_SYMTABSymbol table (.symtab)
0x00000003SHT_STRTABString tables (.shstrtab, .strtab)
0x00000007SHT_NOTENote sections (.note.nv.tkinfo, .note.nv.cuinfo)
0x00000008SHT_NOBITSUninitialized data (.nv.shared.*, .bss)
0x00000012SHT_SYMTAB_SHNDXExtended section indices (.symtab_shndx)
0x70000007CUDA_INFO.nv.info.* -- per-kernel metadata
0x70000009CUDA_RELOCINFO.nv.resolvedrela -- resolved relocations
0x7000000ACUDA_GLOBAL_INIT.nv.global.init -- global initialized data
0x70000011CUDA_UFT_ENTRY.nv.uft.entry -- unified function table entries
0x70000015CUDA_COMPAT.nv.compat -- compatibility metadata

These NVIDIA vendor types are recognized by the validation function (sub_43DD30) through a bitmask check 0x400D applied to (type - 0x70000007):

// Vendor section type recognition in elf_validate
uint32_t offset = sh_type - 0x70000007;   // SHT_LOPROC base
if (offset <= 14) {
    if ((0x400D >> offset) & 1)
        // Skip size validation for this section (may be NOBITS-like)
}

The bitmask 0x400D = 0b0100_0000_0000_1101 selects offsets 0 (0x70000007), 2 (0x70000009), 3 (0x7000000A), and 14 (0x70000015).

ELF Serialization

The output ELF is serialized by sub_45BF00 (write_elf_to_buffer), which writes the complete binary image in standard ELF order.

Write Order

  1. ELF header (64 bytes for Elf64, 52 bytes for Elf32) -- written from the first 64/52 bytes of the elfw struct, which contain the standard ELF header fields.

  2. Padding byte -- a single NUL byte after the header. Always written, verified with size check.

  3. Section header string table (.shstrtab) -- all registered section names as NUL-terminated strings, concatenated. Written by iterating the shstrtab entry array.

  4. Symbol string table (.strtab) -- all registered symbol names, same format as shstrtab.

  5. Padding to .shstrtab section offset -- NUL bytes to reach the declared sh_offset of section index 3.

  6. Symbol table (.symtab) -- symbol entries in Elf64_Sym (24 bytes each) or Elf32_Sym (16 bytes) format.

  7. Section data -- remaining sections in index order, with padding between sections to satisfy alignment requirements. Each section's sh_offset is validated against the current write position; a mismatch triggers the "Negative size encountered" error.

  8. Program headers -- program header table, placed after all section data.

  9. Section headers -- section header table at the file's end.

Program Header Count Selection

The serializer computes the number of program headers based on the presence of LOAD segments:

// Program header count selection in write_elf_to_buffer
if (has_text_segment && has_data_segment)
    phnum = 4;          // PT_LOAD(text) + PT_LOAD(data) + 2 more
else if (has_text_segment)
    phnum = 3;          // PT_LOAD(text) + 2 more
else if (has_data_segment)
    phnum = 2;          // PT_LOAD(data) + 1 more
else
    phnum = 2;          // minimal: PHDR + NOTE or similar

Two Write Paths

FunctionAddressDestinationMechanism
sub_45C9200x45C920Filesub_45B950 opens output file, sub_45BF00 serializes, sub_45B6A0 closes
sub_45C9500x45C950Memory buffersub_45BA30 allocates arena buffer, sub_45BF00 serializes, sub_45B6A0 finalizes

Both paths call the same sub_45BF00 serializer with an abstract write interface, differing only in the setup (file open vs buffer alloc) and teardown.

Architecture Encoding Examples

Here is how several SM architectures are encoded in e_flags under OSABI 0x41. The formula is e_flags = (sm_major << 8) | link_state where link_state is 0x01 (relocatable) or 0x04 (executable):

Architecturesm_majore_flags (hex)Breakdown
sm_50 (Maxwell)50 (0x32)0x00003201(50 << 8) | 1
sm_70 (Volta)70 (0x46)0x00004601(70 << 8) | 1
sm_75 (Turing)75 (0x4B)0x00004B01(75 << 8) | 1
sm_80 (Ampere)80 (0x50)0x00005001(80 << 8) | 1
sm_86 (Ampere)86 (0x56)0x00005601(86 << 8) | 1
sm_89 (Ada)89 (0x59)0x00005901(89 << 8) | 1
sm_90 (Hopper)90 (0x5A)0x00005A01(90 << 8) | 1
sm_100 (Blackwell)100 (0x64)0x00006401(100 << 8) | 1
sm_103 (Blackwell Ultra)103 (0x67)0x00006701(103 << 8) | 1
sm_120 (RTX 50xx)120 (0x78)0x00007801(120 << 8) | 1
sm_121 (DGX Spark)121 (0x79)0x00007901(121 << 8) | 1
Architecturesm_majore_flags (hex)Breakdown
sm_75 (Turing)75 (0x4B)0x00004B04(75 << 8) | 4
sm_80 (Ampere)80 (0x50)0x00005004(80 << 8) | 4
sm_89 (Ada)89 (0x59)0x00005904(89 << 8) | 4
sm_90 (Hopper)90 (0x5A)0x00005A04(90 << 8) | 4
sm_100 (Blackwell)100 (0x64)0x00006404(100 << 8) | 4
sm_120 (RTX 50xx)120 (0x78)0x00007804(120 << 8) | 4

OSABI 0x33 Examples

Architecturesm_majorsm_minorRelocatablee_flags (hex)
sm_50 (Maxwell)500no0x00000032
sm_75 (Turing)750yes0x8000004B

SM Minor / Variant Handling

The SM minor variant (e.g., the a in sm_90a) is not stored in e_flags for OSABI 0x41 device ELFs. It is tracked internally in the elfw struct at offset 134 (*(WORD *)(elfw + 134) = a5). The e_ident[EI_ABIVERSION] field carries an ABI protocol version number (typically 7 or 8), not the SM minor letter.

Relocatable vs Executable Flag Logic

The constructor has a two-source relocatable detection:

// Relocatable detection in elfw_create (sub_4438F0)
bool is_reloc = a10;                        // explicit flag from caller
if (!is_reloc)
    is_reloc = (a9 & 0x180000) != 0;       // bits 19 or 20 in merge_flags

if (is_reloc) {
    // For OSABI 0x41: seeds e_flags with link_state = 1 (relocatable)
    *(DWORD *)(v17 + 48) = 1;              // e_flags initial = 0x01
    merge_flags |= 0x80000;                // ensure bit 19 is set
    force_rela = true;                      // always use RELA for relocatable output
} else {
    // For OSABI 0x41: seeds e_flags with link_state = 4 (executable)
    *(DWORD *)(v17 + 48) = 4;              // e_flags initial = 0x04
}
// e_type is set separately from the first parameter (a1), NOT here.

When bit 19 (0x80000) is set in merge_flags, the ELF is relocatable. The constructor also forces force_rela = true for all relocatable outputs, ensuring .rela.* sections (with explicit addends) rather than .rel.* sections are used. This simplifies the relocation engine since addends do not need to be read from section data.

Device vs Host ELF Distinction

The a9 & 0x8000 test (bit 15) is the master switch between device and host ELF output:

FlagOSABISection SetupArchitecture State
a9 & 0x8000 set0x41.note.nv.tkinfo, .note.nv.cuinfo, .nv.uft.entrysub_45AC50 or sub_459640
a9 & 0x8000 clear0x33Standard sections only (no NVIDIA notes)None (32-bit GPU path)

Device ELFs receive the NVIDIA-specific note sections for tool kit information and CUDA kernel metadata. The architecture state initializer is called differently depending on whether the output is relocatable (sub_45AC50) or executable (sub_459640) -- both return a pointer stored at elfw+488 that provides architecture-specific encoding tables, relocation handlers, and instruction format metadata.

Cross-References

TopicPage
Input ELF accessor functionsELF Parsing
Cubin validation and loadingCubin Loading
Section merge operationsSection Merging
NVIDIA vendor section typesNVIDIA Section Types
.nv.info metadata format.nv.info Metadata
Constant bank sectionsConstant Banks
Unified function tablesUnified Function Tables
Program header layoutProgram Headers
ELF output serializationELF Serialization
Mercury ELF extensionsMercury ELF Sections

Design Notes

  1. No libelf dependency. nvlink constructs ELF files by writing raw bytes at computed offsets. The 672-byte elfw struct is a custom abstraction that tracks all the state needed to produce a valid ELF, including section ordering, string table construction, and symbol management. There is no libelf, libbfd, or LLVM Object library involved.

  2. Arena-owned memory. When a9 & 0x400 is set, the constructor creates a dedicated "elfw memory space" arena. All allocations for this ELF (sections, symbols, strings) come from this arena, enabling bulk deallocation by destroying the arena rather than tracking individual allocations.

  3. Dual encoding schemes. The OSABI 0x41 vs 0x33 split creates two parallel code paths throughout the constructor. OSABI 0x41 puts the SM major in bits [23:8] of e_flags (extracted as (uint16_t)(e_flags >> 8) by sub_4402A0) with a link-state tag in bits [7:0], and uses device-specific note sections. OSABI 0x33 puts the SM major directly in bits [7:0] and the SM minor in bits [23:16], with a simpler flag layout. Modern CUDA targets always use OSABI 0x41.

  4. Mercury detection via e_type. Mercury objects are identified by e_type = 0xFF00 (ET_LOPROC), set in the caller (sub_1406B40) when the Mercury flag at a1 + 505 is active. The link-mode bits from merge_flags (a9 & 0x70000) are stored separately in the elfw struct at offset 68, not in e_flags. Mercury objects require post-link binary rewriting by the FNLZR (Finalizer).

  5. Eager section creation. The constructor pre-creates .shstrtab, .strtab, .symtab, and .symtab_shndx regardless of whether they will contain data. This simplifies the merge phase, which can unconditionally reference these sections by their fixed internal indices. The .symtab_shndx section exists to support more than 65,279 sections (the ELF limit before extended section numbering is required).

  6. Section name preloading. The constructor iterates a static table of known NVIDIA section names and registers them in a hash table at elfw+496. This enables O(1) lookup of section names like .nv.global, .nv.constant0, .nv.shared., etc. during the merge phase, rather than linear scanning.

ptxas Wiki Cross-References

The device ELF format described here is the same format ptxas generates as output. For the ptxas-side ELF construction (which uses a parallel 672-byte ELFW struct), see the ptxas wiki:

Confidence Assessment

ClaimConfidenceEvidence
e_machine = 0xBE (190, EM_CUDA)DEFINITIVEVerified in sub_4438F0: *((_WORD *)v17 + 9) = 190 at offset +18
ELF magic = 0x464C457FDEFINITIVEVerified in sub_4438F0: *(_DWORD *)v17 = 1179403647 = 0x464C457F
OSABI 0x41 for device ELF (a9 & 0x8000)DEFINITIVEVerified: *((_BYTE *)v17 + 7) = 65 (0x41) when a9 & 0x8000
OSABI 0x33 for 32-bit GPU pathDEFINITIVEVerified: *((_BYTE *)v17 + 7) = 51 (0x33) in else branch
e_flags [7:0] = link_state (NOT sm_minor)DEFINITIVEConstructor seeds 1 or 4 into e_flags; reader sub_4402A0 extracts SM via >> 8; sub_443260 checks bit 0 for relocatable
e_flags [23:8] = sm_major (16-bit field)DEFINITIVEsub_4402A0 returns (uint16_t)(e_flags >> 8) for OSABI 0x41
OSABI 0x33: e_flags = sm_major | (sm_minor << 16) | reloc_bitDEFINITIVEsub_4438F0 line 224: v19 | (a5 << 16) | v22
e_type = 0xFF00 for MercuryDEFINITIVEsub_1406B40 line 202: v39 = 65280 when *(BYTE *)(a1 + 505) set
e_type set from first parameter (a1)DEFINITIVEsub_4438F0 line 151: *(WORD *)(v17 + 16) = v114 where v114 = (int16)a1
672-byte elfw struct allocationDEFINITIVEVerified in sub_4438F0: sub_4307C0(v14, 672)
EI_CLASS = (a2 != 0) + 1DEFINITIVEVerified: *((_BYTE *)v17 + 4) = (a2 != 0) + 1
EI_DATA + EI_VERSION = 0x0101DEFINITIVEVerified: *(_WORD *)((char *)v17 + 5) = 257 (0x0101)
EI_ABIVERSION = a3 (protocol version, not sm_minor)HIGHMain passes 7 or 8; sub_1406B40 passes 0, 2, 7, or 8
"elfw memory space" arena stringHIGHString at 0x1D39FA3 confirmed in nvlink_strings.json
"couldn't initialize arch state" errorHIGHString at 0x1D39FE8 confirmed in nvlink_strings.json
Flag decomposition bits 0-19HIGHVerified bit-by-bit in sub_4438F0
Relocatable detection: a10 || (a9 & 0x180000)HIGHVerified in sub_4438F0
link_state = 0x04 means executableHIGHsub_4438F0 line 163; no contradicting reader found
Section type bitmask 0x400D validationMEDIUMVerified in sub_43DD30, bitmask matches but inner logic complex
tkinfo alignment 0x2000000, cuinfo 0x1000000HIGHVerified: sub_441AC0(... ".note.nv.tkinfo", 7, 0x2000000, ...)
Architecture hex examples (sm_75 etc)HIGHDerived from verified (a4 << 8) | link_state formula
.symtab_shndx (SHT_SYMTAB_SHNDX = 18)HIGHString ".symtab_shndx" exists in constructor xrefs
SM minor not in e_flags for OSABI 0x41HIGHsm_minor (a5) only stored at elfw offset 134 in device path; not ORed into e_flags