Virtual Override Execution Space Matrix
When a derived class overrides a base class virtual function in CUDA, the execution spaces of both functions must be compatible. A __device__ virtual cannot be overridden by a __host__ function, a __host__ virtual cannot be overridden by a __device__ function, and so on. cudafe++ enforces these rules inside record_virtual_function_override (sub_432280, 437 lines, class_decl.c), which runs each time the EDG front-end registers a virtual override during class body scanning. The function performs three tasks: (1) propagate the base class's execution space obligations onto the derived function, (2) detect illegal mismatches and emit one of six dedicated error messages (3542–3547), and (3) fall through to standard EDG override recording (covariant returns, [[nodiscard]], override/final, requires-clause checks).
This page documents the override checking logic at reimplementation-grade depth: reconstructed pseudocode from the decompiled binary, a complete compatibility matrix, the six error messages with their diagnostic tags, and the relaxed-mode flag that softens certain checks.
Key Facts
| Property | Value |
|---|---|
| Binary function | sub_432280 (record_virtual_function_override, 437 lines) |
| Source file | class_decl.c |
| Parameters | a1=derivation_info, a2=overriding_sym, a3=overridden_sym, a4=base_class_info, a5=covariant_return_adjustment |
| Entity field read | byte +182 (execution space bitfield) on both overridden and overriding entities |
| Classification mask | byte & 0x30 — two-bit extraction: 0x00=implicit host, 0x10=explicit host, 0x20=device, 0x30=HD |
| Propagation bits | 0x10 (host_explicit), 0x20 (device_annotation) |
| Attribute lookup | sub_5CEE70 with kind 87 (__device__) and 86 (__host__) |
| Error emission | sub_4F4F10 with severity 8 (hard error) |
| Relaxed mode flag | dword_106BFF0 (relaxed_attribute_mode) |
| Implicitly-HD test | byte +177 & 0x10 on entity — constexpr / __forceinline__ bypass |
| Override-involved mark | byte +176 |= 0x02 on overriding entity |
| Assertion guard | nv_is_device_only_routine from nv_transforms.h:367 |
Why Virtual Functions Need Execution Space Checks
Standard C++ imposes no concept of execution space on virtual functions. CUDA introduces three execution spaces (__host__, __device__, __host__ __device__) and one launch-only space (__global__). When a virtual function in a base class is declared with one execution space, every override in every derived class must be callable in the same space. If the base declares a __device__ virtual, calling it through a base pointer on the GPU must dispatch to the derived override — which is only possible if the override is also __device__ (or __host__ __device__).
__global__ functions cannot be virtual at all (error 3505/3506 prevents this at the attribute application stage), so the override matrix only covers three spaces: __host__, __device__, and __host__ __device__. An unannotated function counts as implicit __host__.
Function Entry: Mark and Resolve Entities
The function begins by resolving the actual entity nodes from the symbol table entries:
// sub_432280 entry (lines 60-69 of decompiled output)
//
// a2 = overriding_sym (symbol table entry for the derived-class function)
// a3 = overridden_sym (symbol table entry for the base-class function)
//
// v10 = entity of overridden function: *(overridden_sym + 88)
// v11 = entity of overriding function: *(*(overriding_sym) + 88)
//
// The entity node at offset +88 is the "associated routine entity" --
// the actual function representation containing execution space bits.
int64_t overridden_entity = *(int64_t*)(overridden_sym + 88); // v10
int64_t overriding_entity = *(int64_t*)(*(int64_t*)overriding_sym + 88); // v11
// Mark the overriding entity as "involved in an override"
*(uint8_t*)(overriding_entity + 176) |= 0x02;
The +176 |= 0x02 flag marks the derived function as "override-involved." This flag is consumed downstream by the exception specification resolver and other class completion logic.
Phase 1: Implicitly-HD Fast Path and Execution Space Propagation
The first branch tests byte +177 & 0x10 on the overriding entity. This bit indicates the function is implicitly __host__ __device__ — set for constexpr functions (implicitly HD since CUDA 7.5) and __forceinline__ functions. When this bit is set, the override is exempt from mismatch checking, but execution space propagation still occurs.
// Phase 1: implicitly-HD check and propagation (lines 70-94)
void check_and_propagate(int64_t overriding_entity, int64_t overridden_entity) {
if (overriding_entity->byte_177 & 0x10) {
// Overriding function is implicitly HD (constexpr / __forceinline__)
//
// Skip mismatch errors entirely -- an implicitly-HD function is
// compatible with any base execution space. But we must still
// propagate the base's space obligations onto the derived entity
// so that downstream passes (IL marking, code generation) know
// what to emit.
if (!(overridden_entity->byte_177 & 0x10)) {
// Overridden function is NOT implicitly HD -- it has an explicit
// execution space. We need to propagate that space.
//
// Guard: skip propagation for constexpr lambdas with internal
// linkage but no override flag (a degenerate case).
if ((overridden_entity->qword_184 & 0x800001000000) == 0x800000000000
&& !(overridden_entity->byte_176 & 0x02)) {
// Degenerate case -- skip propagation
goto done_nvidia_checks;
}
uint8_t base_es = overridden_entity->byte_182;
// Propagate __host__ obligation:
// If the base is NOT device-only (i.e., base is host, HD, or
// unannotated), the derived function inherits the host obligation.
if ((base_es & 0x30) != 0x20) {
overriding_entity->byte_182 |= 0x10; // set host_explicit
}
// Propagate __device__ obligation:
// If the base has the device_annotation bit set, the derived
// function inherits the device obligation.
if (base_es & 0x20) {
overriding_entity->byte_182 |= 0x20; // set device_annotation
}
}
goto done_nvidia_checks;
}
// ... Phase 2 continues below
}
Why Propagation Matters
Propagation ensures that a derived class inherits its base class's execution space obligations even when the derived function is implicitly HD. Consider:
struct Base {
__device__ virtual void f(); // byte_182 & 0x30 == 0x20
};
struct Derived : Base {
constexpr void f() override; // byte_177 & 0x10 set (implicitly HD)
};
Without propagation, Derived::f would have byte_182 == 0x00 (no explicit annotation). The device-side IL pass would skip it, and a virtual call base_ptr->f() on the GPU would dispatch to a function never compiled for the device. Propagation sets byte_182 |= 0x20 (device_annotation), ensuring the function is included in device IL.
The propagation follows strict rules:
Base byte_182 & 0x30 | Propagated to overriding entity |
|---|---|
0x00 (implicit host) | |= 0x10 (host_explicit) |
0x10 (explicit host) | |= 0x10 (host_explicit) |
0x20 (device) | |= 0x20 (device_annotation) |
0x30 (HD) | |= 0x10 then |= 0x20 (both) |
Phase 2: Explicit Annotation Mismatch Detection
When the overriding function is NOT implicitly HD (byte_177 & 0x10 == 0), the checker must verify that the derived function's explicit execution space matches the base. It does this by querying the attribute lists on the overriding symbol for __device__ (kind 87) and __host__ (kind 86) attributes using sub_5CEE70.
The overriding symbol has two attribute list pointers: offset +184 (primary attributes) and offset +200 (secondary/redeclaration attributes). Both are checked for each attribute kind.
Reconstructed Pseudocode
// Phase 2: explicit annotation mismatch detection (lines 96-188)
//
// At this point, overriding_entity->byte_177 & 0x10 == 0 (not implicitly HD).
// We must determine what execution space annotations the overriding function
// has, and compare against the overridden function's execution space.
void check_override_mismatch(
int64_t overriding_sym, // a2
int64_t overriding_entity, // v11
int64_t overridden_entity, // v10
int64_t overridden_sym_list, // v6 = a2+48 (location info for diagnostics)
int64_t overridden_sym_arg, // v8 = a3 (for diagnostics)
int64_t base_sym // v9 = *a2 (for diagnostics)
) {
// -- Assertion: overridden entity must exist --
if (!overridden_entity) {
internal_error("nv_transforms.h", 367, "nv_is_device_only_routine");
}
// -- Extract overridden execution space --
uint8_t base_es = overridden_entity->byte_182;
uint8_t mask_30 = base_es & 0x30; // 0x00/0x10/0x20/0x30
bool base_no_device_annotation = (base_es & 0x20) == 0; // v56
bool base_is_hd = (mask_30 == 0x30); // v58
uint8_t base_device_bit = base_es & 0x20; // v55
// -- Check overriding function for __device__ attribute (kind 87) --
bool has_device_attr = find_attribute(87, overriding_sym->attr_list_184)
|| find_attribute(87, overriding_sym->attr_list_200);
if (has_device_attr) {
// Overriding function has __device__.
// Now check if it also has __host__ (kind 86) -- making it HD.
bool has_host_attr = find_attribute(86, overriding_sym->attr_list_184)
|| find_attribute(86, overriding_sym->attr_list_200);
if (has_host_attr) {
// --- Overriding is __host__ __device__ ---
if (base_device_bit) {
// Base has device_annotation (bit 5 set).
// If base is device-only (mask_30 == 0x20), error 3544.
if (mask_30 == 0x20) {
emit_error(8, 3544, location, overridden, base);
}
// If base is HD (mask_30 == 0x30), it's legal -- no error.
// If base has device_bit but mask_30 != 0x20 and != 0x30,
// that can't happen (bit 5 set implies mask_30 is 0x20 or 0x30).
} else {
// Base has no device_annotation -- base is host or implicit host.
emit_error(8, 3543, location, overridden, base);
}
} else {
// --- Overriding is __device__ only ---
// Fall through to LABEL_83 logic.
goto device_only_check;
}
} else {
// Overriding function has NO __device__ attribute.
// It's either explicit __host__ or implicit host (no annotation).
if (dword_106BFF0) {
// Relaxed mode: check if overriding has explicit __host__.
bool has_host_attr = find_attribute(86, overriding_sym->attr_list_184)
|| find_attribute(86, overriding_sym->attr_list_200);
if (!has_host_attr) {
// No explicit __host__ either -- implicit host.
// In relaxed mode, an implicit-host override is treated like
// a device-only override for certain base configurations.
// Jump into the device-only path with modified conditions.
goto device_only_check_relaxed;
}
// Explicit __host__ in relaxed mode: fall through to normal checks.
}
// --- Overriding is __host__ (explicit or implicit) ---
if (mask_30 == 0x20) {
// Base is __device__ only
emit_error(8, 3545, location, overridden, base);
} else if (mask_30 == 0x30) {
// Base is __host__ __device__
emit_error(8, 3546, location, overridden, base);
}
// else: base is host/implicit-host, same space -- no error.
goto done_nvidia_checks;
}
device_only_check:
// Overriding is __device__ only (has __device__ but no __host__).
// v39 = base_no_device_annotation (v56), v40 = 1 (always set entering here).
{
bool should_error = base_no_device_annotation; // v39
bool relaxed_extra = true; // v40
device_only_check_relaxed:
// (relaxed mode entry: v39 = 0, a1 = v56 = base_no_device_annotation)
if (dword_106BFF0) {
// Relaxed mode: the error fires unconditionally when
// base has no device annotation (base is host/implicit-host).
// In strict mode, same condition applies.
should_error = base_no_device_annotation;
relaxed_extra = true; // always true in relaxed
}
if (should_error) {
// Base is host-only (no device_annotation) and override is device-only.
emit_error(8, 3542, location, overridden, base);
} else if (base_is_hd && relaxed_extra) {
// Base is HD, override is device-only.
// v40 (relaxed_extra) is always 1 from Entry A, so this
// fires in both strict and relaxed modes for D-overrides-HD.
emit_error(8, 3547, location, overridden, base);
}
// else: base is device-only too -- compatible, no error.
}
done_nvidia_checks:
// Continue to standard EDG override recording...
}
Decision Tree (Simplified)
overriding byte_177 & 0x10?
YES (implicitly HD) --> propagate, skip mismatch check
NO --> extract base_es = overridden byte_182
has __device__ attr on overriding?
YES --> also has __host__ attr?
YES (override=HD):
base has device_annotation?
YES and mask_30==0x20 --> ERROR 3544
NO --> ERROR 3543
NO (override=D-only):
base has NO device_annotation? --> ERROR 3542
base is HD? --> ERROR 3547
NO (override=H or implicit-H):
base mask_30==0x20 --> ERROR 3545
base mask_30==0x30 --> ERROR 3546
otherwise --> legal (same space)
The Six Error Messages
Each mismatch produces one of six errors. All are emitted at severity 8 (hard error) and are individually suppressible by their diagnostic tag via --diag_suppress or #pragma nv_diag_suppress.
| Internal | Display | Diagnostic Tag | Message Template |
|---|---|---|---|
| 3542 | 20085 | vfunc_incompat_exec_h_d | execution space mismatch: overridden entity (%n1) is a __host__ function, but overriding entity (%n2) is a __device__ function |
| 3543 | 20086 | vfunc_incompat_exec_h_hd | execution space mismatch: overridden entity (%n1) is a __host__ function, but overriding entity (%n2) is a __host__ __device__ function |
| 3544 | 20087 | vfunc_incompat_exec_d_hd | execution space mismatch: overridden entity (%n1) is a __device__ function, but overriding entity (%n2) is a __host__ __device__ function |
| 3545 | 20088 | vfunc_incompat_exec_d_h | execution space mismatch: overridden entity (%n1) is a __device__ function, but overriding entity (%n2) is a __host__ function |
| 3546 | 20089 | vfunc_incompat_exec_hd_h | execution space mismatch: overridden entity (%n1) is a __host__ __device__ function, but overriding entity (%n2) is a __host__ function |
| 3547 | 20090 | vfunc_incompat_exec_hd_d | execution space mismatch: overridden entity (%n1) is a __host__ __device__ function, but overriding entity (%n2) is a __device__ function |
The display number is computed as internal + 16543 (the standard CUDA error renumbering from construct_text_message). The tag naming convention is vfunc_incompat_exec_{overridden}_{overriding}.
The %n1 and %n2 fill-ins resolve to the entity display names of the base and derived functions respectively, including their full qualified names and parameter types.
Suppression Example
# Suppress by tag (preferred)
nvcc --diag_suppress=vfunc_incompat_exec_h_d file.cu
# Suppress by display number
nvcc --diag_suppress=20085 file.cu
# Suppress in source
#pragma nv_diag_suppress vfunc_incompat_exec_h_d
Complete Compatibility Matrix
This table shows every combination of base (overridden) and derived (overriding) execution space. "Implicit H" means the function has no execution space annotation (byte_182 & 0x30 == 0x00). Since implicit host and explicit __host__ are treated identically for override purposes (both lack the device_annotation bit and have mask_30 != 0x20), they share the same row/column behavior.
__global__ is excluded because __global__ functions cannot be virtual — the attribute handler rejects __global__ on virtual functions before override checking ever runs.
The matrix is the same in both strict mode (dword_106BFF0 == 0) and relaxed mode (dword_106BFF0 == 1). The relaxed flag changes the code path used to reach the error decision but produces the same result for all input combinations.
| Derived: H / implicit H | Derived: D | Derived: HD | Derived: implicitly HD | |
|---|---|---|---|---|
| Base: H / implicit H | legal | error 3542 | error 3543 | legal + propagate |= 0x10 |
| Base: D | error 3545 | legal | error 3544 | legal + propagate |= 0x20 |
| Base: HD | error 3546 | error 3547 | legal | legal + propagate |= 0x10, |= 0x20 |
Reading the matrix: each row is the base class virtual function's space; each column is the derived class override's space. "Legal" means no error is emitted and the override is recorded normally. "Legal + propagate" means the override is accepted AND the base's execution space bits are OR'd into the derived entity's byte_182.
The diagonal (same space in base and derived) is always legal. The last column (implicitly HD) is always legal because an implicitly HD function is compatible with every execution space — the mismatch check is skipped entirely and only propagation runs.
Why Both Modes Produce the Same Matrix
Tracing the LABEL_83 code path with the two entry points reveals that dword_106BFF0 does NOT gate error 3547. In the critical device-only-override path (Entry A), v40 is set to 1 before reaching LABEL_83 regardless of the relaxed flag. The flag only changes the assignment to a1 and v40 via conditional moves (cmovz/cmovnz in the disassembly), but the net effect is identical for all input combinations:
LABEL_83 internals (decompiled, annotated):
a2 = 3542; // tentative error
if (!dword_106BFF0) a1 = v39; // strict: a1 = v39
if (dword_106BFF0) v40 = 1; // relaxed: force v40 = 1
// BUT v40 was already 1 from Entry A (line 134)
if (a1) emit_error(3542); // base has no device_annotation
else if (v58 && v40) emit_error(3547); // base is HD
else skip; // base is D-only (compatible)
Entry A sets v39 = v56, v40 = 1, a1 = v56. In strict mode, a1 is overwritten to v39 (same value). In relaxed mode, a1 stays v56 (same value). Either way, a1 = v56 = (base has no device annotation). The v40 = 1 from Entry A is preserved. The result is identical.
The relaxed flag introduces a second entry point (Entry B) for overriding functions with no explicit annotation. In relaxed mode, such functions are routed through LABEL_83 with v39 = 0 and a1 = v56, producing the same device-only check logic. In strict mode, the same functions take the direct H/implicit-H path and produce errors 3545/3546 for device/HD bases. Both paths reach the same conclusions.
Relaxed Mode: The Unannotated Override Path
When dword_106BFF0 == 1 and the overriding function has no __device__ attribute, the checker takes an additional step before falling through to the H/implicit-H path. It queries the overriding symbol for explicit __host__ (kind 86). If __host__ IS found, the function is confirmed as explicit host and errors 3545/3546 apply normally. If __host__ is NOT found (truly unannotated), the function is reclassified through the device-only check path (LABEL_83). This reclassification does not change the error outcome — an unannotated function overriding a host base still sees no error (both are host-space), and an unannotated function overriding a device or HD base still produces the appropriate error.
Propagation Details
When the overriding function is implicitly HD (byte_177 & 0x10), execution space is propagated from the base to the derived entity by OR-ing bits into byte_182:
// Propagation (direct from decompiled sub_432280, lines 77-91)
uint8_t base_es = overridden_entity->byte_182;
// If base is NOT device-only, derived inherits host obligation
if ((base_es & 0x30) != 0x20) {
overriding_entity->byte_182 |= 0x10; // host_explicit bit
base_es = overridden_entity->byte_182; // re-read (compiler artifact)
}
// If base has device_annotation, derived inherits device obligation
if (base_es & 0x20) {
overriding_entity->byte_182 |= 0x20; // device_annotation bit
}
The re-read of overridden_entity->byte_182 after setting 0x10 on the overriding entity is a compiler artifact (the decompiler shows it reading back from v10+182 into v22, but v10 is the overridden entity, so the value hasn't changed). The OR operations are on the overriding entity only.
Propagation Matrix
Base space (byte_182 & 0x30) | Bits OR'd into overriding byte_182 | Net effect on overriding entity |
|---|---|---|
0x00 (implicit H) | |= 0x10 | Becomes explicit host (0x10) |
0x10 (explicit H) | |= 0x10 | Becomes explicit host (0x10) |
0x20 (D only) | |= 0x20 | Becomes device-annotated (0x20) |
0x30 (HD) | |= 0x10, then |= 0x20 | Becomes HD (0x30) |
After propagation, the overriding entity's byte_182 accurately reflects the execution space obligations inherited from its base class. Downstream passes (device/host separation, IL marking, code generation) use this byte to determine whether the function needs device-side compilation, host-side compilation, or both.
Relaxed Mode (dword_106BFF0)
The global flag dword_106BFF0 (relaxed_attribute_mode, default 1 per CLI defaults) controls permissive handling of execution space annotations across the compiler. Its primary effects are on attribute application (allowing __device__ + __global__ coexistence) and cross-space call validation. For virtual override checking, its effect is narrower:
-
Unannotated override reclassification. In relaxed mode, when the overriding function has neither
__device__nor__host__attributes explicitly, the checker additionally queries the overriding symbol for__host__(kind 86). If__host__is NOT found, the checker treats the unannotated function as potentially device-compatible and routes through the device-only check path (LABEL_83). This can produce error 3542 (D overrides H) for an implicit-host function, which would otherwise only see errors 3545/3546. -
No error suppression for overrides. Unlike attribute application where relaxed mode suppresses error 3481, relaxed mode does NOT suppress any of the six override errors. All six fire at severity 8 in both modes. The flag
dword_106BFF0modulates the code path taken to reach the error decision, not the severity or suppression of the error itself.
Additional Override Checks (Non-CUDA)
After the CUDA execution space checks, sub_432280 continues with standard EDG override validation:
| Error | Condition | Meaning |
|---|---|---|
| 1788 | Base has [[nodiscard]], derived does not | Missing [[nodiscard]] on override |
| 1789 | Derived has [[nodiscard]], base does not | Extraneous [[nodiscard]] on override |
| 1850 | Overriding a final virtual function | Override of final function |
| 2935 | Derived has requires-clause, base does not | Requires-clause mismatch |
| 2936 | Base has requires-clause, derived does not | Requires-clause mismatch |
These are standard C++ checks unrelated to CUDA execution spaces.
Example: Override Interactions
// Example 1: Legal same-space override
struct Base {
__device__ virtual void f();
};
struct Derived : Base {
__device__ void f() override; // Legal: D overrides D
};
// Example 2: Error 3542 -- D overrides H
struct Base2 {
virtual void f(); // Implicit __host__
};
struct Derived2 : Base2 {
__device__ void f() override; // ERROR 3542 (20085)
};
// error #20085-D: execution space mismatch: overridden entity (Base2::f)
// is a __host__ function, but overriding entity (Derived2::f)
// is a __device__ function
// Example 3: Error 3546 -- H overrides HD
struct Base3 {
__host__ __device__ virtual void f();
};
struct Derived3 : Base3 {
void f() override; // ERROR 3546 (20089)
};
// error #20089-D: execution space mismatch: overridden entity (Base3::f)
// is a __host__ __device__ function, but overriding entity (Derived3::f)
// is a __host__ function
// Example 4: Legal constexpr override with propagation
struct Base4 {
__device__ virtual int g();
};
struct Derived4 : Base4 {
constexpr int g() override; // Legal: implicitly HD, propagates |= 0x20
};
// Derived4::g now has byte_182 |= 0x20 (device_annotation)
// and is included in device IL compilation.
// Example 5: Error 3547 -- D overrides HD
struct Base5 {
__host__ __device__ virtual void h();
};
struct Derived5 : Base5 {
__device__ void h() override; // ERROR 3547 (20090)
};
Function Map
| Address | Identity | Lines | Source |
|---|---|---|---|
sub_432280 | record_virtual_function_override | 437 | class_decl.c |
sub_5CEE70 | find_attribute (attribute list lookup by kind) | ~30 | attribute.c |
sub_4F4F10 | emit_diag_with_entity_pair (severity, error, loc, base, derived) | ~100 | error.c |
sub_4F2930 | internal_error (assertion failure) | ~20 | error.c |
sub_41A6E0 | dump_override_entry (debug trace helper) | ~40 | class_decl.c |
sub_41D010 | add_to_override_list | ~20 | class_decl.c |
sub_5E20D0 | allocate_override_entry (40-byte node) | ~15 | mem.c |
sub_432130 | resolve_indeterminate_exception_specification | ~60 | class_decl.c |
Override Entry Structure
Each recorded override is stored as a 40-byte linked list node:
Override entry (40 bytes):
+0x00 (0): next pointer
+0x08 (8): base_class_symbol (entity in base class vtable)
+0x10 (16): derived_class_entity (overriding function entity)
+0x18 (24): flags (0 initially, set during processing)
+0x20 (32): covariant_return_adjustment (pointer or NULL)
The override list is managed via:
qword_E7FE98: list head (most recent entry)qword_E7FEA0: free list head (recycled 40-byte entries)qword_E7FE90: allocation counter
When debug tracing is enabled (dword_126EFCC > 3), the function prints "newly created: ", "existing entry: ", "after modification: ", and "removing: " to stderr via fwrite, followed by calls to sub_41A6E0 to dump the entry contents.
Cross-References
- Execution Spaces — bitfield layout at entity
+182, attribute application handlers, conflict matrix - Cross-Space Call Validation — call-graph enforcement, the implicitly-HD bypass
- CUDA Error Catalog — error numbering scheme, diagnostic tag suppression system
- Global Variables —
dword_106BFF0and other flags - Entity Node Layout — full byte map of the entity structure including
+176,+177,+182 - __global__ Function Constraints — why
__global__functions cannot be virtual