Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Subprocess Harness

Abstract

One POSIX subprocess harness drives every external tool tileiras invokes. The same launcher serves ptxas, nvdisasm, and any helper tool — argument construction is tool-specific but process behavior is shared: argv/envp setup, stdio redirection, optional timeout, child status decoding, stderr capture, and resource-usage collection.

The harness itself is CUDA-agnostic. CUDA-specific arguments like --input-as-string, --knobs-file=<path>, --nv-host=<path>, and temporary cubin paths are assembled by the driver before it calls the generic launcher.

POSIX subprocess launcher

The launcher accepts an executable path, argv vector, envp vector, redirection descriptors for stdin/stdout/stderr, optional timeout information, and output buffers for diagnostics and resource usage.

PathUsed whenBehavior
posix_spawnNo session or resource-limit hook is needed.Fast path with file actions and retry on interrupted spawn attempts.
fork + execsetsid or process resource limits are requested.Child applies redirection, limits, optional setsid, then executes the target.

The fork path uses shell-compatible exit codes for exec failure — 127 means the program was not found, 126 means the program existed but could not be executed. The wait decoder maps those codes back into user-facing diagnostics.

int launch_child(ProcessSpec *spec, ChildProcess *child, string *error) {
    if (!spec->setsid && !spec->has_resource_limits)
        return spawn_with_posix_spawn(spec, child, error);

    pid_t pid = fork();
    if (pid < 0)
        return system_error(error, "Couldn't fork");
    if (pid == 0)
        exec_child_or_exit(spec);

    child->pid = pid;
    return 0;
}

stderr-merge optimization

When stderr and stdout target the same sink, the launcher redirects stderr with dup2(stdout, stderr) instead of opening the same destination twice. This is the common ptxas shape — both streams fold into one in-memory diagnostic buffer so the parent can report assembler failure with full context.

SIGALRM wait4 timeout

Timeout handling rides on wait4 and SIGALRM. With a timeout enabled, the parent installs a temporary alarm handler, arms alarm(seconds), and waits on the child. If wait4 is interrupted by the alarm, the parent sends SIGKILL, disarms the alarm, restores the old signal handler, and reaps the child.

int wait_child(pid_t pid, unsigned timeout_s, ProcessResult *result) {
    install_alarm_handler();
    if (timeout_s != 0)
        alarm(timeout_s);

    int status = 0;
    int rc = wait4(pid, &status, 0, &result->usage);
    if (rc < 0 && errno == EINTR && timeout_s != 0) {
        kill(pid, SIGKILL);
        alarm(0);
        restore_alarm_handler();
        waitpid(pid, &status, 0);
        return child_timed_out(result);
    }

    alarm(0);
    restore_alarm_handler();
    return decode_child_status(status, result);
}

Status decoding follows normal POSIX rules. A signaled child reports the signal name and whether a core file was produced. Exit code 126 means the program could not be executed; exit code 127 means command-not-found. Other codes are returned directly to the caller.

ptxas launcher

The ptxas adapter assembles a fixed argv shape around the serialized PTX text:

Argv tokenOriginPurpose
-arch sm_NN--gpu-nameSelects sm_100, sm_103, sm_110, sm_120, or sm_121.
--opt-level N--opt-levelForwards the driver optimization level, default 3.
--input-as-string <PTX>PTX serializerPasses PTX text by argv instead of through a temporary PTX file.
--knobs-file=<path>MLIR_ENABLE_EVO and PTX_KNOBS_PATHForwards a ptxas internal knob file when both env vars are set.
--nv-host=<path>host-code pathPoints ptxas at companion host-side code when present.

The normal ptxas call takes the posix_spawn fast path and merges stdout and stderr into one capture buffer. The assembled cubin comes back through the child's stdout, not via a temporary output file.

nvdisasm driver

The SASS dump adapter takes a command string, splits it into argv words, and expects the first word to resolve through PATH. The documented default is:

nvdisasm -c

The adapter writes the ptxas-produced cubin to a temporary file, appends that file path as the final argv element, launches the command, captures stdout, and removes the temporary cubin if the driver created it. When the caller provided an existing cubin path, lifetime management stays with the caller.

int dump_sass(const DumpOptions *opts, ByteSpan cubin, string *out) {
    Argv argv = split_command(opts->dump_sass_command);
    if (argv.len == 0)
        return error("Please provide a valid dumpSassCommand. For example, `nvdisasm -c`.");

    TempFile temp = write_temp_cubin(opts->input_base_name, cubin);
    argv_push(&argv, temp.path);

    int rc = run_child(argv, CAPTURE_STDOUT_AND_STDERR, out);
    remove_temp_file(&temp);
    return rc;
}

Dump-command failures surface as compile-path failures because the driver treats SASS dumping as part of the requested output action.

nvdisasm command-line construction

Once tileiras finishes its MLIR pipeline and PTX emission, the compile dispatcher sub_57A8E0 shells out to two external programs: ptxas (PTX to SASS) and nvdisasm (SASS to disassembled text, embedded as an ELF section). Both invocations route through the subprocess helpers sub_44A36C0 and sub_44A3430, with the raw-ostream sink sub_6CF9C0 draining the child's stdout back into the parent.

Every nvdisasm invocation starts with the literal 5-byte flag block "-uumn" stored at rodata 0x57BB97. The literal packs four flags into one argv token: -u (unsigned offsets), -u (literal second occurrence, triggering the canonical re-entry behaviour known from nvdisasm pre-13.1), -m (mnemonic-only emission), and -n (no header). All five bytes are pushed as one argv element rather than four separate flags.

With config.sanitize == 1 — the only currently-defined sanitize mode, memcheck — the helper appends a 41-byte tail starting with a leading space: -sanitize=memcheck -g-tmem-access-check. The trailing flag -g-tmem-access-check instruments tensor-memory (TMEM) loads and stores, a Blackwell-and-newer concern consistent with the sm_100..sm_121 target table. The full sanitize-on argv therefore consists of the 5-byte "-uumn" token followed by the 41-byte tail token.

The dispatcher writes the literal "nvdisasm -c" at rodata 0x57B6BB as the command-line prefix before the flag block. Program path resolution leaves nvdisasm to PATH; the -c flag asks nvdisasm to emit a section-friendly format suitable for embedding inside the final ELF relocatable.

Rodata addressLiteral contentsRole
0x57B6BBnvdisasmProgram name, resolved through PATH.
0x57B6C8-cSection-friendly output format.
0x57BB97-uumnCombined flag block (unsigned / re-entry / mnemonic / no-header).
0x57BB9C -sanitize=memcheck -g-tmem-access-checkSanitize tail, leading space included, appended only when sanitize == 1.

The sibling ptxas invocation assembles its argv differently. The prefix is "ptxas", followed by --input-as-string and the PTX text as a sized string, then --knobs-file= with the optional knobs path from the env-var registry, and finally --nv-host= with the host triple. The boundary-spec page ptxas Handoff Protocol covers the ptxas side in detail; this section sticks to how the parent assembles the nvdisasm argv.

void buildNvdisasmCmd(const TileirasProgram *p, ArgvBuilder *out) {
    argvPush(out, "nvdisasm");                                          // 0x57B6BB
    argvPush(out, "-c");                                                // 0x57B6C8
    argvPush(out, "-uumn");                                             // 0x57BB97 — 5 bytes, single token
    if (p->sanitize == 1) {
        argvPush(out, " -sanitize=memcheck -g-tmem-access-check");      // 0x57BB9C — leading space included
    }
}

With the argv vector built, sub_44A3430 forks via posix_spawn and wires stdout and stderr through pipes back to the parent. The parent drains the disassembly text from the stdout pipe via sub_6CF9C0 (the raw_ostream sink) and concatenates the captured bytes into the final ELF relocatable as a .nvdisasm section. The temporary cubin path written by the SASS dump adapter is passed as the final argv element, exactly as the previous section described.

ptxas Handoff Protocol documents the ptxas side of the boundary including the PTX text protocol and cubin return path; Host Launch ABI and ptxas Knobs covers the --knobs-file= grammar consumed by ptxas; Driver Env Vars and Runtime Gates catalogues the env-var registry that produces the optional PTX_KNOBS_PATH forwarded into the ptxas argv; Driver CLI Options documents the --sanitize=memcheck option that adds the nvdisasm sanitize tail.