AI-GENERATED REVERSE-ENGINEERING NOTES — AUTHOR'S PERSONAL REFERENCE ONLY. EVERYTHING HERE IS A BEST-GUESS RECONSTRUCTION, NOT A RELIABLE SOURCE.
crucible-notes
Reverse engineering reference for NVIDIA's CUDA compiler toolchain
Compiler Internals
| Component |
Binary |
Documentation |
Status |
| cicc |
CUDA C→PTX compiler, 60 MB, LLVM 20.0.0 + EDG 6.6 |
wiki |
Live |
| tileiras |
Cuda Tile IR optimizing assembler, 88 MB, MLIR bytecode → TileAS → PTX/SASS, 143 pages |
wiki |
Live |
| cudafe++ |
CUDA C++ frontend, 8.5 MB, EDG 6.6, 6,483 functions, 69 pages |
wiki |
Live |
| ptxas |
PTX→SASS assembler, 37.7 MB, proprietary (no LLVM), 159-phase pipeline |
wiki |
Live |
| nvcc |
CUDA compilation driver |
— |
Planned |
| nvlink |
CUDA device linker, 37 MB (95% embedded ptxas), 40,532 functions |
wiki |
Live |
| nvptxcompiler |
PTX JIT compilation library |
— |
Planned |
ML Accelerator Compilers
| Component |
Binary |
Documentation |
Status |
| libtpu |
Google TPU PJRT plugin, 745 MB stripped ELF — 6 silicon generations, LLO VLIW ISA, TensorCore/SparseCore cost model |
wiki |
Live |
Tools
| Tool |
Description |
Documentation |
Status |
| fatbin |
Fat binary manipulation toolkit — dump, unpack, extract PTX, repack with ZSTD (1–22) |
readme |
Released |
Methodology
All analysis is from static reverse engineering of stripped x86-64 ELF binaries using IDA Pro 9.x. No source code or any other restricted or copyrighted material was used — all findings derive solely from analysis of compiled binaries distributed as part of the publicly available CUDA Toolkit.