Basic Blocks & Control Flow Graph
All addresses in this page apply to ptxas v13.0.88 (CUDA 13.0). Other versions will differ.
ptxas maintains a custom CFG infrastructure built entirely from scratch -- no LLVM BasicBlock, no LLVM MachineBasicBlock, no LLVM dominator framework. Basic blocks are stored in contiguous arrays, edges are stored in FNV-1a hash maps, and RPO / backedge / loop information is computed by dedicated functions in the 0xBDE000--0xBE2400 address range.
Key Facts
| Property | Value |
|---|---|
| BasicBlock object size | 136 bytes (allocated by sub_62BB00) |
| Block info entry (scheduling) | 40 bytes per entry, contiguous array |
| Block naming | bix%d (block index, 0-based integer) |
| Edge representation | FNV-1a hash map (key = block index, value = successor list) |
| RPO storage | int[] array, indexed by RPO position |
| Backedge storage | Separate FNV-1a hash map |
| CFG construction phase | Phase 3: AnalyzeControlFlow |
| Block layout phase | Phase 112: PlaceBlocksInSourceOrder |
| BB merge suppression | --dont-merge-basicblocks / -no-bb-merge CLI flag |
Two-Level Block Representation
ptxas uses two distinct but linked representations for basic blocks. The first is owned by the Code Object (used by all optimization passes); the second is owned by the scheduling/CFG analysis context (used by scheduling and post-regalloc passes).
Code Object Block Array
The Code Object stores an array of pointers to full BasicBlock objects:
| Code Object Offset | Type | Field | Description |
|---|---|---|---|
| +296 | ptr | bb_array | Array of BasicBlock* pointers (8 bytes each) |
| +304 | i32 | bb_count | Number of basic blocks |
Access pattern (from sub_78B430):
int bb_count = *(int*)(ctx + 304);
for (int i = 0; i <= bb_count; i++) {
BasicBlock* bb = *(BasicBlock**)(*(ctx + 296) + 8 * i);
int rpo = *(int*)(bb + 144);
// ...
}
Scheduling Block Info Array
The scheduling context maintains a parallel 40-byte-per-entry array:
| Scheduling Context Offset | Type | Field | Description |
|---|---|---|---|
| +976 | ptr | block_info | Contiguous array of 40-byte entries |
| +984 | i32 | num_blocks | Max block index (0-based; actual count = num_blocks + 1) |
Block Info Entry Layout (40 bytes)
| Offset | Type | Field | Description |
|---|---|---|---|
| +0 | ptr | bb_ptr | Pointer to the full BasicBlock object |
| +8 | ptr | insn_head | Pointer to the instruction list head (or sentinel) |
| +16 | u64 | reserved | Reserved / padding |
| +24 | u32 | flags | Block flags |
| +28 | i32 | bix | Block index (unique ID used in all CFG operations) |
| +32 | u64 | aux | Auxiliary data (varies by pass) |
The DOT dumper at sub_BE21D0 iterates this array with a 40-byte stride:
for (int i = 0; i <= num_blocks; i++) {
entry = *(sched_ctx + 976) + 40 * i;
int bix = *(int*)(entry + 28);
int label = *(int*)(*(ptr*)(entry + 0) + 152);
printf("bix%d(L%x)", bix, label);
}
BasicBlock Object (136 bytes)
Allocated by sub_62BB00 during the parsing/lowering phase. The parser references the string "bb-controlflow" when constructing these objects. After allocation, the 136-byte block is zeroed via memset, then individual fields are populated.
BasicBlock Field Map
| Offset | Type | Field | Description |
|---|---|---|---|
| +0 | ptr | vtable | Virtual function table pointer (or type tag) |
| +8 | ptr | insn_list | Instruction doubly-linked list head/sentinel |
| +16 | ptr | insn_tail | Instruction list tail (for O(1) append) |
| +24 | u32 | insn_count | Number of instructions in the block |
| +28 | u32 | flags_a | Block attribute flags (see below) |
| +104 | ptr | bb_next | Linked-list link to next BasicBlock in function |
| +108 | u8 | opcode_flags | Terminator opcode classification bits |
| +128 | ptr | succ_list | Linked list of successor block references |
| +136 | ptr | pred_list | Linked list of predecessor block references |
| +144 | i32 | rpo_number | Reverse post-order number (set by RPO computation) |
| +152 | i32 | label_id | Label / source line identifier (displayed as L%x in DOT) |
The insn_list at +8 is the head of a doubly-linked list. Each instruction node has a next pointer at offset +8 of the instruction object. The sentinel/end is detected by comparing the current node pointer against the tail stored in the BasicBlock or against a per-block sentinel address.
Successor/Predecessor Lists
Both succ_list (+128) and pred_list (+136) are singly-linked lists of small nodes. Each node contains:
| Offset | Type | Field |
|---|---|---|
| +0 | ptr | next pointer (NULL = end of list) |
| +8 | i32 | Block index of the referenced block |
Iteration pattern (from sub_78B430 -- LoopStructurePass):
// Walk predecessor list
PredNode* pred = *(PredNode**)(bb + 136);
while (pred) {
BasicBlock* pred_bb = *(BasicBlock**)(*(ctx + 296) + 8 * pred->bix);
int pred_rpo = *(int*)(pred_bb + 144);
// ...
pred = pred->next;
}
// Walk successor list
SuccNode* succ = *(SuccNode**)(bb + 128);
while (succ) {
BasicBlock* succ_bb = *(BasicBlock**)(*(ctx + 296) + 8 * succ->bix);
// ...
succ = succ->next;
}
CFG Edge Hash Maps
In addition to the per-block predecessor/successor linked lists, the scheduling context maintains two global FNV-1a hash maps for fast edge queries. These are the primary edge representation used by RPO computation, backedge detection, and the scheduling pass.
Successor Edge Map (Code Object +648)
Maps block index to a set of successor block indices. Used by CFG::computeRPO (sub_BDE150), CFG::printEdges (sub_BDE8B0), and CFG::buildAndAnalyze (sub_BE0690).
Backedge Map (Code Object +680)
Maps block index to the set of backedge targets. A backedge exists when block bix_src has a successor bix_dst where RPO(bix_dst) <= RPO(bix_src) -- i.e., the successor was visited before the source in the DFS traversal, indicating a loop.
FNV-1a Hash Parameters
All CFG hash lookups use identical parameters, confirmed across 50+ call sites:
| Parameter | Value |
|---|---|
| Initial hash | 0x811C9DC5 |
| FNV prime | 16777619 (0x01000193) |
| Key size | 4 bytes (block index) |
| Hash method | Byte-by-byte XOR-fold |
The hash computation for a 32-bit block index bix:
uint32_t hash = 0x811C9DC5;
hash = 16777619 * (hash ^ (bix & 0xFF));
hash = 16777619 * (hash ^ ((bix >> 8) & 0xFF));
hash = 16777619 * (hash ^ ((bix >> 16) & 0xFF));
hash = 16777619 * (hash ^ ((bix >> 24) & 0xFF));
uint32_t bucket = hash & (num_buckets - 1);
Hash Map Structure
HashMap:
+0 ptr first_free_node // Free list for node recycling
+8 ptr node_arena // Pool allocator for new nodes
+16 ptr bucket_array // Array of 24-byte bucket headers
+24 u64 num_buckets // Power of two, initial = 8
+32 i32 total_elements // Total entries across all buckets
+36 i32 num_unique_keys // Distinct keys inserted
Bucket (24 bytes):
+0 ptr head // First node in collision chain
+8 ptr tail // Last node in collision chain
+16 i32 count // Number of nodes in this bucket
Full Node (64 bytes, for edge maps):
+0 ptr next // Chain link within bucket
+8 i32 key // Block index (bix)
+12 i32 value_info // Edge count or flags
+16 ptr value_array // Pointer to sub-array of successor indices
+24 i32 value_count // Number of successors in sub-array
+32 ptr sub_hash_data // Embedded sub-hash for multi-edge blocks
+40 u64 sub_hash_size // Sub-hash capacity
+56 u32 cached_hash // Cached FNV-1a hash of key
Simple Node (16 bytes, for backedge set membership):
+0 ptr next // Chain link within bucket
+8 i32 key // Block index
+12 u32 cached_hash // Cached hash
Growth policy: rehash when total_elements > num_unique_keys (load factor > 1.0). New capacity = 4 * old_bucket_count. Hash map insert/find is implemented at sub_BDED20 (full nodes, 64 bytes) and sub_BDF480 (simple nodes, 16 bytes).
CFG Construction: Phase 3 (AnalyzeControlFlow)
AnalyzeControlFlow is phase 3 in the 159-phase optimizer pipeline. It runs immediately after the parser builds the initial instruction list and before any optimization. This phase:
- Populates the successor edge hash table at Code Object +648 by scanning the last instruction of each basic block and classifying it by opcode (see pseudocode below).
- Computes the backedge map at Code Object +680 by identifying edges where the target has a lower or equal block index position in the DFS tree.
- Builds the reverse post-order (RPO) array at Code Object +720 via iterative DFS.
- Identifies loop headers and backedges for later loop optimization passes.
The phase is critical because the Bison parser constructs basic blocks and instruction lists incrementally. AnalyzeControlFlow ensures the CFG is fully consistent and annotated before optimization begins.
Successor Edge Population Algorithm
The core of step 1 walks every basic block, finds its last instruction via the linked list at BB +8, reads the opcode at instruction+72, and inserts 0, 1, or 2 successor edges into the FNV-1a hash map at Code Object +648. The opcode determines the successor count:
| Successors | Opcodes | Behavior |
|---|---|---|
| 0 | 77 (EXIT), 72 (RET) | Terminal blocks -- no outgoing edges. EXIT terminates the thread; RET returns to caller. |
| 1 | 67 (BRA, unconditional), 71 (CALL) | Single edge. Unconditional BRA targets a single block (operand at instr+84). CALL falls through to the next block after the callee returns. |
| 2 | 67 (BRA, conditional) | Two edges: taken branch target + fall-through to the next sequential block. Conditional BRA is distinguished by having a predicate operand (bit 12 of the opcode word set). |
| N | 68 (BRX) | Multi-way indirect branch (switch lowering). The target count is read from the edge hash table's sub-hash at the BRX instruction's jump table operand. Each jump table entry produces one successor edge. |
| 1 | all other opcodes | Fall-through to the next sequential block. Non-control-flow terminators (the block was split at this point by an earlier pass or the parser) get a single fall-through edge to bb_array[bix + 1]. |
function populateSuccessorEdges(code_obj):
succ_map = &code_obj[+648] // FNV-1a hash map (64-byte nodes)
bb_array = code_obj[+296] // array of BasicBlock pointers
bb_count = code_obj[+304]
for bix in 0 .. bb_count-1:
bb = bb_array[bix]
last_insn = findLastInstruction(bb[+8]) // walk linked list to tail
opcode = *(u32*)(last_insn + 72) & 0xCFFF // mask modifier bits
if opcode == 77 or opcode == 72: // EXIT, RET
// 0 successors -- insert empty entry
hashMapInsert(succ_map, bix, edge_count=0, targets={})
else if opcode == 67: // BRA
target_bix = resolveTarget(last_insn) // operand -> block index
if isConditional(last_insn): // predicate present
// 2 successors: taken + fall-through
fall_bix = bix + 1
hashMapInsert(succ_map, bix, edge_count=2,
targets={target_bix, fall_bix})
else:
// 1 successor: unconditional branch
hashMapInsert(succ_map, bix, edge_count=1,
targets={target_bix})
else if opcode == 68: // BRX (indirect)
jtab = getJumpTable(last_insn)
for each entry in jtab:
hashMapInsert(succ_map, bix, sub_hash, entry.target_bix)
else if opcode == 71: // CALL
// 1 successor: fall-through after return
hashMapInsert(succ_map, bix, edge_count=1,
targets={bix + 1})
else:
// non-terminator: fall-through to next block
if bix + 1 < bb_count:
hashMapInsert(succ_map, bix, edge_count=1,
targets={bix + 1})
The resolveTarget function reads the branch target operand at instr+84 (first operand slot), which holds a block index assigned during parsing. The isConditional test checks whether the opcode word has modifier bits indicating a predicate guard -- specifically (*(u32*)(instr+72) >> 12) & 3 is non-zero for predicated branches. For BRX (opcode 68), the multi-edge sub-hash within the 64-byte hash node stores the full set of jump table targets (see Hash Table Node Layout above).
Phase 6: SetControlFlowOpLastInBB
Phase 6 enforces a structural invariant: control flow operations must be the last instruction in their basic block. If a branch, jump, return, or exit instruction is followed by other instructions in the same block (which can happen during lowering passes), this phase splits the block at the control-flow instruction. New basic block entries are allocated and the instruction linked list is rewritten.
This invariant is required by all downstream passes -- the scheduler and register allocator assume that only the final instruction in a block can be a control-flow transfer.
Reverse Post-Order (RPO) Computation
RPO is computed by sub_BDE150 (CFG::computeRPO), a 9KB function that implements iterative DFS using an explicit stack.
RPO Storage
| Code Object Offset | Type | Field |
|---|---|---|
| +720 | ptr | rpo_array -- int*, indexed by RPO position |
| +728 | i32 | rpo_size -- number of entries used |
| +732 | i32 | rpo_capacity -- allocated capacity |
The array is resized with the standard ptxas growth policy: new_capacity = old + (old + 1) / 2, with a minimum of num_blocks + 1. Growth is implemented in sub_BDFB10.
Algorithm
The RPO computation uses iterative DFS with post-order numbering. Two bit-arrays
cooperate: need_expand marks blocks whose successors have not yet been pushed;
in_stack prevents the same block from being pushed twice (which would happen at
diamond-shaped joins without the guard).
function computeRPO(cfg, entry_block):
stack = [entry_block] // Explicit stack at offset +88..+100
need_expand = BitArray(num_blocks) // +16 -- 1 = unvisited, cleared on expand
in_stack = BitArray(num_blocks) // +40 -- 1 = currently on stack
counter = num_blocks // Decremented as blocks complete
while stack is not empty:
bix = stack.top()
if need_expand[bix]: // First visit -- push successors
need_expand[bix] = 0
for each successor s of bix (via hash map lookup):
if need_expand[s] and not in_stack[s]:
in_stack[s] = 1
stack.push(s)
continue
stack.pop()
if in_stack[bix]: // Canonical entry -- assign RPO
in_stack[bix] = 0
rpo_number[bix] = counter // *(cfg+64)[bix] = counter
rpo_array[counter] = bix // *(*(cfg+720))[counter] = bix
counter--
// else: duplicate stack entry -- silently discard
return counter // Should be -1 if all blocks reachable
The key assignment line from the decompilation:
*(_DWORD *)(*(_QWORD *)(a1 + 64) + 4 * v16) = *a3; // rpo_number[bix] = counter
*(_DWORD *)(*(_QWORD *)(*(_QWORD *)a1 + 720) + 4 * (*a3)--) = v16; // rpo_array[counter--] = bix
After completion, rpo_array[0] is the entry block, and rpo_array[num_blocks] is the deepest post-dominator (typically the EXIT block).
RPO Debug Dump
sub_BDEA50 (CFG::dumpRPOAndBackedges) prints the RPO state:
Showing RPO state for each basic block:
bix0 -> RPONum: 0
bix1 -> RPONum: 1
bix2 -> RPONum: 3
bix3 -> RPONum: 2
RPO traversal order: [0, 1, 3, 2]
Showing backedge info:
bix2 -> backedge's successor BB: 1
This output is gated by option flag #24 at offset +1728 relative to the options manager.
Backedge Detection and Loop Identification
Backedges are identified during CFG::buildAndAnalyze (sub_BE0690, 54KB). A backedge from block src to block dst exists when dst has already been visited in the DFS traversal (i.e., dst has a smaller or equal RPO number than src). Backedges are stored in the hash map at Code Object +680.
Natural Loop Detection
The LoopStructurePass (sub_78B430) combines RPO numbering with backedge analysis to identify natural loops:
- Calls
sub_781F80(BasicBlockAnalysis) to compute RPO numbers, loop detection, and block flags. - Iterates the
bb_arrayat Code Object +296. - For each block, checks three conditions to identify a loop header:
rpo_number(+144) is non-zero,rpo_numberequals the value at +152 (loop-exit RPO marker), and- the block's first instruction has a terminator-class opcode:
opcode & 0xFFFFFFFD == 0x5D. The mask0xFFFFFFFDclears bit 1, so this matches exactly two Ori opcodes: 93 (OUT_FINAL) and 95 (STS). These are NOTBRA(opcode 67 =0x43). In the Ori IR, opcodes 93 and 95 are repurposed as control-flow terminator markers -- their ROT13 names reflect SASS mnemonics, but in loop analysis they denote conditional-exit and unconditional-exit block terminators respectively.
- Walks the predecessor list (+136) to find the preheader -- the predecessor whose RPO number is the smallest among those less than the header's RPO.
- Walks the successor list (+128) to find the loop exit -- the successor whose RPO number is the largest among those less than the preheader's RPO.
The same masked opcode check (& 0xFFFFFFFD == 0x5D) is applied a second time to the latch block's last instruction (at 0x78b6b4 in the binary). Only if both the header and the latch pass the terminator-class test does the transformation proceed.
The RPO range [header_rpo, exit_rpo] defines the set of blocks belonging to the loop body. A block with header_rpo <= block_rpo <= exit_rpo is inside the loop.
LoopMakeSingleEntry Transformation
If a natural loop has multiple entry points, sub_78B430 transforms it into a single-entry loop. This is gated by:
- The
LoopMakeSingleEntrypass-disable check (viasub_799250) - Knob 487 (queried via the knob vtable at +152)
After the latch passes the terminator check, the raw opcode (without the bit-1 mask) selects the code path:
- Opcode == 93 (exact): Calls
sub_9253C0to rewrite the branch target directly. This handles the unconditional-exit case. - Opcode == 95: Calls
sub_748BF0to insert a new preheader block and redirect edges. Then rewrites the latch's operand encoding with0x40000000(taken target) and0x60000000(fall-through target) flags, usingsm_backend->vtable[79](offset+632) to resolve the new target address.
After transformation, sub_931920 splits blocks and updates the instruction list. A post-split check (cmp dword [rdx+48h], 5Dh at 0x78b7e0) tests whether the newly created block ends with opcode 93; if so, no further splitting is needed. Otherwise, a second sub_931920 + sub_748BF0 pass inserts an additional block and emits opcode 93 via sub_92E1B0 as a convergence barrier.
Dominance
Correction: The earlier attribution of dominance to sub_BE2330 was wrong. Decompilation of sub_BE2330 (851 bytes, 0xbe2330-0xbe2683) shows it is an instruction operand address resolver -- it switches on opcode values (7, 8, 10, 51, 137, 144, 269) and computes branch-target address offsets relative to 4 * ctx->pc_scale + instr->offset. It has no loop structure, no bitvector operations, and no predecessor iteration.
Dominance information is computed as a byproduct of the RPO/backedge analysis in sub_BDFB10 (24KB, CFG::buildRPOAndIdom). The algorithm is the Cooper-Harvey-Kennedy (CHK) immediate-dominator algorithm, not bitvector-based iterative dataflow. Evidence from the decompilation:
-
sub_BDFB10allocates two parallel arrays at CodeObject offsets +720 (rpo_order[]) and +744 (idom[]), both sized tonum_blocks + 1, with grow-by-1.5x reallocation (lines 146--260 of the decompiled output). -
DFS-based RPO is computed first by calling either
sub_BDE6C0(recursive DFS) orsub_BDE150(iterative DFS with explicit stack). The choice is gated by a knob query at CodeObject vtable slot +72 with argument 7. -
Immediate dominator via RPO-number intersection: The core loop (at
0xbdff30in the binary) iterates over backedge successors from the FNV-1a hash map and computes the idom by comparing RPO numbers:
if rpo_num[pred] <= rpo_num[current_block]:
intersect(pred, current_block) // walk up idom chains
This comparison (*(idom_array + pred) <= *(idom_array + block) at decompiled line 459) is the CHK intersect fingerprint -- two fingers walk up the dominator tree until they meet.
- No bitvector operations exist in either function. No
memset(ptr, 0xFF, ...)initialization, no word-level AND loops, noword_count/bit_countfields. The speculative bitvector layout table from the earlier version had no binary evidence.
sub_781F80 (BasicBlockAnalysis) consumes the idom results rather than computing them. During its linear instruction walk, it stores the first predecessor's RPO number at BB+152 for loop-header detection and sets dominance-related flags in the BB+280 bitfield (0x10 = loop header, 0x20 = has predecessor, 0x800000 = in-loop).
Block Layout: Phase 112 (PlaceBlocksInSourceOrder)
Phase 112 (PlaceBlocksInSourceOrder) runs in the post-scheduling stage of the pipeline, after register allocation and before Mercury encoding. It reorders the basic block array to restore source-order layout.
The implementation at sub_A92C50 (3.5KB binary, ~19KB decompiled) manipulates linked list structures and uses hash table lookups to reorder blocks. The goal is to minimize branch distances in the final SASS output -- placing fall-through successors immediately after their predecessors.
Hot/Cold Block Layout
Two companion phases handle hot/cold partitioning:
| Phase | Name | Purpose |
|---|---|---|
| 108 | OptimizeHotColdInLoop | Moves cold blocks out of loop bodies |
| 109 | OptimizeHotColdFlow | Global hot/cold block separation |
Cold blocks (e.g., error handlers, unlikely branches, assert paths) are moved to the end of the function's block sequence. The MarkAdditionalColdBlocks pass marks blocks as cold based on heuristics. This separation improves instruction cache utilization on the GPU's SM instruction fetch unit.
BB Merge Suppression
The --dont-merge-basicblocks (alias -no-bb-merge) CLI flag prevents the optimizer from merging consecutive basic blocks. This is used for debuggable code -- without it, the debugger cannot set breakpoints at the original source line boundaries. The flag is documented in the binary as:
"Normally, ptxas attempts to merge consecutive basic blocks as part of its optization process. However, for debuggable code this is very confusing. This option prevents basic block merging, at a slight perfomance cost."
(Note: "optization" and "perfomance" are typos in the original binary string.)
Entry and Exit Blocks
Block index 0 (bix0) is always the function entry block. It is the first element in the bb_array and the root of the RPO traversal. The entry block has no predecessors (its predecessor list at +136 is NULL).
The exit block is the block containing the EXIT instruction (opcode 77 = EXIT in the ROT13 name table). For functions with multiple exit points, each EXIT-containing block is a CFG sink. The RPO computation assigns these the highest RPO numbers. The SetControlFlowOpLastInBB phase (phase 6) ensures each EXIT is the final instruction in its block.
The CFG::buildAndAnalyze function (sub_BE0690) also builds a parallel edge representation for the scheduling-level block_info array at Code Object +976. This representation uses a compressed block-type field at block_info[i]+28 (read as a 16-bit value from the first instruction's +28 field): type 4 = unconditional branch (1 successor), type 7 = conditional branch (2 successors via sub-hash iteration). Opcodes 93 (OUT_FINAL) and 95 (STS) serve as control-flow boundary markers in loop analysis (see Natural Loop Detection below). The authoritative Ori IR opcode-to-successor mapping is described in the Successor Edge Population Algorithm pseudocode above.
CFG Update Protocol
Passes that modify the CFG (block splitting, merging, edge redirection) must maintain consistency across several data structures:
- Block array -- both the Code Object
bb_array(+296) and the schedulingblock_info(+976) must be updated. - Predecessor/successor linked lists -- the per-block lists at +128 and +136 must reflect the new edges.
- Edge hash maps -- the successor map (+648) and backedge map (+680) must be invalidated or updated.
- RPO array -- the RPO order at +720 must be recomputed after structural changes.
- Block count -- both
bb_count(+304) andnum_blocks(+984) must be incremented.
The general pattern observed in sub_931920 (block splitter called from sub_78B430):
function splitBlock(ctx, bb, split_point):
new_bb = allocateBasicBlock()
// Move instructions after split_point to new_bb
new_bb->insn_list = split_point->next
bb->insn_list_tail = split_point
split_point->next = sentinel
// Transfer successors from bb to new_bb
new_bb->succ_list = bb->succ_list
bb->succ_list = new_node(new_bb->bix)
// Update predecessor lists of old successors
for each succ in new_bb->succ_list:
replace bb in succ->pred_list with new_bb
// new_bb's only predecessor is bb
new_bb->pred_list = new_node(bb->bix)
// Invalidate and recompute RPO
ctx->bb_count++
recomputeRPO(ctx)
The AnalyzeControlFlow phase (phase 3) is explicitly re-run or incrementally updated after phases that modify the CFG structure. The phase pipeline contains multiple OriPerformLiveDead and GeneralOptimize passes that may rebuild portions of the CFG.
Key CFG Functions
| Address | Size | Identity | Confidence |
|---|---|---|---|
sub_62BB00 | 16.5KB | BasicBlock::allocate -- allocates 136-byte block, initializes fields | HIGH |
sub_781F80 | 12KB | BasicBlockAnalysis -- block walk, loop detection, flag computation | HIGH |
sub_78B430 | 1.2KB | LoopStructurePass -- single-entry loop transformation | HIGH |
sub_BDE150 | 9KB | CFG::computeRPO -- iterative DFS with explicit stack | HIGH |
sub_BDE6C0 | 3KB | CFG::computeRPO_recursive -- recursive DFS RPO traversal | HIGH |
sub_BDE8B0 | 2KB | CFG::printEdges -- prints "bix%d -> bix%d\n" | HIGH |
sub_BDEA50 | 4KB | CFG::dumpRPOAndBackedges -- RPO + backedge debug dump | HIGH |
sub_BDED20 | 12KB | HashMap::insertOrFind -- full 64-byte node insert | HIGH |
sub_BDF480 | 10KB | HashMap::insertOrFind_simple -- 16-byte node insert | HIGH |
sub_BDFB10 | 24KB | CFG::buildRPOAndIdom -- RPO computation + CHK idom algorithm | HIGH |
sub_BE0690 | 54KB | CFG::buildAndAnalyze -- master CFG builder | HIGH |
sub_BE21D0 | 1.4KB | CFG::dumpDOT -- Graphviz DOT format output | HIGH |
sub_BE2330 | 851B | Instruction operand address resolver -- opcode switch + offset calc | HIGH |
sub_A92C50 | 3.5KB | PlaceBlocksInSourceOrder -- block reordering (phase 112) | MEDIUM |
CFG Visualization
The CFG::dumpDOT function (sub_BE21D0) generates Graphviz DOT output when option flag #20 is enabled (offset +1440 from the options manager). The output format:
digraph f {
node [fontname="Courier",fontsize=10,shape=Mrecord];
"bix0"
[label="bix0(L0)"]
bix0 -> bix1
bix0 -> bix3
"bix1"
[label="bix1(L10)"]
bix1 -> bix2
"bix2"
[label="bix2(L20)"]
bix2 -> bix1
bix2 -> bix3
"bix3"
[label="bix3(L30)"]
}
Where L%x is the label identifier at BasicBlock +152. This can be converted to a visual graph with dot -Tpng.
If option flag #24 is also enabled (offset +1728), the RPO and backedge dump from sub_BDEA50 is appended.
Related Pages
- Ori IR Overview -- Code Object layout, instruction format, register files
- Instructions -- instruction format and opcode details
- Data Structures -- FNV-1a hash maps, bitvectors, linked lists
- Optimizer Pipeline -- the 159-phase pipeline including CFG phases
- Branch & Switch Optimization -- OriBranchOpt pass
- Loop Optimization -- OriLoopSimplification, LoopUnrolling
- Hot/Cold Partitioning -- OptimizeHotColdFlow, MarkAdditionalColdBlocks