Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Payload: SparseCore Band

All addresses, widths, and CHECK constants on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d). The binary is not stripped — full C++ symbols are present, and .text VMA equals file offset. Other versions will differ.

Abstract

The SparseCore (SC) trace band is the on-device profiler's only view of SparseCore execution progress: the per-event payload field maps for the 18 hardware trace events emitted by the SparseCore sequencer (SCS), the TensorE/TensorA compute tiles (TEC/TAC), and the SC stream gather/scatter engine. SparseCore is the sparse-compute substrate that replaced BarnaCore on the Viperfish/Ghostlite/6acc60406 line; its instructions, tasks, streams, and inter-tile messages each pack into the same fixed 16-byte profiler packet as every other band, behind the same 61-bit framing+header envelope, and are read out by an anonymous-namespace DecodeSc<Name>(string_view, bool*, TraceEntry*) per event per chip family.

This page owns two things the codec page deliberately does not: (1) the per-event payload bit-decode — the ordered GetBits64 width sequence after bit 61, the total-bit CHECK, the packet count, and the named proto fields with per-field widths — byte-exact for the three SC-bearing families vfc (Viperfish), glc (Ghostlite), gfc (6acc60406); and (2) the on-wire-id → trace-point map — the dense trace_point_id (108..133) the hardware stamps for each SC event, which is distinct from the proto oneof field number the decoder writes. The SC band lives in the high id band reached only through the newer codec's two-level dispatch; tracepoints-master-registry.md owns the full cross-band registry, this page owns the SC sub-range.

A structural fact frames everything: the SC trace band exists only on SparseCore-bearing silicon. pxc (Pufferfish, BarnaCore generation) and vlc (Viperfish-lite) emit zero Sc* proto messages and have zero DecodeSc* functions. The events instrument the SCS sequencer, the TEC/TAC tiles, and the stream gather/scatter datapath; the SyncFlagCoreType/DestCoreType TEC_OR_SCS/TAC selectors are the sequencer-type enum surfaced on the wire.

For reimplementation, the contract is:

  • The 18 SC events and their wire shapes — 11 ScInstruction*, 2 ScTask*, 3 ScStream*, 2 ScMessage* — each as an ordered post-bit-61 width sequence + total-bit CHECK + packet count, per family.
  • The named proto field maps — which width is scs_pc, tile_bitmap, tec_sync_stalls, stream_opcode, smem_address, etc., descriptor-confirmed.
  • The SC selector enum value tablesStreamOpcode, SyncFlagCoreType, TileLocal/OffTile MemoryType/StreamType, IndirectListType, DestCoreType, MsgType, Opcode — byte-exact named values, including the per-gen StreamOpcode 3→4-bit growth.
  • The on-wire-id → event map — the dense SC sub-range (108..133), proven distinct from the proto oneof field, recovered from the high-id-band secondary jump table.
OwnsSC-band per-event payload field-bit maps + the SC on-wire-id table
Codec / framingtrace-entries-coder.md — 16-byte packet, 2-bit framing, 61-bit header, GetBits64/SkipBits primitives, CHECK mechanism
SC-bearing familiesvfc (Viperfish), glc (Ghostlite), gfc (6acc60406) — 18 DecodeSc* each
No SC bandpxc (Pufferfish/BarnaCore), vlc (Viperfish-lite) — 0 DecodeSc*, 0 Sc* proto messages
SC event count18 — 11 ScInstruction* + 2 ScTask* + 3 ScStream* + 2 ScMessage*
SC on-wire id range108..132 (vfc/glc), 108..133 (gfc — a StatsCounter sample shifts ScMessage up by 1)
vfc decodersDecodeScInstructionCoreInterrupt @ 0xf60a8c0DecodeScMessageInboundInternalMessage @ 0xf60eb00
glc decoders0xf63c7600xf6409a0
gfc decoders0xf6725a00xf676ae0
Descriptor pooltrace_entries.proto FileDescriptorProto: vfc @ 0xbf06830, glc @ 0xbf41210, gfc @ 0xbf64c80 (named fields + nested *Values enums)
Bit primitivesGetBits64NoInline @ 0x21073760, SkipBitsNoInline @ 0x21073580, mask_ @ 0xbe79440

Reading a SC Payload — the Common Shape

Every SC decoder follows the generic Decode<Name> contract: gate on length > 15, SkipBits(2) past the framing, run the per-family DecodeTraceHeader (the 59-bit id/block_id/timestamp, so framing+header = 61 bits), optionally read a TraceIdHeader, stamp the dense oneof field at TraceEntry+0x28, then read the typed payload with one GetBits64(n) per field, and finally CHECK(BitsDecoded() == K). The width sequences below are the payload bits — what is read after bit 61 (and after the TraceIdHeader when one is present). The total-bit CHECK is the whole packet: CHECK = 61 + [TraceIdHeader] + Σ payload widths.

 bit window of an SC packet
 ┌──────────┬───────────────────────────────┬──────────────────┬───────────────────────┐
 │ framing  │  TraceHeader (id/block/ts)     │ TraceIdHeader?   │ SC payload (this page)│
 │ 2 bits   │  59 bits  →  bit 61            │ 38 bits (msg)    │ Σ widths              │
 └──────────┴───────────────────────────────┴──────────────────┴───────────────────────┘
   valid,started                                 only ScMessage*       per-event field set

NOTE — the SC TraceIdHeader is 21,3,14 (= 38 bits), not the 21,3,12 (= 36 bits) of the UHI/OCI/ICI bands — the chip_id field is 14 bits here, matching the newer-gen 14-bit chip_id of the vfc/glc/gfc header. Only the two ScMessage* events carry it; the ScInstruction*/ScTask*/ScStream* events carry no TraceIdHeader (the core is implied by trace_point_block_id in the 59-bit header). Byte-confirmed: DecodeScMessageOutboundInternalMessage @ 0xf60e6c0 opens with GetBits64(21), GetBits64(3), GetBits64(14).

Two SC events (ScTaskCommitOnSct, ScMessage*) exceed one 16-byte packet and are 2-packet events. They carry a mid-stream CHECK == 128 at the packet-1 boundary (movq $0x20,0x8(%rbx) records 32 bytes consumed) in addition to the final total-bit CHECK. The fields whose bits straddle that 128-bit boundary are split across two GetBits64 calls — the 7,1,1,9 group in ScTaskCommitOnSct and the 4,1,1,10 group in ScMessage* (see the per-event notes).

GOTCHA — the two single-bit GetBits64(1) calls inside those straddle groups ({7, 1, 1, 9} and {4, 1, 1, 10}) are not boolean proto fields — they are the two halves of the packet-boundary straddle of a single multi-bit field (tec_sync_stalls 7+9, smem_address 4+10). A reimplementation that maps each GetBits64 one-to-one onto a proto field will invent two phantom bools and split one real counter. The width fragments and the +0xNN destination offset are confirmed; the exact LSB position each fragment occupies in the reassembled value is not tabulated (LOW — same open item as the UHI/OCI straddle fields).


SC_INSTRUCTION_* Band — the Sequencer Control Events (11 events)

Purpose

The SparseCore sequencer's internal control band — the SC analogue of the TensorCore TcsInternal* band. These 11 events fire when an SC sequencer primitive executes: core interrupt, trace marking, and the fence/sync/barrier/sync-watch start/stop pairs. They carry no TraceIdHeader and all 11 share one wire shape per gen — they differ only by which primitive fired (i.e. only by trace_point_id and oneof field), not by payload layout.

Algorithm

// DecodeScInstructionCoreInterrupt @0xf60a8c0 (vfc) — representative of all 11
function DecodeScInstruction<Prim>(view, started_out, entry):
    if view.length <= 0xf: return error
    BitDecoder dec(view); SkipBits(dec, 2);          // framing
    DecodeTraceHeader(entry, dec);                    // id/block/timestamp → bit 61
    entry[+0x28] = <oneof field>;                     // vfc 75..85 / glc 67..77 / gfc 66..76
    GetBits64(dec, 32, &data);                        // data      (uint32)
    GetBits64(dec,  1, &done);                        // done      (bool)
    GetBits64(dec,  6, &extra_id);                    // extra_id  (uint32, 6-bit field)
    GetBits64(dec, 13, &index);                       // index     (uint32, 13-bit field)
    GetBits64(dec, 14, &pc);                          // pc        (uint32, 14-bit field)
    CHECK(BitsDecoded() == 127);                       // movl $0x4b,0x28 stamps oneof 75
    *bytes_consumed = 0x10;

Field Map

Proto fields (descriptor-confirmed, identical for all 11 events and all 3 gens): data(uint32), done(bool), extra_id(uint32), index(uint32), pc(uint32).

FieldWidthMeaningConfidence
data32sequencer-primitive operand / payload word (e.g. tracemark value, interrupt cause)HIGH
done1primitive-complete flagHIGH
extra_id6per-event correlation tag (the SC band's local "sub-id")HIGH
index13sequencer slot / sync-flag index the primitive targetsMEDIUM
pc14SCS program counter at issueHIGH

Per-Gen Width Table

Genpayload widths (no TraceIdHeader)CHECKpktsoneof base
vfc32,1,6,13,14127175
glc32,1,6,13,14127167
gfc32,1,6,13,14127166

66 payload bits + 61 frame+header = 127. Byte-identical across all three SC gens — only the oneof base differs (the wire shape and CHECK are invariant). The 11 primitives, in oneof / on-wire-id order: CoreInterrupt, SetTracemark, TraceInstruction, SfenceStart, SfenceStop, SyncStart, SyncStop, BarrierStart, BarrierStop, SyncWatchStart, SyncWatchStop.

CONFIRMED (SC-1) — DecodeScInstructionCoreInterrupt @ 0xf60a8c0 decompiles to exactly SkipBits(2); DecodeTraceHeader; GetBits64(32); GetBits64(1); GetBits64(6); GetBits64(13); GetBits64(14); CHECK(==127), with movl $0x4b,0x28 (= oneof field 75). All 11 share this shape (0xf60a8c0..0xf60bcc0, oneof 75..85 on vfc).


SC_TASK_* Band — Issue and Commit (2 events)

Purpose

The SparseCore task lifecycle: a task is issued from the SCS (the SparseCore scalar sequencer) and later committed on the SCT (the SparseCore tile) with rich per-engine stall accounting. ScTaskCommitOnSct is the dominant SC progress record — gather/scatter/sort/dedup throughput is read out through its stall and word counters. Neither carries a TraceIdHeader.

ScTaskIssueFromScs — the dispatch record

// DecodeScTaskIssueFromScs @0xf60bec0 (vfc)
function DecodeScTaskIssueFromScs(view, started_out, entry):
    ... SkipBits(2); DecodeTraceHeader; entry[+0x28] = 86; ...   // glc 78 / gfc 77
    GetBits64(dec, 13, &scs_pc);       // SCS program counter
    GetBits64(dec,  8, &tag);          // task tag
    GetBits64(dec, 14, &tec_pc);       // TensorE-Compute engine PC
    GetBits64(dec, 14, &tac_pc);       // TensorA-Compute engine PC
    GetBits64(dec, 16, &tile_bitmap);  // SMEM-bitmap: which tiles the task spans
    CHECK(BitsDecoded() == 126);
FieldWidthMeaningConfidence
scs_pc13SCS program counter at task issueHIGH
tag8task correlation tag (matched against ScTaskCommitOnSct.tag)HIGH
tec_pc14TEC program counterHIGH
tac_pc14TAC program counterHIGH
tile_bitmap16SMEM-bitmap iteration field — the set of tiles the task spansHIGH
GenwidthsCHECKpktsoneof
vfc13,8,14,14,16126186
glc13,8,14,14,16126178
gfc13,8,14,14,16126177

65 payload bits + 61 = 126. Byte-identical across gens.

ScTaskCommitOnSct — the progress record (RESTRUCTURES on gfc)

A 2-packet event with a mid-stream packet-1 boundary CHECK == 128. The shape restructures on gfc (6acc60406): the separate TEC/TAC stall accounting is collapsed and a load-store-unit hold-stall counter is added.

// DecodeScTaskCommitOnSct @0xf60c0c0 (vfc) — 2-packet, mid CHECK 128, final CHECK 251
function DecodeScTaskCommitOnSct(view, started_out, entry):
    ... SkipBits(2); DecodeTraceHeader; entry[+0x28] = 87; ...   // glc 79 / gfc 78
    GetBits64(dec,  8, &tag);             // task tag
    GetBits64(dec,  4, &extra_id);        // correlation sub-id
    GetBits64(dec, 32, &total_cycles);    // task wall cycles
    GetBits64(dec, 16, &tec_ibuf_stalls);
    GetBits64(dec,  7, &tec_sync_stalls_lo);   // ── packet-boundary straddle ──
    CHECK(BitsDecoded() == 128);          // packet-1 boundary; movq $0x20 (32 B consumed)
    GetBits64(dec,  1, &straddle_a);      // (high half of tec_sync_stalls)
    GetBits64(dec,  1, &straddle_b);      //  └ the 7+9 split spans the 16-byte boundary
    GetBits64(dec,  9, &tec_sync_stalls_hi);
    GetBits64(dec, 16, &tec_hold_stalls);
    GetBits64(dec, 16, &tac_ibuf_stalls);      // vfc/glc only
    GetBits64(dec, 16, &tac_sync_stalls);      // vfc/glc only
    GetBits64(dec, 16, &tac_hold_stalls);      // vfc/glc only
    GetBits64(dec, 16, &num_spmem_words);
    GetBits64(dec, 32, &num_hbm_words);
    CHECK(BitsDecoded() == 251);
Field (vfc/glc)WidthMeaningConfidence
tag8task tag (matches ScTaskIssueFromScs.tag)HIGH
extra_id4correlation sub-idHIGH
total_cycles32task duration in device cyclesHIGH
tec_ibuf_stalls16TEC instruction-buffer stall cyclesHIGH
tec_sync_stalls7+9 (straddle)TEC sync-flag wait cycles (spans the packet boundary)HIGH
tec_hold_stalls16TEC hold/backpressure stall cyclesHIGH
tac_ibuf_stalls16TAC instruction-buffer stall cyclesHIGH
tac_sync_stalls16TAC sync-flag wait cyclesHIGH
tac_hold_stalls16TAC hold/backpressure stall cyclesHIGH
num_spmem_words16SPMEM word throughput of the taskHIGH
num_hbm_words32HBM word throughput of the taskHIGH
Genwidthsmid CHECKfinal CHECKpktsproto fields
vfc8,4,32,16,7,1,1,9,16,16,16,16,16,32128251211 (TEC+TAC triples)
glc8,4,32,16,7,1,1,9,16,16,16,16,16,32128251211 (identical)
gfc8,4,32,16,7,1,1,9,16,16,32,1612821929 (TAC dropped, lsu_hold_stalls added)

vfc/glc payload = 190 bits (251−61); gfc payload = 158 bits (219−61).

CONFIRMED (SC-2) — the gfc restructure is byte-exact. After the 7,1,1,9 straddle of tec_sync_stalls, gfc reads only 16,16,32,16 = tec_hold_stalls(16), num_spmem_words(16), num_hbm_words(32), lsu_hold_stalls(16) — the three TAC counters (tac_ibuf/sync/hold_stalls) are gone, replaced by a single lsu_hold_stalls(16). Read from DecodeScTaskCommitOnSct @ 0xf673da0 (gfc): mid CHECK == 128, final CHECK == 219. Whether this is a true 6acc60406 (v7x) microarch change (a unified SC tile vs separate TEC/TAC) or a profiler schema rev was not cross-checked against the SC ISA (LOW on the cause; the wire shape is CERTAIN).


SC_STREAM_* Band — Gather/Scatter and Progress (3 events)

Purpose

The SparseCore stream gather/scatter engine band. The stream issue event carries the StreamOpcode (GATHER/SCATTER family) selector plus the tile-local/off-tile memory and stream-type selectors — it is the on-wire image of an SC stream dispatch. The two progress events (Xbar = crossbar lane, Cmn = memory-network lane) carry the per-progress sync-flag bump. None carry a TraceIdHeader.

ScStreamIssueFromCore — the gather/scatter dispatch (WIDENS on glc/gfc)

13 proto fields, 13 widths (one per field). The stream_opcode width grows 3→4 bits on the newer gens because the StreamOpcode enum gains the half-width-accumulate variants.

// DecodeScStreamIssueFromCore @0xf60c460 (vfc)
function DecodeScStreamIssueFromCore(view, started_out, entry):
    ... SkipBits(2); DecodeTraceHeader; entry[+0x28] = 88; ...   // glc 80 / gfc 79
    GetBits64(dec, 14, &pc);
    GetBits64(dec,  6, &extra_id);
    GetBits64(dec,  5, &sync_flag_id);
    GetBits64(dec,  1, &sync_flag_core_type);    // enum SyncFlagCoreType
    GetBits64(dec,  3, &stream_opcode);          // enum StreamOpcode (vfc 3-bit; glc/gfc 4-bit)
    GetBits64(dec,  1, &tile_local_memory_type); // enum
    GetBits64(dec,  3, &off_tile_memory_type);   // enum
    GetBits64(dec,  1, &tile_local_stream_type); // enum
    GetBits64(dec,  2, &off_tile_stream_type);   // enum
    GetBits64(dec,  1, &set_done_bit);           // bool
    GetBits64(dec,  1, &sync_flag_count_type);   // bool
    GetBits64(dec,  1, &indirect_list_type);     // enum
    GetBits64(dec, 18, &length_in_4B);           // uint32 (glc re-budgets to 17)
    CHECK(BitsDecoded() == 118);
Fieldvfc widthMeaningConfidence
pc14SC core PC at stream issueHIGH
extra_id6correlation sub-idHIGH
sync_flag_id5sync-flag the stream signals on completionHIGH
sync_flag_core_type1target core class — TEC_OR_SCS/TAC enumHIGH
stream_opcode3 (vfc) / 4 (glc,gfc)GATHER/SCATTER family — see enum tableCERTAIN
tile_local_memory_type1SMEM/TILESPMEM enumHIGH
off_tile_memory_type3SPMEM/TILESPMEMN/HBM/HBM4B enumHIGH
tile_local_stream_type1LINEAR/CIRCULARBUFFER enumHIGH
off_tile_stream_type2LINEAR/STRIDED/INDIRECT/INDIRECTVREG enumHIGH
set_done_bit1set the sync flag's done bit on completionHIGH
sync_flag_count_type1count-mode of the sync-flag bumpMEDIUM
indirect_list_type1WORD/ROW indirection granularity enumHIGH
length_in_4B18 (vfc,gfc) / 17 (glc)stream length in 4-byte wordsHIGH
GenwidthsCHECKpktsstream_opcodelength_in_4B
vfc14,6,5,1,3,1,3,1,2,1,1,1,1811813-bit (8 values)18
glc14,6,5,1,4,1,3,1,2,1,1,1,1711814-bit (11 values)17
gfc14,6,5,1,4,1,3,1,2,1,1,1,1811914-bit (11 values)18

CONFIRMED (SC-3) — the per-gen drift is byte-exact. glc/gfc read GetBits64(4) for stream_opcode where vfc reads GetBits64(3) (verified at DecodeScStreamIssueFromCore @ 0xf63e300 for glc). glc absorbs the extra bit by narrowing length_in_4B 18→17 (keeping CHECK == 118); gfc keeps length_in_4B == 18 and lets the total grow to CHECK == 119. The enum growth that forces the widening is in the enum tables below.

ScStreamProgressXbar / ScStreamProgressCmn — the lane-progress events

Byte-identical wire shape; the only difference is the lane class — Xbar is the crossbar progress lane, Cmn is the memory-network progress lane.

// DecodeScStreamProgressXbar @0xf60c740 (vfc); DecodeScStreamProgressCmn @0xf60c940 identical shape
function DecodeScStreamProgress<Lane>(view, started_out, entry):
    ... SkipBits(2); DecodeTraceHeader; entry[+0x28] = 89/90; ...
    GetBits64(dec, 6, &extra_id);
    GetBits64(dec, 5, &sync_flag_id);
    GetBits64(dec, 1, &sync_flag_core_type);   // enum TEC_OR_SCS/TAC
    GetBits64(dec, 32, &data);
    GetBits64(dec, 1, &done);
    CHECK(BitsDecoded() == 106);
FieldWidthMeaningConfidence
extra_id6correlation sub-idHIGH
sync_flag_id5the sync flag this progress event bumpsHIGH
sync_flag_core_type1TEC_OR_SCS/TAC enumHIGH
data32progress count / lane payload wordHIGH
done1lane-drained flagHIGH
GenwidthsCHECKpktsoneof (Xbar / Cmn)
vfc6,5,1,32,1106189 / 90
glc6,5,1,32,1106181 / 82
gfc6,5,1,32,1106180 / 81

45 payload + 61 = 106. The {sync_flag_id, sync_flag_core_type, done} triple is the stream's per-progress sync-flag bump — the producer/consumer occupancy signal the stream queue uses.


SC_MESSAGE_* Band — Inter-Tile Internal Messages (2 events)

Purpose

The SparseCore inter-tile / SCS↔SCT internal-message band: queue-occupancy, sync-update, and SMEM-update messages between tiles. Both Outbound and Inbound carry the 21,3,14 (38-bit) TraceIdHeader and share one wire shape per gen. These are the only SC events with a TraceIdHeader, and both are 2-packet events.

Algorithm

// DecodeScMessageOutboundInternalMessage @0xf60e6c0 (vfc) — 2-packet, mid CHECK 128, final 176
function DecodeScMessage<Dir>InternalMessage(view, started_out, entry):
    ... SkipBits(2); DecodeTraceHeader; ...                      // bit 61
    GetBits64(dec, 21, &transaction_id);   // TraceIdHeader f1
    GetBits64(dec,  3, &core_id);          // TraceIdHeader f2 (3-bit enum)
    GetBits64(dec, 14, &chip_id);          // TraceIdHeader f3 (14-bit — note: not 12)
    entry[+0x28] = 98;                      // glc 90 / gfc 90  (Inbound: vfc 99 / glc,gfc 91)
    GetBits64(dec,  6, &extra_id);
    GetBits64(dec,  5, &dest_tile_id);
    GetBits64(dec,  1, &dest_core_type);   // enum TEC_OR_SCS/TAC
    GetBits64(dec, 13, &sync_flag_id);
    GetBits64(dec,  4, &smem_address_lo);  // ── packet-boundary straddle (4+10) ──
    CHECK(BitsDecoded() == 128);           // packet-1 boundary; movq $0x20 (32 B)
    GetBits64(dec,  1, &straddle_a);
    GetBits64(dec,  1, &straddle_b);
    GetBits64(dec, 10, &smem_address_hi);
    GetBits64(dec,  1, &msg_type);         // enum SYNCUPDATE/SMEMUPDATE
    GetBits64(dec,  2, &opcode);           // enum WRITE/INC × NO_DONE/WITH_DONE
    GetBits64(dec, 32, &data);
    GetBits64(dec,  1, &done);
    CHECK(BitsDecoded() == 176);
FieldWidthMeaningConfidence
trace_id_header38 (21,3,14)per-transaction identity (transaction_id, core_id, chip_id)CERTAIN
extra_id6correlation sub-idHIGH
dest_tile_id5target tile of the messageHIGH
dest_core_type1target core class — TEC_OR_SCS/TAC enumHIGH
sync_flag_id13sync flag the message updatesHIGH
smem_address4+10 (straddle)SMEM target address (spans the packet boundary)HIGH
msg_type1SYNCUPDATE/SMEMUPDATE enumHIGH
opcode2WRITE/INC × NO_DONE/WITH_DONE enumHIGH
data32message payload wordHIGH
done1done flagHIGH

| Gen | TraceIdHeader | payload widths | mid CHECK | final CHECK | pkts | oneof (Out / In) | |---|---|---|---|---|---|---| | vfc | 21,3,14 | 6,5,1,13,4,1,1,10,1,2,32,1 | 128 | 176 | 2 | 98 / 99 | | glc | 21,3,14 | 6,5,1,13,4,1,1,10,1,2,32,1 | 128 | 176 | 2 | 90 / 91 | | gfc | 21,3,14 | 6,5,1,13,4,1,1,10,1,2,32,1 | 128 | 176 | 2 | 90 / 91 |

payload incl. straddle = 115 bits; + 61 = 176. Byte-identical across all three SC gens. dest_tile_id + dest_core_type pick the target tile and core; msg_type + opcode are the message semantic.


The SparseCore Selector Enum Value Tables

Read byte-exact from the *Values nested enums of each SC message in trace_entries.proto (descriptor pool: vfc @ 0xbf06830, glc @ 0xbf41210, gfc @ 0xbf64c80). These are the named values for the enum fields decoded above. vfc values shown; glc/gfc identical except StreamOpcodeValues, which gains 3 values (and a bit) on the newer gens.

EnumWidthValues
StreamOpcodeValues (vfc)3-bit0=GATHER, 1=GATHERADDS32, 2=GATHERADDF32, 4=SCATTER, 5=SCATTERADDS32, 6=SCATTERADDF32, 7=RESERVED (value 3 is a hole)
StreamOpcodeValues (glc/gfc)4-bitadds 9=GATHERADDS16, 10=GATHERADDBF16, 13=SCATTERADDS16, 14=SCATTERADDBF16, 15=RESERVED
SyncFlagCoreTypeValues1-bit0=TEC_OR_SCS, 1=TAC
TileLocalMemoryTypeValues1-bit0=SMEM, 1=TILESPMEM
OffTileMemoryTypeValues3-bit0=SPMEM, 1=TILESPMEMN, 2=HBM, 3=HBM4B
TileLocalStreamTypeValues1-bit0=LINEAR, 1=CIRCULARBUFFER
OffTileStreamTypeValues2-bit0=LINEAR, 1=STRIDED, 2=INDIRECT, 3=INDIRECTVREG
IndirectListTypeValues1-bit0=WORD, 1=ROW
DestCoreTypeValues1-bit0=TEC_OR_SCS, 1=TAC
MsgTypeValues1-bit0=SYNCUPDATE, 1=SMEMUPDATE
OpcodeValues2-bit0=WRITE_NO_DONE, 1=WRITE_WITH_DONE, 2=INC_NO_DONE, 3=INC_WITH_DONE

QUIRK — StreamOpcode is not densely packed. The GATHER family occupies 0..2, value 3 is a hole, the SCATTER family starts at 4 — so the gather/scatter dichotomy is encoded in the high bit, not a contiguous range. On glc/gfc the half-width-accumulate variants (+S16/+BF16) are NEW on Ghostlite/6acc60406 and slot in at 9,10,13,14, forcing the field to 4 bits. A reimplementation that assumes 0..n contiguity will misdecode SCATTER as a reserved opcode. The SyncFlagCoreType/DestCoreType TEC_OR_SCS/TAC split mirrors the SparseCore sequencer-type enum.


The On-Wire-id → Trace-Point Map

The codec's two-level dispatch splits the dense on-wire trace_point_id space into a low band (primary jump table) and a high band (secondary jump table reached on primary overflow). The SC band lives entirely in the high band. The on-wire id is the value the hardware stamps and the decoder indexes; it is distinct from the proto oneof field number the handler writes to TraceEntry+0x28.

QUIRK — ScInstructionCoreInterrupt is on-wire id 108 but proto oneof field 75 (vfc). The dense on-wire id space (0..185 vfc / 0..211 glc / 0..206 gfc) is the decode key; the dense oneof field is the encode key. They count the same events but index differently — a captured device-trace ring stamps the on-wire id, so a decoder that drives off the oneof field cannot map a raw stream.

SC sub-range — on-wire id → event → oneof (per gen)

The SC band occupies a contiguous run in the high band. on-wire ids are identical across vfc/glc (108..132); gfc inserts a StatsCounterSampleIssuedFromScs at id 129, shifting the two ScMessage* events up by one (to 132/133).

Eventon-wire id (vfc/glc)vfc oneofglc oneofgfc oneofgfc on-wire id
ScInstructionCoreInterrupt108756766108
ScInstructionSetTracemark109766867109
ScInstructionTraceInstruction110776968110
ScInstructionSfenceStart111787069111
ScInstructionSfenceStop112797170112
ScInstructionSyncStart113807271113
ScInstructionSyncStop114817372114
ScInstructionBarrierStart115827473115
ScInstructionBarrierStop116837574116
ScInstructionSyncWatchStart117847675117
ScInstructionSyncWatchStop118857776118
ScTaskIssueFromScs119867877119
ScTaskCommitOnSct120877978120
ScStreamIssueFromCore121888079121
ScStreamProgressXbar122898180122
ScStreamProgressCmn123908281123
ScMessageOutboundInternalMessage131989090132
ScMessageInboundInternalMessage132999191133

The 7-id gap between ScStreamProgressCmn (123) and ScMessageOutbound (131) is filled by the SC-issued OCI descriptor/message events — on-wire ids 124..130 = OciDescriptorCommonIssuedBySc, OciDescriptorStride{Src,Dst,Steps}IssuedBySc, OciDescriptorAddressMiscIssuedFromSc, OciMessage{ReceivedBySc,SentBySc} (the SC's view of its own OCI traffic, owned by payload-uhi-oci-ici-dma.md). gfc additionally exposes a SparseCore PMU surface: StatsCounterSampleIssuedFromScs @ id 129, and StatsCounterSampleIssuedFrom{Sctd,Sctc} @ ids 134/135 (the SC TileData/TileCompute hardware perf-counter samples — gfc-only).

Two-level dispatch parameters (the path to the SC band)

DecodeEntry(view):
  read framing(2) + trace_point_id(8)
  if id <= bound1:  goto *primary_jt [id]                  // low band
  else:             id2 = id - rebase                       // high band
                    if id2 <= bound2: goto *secondary_jt [id2]
                    else:             goto error
GenDecodeEntryprimary jtbound1rebasesecondary jtbound2SC sub-range
vfc0xf5f70800xab86ce80x5f (95)−0x60 (96)0xab86e680x59 (89)108..132
glc0xf6295c00xab875a80x62 (98)−0x63 (99)0xab877340x70 (112)108..132
gfc0xf65ffe00xab87f200x64 (100)−0x6c (108)0xab880b40x62 (98)108..133
vlc0xf5d64600xab865200x8f (143)−0x90 (144)0xab867600x17 (23)— (no SC)

Each secondary arm is a rel32 relative to the secondary table base; the arm is a thunk inside DecodeEntry that stamps the proto oneof tag and tail-calls the matching DecodeSc<Name>. The codec page owns the dispatch mechanism; this page owns the SC sub-range it routes to.

CONFIRMED (SC-4) — in the vfc DecodeEntry @ 0xf5f7080, the decompiler renders the two-level dispatch as nested switches — the outer switch bound at case 0x5f (primary), the inner switch rebased past case 0x60 reaching DecodeScInstructionCoreInterrupt. This is the secondary table the earlier pxc reading (single table only) labeled an "error label": on vfc/vlc/glc/gfc the overflow target is a second id-rebased jump table for the high id band (SC/throttle/MGR/CMNUR). pxc and vlc reach the secondary table but it holds no SC arms — pxc has the BarnaCore Bc_Fsm* band instead, vlc holds only VdqTransaction* and MGR OCI.


What Is Not Decoded Here

  • The exact uint64 reassembly bit order for the straddle fieldsScTaskCommitOnSct.tec_sync_stalls (7+9) and ScMessage*.smem_address (4+10): the width fragments, the +0xNN destination offset, and the packet-1 boundary CHECK == 128 are CERTAIN, but the precise LSB position each fragment occupies in the reassembled value is not tabulated (LOW — same open item as the UHI/OCI straddle fields).
  • The downstream SC scalar → XStat/XEvent mapping — which decoded SC field (num_spmem_words, tec_sync_stalls, tile_bitmap, stream_opcode) becomes which XStat vs is folded into the SparseCore XEvent name happens in the SparseCoreOverlaySubscriber, owned by trace-entry-to-xevent.md.
  • The gfc-only SC PMU sampling events (StatsCounterSampleIssuedFrom{Scs,Sctd,Sctc} @ ids 129/134/135) and the SC-issued OCI descriptor sub-band (ids 124..130) — adjacent to the SC band but not DecodeSc* events; their payloads are not decoded here.
  • The cause of the gfc ScTaskCommitOnSct restructure — the wire shape (9 fields, CHECK == 219) is CERTAIN; whether the dropped TAC counters reflect a unified-SC microarch change or a profiler schema rev was not cross-checked against the SC ISA (LOW).

ComponentRelationship
TraceEntriesCoderthe codec these SC payloads sit inside — 16-byte packet, 61-bit envelope, GetBits64/CHECK, two-level dispatch
TracePoints Master Registrythe full cross-band wire-id ↔ oneof-field registry; this page owns only the SC sub-range
Payload: UHI/OCI/ICI/DMAthe neighboring high-band events (SC-issued OCI ids 124..130) and the 21,3,12 TraceIdHeader the SC band varies to 21,3,14
Payload: vfc/vlc/gfcthe per-family header deltas (6-bit block_id, 45-bit timestamp, 14-bit chip_id) the SC payloads inherit
SCS Enginethe sequencer that issues ScTaskIssueFromScs and runs the ScInstruction* primitives
Stream Gather/Scatterthe datapath the ScStream* events and StreamOpcode enum instrument

Cross-References