Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TracePoints Master Registry

All addresses, ids, and offsets on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d). The binary is not stripped — full C++ symbols are present, and .text VMA equals file offset. Other versions will differ.

Abstract

This page is the master trace-point id→name registry for the libtpu on-device profiler: the table that maps every on-wire trace_point_id (the 8-bit TraceHeader field the codec decodes) to a human event name, the hardware band it belongs to, and the subscriber(s) that consume it. It is the index that the decode → XSpace fan-out keys off: when DecodeTraceBuffers walks a 16-byte packet, the wire trace_point_id it reads is only a number; the meaning of that number — what event it names, which device line it lands on, which begin/end tracker pairs it — lives here.

The registry is not a static table in the binary. It is built at runtime by ConvertTpuTraceToXPlaneV2<TraceEntry> (204 symbol references), which, on first sight of each on-chip core, runs a per-family CoreContext setup lambda that registers a fixed set of TraceEventSubscribers, each carrying an inline TracePoints<TraceEntry> int32 id-set. CoreDispatcher::RegisterSubscriber (one out-of-line symbol per chip family, 6 total) inserts each subscriber into a FlatHashMap<trace_point_id, vector<subscriber>>. The id→name mapping itself is the codec's per-event Decode<EventName>() jump-table — the master registry is the union of that name space with the subscriber routing the lambda installs over it. Reconstructing it byte-exact means reading both: the codec's 111-entry decode table for names, and each family's lambda for the id-set each subscriber selects on.

The id space is banded. The deepsea gens (pxc/vfc/vlc/glc/gfc) pack their trace_point_id into a gappy 0..255 space partitioned by hardware origin: UHI host-bridge, OCI on-chip interconnect, ICI inter-chip interconnect, DMA, the TensorCore sync/fence band at base 80, the SparseCore band at base 109, the throttle band at base 104/200, and the MGR/power band at base 160. The legacy jxc (jellyfish) gen uses a different 16-bit id namespace entirely. This page owns the per-band id→name/subscriber/tracker rollup; the per-band payload field decode is owned by the payload pages, and the id→name codec dispatch by TraceEntriesCoder.

For reimplementation, the contract is:

  • The band partition of the deepsea 0..255 trace_point_id space — which contiguous id range each hardware origin owns, and its band base.
  • The id→name mapping per band — the event name the codec's Decode<Name>() attaches to each id (the wire→human map every decode/translation step keys off).
  • The id→subscriber fan-out — the FlatHashMap routing the CoreContext lambda installs; the structural fact that one id can drive N subscribers (id 85 → 4).
  • The begin/end tracker match keys — which ids feed which stateful tracker (Step/Task/Overlay/Sync/Dma) and the field it pairs spans on.
Registry builderxprof::tpu::ConvertTpuTraceToXPlaneV2<TraceEntry> — pxc @0xf1d4360 (+5 families)
CoreContext setup lambdapxc @0xf1eb2e0, gfc @0xf229100, glc @0xf214640, vfc @0xf201bc0, vlc @0xf1f5940, jxc @0xf1da7c0
Routing structureCoreDispatcher FlatHashMap<trace_point_id (u16 @ TraceHeader+0x18), vector<shared_ptr<TraceEventSubscriber>>>
Register callCoreDispatcher::RegisterSubscriber(this, const TracePoints&, shared_ptr<sub>) — pxc @0xf1ecee0 (+5)
Dispatch (drain)CoreDispatcher::Dispatch @0xf1ed280movzwl 0x18(%rax),%edx reads the id, call *0x10(vtable) fans out
Id→name sourceper-event Decode<Name>() via the 111-entry rel32 jump table @0xab85bc0 (codec)
Subscriber count / familypxc 8 · gfc 19 · glc 19 · vfc 17 · vlc 11 · jxc 10

At-a-glance — bands and their id ranges

BandBase / range (deepsea)OriginSubscribersPayload page
UHIOCI sub-channel (*UhiBridge)Host bridge(raw path)uhi-oci-ici-dma
OCIOciMessage* (20 events)On-chip interconnect(raw path)uhi-oci-ici-dma
ICIIciPacket*Inter-chip interconnect(raw path)uhi-oci-ici-dma
DMACMQ/CMN/HDE req+data-endDMA enginesDmaSubscriber (jxc)uhi-oci-ici-dma
Sync/Fencebase 80 (80..90)TensorCoreSync + ScalarFence ×2sc-band / vfc-vlc-gfc
Throttlebase 104 (vfc/vlc) / 200 (gfc/glc)Power mgmtPowerThrottlevfc-vlc-gfc
SparseCorebase 109 (109..120)SparseCore6 SC subscriberssc-band
MGR/Powerbase 160 + 168/169MGR firmwarePState/Firmware/Spivfc-vlc-gfc
BarnaCorepxc — no SC bandpxc-only(TC fan-out only)jxc-legacy
jxc legacy16-bit jellyfish nsjellyfish genHbmMux + Dma + 8jxc-legacy

NOTE — "raw path" means the band has no dedicated subscriber in the deepsea CoreContext lambda. UHI/OCI/ICI ids fall through to the raw TraceEventSubscriber decimal-string XEvent path (the FastIntToBuffer name fallback) rather than a semantic subscriber. See § Unbound bands. The names below are still the authoritative codec decode names; the routing is what is absent.


How an id Becomes an Event

The registration mechanism

The registry is assembled lazily. ConvertTpuTraceToXPlaneV2<TraceEntry> spawns a worker thread and, the first time it sees a given ChipCoreId, runs the per-family CoreContext setup lambda ({lambda(shared_ptr<TraceEntryWrapper>)}::operator()). That lambda builds the per-core TpuXPlaneBuilder and then performs N RegisterSubscriber calls — N being the per-family subscriber count (8/19/19/17/11/10).

// CoreContext setup lambda — schematic of one register call
// (pxc @0xf1eb2e0; one of 8 such blocks)
function RegisterOne(dispatcher, kind, id_set[], n, line):
    sub = operator_new(...)                  // allocate the subscriber
    sub.vtable = SubscriberVtable[kind]       // or call its out-of-line ctor
    // most subscribers are wrapped in a ThreadedSubscriber (vtable + ClosureThread
    // worker @0xf1ede60 at +0xa0); HLO/SparseCore/Dma/HbmMux are NOT wrapped.
    tp = TracePoints<TraceEntry>()            // std::vector<int32> on the stack
    buf = _Znwm(4 * n)                        // {ptr@+0x0, size@+0x8, cap@+0x10}
    write_ids(buf, id_set, n)                 // movl/movabs/vmovaps id stores
    tp.size = n                               // movq $n, slot+0x8
    CoreDispatcher::RegisterSubscriber(dispatcher, tp, sub)  // @0xf1ecee0
    free(buf)                                 // tp freed after each call

RegisterSubscriber walks every id in the TracePoints set and, for each, appends sub to FlatHashMap[id]. This is what makes the registry a multimap: registering two subscribers with overlapping id-sets means a single wire id resolves to a vector of subscribers, and at drain every one of them runs.

QUIRK — the id-set is not a property of the codec. The codec knows how to decode id 85 into a TraceEntry; it has no idea that on a deepsea gen id 85 must drive four different consumers. The decode name space (111 entries) and the routing (8..19 subscribers over a handful of ids) are two independent structures, joined only here. A reimplementation that builds the routing off the decode table — assuming "one decoder, one consumer" — is wrong for every fan-out id.

The id write encodings

The inline TracePoints int32 buffer is filled with one of three x86 idioms, depending on set size. Reading them back is how each id-set below was recovered byte-exact:

Set sizeEncodingExample
1movl $id,(buf)movl $0x54,… → {84} (pxc reg3, @0xf1ec331)
2movabs $0xHI00000000LO,%rcx; mov %rcx,(buf)movabs $0x5a00000059 → {89,90} (@0xf1ec0be)
4vmovaps rodata,%xmm0; vmovups %xmm0,(buf)@0xa2d6b40 → {81,82,88,87} (pxc sync)
>4a 4-id vmovaps base + per-id movl $id,off(newbuf) growth copiesSC-Syncs {111..116}, size 6 (movq $0x6 @0xf22af00)

The 4-id sync vector lives in .rodata @0xa2d6b40 and decodes to {81,82,88,87}; the remaining two sync ids {86,80} are appended via movabs $0x5000000056. The SpiSampler pair {168,169} is movabs $0xa9000000a8.

The dispatch (drain) path

At drain CoreDispatcher::Dispatch @0xf1ed280 reads the wire id with movzwl 0x18(%rax),%edx @0xf1ed2c4, probes the FlatHashMap, and for each registered subscriber calls *0x10(vtable) (the subscriber's ProcessTraceEntry) at @0xf1ed422/43c/498/4de. A miss — an id no subscriber registered — is where the unbound bands go.


The Sync / Fence Band (base 80)

Purpose

The TensorCore sync-flag and scalar-fence band: the 80..90 id range carrying every sync-flag update/read/wait and scalar-fence start/end. This is the only band with a literal in-body selection mask (0x1c7), and the only one whose pairing semantics (wait BEGIN → wait END) are decoded directly here.

Registry

idevent name (codec)subscriber(s)tracker / key
80TCS_EXTERNAL_SYNC_FLAG_UPDATE_DMA_DONESyncSubscriberSyncTracker (END, DMA-done)
81TCS_INTERNAL_SET_SYNC_FLAGSyncSubscriber(instant)
82TCS_INTERNAL_ADD_SYNC_FLAGSyncSubscriber(instant)
84TCS_INTERNAL_SET_TRACEMARKTensorCoreStepSubscriberStepTracker (TraceMark id)
85TCS_INTERNAL_TRACE_INSTRUCTIONHlo + Overlay + OnDeviceTraceMe + LloOpOverlayTracker (overlay_id)
86TCS_INTERNAL_UNSUCCESSFUL_SYNC_ATTEMPTSyncSubscriberSyncTracker (wait BEGIN)
87TCS_INTERNAL_SUCCESSFUL_SYNC_ATTEMPTSyncSubscriber(instant — not tracker-paired in pxc)
88TCS_INTERNAL_READ_SYNC_FLAGSyncSubscriber(instant)
89TCS_INTERNAL_SCALAR_FENCE_STARTScalarFenceSubscriber ×2fence span (lines 9 + 62)
90TCS_INTERNAL_SCALAR_FENCE_ENDScalarFenceSubscriber ×2fence span

Two further codec names exist in this neighborhood but are not bound by any deepsea lambda subscriber: TCS_INTERNAL_CORE_INTERRUPT and TCS_INTERNAL_HOST_INTERRUPT (decode names confirmed in .rodata; routing not installed — (unbound, HIGH)).

The 0x1c7 selection mask

SyncSubscriber::ProcessTraceEntry (pxc @0xf1eeee0) loads mov $0x1c7,%esi @0xf1eef3d and tests (1 << (id - 80)) against it. The band base is 80, so:

0x1c7 = 0b1_1100_0111 = bits {0,1,2,6,7,8} = ids {80,81,82,86,87,88}

This is the canonical band-base encoding: a sync trace_point_id's "selection-mask bit" is (1 << (id - 80)). It excludes 84 (tracemark → StepTracker), 85 (trace-instruction → 4-way fan-out), and 89/90 (scalar fence → ScalarFence), which are routed by their own registered subscribers.

GOTCHA — the 0x1c7 mask is in addition to the FlatHashMap routing, not instead of it. SyncSubscriber is registered for {80,81,82,86,87,88} (the mask set) and the dispatch already filters to those ids; the in-body mask is a second, redundant-looking filter. Only Sync was confirmed to carry such an in-body literal mask. Every other subscriber routes purely by its registered set — the FlatHashMap is the authoritative selection structure. Do not assume other bands have a hidden bitmask (LOW that they do).


The SparseCore Band (base 109)

Purpose

The SparseCore instruction/task/sync band: ids 109..120 carrying SC tracemarks, trace-instructions, sfence/sync/barrier start-stop pairs, and the task issue→commit span. Present on gfc/glc/vfc; absent on pxc and vlc.

NOTE — the subscriber-bound SC ids are 109..120, but the full SC codec band starts one lower at id 108 (ScInstructionCoreInterrupt, unbound — no deepsea lambda subscriber routes it) and extends through id 132/133 plus the gfc-only PMU samples at 134/135. "Base 109" here is the lowest routed id, not the band floor; the complete 108..133/135 on-wire SC sub-range (and the SC-issued OCI events at 124..130 that sit inside it) is owned by payload-sc-band.md.

Registry

idevent name (codec)subscriber(s)tracker / key
109ScInstructionSetTracemarkSC-Hlo + SC-Task + SC-StepStepTracker (SC TraceMark)
110ScInstructionTraceInstructionSC-Hlo + SC-Task + SC-Overlay + SC-OnDeviceTraceMeOverlayTracker (SC overlay_id)
111ScInstructionSfenceStartSparseCoreSyncsSubscribersfence span
112ScInstructionSfenceStopSparseCoreSyncsSubscribersfence span
113ScInstructionSyncStartSparseCoreSyncsSubscribersync span
114ScInstructionSyncStopSparseCoreSyncsSubscribersync span
115ScInstructionBarrierStartSparseCoreSyncsSubscriberbarrier span
116ScInstructionBarrierStopSparseCoreSyncsSubscriberbarrier span
119ScTaskIssueFromScsSC-Hlo + SC-TaskTaskTracker (task tag, ISSUE)
120ScTaskCommitOnSctSC-Hlo + SC-TaskTaskTracker (task tag, COMMIT)

Adjacent codec names not bound by the SC lambda subscribers: ScInstructionSyncWatchStart / ScInstructionSyncWatchStop and ScInstructionCoreInterrupt (decode names present; (unbound, HIGH)). The SC-Syncs subscriber's id-set is written movabs $0x7200000071 {113,114} then grown +0x73{115}+0x74{116}+0x6f{111}+0x70{112}, size 6 (movq $0x6,-0x110 @0xf22af00).

Subscriber roster (gfc/glc/vfc)

#subscriberTracePoints setXLinerole
5SparseCoreHloSubscriber{109,110,119,120}SC XLA-op name
6SparseCoreTaskSubscriber{109,110,119,120}→ TaskTracker
7SparseCoreOverlaySubscriber{110}142 SC Overlay→ OverlayTracker
8SparseCoreOnDeviceTraceMeSubscriber{110}100 SC TraceMeTraceMe span
9SparseCoreStepSubscriber{109}117 SC Steps→ StepTracker
10SparseCoreSyncsSubscriber{111,112,113,114,115,116}67 SC Syncssfence/sync/barrier

QUIRK — ids 109 and 110 each fan out to a different set of subscribers. 109 (tracemark) hits Hlo+Task+Step; 110 (trace-instruction) hits Hlo+Task+Overlay+TraceMe. The Task subscriber listens on the full {109,110,119,120} but only the 119/120 pair actually pairs into a span — 109/110 reach it for completeness. This is the SC analog of the TensorCore 85 fan-out.


The OCI / ICI / UHI Bands (interconnect)

Purpose

The on-chip (OCI), inter-chip (ICI), and host-bridge (UHI) interconnect bands. These carry the bulk of the on-wire event volume — message issue/receive across engines, ICI link packet flow, and UHI host-bridge traffic — but have no dedicated semantic subscriber in the deepsea CoreContext lambda.

Registry (representative — OCI message events)

The OCI band's decode names form a 20-event OciMessage* family naming each hop of an on-chip message:

event name (codec)meaning
OciMessageIssuedFromTcsmessage issued from TensorCore sequencer
OciMessageIssuedFromMgrmessage issued from MGR firmware
OciMessageMsgIssuedFromEngine / ...FromQnmengine / QNM issue
OciMessageSentByBc / ...BySc / ...ByCmnde / ...ByHdeegress per source block
OciMessageSentByUhiBridge / OciMessageReceivedByUhiBridgeUHI band — host-bridge hop
OciMessageReceivedByBc / ...BySc / ...ByMgringress per dest block
OciMessageGeneratedInIcrEgressDma / ...IngressDmaICR DMA-bridge generation
OciMessagePacketReceivedInIcr / OciMessagePacketSentToOciOCI/ICR boundary
OciMessageSyncFlagUpdateFromDstEnginesync-flag update carried over OCI

ICI link flow uses the parallel IciPacket* decode family: IciPacketControlPacketInjectedByIcrDmaBridge / ...QueuedForLocalIngress / ...ReceivedByIcrDmaBridge, the matching IciPacketDataPacket* trio, and the link-side IciPacketPacketQueuedForLinkTransmission / ...ReceivedOnLinkInput / ...TransmittedOnLinkOutput.

The UHI band is not a separate id range — it is the subset of OCI/ICR events naming the UHI host bridge (*UhiBridge). The UhiBridge symbol is present; the UHI events ride the OCI codec.

NOTE — the per-id numeric values for the OCI/ICI/UHI bands are owned by the codec decode table (the gappy 0..255 jump-table index). This page asserts the names and the band membership, both byte-confirmed from .rodata; the exact id→name numeric pairing for the interconnect bands lives on the codec page as the jump-table indices.

Unbound bands — UHI/OCI/ICI/DMA on deepsea

The CoreContext lambda registers only the ~8..19 semantic subscribers (Sync/Step/Hlo/Overlay/TraceMe/Llo/ScalarFence/SparseCore*/Firmware/Throttle/Spi). The dominant interconnect and DMA bands have no entry in the FlatHashMap, so on the deepsea gens they fall through to the raw TraceEventSubscriber path: the event name becomes a FastIntToBuffer decimal string of the id, emitted as a generic XEvent — or the id is dropped. Which one, and whether a separate DmaSubscriber is registered on the deepsea gens (it is present only for jxc), was not resolved (LOW). This is the single largest completeness gap in the master registry: the bands with the most on-wire traffic are the least semantically bound.


The DMA Band

Purpose

The DMA started/completed band: CMQ/CMN/HDE descriptor request + data-end events, paired by dma_id. On jxc this is a first-class DmaSubscriber; on deepsea it is unbound (above).

Registry

elementvalue
request decode namesCmnDmaRequestEastSideLane / ...WestSideLane / CmnDmaRequestSet
DMA enginesCMQ (queue), CMN/CMNDE (notify), HDE (host)
pairing keydma_id = GetDmaId() — first descriptor = BEGIN, data-end = END
jxc subscriberDmaSubscriber — routes on MemoryCommand()/GetDmaId(), not a static id-set
jxc HBM muxHbmMuxSubscriber — HBM-mux band, routes on MemoryCommand

GOTCHA — the jxc DmaSubscriber and HbmMuxSubscriber do not register a TracePoints int32 set at all. They route dynamically on the decoded MemoryCommand() accessor, not on the wire trace_point_id. A reimplementation that expects every subscriber to carry an id-set will mis-model these two — they are the exception to the registration mechanism.


The Throttle Band (base 104 / 200)

Purpose

The power-management cycle-skip band: throttle events keyed by a RunLengthTracker<unsigned int> that coalesces consecutive cycle-skip samples into runs. The band base is gen-dependent — 104 on vfc/vlc, 200 on gfc/glc.

Registry

ideventsubscriber
104ThrottleCycleSkip band base (vfc/vlc)PowerThrottleSubscriber
200ThrottleCycleSkip band base (gfc/glc)PowerThrottleSubscriber

The deepsea throttle decode names form a deep ThrottleCycleSkip* taxonomy by brake source: ...ElectricalBrakeEventCycleSkipLdidtBrake, ...ElectricalDroopEvent...LdidtDroop, ...ExternalBrakeEvent...ExtBrake, ...ExternalThrottleEvent...ExtThrottle, and the PPM family (...PpmBrakeEventDidtAggressive/Nominal, ...OvershootAggressive/Nominal, ...SustainedAggressive/Nominal, plus the rising/falling-edge variants). The RunLengthTracker per-sample accumulation key (which thermal/power counter line each run targets) was not decoded (LOW). The vfc base is written movl $0x68 = 104 @0xf2027ae; gfc movl $0xc8 = 200 @0xf229cee.


The MGR / Power Band (base 160)

Purpose

The MGR firmware power/p-state band: the 160 base plus the 168/169 SPI sampler pair. These subscribers wrap RunLengthTracker<double> (firmware) / <...EventBuilder> (SPI) accumulators rather than pairing begin/end spans.

Registry

ideventsubscriber(s)
160MgrFwEvent / MGR band basePStateTrackerSubscriber + FirmwareSubscriber (+ 2nd component FW on gfc/glc)
168SPI sampler (gfc/glc)SpiSamplerSubscriber
169SPI sampler (gfc/glc)SpiSamplerSubscriber

The 2nd FirmwareSubscriber (gfc/glc only) wraps RunLengthTracker<FirmwareComponentEventBuilder> and emits per-component power lines (kComponents 120..130 / 134..138). The SpiSamplerSubscriber (SpiSamplerSubscriber / PowerSpiSamplerEventBuilder confirmed) wraps RunLengthTracker<PowerSpiSamplerEventBuilder>, lines 118/119, id-set {168,169} (movabs $0xa9000000a8 @0xf22c3aa). vfc/vlc have neither the 2nd Firmware nor the SPI sampler. The MGR base is movl $0xa0 = 160.


The pxc (BarnaCore) Gen

Purpose

pxc (pufferfish) is the BarnaCore gen — the simplest deepsea family, 8 subscribers, with no SparseCore band and no power/throttle subscribers. Its high-id space holds only the TC sync/fence/instruction bands.

Registry

#subscriberTracePointsXLinetracker/key
1SyncSubscriber (threaded){80,81,82,86,87,88} mask 0x1c717sync_flag_number
2ScalarFenceSubscriber (threaded){89,90}9 Scalar Unitfence span
3TensorCoreStepSubscriber (threaded){84}1 StepsTraceMark id
4TensorCoreHloSubscriber{85}3 XLA Ops(HLO name)
5TensorCoreOverlaySubscriber (threaded){85}7 TC Overlayoverlay_id
6TensorCoreOnDeviceTraceMeSubscriber{85}6 XLA TraceMeTraceMe span
7LloOpEventSubscriber (threaded){85}8 Tensor Core(LLO op)
8ScalarFenceSubscriber (threaded){89,90}62 Barna Core Fencefence span

The 4-way fan-out of id 85 (TCS_INTERNAL_TRACE_INSTRUCTION) is most visible here: subscribers 4/5/6/7 all register {85}. Their shared -0xc0 {85} stack buffer is allocated once and freed once (@0xf1ecafb). Subscribers 2 and 8 are both ScalarFenceSubscriber on {89,90} but emit to different device lines (9 Scalar Unit vs 62 Barna Core Fence) — distinguished by the +0x18 line field (movl $0x9,0x18 vs movl $0x3e,0x18 = 62).


The jxc (jellyfish) Legacy Gen

Purpose

jxc (jellyfish) is the legacy PerformanceTraceEntry gen — the only family on the old codec, with a 16-bit id namespace distinct from the deepsea 0..255 band, and the only gen with HbmMux + Dma subscribers. 10 subscribers (lambda @0xf1da7c0).

Registry (jellyfish 16-bit ids)

#subscriberTracePoints (jellyfish ids)
1HbmMuxSubscriber (threaded)(HBM-mux band; routes on MemoryCommand)
2DmaSubscriber(routes on MemoryCommand()/GetDmaId())
3SyncSubscriber (threaded){0xa3d,0xa3e,0xa42,0xa43,0xa44,0x93c}
4ScalarFenceSubscriber (threaded){0xa45,0xa46} = {2629,2630}
5TensorCoreStepSubscriber (threaded)(TC SetTracemark, jellyfish id)
6TensorCoreHloSubscriber(TC TraceInstruction, jellyfish id)
7TensorCoreOverlaySubscriber (threaded)(same)
8TensorCoreOnDeviceTraceMeSubscriber(same)
9LloOpEventSubscriber (threaded)(same)
10ScalarFenceSubscriber (threaded){0xa45,0xa46}

The sync id-set is packed movabs $0xa430a440a3e0a3d @0xf1db880 (16-bit halves 0xa3d/0xa3e/0xa44/0xa43) + movl $0x93c0a42 ({0xa42,0x93c}). The ScalarFence set is movl $0xa460a45 ({0xa45,0xa46}). The TC single-id sets (subscribers 5..9) and the HbmMux band were not individually enumerated (LOW); the jellyfish-id → event-name cross-walk is owned by the jxc legacy payload page. jellyfish::TraceOperand and PerformanceTraceEntry symbols confirmed present.

QUIRK — jxc subscribers 3/4/10 reuse the same C++ subscriber types as the deepsea gens (SyncSubscriber, ScalarFenceSubscriber) but feed them ids from a completely different namespace. The subscriber type is portable; the id-set is not. A cross-gen routing table that keys on the deepsea band bases (80/109/160) will silently mis-route every jxc event — the only common ground is the subscriber taxonomy, not the ids.


The Five Stateful Trackers — Match Keys

The trackers are the begin/end pairing layer above the subscribers. Each consumes a fixed id subset and pairs spans on one field; this is the analog of SyncTracker(sync_flag_number) and DmaSubscriber(dma_id).

trackerfeeds (trace_point_ids)begin eventend eventMATCH KEY
SyncTracker80,86 (+81,82,87,88 instant)86 UNSUCCESSFUL_SYNC (ProcessSyncBlock)80 DMA_DONE (ProcessSyncUnblock)sync_flag_number
DmaSubscriberCMQ/CMN/HDE req + data-endFirst()/MemoryCommandLast()/MemoryDataEnddma_id = GetDmaId()
StepTracker84 (TC) / 109 (SC)TraceMark 0x7fffffffTraceMark 0x7ffffffeTraceMark id (state+0x8)
TaskTracker119,120 (SC only)119 ScTaskIssueFromScs120 ScTaskCommitOnScttask tag (FlatHashMap)
OverlayTracker85 (TC) / 110 (SC)operand kind 0xd (open)operand kind 0x9 (close)overlay_id (state+0x8)

StepTracker — TraceMark id, sentinel-discriminated

StepTracker::ProcessTraceEntry<gfc> @0xf231900 reads the id and dispatches by core type: TC (id 0x54=84, marker from TcsInternalSetTracemark_globals_+0x18) or SC (id 0x6d=109, from ScInstructionSetTracemark_globals_+0x18). It builds a TraceMarkEntry{+0x0 step_id, +0x8 mark_type, +0x10 gtc, +0x18 flag} and calls the shared StepTracker::ProcessTraceEntry @0xf2c4480. Mark-type sentinels (cmp @0xf2c44bd/4f1/4513/4580/4589):

0x7fffffff  step-BEGIN marker  → close any open step, open a new one keyed by the marker id
0x7ffffffe  step-END marker     → close the open step, emit StepInfo{step_id, start, end}
0x7ffffff9  intra-step marker   → annotate the open step without closing it

State struct: +0x8 = current step id (the match key), +0x10 = start gtc, +0x28 = flag, +0x30 = present.

TaskTracker — task tag, FlatHashMap-keyed

TaskTracker::ProcessTraceEntry<gfc> @0xf2394e0. id 0x77=119 (ScTaskIssueFromScs, oneof 0x4d) reads tag from ScTaskIssueFromScs_globals_+0x28 and records a pending task in a FlatHashMap<tag, pending> (probe mulq 0x28(%r12), cap cmp 0x38(%r12)). id 0x78=120 (ScTaskCommitOnSct, oneof 0x4e) reads the matching tag from +0x20, looks it up, and on match emits TaskInfo{+0x8 tag, +0x50 present} via ProcessTaskCommit @0xf2c4ce0. Begin = 119, end = 120, key = the task tag. SparseCore-only.

OverlayTracker — overlay_id operand

OverlayTracker::ProcessTraceEntry<gfc> @0xf2335c0 dispatches on the core-type selector (state+0x0: 2=TC, 5=SC), filters id + block_id (+0x1c against state+0x4), and extracts the operand: TC id 0x55=85 (operand from TcsInternalTraceInstruction_globals_+0x18) / SC id 0x6e=110 (from ScInstructionTraceInstruction_globals_+0x18). OverlayTracker::ProcessTraceOperand @0xf2c3e40 branches on operand kind:

0xd  overlay OPEN   → record overlay_id at state+0x8 (match key), active flag +0xc, start gtc +0x10
0x9  overlay CLOSE  → emit OverlayInfo{overlay_id, start, end=gtc}, clear +0x8/+0x10

Both ride on id 85 (TC) / 110 (SC); the open/close discriminant is the operand kind, not a separate id.


Relevant Struct / Table Offsets

StructureLayout
TracePoints<TraceEntry>std::vector<int32> {ptr@+0x0, size@+0x8, cap@+0x10}; built on stack, freed after each register
CoreDispatcher mapFlatHashMap<u16 id (TraceHeader+0x18), vector<shared_ptr<TraceEventSubscriber>>>
TraceEventSubscriber base+0x08/+0x0c core_id/chip_id filter; +0x10 TpuXPlaneBuilder*; +0x18 per-id XEventMetadata* cache / line field
ThreadedSubscribervtable + ClosureThread worker (ThreadLoop @0xf1ede60 pxc) at +0xa0; wrapped vtable set inline
StepTracker state+0x8 step id (key), +0x10 start gtc, +0x28 flag, +0x30 present
TaskTracker stateFlatHashMap<tag, pending>; TaskInfo{+0x8 tag, +0x50 present}
OverlayTracker state+0x8 overlay_id (key), +0xc active flag, +0x10 start gtc
sync id rodata@0xa2d6b40 = {81,82,88,87}; +{86,80} via movabs $0x5000000056

Cross-References