Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SpecialPurposeSyncFlags — the chip-config SFLAG reservation message, its FromProto sink, and the GetSpecialPurposeSyncFlags accessor

Binary: extracted/libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64/libtpu/libtpu.so (build-id 89edbbe81c5b328a958fe628a9f2207d, build libtpu_lts_20260413_b_RC00; .text VMA == file offset 0xe63c000, .rodata VMA == file offset 0x84a0000). Status: Reimplementation-grade · Evidence grade: Confirmed (byte-anchored) — the GetSpecialPurposeSyncFlags accessor, the TpuChipConfig::FromProto element build, the 0x40-byte runtime element layout, the EnumMap::Clear scalar-persistence, and the Target::Init / SparseCoreTarget::Init struct sinks are all byte-exact (+core<<6, & 0x100000000 presence gates, size − 5). The literal per-gen SFLAG integers are an embedded-memfile dependency (LOW — see Per-Codename Compiler-Reserved). · Part XIII — On-Pod Collectives & Barriers / SFLAG & barriers · back to index

Abstract

Every barrier on a TPU is a reserved sync-flag (SFLAG) number, and the numbers a barrier may take are not hard-coded — they are carried, one set per core type, in the chip config. The carrier is a proto submessage, SpecialPurposeSyncFlags, embedded as the repeated field 13 of TpuChipConfigProto. Each instance pairs a compiler_reserved repeated-int32 range (the per-id barrier window) with four named scalar SFLAG numbers (sequencer_overlay, tile_overlay, global_barrier_sflag, local_barrier_sflag). The general SFLAG number formulas that consume compiler_reserved live on Barrier-to-SFLAG Binding; the literal per-(codename, deployment) integers live on Per-Codename Compiler-Reserved. This page owns the parsing and the access path: the TpuChipConfig::FromProto runtime sink that lands the proto into an EnumMap element, the GetSpecialPurposeSyncFlags(coreType) accessor that retrieves it (chip+0x2a0 + (core<<6), gated by the chip+0x360 presence bitmask), the overlay semantics of the four named scalars (each a single SFLAG index — not a bitmask — packed with a presence bool, surviving EnumMap::Clear), and the EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> that keys the set by core type.

For reimplementation, the contract is:

  • The proto carrier: TpuChipConfigProto.special_purpose_sync_flags (field 13, repeated SpecialPurposeSyncFlags), one element per TpuCoreType. Each element has core_type (f1), compiler_reserved (f3, repeated int32), and four scalar SFLAG numbers sequencer_overlay/tile_overlay/global_barrier_sflag/local_barrier_sflag (f4–f7).
  • The FromProto sink: TpuChipConfig::FromProto @0x20aea100 heap-copies compiler_reserved into a vector<int32> and stores the four scalars verbatim (low 32 bits = number, bit 32 = presence) into a 0x40-byte EnumMap element, then sets the EnumMap presence bit for that core. The scalars are not folded into compiler_reserved — they persist in the element padding.
  • The accessor: GetSpecialPurposeSyncFlags(core) @0x20afcf40 returns NULL if the per-core presence bit (chip+0x360) is clear, ud1-traps if core >= 3, else returns chip + 0x2a0 + (core<<6) (element stride 0x40). The TensorCore entry is mandatory: Target::Init dies via DieBecauseNull if it is absent.
  • The overlay semantics: each named scalar is a single SFLAG number, presence-gated, copied into a Target/SparseCoreTarget field on demand. The named-scalar path is SparseCore-only; on the TC only sequencer_overlay (→ Target+0x534) is consumed. The four scalars survive EnumMap::Clear (which frees only the compiler_reserved vector).
Proto carrierTpuChipConfigProto.special_purpose_sync_flags (field 13, repeated SpecialPurposeSyncFlags)
Proto element fieldscore_type f1 @msg+0x2c; compiler_reserved f3 (count @+0x1c, data @+0x18/+0x20); sequencer_overlay f4 @+0x30; tile_overlay f5 @+0x34; global_barrier_sflag f6 @+0x38; local_barrier_sflag f7 @+0x3c
FromProto sinkTpuChipConfig::FromProto @0x20aea100 (special_purpose_sync_flags loop)
Runtime containerEnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> @TpuChipConfig+0x2a0, stride 0x40, presence bitmask @+0x360
AccessorGetSpecialPurposeSyncFlags(core) @0x20afcf40chip + 0x2a0 + (core<<6)
EnumMap teardownEnumMap<…SpecialPurposeSyncFlags,3>::Clear @0x20b08200 (frees compiler_reserved vector only)
TC struct sinkTarget::Init @0x1d60fc20Target+0x8c0/+0x8c4 (cr base/count−5), +0x534 (sequencer_overlay)
SC struct sinkSparseCoreTarget::Init @0x1d612b20+0x1d0/+0x1d4 (cr base/count, no −5), +0x1e8/+0x200/+0x204/+0x208 (the four scalars)
Scalar semanticssingle SFLAG index, not a bitmask; value@bits[31:0], present@bit[32]

1. The proto carrier — SpecialPurposeSyncFlags

SpecialPurposeSyncFlags is a submessage of TpuChipConfigProto, carried as the repeated field 13 special_purpose_sync_flags. The chip config carries one instance per TpuCoreType (kTensorCore=0, kSparseCore=1, kBarnaCore=2). Each instance bundles a range of SFLAG numbers reserved for the compiler with four individual named SFLAG numbers used by the overlay / barrier lowering.

The generated-C++ message struct offsets (recovered from the generated _InternalSerialize @0x20b0f040, which is the authoritative layout witness — WriteInt32<ILi4>(+0x30)<ILi7>(+0x3c)) are:

Proto fieldNameTypeMessage struct offsetRole
f1core_typeenum TpuCoreTypeProto@+0x2cselects which core type this element describes
f3compiler_reservedrepeated int32count @+0x1c, data @+0x18 (inline) / @+0x20 (heap)the per-id barrier SFLAG window (ascending contiguous)
f4sequencer_overlayint32@+0x30overlay-reserved SFLAG number (single index)
f5tile_overlayint32@+0x34tile-overlay SFLAG number (SC; single index)
f6global_barrier_sflagint32@+0x38SC global-barrier SFLAG number (single index)
f7local_barrier_sflagint32@+0x3cSC local-barrier SFLAG number (single index)

NOTE — the hasbits word sits at @+0x10. The core_type slot is +0x2c in the runtime message struct — distinct from +0x1c, which is the proto field-NUMBER / descriptor list offset, not the runtime struct offset. The compiler_reserved field-3 range, and the general SFLAG number formulas built from its base/count (base+count+4 global, base+count megacore, base+id per-key), are documented on Barrier-to-SFLAG Binding; the per-(codename, deployment) literal integers are on Per-Codename Compiler-Reserved.


2. The FromProto sink — TpuChipConfig::FromProto

TpuChipConfig::FromProto @0x20aea100 carries a dedicated special_purpose_sync_flags loop that, per SpecialPurposeSyncFlags proto element, decodes the core type, heap-copies the compiler_reserved range, captures the four scalars with their presence bits, and writes a 0x40-byte EnumMap element. The loop iterates the repeated field 13 (begin @proto+0x98, count @proto+0xa0). The byte-exact body:

// TpuChipConfig::FromProto @0x20aea100 — special_purpose_sync_flags loop (one element)
for (each SpecialPurposeSyncFlags msg in proto.special_purpose_sync_flags) {  // field 13
    core      = TpuCoreTypeFromProto(*(int*)(msg + 0x2c));        // 0=TC, 1=SC, 2=BC
    cr_count  = *(int*)(msg + 0x1c);                              // compiler_reserved size
    cr_data   = (hasbit on msg+0x18) ? *(int**)(msg + 0x20)       // heap repeated field
                                     : (int*)(msg + 0x18);        // inline
    // heap copy of the compiler_reserved range
    v.data = (int*)_Znwm(cr_count * 4);
    memcpy(v.data, cr_data, cr_count * 4);
    v.size = v.cap = cr_count;

    // the four scalars, each packed (value | present<<32)
    seq_ov   = (long)*(int*)(msg + 0x30) | (seq_ov   > 0 ? 0x100000000 : 0);   // f4
    tile_ov  = (long)*(int*)(msg + 0x34) | (tile_ov  > 0 ? 0x100000000 : 0);   // f5
    glob_bar = (long)*(int*)(msg + 0x38) | (glob_bar > 0 ? 0x100000000 : 0);   // f6
    local_bar= (long)*(int*)(msg + 0x3c) | (local_bar> 0 ? 0x100000000 : 0);   // f7

    elem = chip + 0x2a0 + (core << 6);          // EnumMap element, 0x40-byte stride
    *(int*)  (elem + 0x00) = core;              // element key / runtime core marker
    *(int**) (elem + 0x08) = v.data;            // compiler_reserved vector data ptr
    *(long*) (elem + 0x10) = v.size;            // compiler_reserved size
    *(long*) (elem + 0x18) = v.cap;             // compiler_reserved cap (== size)
    *(int*)  (elem + 0x20) = tile_ov_value;     *(bool*)(elem + 0x24) = tile_ov_present;
    *(int*)  (elem + 0x28) = seq_ov_value;      *(bool*)(elem + 0x2c) = seq_ov_present;
    *(int*)  (elem + 0x30) = glob_bar_value;    *(bool*)(elem + 0x34) = glob_bar_present;
    *(int*)  (elem + 0x38) = local_bar_value;   *(bool*)(elem + 0x3c) = local_bar_present;
    set_enummap_presence_bit(chip + 0x360, core);   // bts core, *(chip+0x360)
}

Two properties matter for a reimplementer:

  • The scalars are stored verbatim. The writer's >>8 / <<8 split + low-byte movzbl (visible in the raw disassembly) is a register-allocation artifact: it reconstructs the exact proto value while carrying the presence bool separately in bit 32 of the packed qword. The presence bool is computed setg (value > 0), so a zero or negative value reads as absent.
  • The element ordering is not the proto order. Note the element stores tile_overlay first (+0x20) then sequencer_overlay (+0x28), inverting the proto field order (f4 sequencer_overlay, f5 tile_overlay). The struct sinks (§5) read the element offsets, not the proto offsets — Target::Init reads element +0x28 for sequencer_overlay.

CONFIRMED (decompile) — in the special_purpose_sync_flags loop of FromProto, the element base is v1047 << 6 (= core * 0x40), and per element the four scalars are reconstructed verbatim (>>8 / <<8 | low-byte register split) with the presence bool stored separately: sequencer_overlay presence is carried as v706 = v705 + 0x100000000 (= *(int*)(msg+0x30), bit 32 = value > 0); tile_overlay/global_barrier/local_barrier presence are the byte setg flags v710/v711/v714 (= *(int*)(msg+0x34/0x38/0x3c) > 0) written at element +0x24/+0x34/+0x3c. The readers (Target::Init / SparseCoreTarget::Init) test & 0x100000000 on the packed qword. The repeated-field copy is operator new(4*count) + memcpy. This matches the proto→element→struct chain exactly.


3. The runtime container — EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3>

The three per-core elements live in a fixed-capacity EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> embedded directly inside TpuChipConfig at offset +0x2a0. The map is keyed by TpuCoreType (the enum value is the element index), with a separate per-core presence bitmask at +0x360. It is a fixed array, not a hash map: three 0x40-byte slots at +0x2a0, +0x2e0, +0x320.

3.1 The element layout (0x40 bytes)

Element offsetContentSource
+0x00element key / runtime core (0/1/2)core
+0x08compiler_reserved vector data ptr_Znwm result
+0x10compiler_reserved sizecr_count
+0x18compiler_reserved cap (== size)cr_count
+0x20tile_overlay (int32)proto f5 @msg+0x34
+0x24tile_overlay present (bool)setg (value > 0)
+0x28sequencer_overlay (int32)proto f4 @msg+0x30
+0x2csequencer_overlay present (bool)bit-32 packing
+0x30global_barrier_sflag (int32)proto f6 @msg+0x38
+0x34global_barrier_sflag present (bool)setg
+0x38local_barrier_sflag (int32)proto f7 @msg+0x3c
+0x3clocal_barrier_sflag present (bool)setg

3.2 Teardown — EnumMap::Clear frees only the vector

EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3>::Clear @0x20b08200 walks the three elements (+0x0/+0x40/+0x80), and for each whose presence bit (+0xc0 in the map) is set, frees the compiler_reserved vector (begin@+0x8, cap@+0x18 << 2 bytes) and zeros the size at +0x10. It never touches +0x20..+0x3c. Consequence: the four named scalars persist in the element padding — they are stored at FromProto time and read on demand by Target::Init / SparseCoreTarget::Init, not consumed and discarded.

NOTE — this is why the four scalars are described as living "in the element" rather than "in a parsed result struct". The compiler_reserved vector is the only heap allocation the element owns; everything else is plain-old-data in the 0x40-byte slot.


4. The accessor — GetSpecialPurposeSyncFlags(coreType)

GetSpecialPurposeSyncFlags(core) @0x20afcf40 is the single retrieval path for a SpecialPurposeSyncFlags element. It is a presence-gated, bounds-checked address computation over the EnumMap at chip+0x2a0:

// TpuChipConfig::GetSpecialPurposeSyncFlags(TpuCoreType core) @0x20afcf40
SpecialPurposeSyncFlags* GetSpecialPurposeSyncFlags(TpuChipConfig* chip, TpuCoreType core) {
    uint64_t mask = *(uint64_t*)(chip + 0x360);     // per-core presence bitmask
    if (!_bittest64(&mask, core)) return NULL;       // bt core, mask; jae → NULL
    if ((unsigned)core >= 3) ud1();                  // CHECK core ∈ {0,1,2}
    return chip + 0x2a0 + ((uint64_t)core << 6);     // element `core`, 0x40-byte stride
}

The contract a reimplementer must honour:

  • Presence is checked first, then bounds. If the chip config carried no SpecialPurposeSyncFlags for core, the accessor returns NULL before the core < 3 check. A caller that does not check the return value for NULL will dereference null garbage.
  • The index is core << 6, not +core. The element stride is 0x40 (the EnumMap element size); a reimplementation that indexes +0x2a0 + core reads into the TensorCore element's compiler_reserved pointer for core=1 and into its scalar padding for core=2 — garbage for kSparseCore / kBarnaCore.
  • The TensorCore entry is mandatory. Target::Init calls the accessor for kTensorCore and, on NULL, dies via DieBecauseNull with the string "chip_config.GetSpecialPurposeSyncFlags( ::tpu::TpuCoreType::kTensorCore)". Every valid chip config must carry a TC special-purpose entry.

CONFIRMED (decompile) — the accessor body is exactly v2 = *(chip+864); if (!_bittest64(&v2, core)) return 0; if ((unsigned)core >= 3) ud1; return chip + 672 + (core << 6) — i.e. +0x360 bitmask, +0x2a0 base (672 = 0x2a0), core << 6 stride. The six callers of this accessor in the build are Target::Init (×2), SparseCoreTarget::Init (×2), TpuPxcDriver::InitializeCores, and TpuProfilerControlListener::CanStartProfiler.

This accessor is also summarised on overview §5.1; this page is the authoritative derivation.


5. The overlay semantics — the four named scalars

The four named scalars (sequencer_overlay, tile_overlay, global_barrier_sflag, local_barrier_sflag) are each a single SFLAG number (an index into the SFLAG MemorySpace), not a bitmask. They overlay the general SFLAG window — i.e. they reserve specific SFLAG indices outside the compiler_reserved per-id range, pinned to the top of the per-gen SFLAG address space. After FromProto stores them into the EnumMap element, Target::Init / SparseCoreTarget::Init copy the present ones into named Target/SparseCoreTarget fields, each guarded by the bit-32 presence test.

5.1 The presence-gated qword

Each scalar occupies one 8-byte element qword: bits[31:0] = the SFLAG number, bit[32] (mask 0x100000000) = the proto-presence bool. The reader tests bit 32 (test $0x100000000, qword or bt $0x20, qword) and stores the low 32 bits only if present:

// SparseCoreTarget::Init @0x1d612b20 — scalar store (representative; one of four)
v74 = *(uint64_t*)(elem + 0x30);            // global_barrier_sflag (packed)
if (v74 & 0x100000000)                       // bit-32 presence gate
    *(int*)(sparse_core_target + 0x204) = (int)v74;   // store low 32 bits = the SFLAG number

5.2 The struct sinks and consumers

Target::Init copies only sequencer_overlay into a named TC field; SparseCoreTarget::Init copies all four into SC fields. The TC tile_overlay/global_barrier_sflag/local_barrier_sflag scalars are never read for the TC — the named-scalar barrier/overlay path is SparseCore-only. (The TC global/megacore/all-reduce barriers come from compiler_reserved via base+count+{0,2,3,4} — see Barrier-to-SFLAG Binding.)

Scalar (proto field)Element offPresence gateTC sinkSC sinkConsumer
sequencer_overlay (f4)+0x28bit 32Target+0x534SparseCoreTarget+0x200TC: GetOverlayReservedSyncFlagNumber → overlay-reserved SFLAG (single index). SC: none (stored, never read in this build)
tile_overlay (f5)+0x20bit 32(not copied)SparseCoreTarget+0x1e8SC: overlayer::OverlayProgram encodes it as a SyImm32 MC immediate; also read by EmitFinishDescriptorDma/EmitEmulatedContinuation
global_barrier_sflag (f6)+0x30bit 32(not copied)SparseCoreTarget+0x204SC: LoweringEmitter::Emit / CustomKernelEmitter::MaybeInsertGlobalBarrier (mlir::sparse_core::MemorySpaceAttr::get(ctx, 14) global-barrier SFLAG MemRef — SC MLIR enum sflag-band value 14, not the jellyfish MemorySpace enum)
local_barrier_sflag (f7)+0x38bit 32(not copied)SparseCoreTarget+0x208SC: lowering_util::ReservedLocalBarrierSflag (1-elt i32 MemRef, mlir::sparse_core::MemorySpaceAttr::get(ctx, 5) — SC MLIR enum sflag-band value 5, not jellyfish MemorySpace)
compiler_reserved (f3)+0x08/+0x10(vector, always)Target+0x8c0/+0x8c4 (count = size −5)SparseCoreTarget+0x1d0/+0x1d4 (count = size, no −5)per-id barrier window — see Barrier-to-SFLAG Binding

5.3 Why "overlay" is an index, not a bitmask

sequencer_overlay lands at Target+0x534 and is returned by GetOverlayReservedSyncFlagNumber() @0x1d617900 (mov 0x534(rdi), eax; ret). All four of its consumers treat it as one int SFLAG number, never AND/test it:

  • EmitContinuationTailcall @0x12718ca0: number → SflagImmPtr(int, "overlay reserved sync flag") → a single SFLAG pointer used as a DMA completion sflag.
  • LinkAndFinishProgram @0x10a25a20: inserts {number → "overlay"} into a std::map<int, string> (next to {GetGlobalBarrierSyncFlagNumber → "global barrier"}) — a single map key.
  • CodeGenerationHelper::emit_routine<1> / <3> @0x1409a5a0 / 0x14062da0: number → linked_hash_map<int, BundleImmediatesMetadata> key.

The per-gen values follow a 2^n − 1 progression — 254 (0xFE, JF/DF), 511 (0x1FF, PF/VF/GL), 4095 (0xFFF, GF) — not because it is a bitmask, but because the overlay-reserved SFLAG is pinned to the topmost addressable SFLAG index, and the SFLAG-number encoding width grows per generation (8→9→12 bits; gen ISA SyncFlagCountType). The SC tile_overlay/sequencer_overlay (7167/7157) likewise land as single SFLAG indices, materialised as 1-element i32 MemRefs in MemorySpace::sflag.

LOW — the per-gen SFLAG-number bit-width (8/9/12) is attributed from the gen ISA SyncFlagCountType plus the 2^n − 1 overlay-number progression; it is not read as a single binary literal in one accessor. The index-not-bitmask conclusion is CONFIRMED (no consumer masks the value); the width is inferred. The literal scalar values per gen are an embedded-memfile dependency (see Per-Codename Compiler-Reserved).

QUIRK — the SC sequencer_overlay (SparseCoreTarget+0x200, value 7157) is stored by SparseCoreTarget::Init but has no reader anywhere in .text in this build — it is ingested-but-unused. The SC overlayer reads tile_overlay (+0x1e8), not sequencer_overlay. Whether this is a retired field, a forward-declared field for a future generation, or a flag-disabled path is not determinable from the binary. A reimplementer should still ingest it (the proto carries it) but need not wire a consumer.


6. The full chip-config → element → struct → consumer chain

The complete data path from proto to consumer, byte-anchored end-to-end:

TpuChipConfigProto.special_purpose_sync_flags[core]   (field 13, repeated SpecialPurposeSyncFlags)
        │   core_type f1 @+0x2c ; compiler_reserved f3 ; seq_ov f4 ; tile_ov f5 ; glob f6 ; local f7
        │
  FromProto @0x20aea100  — per element:
        │   _Znwm + memcpy(compiler_reserved) ; pack 4 scalars (value | present<<32)
        ▼
EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3>  @TpuChipConfig+0x2a0  (stride 0x40, bitmask +0x360)
        │   element: +0x08/+0x10/+0x18 cr vector ; +0x20/+0x28/+0x30/+0x38 scalars (+presence)
        │   Clear @0x20b08200 frees only the cr vector → scalars persist
        │
  GetSpecialPurposeSyncFlags(core) @0x20afcf40  — presence-gated, bounds-checked, +0x2a0+(core<<6)
        │
        ├── Target::Init @0x1d60fc20  (core=TC, NULL → DieBecauseNull)
        │     element +0x10 size / +0x08 data → contiguity-check → Target+0x8c0 base, +0x8c4 count−5
        │     element +0x28 sequencer_overlay (if present) → Target+0x534 = GetOverlayReservedSyncFlagNumber
        │       → EmitContinuationTailcall / LinkAndFinishProgram / emit_routine<1,3>  (single index)
        │
        ├── SparseCoreTarget::Init @0x1d612b20  (core=SC)
        │     element +0x10 size / +0x08 data → SparseCoreTarget+0x1d0 base, +0x1d4 count (no −5)
        │     +0x20 tile_overlay → +0x1e8   (→ overlayer::OverlayProgram SyImm32)
        │     +0x28 sequencer_overlay → +0x200   (stored, NO reader)
        │     +0x30 global_barrier_sflag → +0x204 (→ MaybeInsertGlobalBarrier, SC MLIR MemSpace 14)
        │     +0x38 local_barrier_sflag → +0x208  (→ ReservedLocalBarrierSflag, SC MLIR MemSpace 5)
        │
        ├── TpuPxcDriver::InitializeCores @0xe806500  (reads only the cr vector +0x08/+0x10)
        └── TpuProfilerControlListener::CanStartProfiler @0xf3328c0  (profiler gate)

The compiler_reserved carve (the −5 reservation on TC, the SC full-range, and the per-id / global / megacore / all-reduce SFLAG formulas built from base/count) is the subject of Barrier-to-SFLAG Binding. This page stops at the field stores; it does not re-derive the number formulas.


7. Verification notes

Byte-exact in libtpu.so v0.0.40:

  • GetSpecialPurposeSyncFlags @0x20afcf40: v2 = *(chip+864); if (!_bittest64(&v2, core)) return 0; if ((unsigned)core >= 3) ud1; return chip + 672 + (core<<6) — i.e. +0x360 bitmask, +0x2a0 base, core<<6 stride — byte-exact.
  • TpuChipConfig::FromProto @0x20aea100: in the special_purpose_sync_flags loop the element base is core << 6; the four scalars are stored verbatim (>>8/<<8|low-byte reconstruction), sequencer_overlay presence carried as v705 + 0x100000000 and the other three as setg (value > 0) bytes at element +0x24/+0x34/+0x3c; compiler_reserved via operator new(4*count) + memcpy — confirms the element build and the (value, present) packing.
  • Target::Init @0x1d60fc20: *((_DWORD*)target + 560) = *base (= Target+0x8c0 base); *((_DWORD*)target + 561) = size − 5 (= Target+0x8c4 count); sequencer_overlay gated by & 0x100000000 at element +0x28; DieBecauseNull("…GetSpecialPurposeSyncFlags( ::tpu::TpuCoreType::kTensorCore)") on NULL — exact.
  • SparseCoreTarget::Init @0x1d612b20: +464/+468 = +0x1d0/+0x1d4 (base / count, no −5); the four scalar stores +488/+512/+516/+520 (= +0x1e8/+0x200/+0x204/+0x208), each gated by & 0x100000000 — exact.

[LOW] The literal per-gen compiler_reserved integers and the four scalar values (e.g. sequencer_overlay = 254/511/4095, SC global_barrier_sflag = 7156, local_barrier_sflag = 7155) are runtime-resolved from embedded chip-config memfile binarypb blobs and were not statically extracted here. The per-gen SFLAG-number bit-width (8/9/12) is inferred from the 2^n − 1 overlay progression + the gen ISA SyncFlagCountType, not read as a single literal. The index-not-bitmask overlay semantics, the FromProto sink, the accessor, the EnumMap layout, and the struct sinks are CONFIRMED byte-anchored.


Cross-References

Barrier algorithms (this section)

  • overview — the barrier subsystem map; §5.1 summarises the GetSpecialPurposeSyncFlags accessor (this page is the authoritative derivation)
  • Barrier-to-SFLAG Binding — the general SFLAG number formulas built from compiler_reserved base/count (base+count+4 global, base+count megacore, base+id per-key)
  • Per-Codename Compiler-Reserved — the literal per-(codename, deployment) compiler_reserved integers and scalar values (memfile-resolved)
  • Infer Barrier Config — the pincer-fusion CUSTOM → GLOBAL/REPLICA normaliser that consumes the carved TC count
  • Global-Barrier Window — the base+count+4 global SFLAG slot and the per-core barrier window

Sibling subsystems