SpecialPurposeSyncFlags — the chip-config SFLAG reservation message, its FromProto sink, and the GetSpecialPurposeSyncFlags accessor
Binary:
extracted/libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64/libtpu/libtpu.so(build-id89edbbe81c5b328a958fe628a9f2207d, buildlibtpu_lts_20260413_b_RC00;.textVMA == file offset0xe63c000,.rodataVMA == file offset0x84a0000). Status: Reimplementation-grade · Evidence grade: Confirmed (byte-anchored) — theGetSpecialPurposeSyncFlagsaccessor, theTpuChipConfig::FromProtoelement build, the 0x40-byte runtime element layout, theEnumMap::Clearscalar-persistence, and theTarget::Init/SparseCoreTarget::Initstruct sinks are all byte-exact (+core<<6,& 0x100000000presence gates,size − 5). The literal per-gen SFLAG integers are an embedded-memfile dependency (LOW — see Per-Codename Compiler-Reserved). · Part XIII — On-Pod Collectives & Barriers / SFLAG & barriers · back to index
Abstract
Every barrier on a TPU is a reserved sync-flag (SFLAG) number, and the numbers a barrier may take are not hard-coded — they are carried, one set per core type, in the chip config. The carrier is a proto submessage, SpecialPurposeSyncFlags, embedded as the repeated field 13 of TpuChipConfigProto. Each instance pairs a compiler_reserved repeated-int32 range (the per-id barrier window) with four named scalar SFLAG numbers (sequencer_overlay, tile_overlay, global_barrier_sflag, local_barrier_sflag). The general SFLAG number formulas that consume compiler_reserved live on Barrier-to-SFLAG Binding; the literal per-(codename, deployment) integers live on Per-Codename Compiler-Reserved. This page owns the parsing and the access path: the TpuChipConfig::FromProto runtime sink that lands the proto into an EnumMap element, the GetSpecialPurposeSyncFlags(coreType) accessor that retrieves it (chip+0x2a0 + (core<<6), gated by the chip+0x360 presence bitmask), the overlay semantics of the four named scalars (each a single SFLAG index — not a bitmask — packed with a presence bool, surviving EnumMap::Clear), and the EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> that keys the set by core type.
For reimplementation, the contract is:
- The proto carrier:
TpuChipConfigProto.special_purpose_sync_flags(field 13, repeatedSpecialPurposeSyncFlags), one element perTpuCoreType. Each element hascore_type(f1),compiler_reserved(f3, repeated int32), and four scalar SFLAG numberssequencer_overlay/tile_overlay/global_barrier_sflag/local_barrier_sflag(f4–f7). - The FromProto sink:
TpuChipConfig::FromProto@0x20aea100heap-copiescompiler_reservedinto avector<int32>and stores the four scalars verbatim (low 32 bits = number, bit 32 = presence) into a 0x40-byteEnumMapelement, then sets the EnumMap presence bit for that core. The scalars are not folded intocompiler_reserved— they persist in the element padding. - The accessor:
GetSpecialPurposeSyncFlags(core)@0x20afcf40returnsNULLif the per-core presence bit (chip+0x360) is clear,ud1-traps ifcore >= 3, else returnschip + 0x2a0 + (core<<6)(element stride0x40). The TensorCore entry is mandatory:Target::Initdies viaDieBecauseNullif it is absent. - The overlay semantics: each named scalar is a single SFLAG number, presence-gated, copied into a
Target/SparseCoreTargetfield on demand. The named-scalar path is SparseCore-only; on the TC onlysequencer_overlay(→Target+0x534) is consumed. The four scalars surviveEnumMap::Clear(which frees only thecompiler_reservedvector).
| Proto carrier | TpuChipConfigProto.special_purpose_sync_flags (field 13, repeated SpecialPurposeSyncFlags) |
| Proto element fields | core_type f1 @msg+0x2c; compiler_reserved f3 (count @+0x1c, data @+0x18/+0x20); sequencer_overlay f4 @+0x30; tile_overlay f5 @+0x34; global_barrier_sflag f6 @+0x38; local_barrier_sflag f7 @+0x3c |
| FromProto sink | TpuChipConfig::FromProto @0x20aea100 (special_purpose_sync_flags loop) |
| Runtime container | EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> @TpuChipConfig+0x2a0, stride 0x40, presence bitmask @+0x360 |
| Accessor | GetSpecialPurposeSyncFlags(core) @0x20afcf40 → chip + 0x2a0 + (core<<6) |
| EnumMap teardown | EnumMap<…SpecialPurposeSyncFlags,3>::Clear @0x20b08200 (frees compiler_reserved vector only) |
| TC struct sink | Target::Init @0x1d60fc20 → Target+0x8c0/+0x8c4 (cr base/count−5), +0x534 (sequencer_overlay) |
| SC struct sink | SparseCoreTarget::Init @0x1d612b20 → +0x1d0/+0x1d4 (cr base/count, no −5), +0x1e8/+0x200/+0x204/+0x208 (the four scalars) |
| Scalar semantics | single SFLAG index, not a bitmask; value@bits[31:0], present@bit[32] |
1. The proto carrier — SpecialPurposeSyncFlags
SpecialPurposeSyncFlags is a submessage of TpuChipConfigProto, carried as the repeated field 13 special_purpose_sync_flags. The chip config carries one instance per TpuCoreType (kTensorCore=0, kSparseCore=1, kBarnaCore=2). Each instance bundles a range of SFLAG numbers reserved for the compiler with four individual named SFLAG numbers used by the overlay / barrier lowering.
The generated-C++ message struct offsets (recovered from the generated _InternalSerialize @0x20b0f040, which is the authoritative layout witness — WriteInt32<ILi4>(+0x30) … <ILi7>(+0x3c)) are:
| Proto field | Name | Type | Message struct offset | Role |
|---|---|---|---|---|
| f1 | core_type | enum TpuCoreTypeProto | @+0x2c | selects which core type this element describes |
| f3 | compiler_reserved | repeated int32 | count @+0x1c, data @+0x18 (inline) / @+0x20 (heap) | the per-id barrier SFLAG window (ascending contiguous) |
| f4 | sequencer_overlay | int32 | @+0x30 | overlay-reserved SFLAG number (single index) |
| f5 | tile_overlay | int32 | @+0x34 | tile-overlay SFLAG number (SC; single index) |
| f6 | global_barrier_sflag | int32 | @+0x38 | SC global-barrier SFLAG number (single index) |
| f7 | local_barrier_sflag | int32 | @+0x3c | SC local-barrier SFLAG number (single index) |
NOTE — the hasbits word sits at
@+0x10. Thecore_typeslot is+0x2cin the runtime message struct — distinct from+0x1c, which is the proto field-NUMBER / descriptor list offset, not the runtime struct offset. Thecompiler_reservedfield-3 range, and the general SFLAG number formulas built from itsbase/count(base+count+4global,base+countmegacore,base+idper-key), are documented on Barrier-to-SFLAG Binding; the per-(codename, deployment)literal integers are on Per-Codename Compiler-Reserved.
2. The FromProto sink — TpuChipConfig::FromProto
TpuChipConfig::FromProto @0x20aea100 carries a dedicated special_purpose_sync_flags loop that, per SpecialPurposeSyncFlags proto element, decodes the core type, heap-copies the compiler_reserved range, captures the four scalars with their presence bits, and writes a 0x40-byte EnumMap element. The loop iterates the repeated field 13 (begin @proto+0x98, count @proto+0xa0). The byte-exact body:
// TpuChipConfig::FromProto @0x20aea100 — special_purpose_sync_flags loop (one element)
for (each SpecialPurposeSyncFlags msg in proto.special_purpose_sync_flags) { // field 13
core = TpuCoreTypeFromProto(*(int*)(msg + 0x2c)); // 0=TC, 1=SC, 2=BC
cr_count = *(int*)(msg + 0x1c); // compiler_reserved size
cr_data = (hasbit on msg+0x18) ? *(int**)(msg + 0x20) // heap repeated field
: (int*)(msg + 0x18); // inline
// heap copy of the compiler_reserved range
v.data = (int*)_Znwm(cr_count * 4);
memcpy(v.data, cr_data, cr_count * 4);
v.size = v.cap = cr_count;
// the four scalars, each packed (value | present<<32)
seq_ov = (long)*(int*)(msg + 0x30) | (seq_ov > 0 ? 0x100000000 : 0); // f4
tile_ov = (long)*(int*)(msg + 0x34) | (tile_ov > 0 ? 0x100000000 : 0); // f5
glob_bar = (long)*(int*)(msg + 0x38) | (glob_bar > 0 ? 0x100000000 : 0); // f6
local_bar= (long)*(int*)(msg + 0x3c) | (local_bar> 0 ? 0x100000000 : 0); // f7
elem = chip + 0x2a0 + (core << 6); // EnumMap element, 0x40-byte stride
*(int*) (elem + 0x00) = core; // element key / runtime core marker
*(int**) (elem + 0x08) = v.data; // compiler_reserved vector data ptr
*(long*) (elem + 0x10) = v.size; // compiler_reserved size
*(long*) (elem + 0x18) = v.cap; // compiler_reserved cap (== size)
*(int*) (elem + 0x20) = tile_ov_value; *(bool*)(elem + 0x24) = tile_ov_present;
*(int*) (elem + 0x28) = seq_ov_value; *(bool*)(elem + 0x2c) = seq_ov_present;
*(int*) (elem + 0x30) = glob_bar_value; *(bool*)(elem + 0x34) = glob_bar_present;
*(int*) (elem + 0x38) = local_bar_value; *(bool*)(elem + 0x3c) = local_bar_present;
set_enummap_presence_bit(chip + 0x360, core); // bts core, *(chip+0x360)
}
Two properties matter for a reimplementer:
- The scalars are stored verbatim. The writer's
>>8/<<8split + low-bytemovzbl(visible in the raw disassembly) is a register-allocation artifact: it reconstructs the exact proto value while carrying the presence bool separately in bit 32 of the packed qword. The presence bool is computedsetg(value > 0), so a zero or negative value reads as absent. - The element ordering is not the proto order. Note the element stores
tile_overlayfirst (+0x20) thensequencer_overlay(+0x28), inverting the proto field order (f4sequencer_overlay, f5tile_overlay). The struct sinks (§5) read the element offsets, not the proto offsets —Target::Initreads element+0x28forsequencer_overlay.
CONFIRMED (decompile) — in the
special_purpose_sync_flagsloop ofFromProto, the element base isv1047 << 6(=core * 0x40), and per element the four scalars are reconstructed verbatim (>>8/<<8 | low-byteregister split) with the presence bool stored separately:sequencer_overlaypresence is carried asv706 = v705 + 0x100000000(=*(int*)(msg+0x30), bit 32 =value > 0);tile_overlay/global_barrier/local_barrierpresence are the bytesetgflagsv710/v711/v714(=*(int*)(msg+0x34/0x38/0x3c) > 0) written at element+0x24/+0x34/+0x3c. The readers (Target::Init/SparseCoreTarget::Init) test& 0x100000000on the packed qword. The repeated-field copy isoperator new(4*count)+memcpy. This matches the proto→element→struct chain exactly.
3. The runtime container — EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3>
The three per-core elements live in a fixed-capacity EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> embedded directly inside TpuChipConfig at offset +0x2a0. The map is keyed by TpuCoreType (the enum value is the element index), with a separate per-core presence bitmask at +0x360. It is a fixed array, not a hash map: three 0x40-byte slots at +0x2a0, +0x2e0, +0x320.
3.1 The element layout (0x40 bytes)
| Element offset | Content | Source |
|---|---|---|
+0x00 | element key / runtime core (0/1/2) | core |
+0x08 | compiler_reserved vector data ptr | _Znwm result |
+0x10 | compiler_reserved size | cr_count |
+0x18 | compiler_reserved cap (== size) | cr_count |
+0x20 | tile_overlay (int32) | proto f5 @msg+0x34 |
+0x24 | tile_overlay present (bool) | setg (value > 0) |
+0x28 | sequencer_overlay (int32) | proto f4 @msg+0x30 |
+0x2c | sequencer_overlay present (bool) | bit-32 packing |
+0x30 | global_barrier_sflag (int32) | proto f6 @msg+0x38 |
+0x34 | global_barrier_sflag present (bool) | setg |
+0x38 | local_barrier_sflag (int32) | proto f7 @msg+0x3c |
+0x3c | local_barrier_sflag present (bool) | setg |
3.2 Teardown — EnumMap::Clear frees only the vector
EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3>::Clear @0x20b08200 walks the three elements (+0x0/+0x40/+0x80), and for each whose presence bit (+0xc0 in the map) is set, frees the compiler_reserved vector (begin@+0x8, cap@+0x18 << 2 bytes) and zeros the size at +0x10. It never touches +0x20..+0x3c. Consequence: the four named scalars persist in the element padding — they are stored at FromProto time and read on demand by Target::Init / SparseCoreTarget::Init, not consumed and discarded.
NOTE — this is why the four scalars are described as living "in the element" rather than "in a parsed result struct". The
compiler_reservedvector is the only heap allocation the element owns; everything else is plain-old-data in the 0x40-byte slot.
4. The accessor — GetSpecialPurposeSyncFlags(coreType)
GetSpecialPurposeSyncFlags(core) @0x20afcf40 is the single retrieval path for a SpecialPurposeSyncFlags element. It is a presence-gated, bounds-checked address computation over the EnumMap at chip+0x2a0:
// TpuChipConfig::GetSpecialPurposeSyncFlags(TpuCoreType core) @0x20afcf40
SpecialPurposeSyncFlags* GetSpecialPurposeSyncFlags(TpuChipConfig* chip, TpuCoreType core) {
uint64_t mask = *(uint64_t*)(chip + 0x360); // per-core presence bitmask
if (!_bittest64(&mask, core)) return NULL; // bt core, mask; jae → NULL
if ((unsigned)core >= 3) ud1(); // CHECK core ∈ {0,1,2}
return chip + 0x2a0 + ((uint64_t)core << 6); // element `core`, 0x40-byte stride
}
The contract a reimplementer must honour:
- Presence is checked first, then bounds. If the chip config carried no
SpecialPurposeSyncFlagsforcore, the accessor returnsNULLbefore thecore < 3check. A caller that does not check the return value forNULLwill dereference null garbage. - The index is
core << 6, not+core. The element stride is0x40(the EnumMap element size); a reimplementation that indexes+0x2a0 + corereads into the TensorCore element'scompiler_reservedpointer forcore=1and into its scalar padding forcore=2— garbage forkSparseCore/kBarnaCore. - The TensorCore entry is mandatory.
Target::Initcalls the accessor forkTensorCoreand, onNULL, dies viaDieBecauseNullwith the string"chip_config.GetSpecialPurposeSyncFlags( ::tpu::TpuCoreType::kTensorCore)". Every valid chip config must carry a TC special-purpose entry.
CONFIRMED (decompile) — the accessor body is exactly
v2 = *(chip+864); if (!_bittest64(&v2, core)) return 0; if ((unsigned)core >= 3) ud1; return chip + 672 + (core << 6)— i.e.+0x360bitmask,+0x2a0base (672 = 0x2a0),core << 6stride. The six callers of this accessor in the build areTarget::Init(×2),SparseCoreTarget::Init(×2),TpuPxcDriver::InitializeCores, andTpuProfilerControlListener::CanStartProfiler.
This accessor is also summarised on overview §5.1; this page is the authoritative derivation.
5. The overlay semantics — the four named scalars
The four named scalars (sequencer_overlay, tile_overlay, global_barrier_sflag, local_barrier_sflag) are each a single SFLAG number (an index into the SFLAG MemorySpace), not a bitmask. They overlay the general SFLAG window — i.e. they reserve specific SFLAG indices outside the compiler_reserved per-id range, pinned to the top of the per-gen SFLAG address space. After FromProto stores them into the EnumMap element, Target::Init / SparseCoreTarget::Init copy the present ones into named Target/SparseCoreTarget fields, each guarded by the bit-32 presence test.
5.1 The presence-gated qword
Each scalar occupies one 8-byte element qword: bits[31:0] = the SFLAG number, bit[32] (mask 0x100000000) = the proto-presence bool. The reader tests bit 32 (test $0x100000000, qword or bt $0x20, qword) and stores the low 32 bits only if present:
// SparseCoreTarget::Init @0x1d612b20 — scalar store (representative; one of four)
v74 = *(uint64_t*)(elem + 0x30); // global_barrier_sflag (packed)
if (v74 & 0x100000000) // bit-32 presence gate
*(int*)(sparse_core_target + 0x204) = (int)v74; // store low 32 bits = the SFLAG number
5.2 The struct sinks and consumers
Target::Init copies only sequencer_overlay into a named TC field; SparseCoreTarget::Init copies all four into SC fields. The TC tile_overlay/global_barrier_sflag/local_barrier_sflag scalars are never read for the TC — the named-scalar barrier/overlay path is SparseCore-only. (The TC global/megacore/all-reduce barriers come from compiler_reserved via base+count+{0,2,3,4} — see Barrier-to-SFLAG Binding.)
| Scalar (proto field) | Element off | Presence gate | TC sink | SC sink | Consumer |
|---|---|---|---|---|---|
sequencer_overlay (f4) | +0x28 | bit 32 | Target+0x534 | SparseCoreTarget+0x200 | TC: GetOverlayReservedSyncFlagNumber → overlay-reserved SFLAG (single index). SC: none (stored, never read in this build) |
tile_overlay (f5) | +0x20 | bit 32 | (not copied) | SparseCoreTarget+0x1e8 | SC: overlayer::OverlayProgram encodes it as a SyImm32 MC immediate; also read by EmitFinishDescriptorDma/EmitEmulatedContinuation |
global_barrier_sflag (f6) | +0x30 | bit 32 | (not copied) | SparseCoreTarget+0x204 | SC: LoweringEmitter::Emit / CustomKernelEmitter::MaybeInsertGlobalBarrier (mlir::sparse_core::MemorySpaceAttr::get(ctx, 14) global-barrier SFLAG MemRef — SC MLIR enum sflag-band value 14, not the jellyfish MemorySpace enum) |
local_barrier_sflag (f7) | +0x38 | bit 32 | (not copied) | SparseCoreTarget+0x208 | SC: lowering_util::ReservedLocalBarrierSflag (1-elt i32 MemRef, mlir::sparse_core::MemorySpaceAttr::get(ctx, 5) — SC MLIR enum sflag-band value 5, not jellyfish MemorySpace) |
compiler_reserved (f3) | +0x08/+0x10 | (vector, always) | Target+0x8c0/+0x8c4 (count = size −5) | SparseCoreTarget+0x1d0/+0x1d4 (count = size, no −5) | per-id barrier window — see Barrier-to-SFLAG Binding |
5.3 Why "overlay" is an index, not a bitmask
sequencer_overlay lands at Target+0x534 and is returned by GetOverlayReservedSyncFlagNumber() @0x1d617900 (mov 0x534(rdi), eax; ret). All four of its consumers treat it as one int SFLAG number, never AND/test it:
EmitContinuationTailcall@0x12718ca0: number →SflagImmPtr(int, "overlay reserved sync flag")→ a single SFLAG pointer used as a DMA completion sflag.LinkAndFinishProgram@0x10a25a20: inserts{number → "overlay"}into astd::map<int, string>(next to{GetGlobalBarrierSyncFlagNumber → "global barrier"}) — a single map key.CodeGenerationHelper::emit_routine<1>/<3>@0x1409a5a0/0x14062da0: number →linked_hash_map<int, BundleImmediatesMetadata>key.
The per-gen values follow a 2^n − 1 progression — 254 (0xFE, JF/DF), 511 (0x1FF, PF/VF/GL), 4095 (0xFFF, GF) — not because it is a bitmask, but because the overlay-reserved SFLAG is pinned to the topmost addressable SFLAG index, and the SFLAG-number encoding width grows per generation (8→9→12 bits; gen ISA SyncFlagCountType). The SC tile_overlay/sequencer_overlay (7167/7157) likewise land as single SFLAG indices, materialised as 1-element i32 MemRefs in MemorySpace::sflag.
LOW — the per-gen SFLAG-number bit-width (8/9/12) is attributed from the gen ISA
SyncFlagCountTypeplus the2^n − 1overlay-number progression; it is not read as a single binary literal in one accessor. The index-not-bitmask conclusion is CONFIRMED (no consumer masks the value); the width is inferred. The literal scalar values per gen are an embedded-memfile dependency (see Per-Codename Compiler-Reserved).
QUIRK — the SC
sequencer_overlay(SparseCoreTarget+0x200, value 7157) is stored bySparseCoreTarget::Initbut has no reader anywhere in.textin this build — it is ingested-but-unused. The SC overlayer readstile_overlay(+0x1e8), notsequencer_overlay. Whether this is a retired field, a forward-declared field for a future generation, or a flag-disabled path is not determinable from the binary. A reimplementer should still ingest it (the proto carries it) but need not wire a consumer.
6. The full chip-config → element → struct → consumer chain
The complete data path from proto to consumer, byte-anchored end-to-end:
TpuChipConfigProto.special_purpose_sync_flags[core] (field 13, repeated SpecialPurposeSyncFlags)
│ core_type f1 @+0x2c ; compiler_reserved f3 ; seq_ov f4 ; tile_ov f5 ; glob f6 ; local f7
│
FromProto @0x20aea100 — per element:
│ _Znwm + memcpy(compiler_reserved) ; pack 4 scalars (value | present<<32)
▼
EnumMap<TpuCoreType, SpecialPurposeSyncFlags, 3> @TpuChipConfig+0x2a0 (stride 0x40, bitmask +0x360)
│ element: +0x08/+0x10/+0x18 cr vector ; +0x20/+0x28/+0x30/+0x38 scalars (+presence)
│ Clear @0x20b08200 frees only the cr vector → scalars persist
│
GetSpecialPurposeSyncFlags(core) @0x20afcf40 — presence-gated, bounds-checked, +0x2a0+(core<<6)
│
├── Target::Init @0x1d60fc20 (core=TC, NULL → DieBecauseNull)
│ element +0x10 size / +0x08 data → contiguity-check → Target+0x8c0 base, +0x8c4 count−5
│ element +0x28 sequencer_overlay (if present) → Target+0x534 = GetOverlayReservedSyncFlagNumber
│ → EmitContinuationTailcall / LinkAndFinishProgram / emit_routine<1,3> (single index)
│
├── SparseCoreTarget::Init @0x1d612b20 (core=SC)
│ element +0x10 size / +0x08 data → SparseCoreTarget+0x1d0 base, +0x1d4 count (no −5)
│ +0x20 tile_overlay → +0x1e8 (→ overlayer::OverlayProgram SyImm32)
│ +0x28 sequencer_overlay → +0x200 (stored, NO reader)
│ +0x30 global_barrier_sflag → +0x204 (→ MaybeInsertGlobalBarrier, SC MLIR MemSpace 14)
│ +0x38 local_barrier_sflag → +0x208 (→ ReservedLocalBarrierSflag, SC MLIR MemSpace 5)
│
├── TpuPxcDriver::InitializeCores @0xe806500 (reads only the cr vector +0x08/+0x10)
└── TpuProfilerControlListener::CanStartProfiler @0xf3328c0 (profiler gate)
The compiler_reserved carve (the −5 reservation on TC, the SC full-range, and the per-id / global / megacore / all-reduce SFLAG formulas built from base/count) is the subject of Barrier-to-SFLAG Binding. This page stops at the field stores; it does not re-derive the number formulas.
7. Verification notes
Byte-exact in
libtpu.sov0.0.40:
GetSpecialPurposeSyncFlags@0x20afcf40:v2 = *(chip+864);if (!_bittest64(&v2, core)) return 0;if ((unsigned)core >= 3) ud1;return chip + 672 + (core<<6)— i.e.+0x360bitmask,+0x2a0base,core<<6stride — byte-exact.TpuChipConfig::FromProto@0x20aea100: in thespecial_purpose_sync_flagsloop the element base iscore << 6; the four scalars are stored verbatim (>>8/<<8|low-bytereconstruction),sequencer_overlaypresence carried asv705 + 0x100000000and the other three assetg (value > 0)bytes at element+0x24/+0x34/+0x3c;compiler_reservedviaoperator new(4*count)+memcpy— confirms the element build and the (value, present) packing.Target::Init@0x1d60fc20:*((_DWORD*)target + 560) = *base(=Target+0x8c0base);*((_DWORD*)target + 561) = size − 5(=Target+0x8c4count); sequencer_overlay gated by& 0x100000000at element+0x28;DieBecauseNull("…GetSpecialPurposeSyncFlags( ::tpu::TpuCoreType::kTensorCore)")onNULL— exact.SparseCoreTarget::Init@0x1d612b20:+464/+468=+0x1d0/+0x1d4(base / count, no −5); the four scalar stores+488/+512/+516/+520(=+0x1e8/+0x200/+0x204/+0x208), each gated by& 0x100000000— exact.[LOW] The literal per-gen
compiler_reservedintegers and the four scalar values (e.g.sequencer_overlay = 254/511/4095, SCglobal_barrier_sflag = 7156,local_barrier_sflag = 7155) are runtime-resolved from embedded chip-config memfile binarypb blobs and were not statically extracted here. The per-gen SFLAG-number bit-width (8/9/12) is inferred from the2^n − 1overlay progression + the gen ISASyncFlagCountType, not read as a single literal. The index-not-bitmask overlay semantics, the FromProto sink, the accessor, the EnumMap layout, and the struct sinks are CONFIRMED byte-anchored.
Cross-References
Barrier algorithms (this section)
- overview — the barrier subsystem map; §5.1 summarises the
GetSpecialPurposeSyncFlagsaccessor (this page is the authoritative derivation) - Barrier-to-SFLAG Binding — the general SFLAG number formulas built from
compiler_reservedbase/count(base+count+4global,base+countmegacore,base+idper-key) - Per-Codename Compiler-Reserved — the literal per-
(codename, deployment)compiler_reservedintegers and scalar values (memfile-resolved) - Infer Barrier Config — the pincer-fusion
CUSTOM → GLOBAL/REPLICAnormaliser that consumes the carved TCcount - Global-Barrier Window — the
base+count+4global SFLAG slot and the per-core barrier window
Sibling subsystems
- SFLAG Sync-Flag Tier — the SFLAG atomic-counter substrate every barrier (and overlay reservation) is built on
- back to index