Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TCE Field-Offsets & Flag Defaults

All addresses on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id md5 89edbbe81c5b328a958fe628a9f2207d, internal tag libtpu_lts_20260413_b_RC00). The ELF is not stripped; symbol names below are read verbatim from .symtab. Other libtpu builds will differ.

Abstract

The TpuCompilationEnvironment (TCE) is the TPU-private master config proto — 1121 live fields, each one registered as an absl::Flag. Two questions remain after the field dictionary has named and typed every field: where does each knob physically live in the materialized C++ object, and what literal value is it seeded to before any flag, AUTO arm, or per-codename overlay touches it. This page owns both answers — the field# → struct-byte-offset map and the field# → flag-default-value map — plus the write-path functions that move a parsed value into the object.

The offset oracle is not a hand-disassembled accessor list. It is the protobuf fast-parse table TpuCompilationEnvironment::_table_ (a TcParseTableBase), whose 1121-entry FieldEntry array carries the exact struct offset of every field, sorted by ascending field number. The default oracle is the union at FlagImpl+0x48 inside each FLAGS_<name> object: either a relocation pointing at an AbslFlagDefaultGenFor<name>::Gen body (which movs an immediate), or — when no reloc is present — eight inline literal bytes. 507 fields use a Gen function; 614 use an inline literal. The materialization driver, CreateDefaultTpuCompEnv @0x1d73dfa0, walks the descriptor and runs GetFlagForField → ReadFlag → SetEnvField once per field; the two surgical writers, SetFieldFromFlagString @0x1d73fcc0 and OverwriteFieldIfNotDefault @0x1d73f360, override individual fields afterward.

The reader should treat this page as the bridge layer. The schema (name, type, wrapper enum, HOT tag) is on the dictionary pages; the AUTO-arm resolution of message/oneof fields is on AutoProto / AutoOr Resolution; the structural overview of the object is on TpuCompilationEnvironment. Here, the field# is joined to offset and default value, and the flag-write path is traced.

For reimplementation, the contract is:

  • The offset formula — how to compute struct_offset(field N) from the TcParseTableBase header constants and the FieldEntry array, deterministically, for any of the 1121 fields.
  • The default-value mechanism — the FlagImpl+0x48 union, the kGenFunc-vs-kOneWord discriminator (presence of a reloc), and how to decode each base type's immediate.
  • The seeding driver and the two overridersCreateDefaultTpuCompEnv's per-field loop, and the SetFieldFromFlagString / OverwriteFieldIfNotDefault single-field write path with its "both set" conflict rule.
  • The default census — which values are non-trivial (the 120 MiB combiner thresholds, the INT64_MAX fuel knobs, the MSA overlap-ratio triples, the 7 wrapper-enum defaults) so a reimplementer can reproduce a byte-identical default env.
Offset oracleTpuCompilationEnvironment::_table_ @0x21cfa9e0 (.data.rel.ro)
FieldEntry arraytable +0x370, 1121 × 12 B, ascending field#
sizeof(TpuCompilationEnvironment)0x15e8 (5608 B); field data +0xa8..+0x15e0
Default unionFlagImpl+0x48 per FLAGS_<name> (reloc ⇒ gen body, none ⇒ inline)
Default split507 kGenFunc / 614 kOneWord (1121 total, perfect 1:1 field↔flag)
Seeding driverCreateDefaultTpuCompEnv @0x1d73dfa0
Single-field writersSetFieldFromFlagString @0x1d73fcc0, OverwriteFieldIfNotDefault @0x1d73f360
Default singletonGetTpuCompEnvWithDefaultValues @0x1d73f100 (NoDestructor, lazy)

Field# → Offset: the Parse-Table Oracle

Purpose

A serialized or in-memory TpuCompilationEnvironment is a flat 5608-byte C++ object. A consumer that wants to read a specific knob — say, the post-optimization pipeline string or the pipelined-loop-unroll Tristate — needs the byte offset of that field inside the object. The generated protobuf fast-parse table already encodes every offset; this section recovers the offsets from that table rather than from per-field accessor disassembly, so all 1121 are available, not just the handful that have hand-written gates.

The TcParseTableBase header

TpuCompilationEnvironment::_table_ @0x21cfa9e0 opens with the standard protobuf TcParseTableBase header. Decoded byte-for-byte:

TcParseTableBase @0x21cfa9e0
  +0x00  uint32  has_bits_offset      = 16     (0x10)   hasbits start at struct +0x10
  +0x02  uint16  extension_offset     = 0               no extensions
  +0x04  uint32  max_field_number     = 1218   (0x4c2)  matches the descriptor max
  +0x08  uint32  fast_idx_mask        = 0xf8
  +0x10  uint32  field_entries_offset = 0x370            FieldEntry array start
  +0x14  uint16  num_field_entries    = 1121
  +0x16  uint16  num_aux_entries      = 349              = the 349 message-typed fields
  +0x18  uint32  aux_offset           = 0x3800

max_field_number is 1218, but num_field_entries is 1121: the gap is the 97 declared-but-reserved field numbers. The FieldEntry array holds one entry per live field, so it is dense at 1121 entries even though the highest field number is 1218.

The FieldEntry layout and the offset formula

Each FieldEntry is 12 bytes:

FieldEntry (12 bytes, array @ table+0x370, 1121 entries)
  +0x00  uint32  offset      struct byte-offset of this field's storage
  +0x04  uint32  has_idx     presence-bit index (into the hasbit words at struct +0x10)
  +0x08  uint16  aux_idx     index into the aux array (message/enum types only)
  +0x0a  uint16  type_card   16-bit protobuf type card (see below)

The array is sorted by ascending field number, so FieldEntry[i] is the i-th live field. The offset of field number N is therefore:

// Deterministic offset lookup — no disassembly needed per field.
uint32 struct_offset(int N):
    i = index_of(N)                       // = (N-1) - (count of reserved field#s below N)
    return FieldEntry[i].offset           // table+0x370 + 12*i, read uint32 at +0x00

Three offsets that were previously pinned by hand-disassembling consumer gates reproduce exactly from this table, validating the formula:

Field#NameIndexOffsetSource of cross-check
#132xla_tpu_verify_or_assign_tiling_before_lowering120+0xDFChand-pinned accessor gate
#648xla_while_loop_unroll_count (int64)591+0x1328parse-table FieldEntry[591]
#867xla_tpu_enable_pipelined_loop_unrolling (message)787+0x2f0parse-table FieldEntry[787]

NOTE — field #2 (xla_tpu_sdc_checker_instrument_megacore_fusion, bool) sits at +0xBC in the parse table, not in the +0x1206 region. +0x1206 (4614) is a different field — the collective-producer "must-fuse" bool that the producer-priority cost model reads. The FieldEntry array resolves each field to the exact byte offset.

The type_card cross-check

The 16-bit type_card in each FieldEntry is a 1:1 proxy for the field's proto type — every base type maps to exactly one card, and the per-card population reproduces the dictionary's type histogram with zero disagreement. This is an independent confirmation that the offset table and the field dictionary describe the same 1121 fields in the same order.

Typetype_cardCountTypetype_cardCount
bool0x0011418string0x0c1537
int640x10d1148float0x189334
message0x0416349int320x109132
enum0x189174double0x18d314
uint320x089111
uint640x08d14

NOTE — the has_idx in each FieldEntry is recovered, but the packing of has_idx → (hasbit word offset, bit position) was not re-walked. It follows protobuf's standard sequential packing from has_bits_offset=16; the hasbit region spans struct +0x10..~+0xb0 for ~1121 presence bits. A reimplementer reading field presence from a serialized env must derive this packing (one disassembly of TpuCompilationEnvironment::Clear confirms it). Marked LOW until walked.


Field# → Default: the FlagImpl+0x48 Union

Purpose

Every TCE field is the typed mirror of an absl::Flag named FLAGS_<field_name>. The field's default — the value the materialized env carries before any command-line flag or per-codename overlay is applied — is the flag's registered default. This section locates that default inside the FlagImpl object and gives the decode rule per base type.

FlagImpl layout

Each FLAGS_<name> is a 0x60-byte absl::FlagImpl:

FlagImpl (0x60 bytes)
  +0x00  vtable (0x22040f18)
  +0x08  name (const char*)
  +0x10  type_name (const char*)
  +0x18  filename (const char*)
  +0x20  FlagOps<T> (op dispatch)
  +0x28  HelpGen
  +0x30  packed metadata
  +0x38  lazy sentinel (-1)
  +0x40  default-kind discriminator
  +0x48  DEFAULT-VALUE UNION  ← the literal lives here

The discriminator is implicit in the relocation table, not just the +0x40 byte:

  • kGenFunc — if an R_X86_64_RELATIVE reloc targets +0x48, the union holds a function pointer to AbslFlagDefaultGenFor<name>::Gen. That body writes the default into the caller's buffer with a single mov imm. 507 fields.
  • kOneWord — if no reloc targets +0x48, the eight bytes are the literal value. 614 fields.

The Gen body decode

A Gen body is the smallest possible function: store an immediate, return. Four byte-confirmed examples (decompiled bodies, exact addresses):

AbslFlagDefaultGenForxla_tpu_register_selection_policy::Gen:   // 0x1d723540
    *(uint8*)dst = 6;                       // RegSelectPolicy DISREGARD_RECENTLY_USED

AbslFlagDefaultGenForxla_msa_enable::Gen:                      // 0x1d705500
    *(uint8*)dst = 2;                       // Tristate ENABLED

AbslFlagDefaultGenForxla_tpu_msa_inefficient_use_to_copy_ratio::Gen: // 0x1d721c60
    *(uint32*)dst = 0x3F000000;             // 1056964608 == 0.5f

AbslFlagDefaultGenForrematerialization_algorithm::Gen:        // 0x1d718ee0
    *(uint64*)dst     = 0x7464697765657274; // "treewidt"
    *(uint64*)(dst+8) = 104;                // 'h' + SSO terminator → "treewidth"
    *(uint64*)(dst+16)= 0x0900000000000000; // std::string SSO length byte (9)

The instruction skeleton tells the type directly:

PatternTypeExample
c6 07 imm8 c3 (movb)bool / 1-byte enumregister_selection_policy = 6
c7 07 imm32 c3 (movl)float / int32 / uint32 / 4-byte enumratio 0x3F000000 = 0.5f
48 c7 07 … / 48 b8 … (movabs)int64 / uint64 / doublevliw_fuel = 0x7fffffffffffffff
c5 f8 57 c0 … / movabs-SSOstring (SSO / empty)"treewidth"
66 c7 07 0000 c3 / vxorpsempty AutoProto / messageoneof-case 0 = AUTO

For kOneWord fields, the eight bytes at +0x48 are read directly — e.g. xla_tpu_arf_combiner_threshold_in_bytes carries inline 0x0000000007800000 (= 125,829,120 = 120 MiB), and xla_jf_loop_trip_count carries inline 4.

Default census

The 1121 defaults break down by base type as follows. The census is the reimplementer's checklist: reproduce these distributions and the named non-trivial values, and the default env is byte-identical.

TypeCountDefault distribution
bool418153 true, 265 false
enum7467 Tristate (21 ENABLED, 37 AUTO, 9 DISABLED) + 7 wrapper enums
int64148108 non-{0,−1}; combiner thresholds, BRKGA limits, INT64_MAX fuel knobs
int323218 non-{0,−1}; trip counts, send/recv limits
float34MSA overlap ratios, megacore margins, oblongness 50.0
double14MSA scaling factors
string3730 empty "", 7 non-empty literals
message349all empty/AUTO instance (oneof-case 0)
uint3211
uint644

GOTCHA — the registered absl flag default is the byte-authoritative source for a field's default. Help-string text is not. For xla_tpu_rwb_fusion and xla_tpu_accumulate_into_mrb, the =value help-string text reads false, but their FlagImpl+0x48 inline literal is 01 00 00 00 = true in both cases. The help string describes behavior, not the seeded default. When the two disagree, trust the union at +0x48.


The Write Path

CreateDefaultTpuCompEnv — the per-field seeding loop

CreateDefaultTpuCompEnv @0x1d73dfa0 is the driver that materializes a fresh default env. It is not a static initializer list; it walks the descriptor and reads each field's flag default at runtime. Allocation, loop, and the two post-loop steps are all byte-confirmed:

function CreateDefaultTpuCompEnv(sdm_fdo_config):           // 0x1d73dfa0
    env = operator new(0x15E8)                              // sizeof(TCE) = 5608
    TpuCompilationEnvironment::ctor(env, /*arena=*/0)
    md = GetMetadata(&TpuCompilationEnvironment_globals_)
    fields = md.descriptor.field_array                      // md+64, stride 88 B per field
    for field in fields:                                    // loop while i < md.field_count
        flag = TpuCompEnvReflection::GetFlagForField(field) // CHECK OK @reflection: line 5755
        value = TpuCompEnvReflection::ReadFlag(flag, field) // CHECK OK @line 5758
        if field.options.is_used_at_runtime:                // field+56 → options+125
            runtime_fields.push_back(field)                 // collected for the diff below
        TpuCompEnvReflection::SetEnvField(value, field, env) // CHECK OK @line 5765
    if sdm_fdo_config != null:
        env.flag_bits[25] |= 0x20                           // mark fdo-config present
        env.sparse_dense_matmul_fdo_config (+0x280) = sdm_fdo_config.CopyFrom()
    // Deprecated-flag tripwire:
    diff = MessageDifferencer(report_to_string)
    diff.CompareWithFields(env, GetTpuCompEnvWithDefaultValues(), runtime_fields)
    if diff != empty:
        LOG("[DEPRECATED_XLA_TPU_FLAG_USE] Deprecated TpuCompilationEnvironment "
            "flags were present and not matching their default values:\n" + diff)  // line 5776
    return env

The loop is the core mechanism: for each FieldDescriptor, GetFlagForField resolves the absl::Flag via a FlagFieldMappings Swiss-table keyed on the descriptor pointer (_mm_crc32_u64 hash, vpcmpeqb/vpmovmskb group probe), ReadFlag dispatches to a ReadFlagImpl<T> template that pulls the flag's default through the flag vtable (vtable+24 validate, vtable+56 TypeId check against FastTypeTag<T>, vtable+72 read default), and SetEnvField stores it at the field's offset. Each GetFlagForField / ReadFlag failure is a CHECK-fail (fatal), so a build with a flag missing for any field aborts here rather than silently using a wrong default.

QUIRK — the FDO-config copy writes struct offset +0x280 (640) and sets bit 0x20 of the flag byte at env+25. This is the one field CreateDefaultTpuCompEnv writes outside the descriptor loop — it is sourced from the caller's argument, not from a flag default. Every other field is seeded purely from its FLAGS_<name> default.

GetTpuCompEnvWithDefaultValues — the cached default singleton

GetTpuCompEnvWithDefaultValues @0x1d73f100 is a lazy absl::NoDestructor singleton guarded by a __cxa_guard. On first call it materializes the default env (via the same $_0 lambda the descriptor loop uses), placement-news it into a static NoDestructor slot, then runs SharedDtor on the stack temporary. Subsequent calls return the cached pointer. It is the reference the two single-field overriders diff against to decide whether a field is "still default".

SetFieldFromFlagString — parse-and-set a single field

SetFieldFromFlagString @0x1d73fcc0 writes one field from a string value, given the absl::CommandLineFlag for that field:

function SetFieldFromFlagString(flag, value_str, env):      // 0x1d73fcc0
    field = TpuCompEnvReflection::GetFieldForFlag(flag)      // reverse map; err @line 5905
    value = TpuCompEnvReflection::ParseFlagFromString(flag, field, value_str)
    if value is error:                                       // err @line 5908
        return error
    return TpuCompEnvReflection::SetEnvField(value, field, env)

GetFieldForFlag @0x1d74ab20 is the inverse of GetFlagForField — the same FlagFieldMappings structure, keyed the other way. The parsed value is a 20-alternative std::variant (the scalar types plus RangeSpecProto, RepeatedStrings, SparseDenseMatmulFdoConfig, the MSA option messages, BufferContentsSanitizerConfig, BufferIsolationConfig, AutoProto); SetEnvField @0x1d752ae0 visits the variant and stores into the field at its offset.

OverwriteFieldIfNotDefault — the conflict-aware overrider

OverwriteFieldIfNotDefault @0x1d73f360 takes a source field and a destination field and copies source→dest only when safe. It is used to migrate a value from a renamed/deprecated field onto its replacement without clobbering an explicitly-set replacement:

function OverwriteFieldIfNotDefault(src_name, dst_name, env):  // 0x1d73f360
    md       = GetMetadata(&TpuCompilationEnvironment_globals_)
    dst_fd   = md.FindFieldByName(dst_name)   // err "Could not find field … " @line 5851
    src_fd   = md.FindFieldByName(src_name)   // err "Could not find field … " @line 5858
    defaults = GetTpuCompEnvWithDefaultValues()
    src_cur  = GetFieldValueAsString(src_fd, env)       // @line 5869
    src_def  = GetFieldValueAsString(src_fd, defaults)  // @line 5872
    if src_cur == src_def:
        return OK                             // source untouched → nothing to migrate
    dst_cur  = GetFieldValueAsString(dst_fd, env)       // @line 5885
    dst_def  = GetFieldValueAsString(dst_fd, defaults)  // @line 5888
    if dst_cur == dst_def:                    // dest still default → safe to overwrite
        value = GetFieldValue(src_fd, env)              // @line 5881
        return SetEnvField(value, dst_fd, env)
    // both non-default → conflict, keep dest:
    LOG("Both " + src_name + " and " + dst_name +
        " were set to non-default values; keeping the value of " + src_name)  // line 5890
    return OK

GOTCHA — the conflict message at line 5890 says "keeping the value of src_name", but the code path that emits it does not write — it leaves the destination at its already-set non-default value. The log text names the source, yet the dest is what survives. A reimplementer copying this string verbatim must not also copy a misread of its semantics: when both are non-default, the destination is kept untouched and the source's value is discarded.


Notable Default Values

These are the non-trivial defaults — the "magic numbers" a reimplementer must reproduce. Field numbers are from the dictionary; offsets are from the parse table; values are byte-exact from FlagImpl+0x48.

The 7 wrapper-enum defaults

The 67 Tristate-typed enums default per the AUTO resolution split (21 ENABLED, 37 AUTO, 9 DISABLED). The 7 non-Tristate wrapper enums carry these specific defaults:

Field#NameEnumDefault
#31xla_memory_schedulerMemorySchedulerProto.Value0 (DEFAULT)
#132xla_tpu_verify_or_assign_tiling_before_loweringVerifyOrAssignTilingFlags.Value1 (VERIFY)
#487xla_tpu_vmac_transform_strategyTpuVmacTransformStrategy.Value0 (NONE)
#583xla_tpu_sdc_checker_checksum_algoChecksumAlgoProto.Value0 (DEFAULT)
#631xla_tpu_register_selection_policyRegSelectPolicyProto.Value6 (DISREGARD_RECENTLY_USED)
#723xla_tpu_precision_tracer_modePrecisionTracerModeProto.Value0 (NONE)
#827xla_sc_async_wrapper_fusion_typeScAsyncWrapperFusionType.Value3 (SINGLE_TPU_CUSTOM_CALL)

#631 = 6 is byte-confirmed against the Gen body @0x1d723540 (movb $0x6).

The 7 non-empty string defaults

The other 30 string fields default to "".

Field#NameDefault
#198xla_jf_hlo_deduplicate_only"true"
#209config_criterion"min"
#212rematerialization_algorithm"treewidth"
#393xla_tpu_nested_dot_fusion_supported_custom_ops"PartialReduce"
#578xla_tpu_alternate_memory_benefit_scaling_factor_for_large_buffers"SQRT"
#656xla_tpu_collect_sflag_wait_stats_filter"all"
#739xla_tpu_synthetic_compute_in_sflag_wait_filter"all"

#212 = "treewidth" is byte-confirmed against the Gen body @0x1d718ee0 (movabs 0x7464697765657274 = "treewidt" + 'h').

Notable integer defaults

Field#(s)NameDefault
#55/56/57, #184arf / ars / agf / crs combiner threshold (bytes)125,829,120 (120 MiB)
#58xla_jf_crs_combiner_threshold_count256
#41xla_hlo_scheduling_brkga_generation_limit1200
#42xla_hlo_scheduling_brkga_computation_limit3
#12/13net-router ring limits (cross-module / cross-replica)8 / 16
#18/19cmem max outstanding prefetches / evictions40 / 40
#40xla_hbm_logging_buffer_size_bytes1,048,576 (1 MiB)
#74xla_tpu_rematerialization_min_size_in_bytes10,485,760 (10 MiB)
#107xla_jf_vliw_fuelINT64_MAX (unlimited)
#128xla_tpu_min_elements_for_while_loop_concat_code_motionINT64_MAX
#149xla_max_concurrent_send_recvINT32_MAX (2147483647)
#151xla_tpu_licm_analysis_allowance100,000
#166xla_jf_loop_trip_count4
#255xla_jf_overlay_compression_threshold2,044,723,200 (0x79e00000)

#166 = 4 and #255 = 0x79e00000 are byte-confirmed inline literals (no reloc at +0x48).

Notable float / double defaults

The MSA overlap-ratio triples (max / min / pref) repeat across the five memory families. They are the single most reused default pattern; reproduce them exactly per family.

FamilyFields (max/min/pref)maxminpref
jf#280 / #284 / #28532.01.02.0
vf#442 / #446 / #4478.01.02.0
gf#540 / #544 / #5458.01.02.0
cmem#309 / #313 / #3148.01.02.0
msa#790 / #788 / #7898.01.02.0

Other notable scalars: #592 MSA inefficient-use-to-copy ratio = 0.5 (byte-confirmed movl 0x3F000000 @0x1d721c60), #459 auto-SPMD memory-budget ratio = 1.1 (f32 1.10000002), #389 megacore-fusion scaling factor = 2.0, #319 copy-fusion pad/unpad ratio = 300.0, #364 experimental max padding = 0.125 GiB, #30 embedding-table oblongness = 50.0, #176 fusion max vmem = 15.0 MiB.

NOTE — these are the flag-default baseline only. The effective per-chip default is baseline ⊕ overlay: ComputeMemorySpaceAssignmentOptions and the target jump table overwrite the per-family MSA fields per TPU generation (v0/1→jf, v2→cmem, v3→vf, v4/5→gf). What a JAX/PJRT user gets on a given chip is the baseline composed with that overlay. The overlay itself is out of scope here; see the MSA per-codename material referenced from TpuCompilationEnvironment.


What Is Not Covered Here

  • The has-bit packing. has_idx per field is recovered, but the has_idx → (struct word, bit) map was not re-walked (LOW). Needed to read field presence from a serialized env.
  • The per-version overlay values. This page is the flag-default baseline; the ComputeMemorySpaceAssignmentOptions family overrides are owned elsewhere.
  • The field#→name→type schema. Owned by TCE Field Dictionary (A) / (B).
  • The AutoProto message-arm sub-defaults. A message field defaults to its empty instance here; the 12 message-typed AutoProto arms each have their own sub-message defaults, owned by AutoProto / AutoOr Resolution.

Cross-References