Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Flag Catalog (Full)

All counts, symbols, and offsets on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build libtpu_lts_20260413_b_RC00, build-id md5 89edbbe81c5b328a958fe628a9f2207d, 781,691,048 bytes, ELF x86-64 DYN, not stripped). Other versions differ.

Abstract

This appendix is the exhaustive prefix index of libtpu's flag surface: every XLA_FLAGS / LIBTPU_INIT_ARGS / xla.DebugOptions name string the binary registers, grouped by prefix, with per-group counts, a representative enumerated subset, the inferred type, and the byte-evidenced default where one survives. It is the machine-style companion to the grouped narrative in xla-flag-atlas.md: the atlas explains what the high-signal knobs do; this page is the complete reference table a reader greps. Where the two disagree on a count, this page wins — its numbers come directly from the binary.

The authoritative name census is the mangled helper-symbol set. Every absl::Flag<T> FLAGS_<name> global emits an _ZN<len>AbslFlagHelpGenFor<name>8NonConstEv helper symbol, so that symbol set is a 1:1 enumeration of registered flags; length-prefix parsing recovers each <name> exactly. The binary carries 2048 such distinct symbols — the registered-flag count. A further set of names appears only in .rodata (deprecated aliases, error-message-only references) that are not backed by a live AbslFlagHelpGenFor symbol; folding those in yields 2107 distinct flag names. The two numbers answer two different questions: 2048 is "how many flags can you set," 2107 is "how many flag names exist as strings in the binary." Both are used below, labelled explicitly per row.

Every flag is settable through one funnel: LIBTPU_INIT_ARGS (env string @ file 0x918c880) is read by GetLibTpuInitArguments @ 0x20ccca20, split argv-style, and handed to absl::ParseCommandLine inside RealInitGoogle @ 0x210ae860. Because the parse is generic, the entire 2048-flag set is reachable through that one variable — there is no init-args-private subset. The xla_* (non-TPU) flags also bind to xla::DebugOptions fields via MakeDebugOptionsFlags @ 0x1e66ce80; the TPU-private families (xla_tpu_*, xla_jf_*, codenames, megascale_*, barna_core_*) are standalone globals that land in the TpuCompilationEnvironment via OverrideTpuCompEnvByCmdLineFlags @ 0x1d73e640, not in DebugOptions. Which proto a name lands in is owned by flag-families.md; this page owns the grouped name index.

This appendix is a pure-reference catalog: it carries no reimplementation contract of its own (the registration mechanism is reimplemented from xla-flag-atlas.md and registry-mediated-flags.md). It provides:

  • A per-prefix count table — registered count and rodata-name count for each of the ~14 prefix namespaces, with scope and confidence.
  • Per-prefix sections — for each prefix, the subsystem split (where applicable) and a substantial enumerated subset of names with inferred type, the highest-value TPU-specific flags spelled out in full.
  • The certainty boundary — the 13 flags whose error-string =value remedy clauses survive (these spell the non-default direction, not the default), and the convention-inference caveat on the type column for the rest.
Registered flags (AbslFlagHelpGenFor symbols)2048
Distinct rodata flag names (registered + rodata-only)2107
Enumeration symbol_ZN<len>AbslFlagHelpGenFor<name>8NonConstEv (1 per registered flag)
DebugOptions registrarMakeDebugOptionsFlags @ 0x1e66ce80 (binds generic xla_* fields)
TCE flag→field bridgeOverrideTpuCompEnvByCmdLineFlags @ 0x1d73e640 (TPU families)
FunnelLIBTPU_INIT_ARGS (str @ file 0x918c880) → GetLibTpuInitArguments @ 0x20ccca20absl::ParseCommandLine (RealInitGoogle @ 0x210ae860)
Type split (all 2107 names)bool 1431 (68%) · int 434 (21%) · string 93 · float 79 · enum/string 70
Error-string =value remedy clauses13 (each spells the non-default direction, not the default); actual defaults live in .text initializers
xla_gpu_* / xla_cpu_* registered flags0 (GPU/CPU flag wiring stripped from this TPU build)

NOTE — two different xla_tpu_* counts measure two different things. The binary's AbslFlagHelpGenFor symbol set holds 909 registered xla_tpu_* flags — the settable surface. There are 968 distinct xla_tpu_* name strings in .rodata (909 registered + 59 rodata-only references), which is not the count of settable flags. This page splits the two columns per prefix so the distinction is explicit: use 909 for "how many --xla_tpu_*=... you can pass," 968 for "how many xla_tpu_* strings exist."


Per-Prefix Index (at a glance)

Every prefix namespace in the catalog, with the registered-flag count (binary AbslFlagHelpGenFor census) and the rodata-name count (registered + rodata-only). The "rodata names" column matches the prior 2107-union tally; the "registered" column is what is actually settable. Confidence is CERTAIN for counts derived directly from the symbol census, HIGH where the rodata-name delta carries concatenation noise.

PrefixRegisteredRodata namesScopeOwner proto
xla_tpu_909968TPU-specific compiler + runtime knobsTCE
megascale_150150Megascale DCN collective runtimestandalone
xla_jf_148148Jellyfish TPU XLA backend (all gens)TCE
xla_ (plain)121138Generic XLA (scheduler / MSA / collective / dump)DebugOptions
xla_sc_9292SparseCore compiler (SCS/SCC LLVM backend)TCE
tpu_6969TPU runtime / compilation-cache / driverstandalone
barna_core_6161BarnaCore embedding-engine runtimestandalone
xla_msa_2222Memory-Space-Assignment (dedicated namespace)TCE
tf_2020TensorFlow-TPU bridge (tf_jf_* etc.)standalone
xla_vf_1616Gen-specific VMEM/MSA override mirror (vf codename)TCE
xla_gf_1414Gen-specific VMEM/MSA override mirror (gf codename)TCE
xla_mosaic_88Mosaic MLIR custom-kernel dialectTCE
xla_ior_44"IOR" fast-mem round-trip MSA variantTCE
xla_pf_11Gen-specific ND-allreduce override (pf codename)TCE
xla_llo_11LLO annotation lifecycleTCE
xla_gpu_0GPU backend (proto-only, no flag)DebugOptions
xla_cpu_0CPU backend (proto-only, no flag)DebugOptions
(other / no XLA prefix)412395abseil / grpc / protobuf / OR-tools library flagsstandalone
Total20482107

NOTE — "registered total 2048" and "rodata names 2107" are both byte-derived: the registered total is the count of distinct AbslFlagHelpGenFor<name>8NonConstEv symbols (2048, confirmed by rg over the names sidecar); the 2107 is that set unioned with the rodata-only references (deprecated aliases, error-string mentions). The "other / no XLA prefix" registered count is the residual (2048 − the 1636 XLA/TPU-prefixed registered flags = 412); those are statically-linked library flags (alsologtostderr, alarm_on_failure, OR-tools cp_model_*, grpc internals) of no compiler interest.


xla_tpu_ — TPU Compiler + Runtime Knobs (909 registered / 968 names)

The dominant family — the TPU-private compiler and runtime knob surface, registered as standalone absl::Flag globals that land in TpuCompilationEnvironment, not in DebugOptions. (The sole two exceptions wired to DebugOptions are xla_tpu_detect_nan and xla_tpu_detect_inf — see the xla_ section.) Subsystem split, by keyword classification over all 968 xla_tpu_* name strings (sums to 968, not the 909 registered subset):

SubsystemCountKeyword signature
misc / uncategorized288(no dominant keyword)
ICI / collectives174ici, all_reduce, all_gather, reduce_scatter, all_to_all, collective, sflag, dcn, barrier
fusion101fusion, fuse, rwb, dot_dot, nested_dot, multi_output, horizontal
debug / dump / log / trace77dump, debug, log, trace, verify, nan, recorder, assert
MSA / prefetch / scoped mem55msa, memory_space, prefetch, scoped_(v|c)mem, async_copy, cmem, telamalloc
SparseCore (TC-side)50sparse_core, sparsecore, _sc_, embedding, minibatch, offload
scheduler47schedul, latency_hiding, lhs, ilp, brkga, critical_path
auto-sharding / SPMD40sharding, spmd, partition, shardonnay, propagat
layout29layout, minor_dim, 2nd_minor, transpose, relayout, _x16/_x8/_x4
memory / allocation27allocat, hbm, vmem, spill, oom, defragment
dot / conv24dot, conv, matmul, mxu, gemm, einsum
autotune / autofdo24autotun, autofdo, flagnet
numerics / precision21accurate_, _exp, _log, precision, bf16, fp8, stochastic
cost-model8cost_model, cycle, learned_cost, roofline
runtime3runtime, init

Scheduler — five engine gates

The scheduler family advertises five distinct engines, each behind its own gate. Defaults are unrecoverable from strings unless evidenced.

FlagTypeDefaultPurpose
xla_tpu_enable_latency_hiding_schedulerboolmaster LHS gate
xla_tpu_enable_ilp_latency_hiding_schedulerboolILP-based LHS engine
xla_tpu_enable_brkga_latency_hiding_schedulerboolgenetic (BRKGA) scheduler engine
xla_tpu_brkga_latency_hiding_scheduler_generation_limitintBRKGA generation cap
xla_tpu_brkga_latency_hiding_scheduler_num_chromosomesintBRKGA population size
xla_tpu_brkga_latency_hiding_scheduler_num_top_heap_computationsintBRKGA elite-set size
xla_tpu_brgka_latency_hiding_scheduler_no_progress_limitintBRKGA stall cutoff (note brgka typo)
xla_tpu_enable_dozer_latency_hiding_schedulerbool"Dozer" scheduler variant
xla_tpu_enable_lem_schedulerboolLEM scheduler variant
xla_tpu_consider_lp_llo_schedulerboolLP-based LLO scheduler
xla_tpu_enable_depth_memory_pressure_reductionbooldepth-based memory-pressure reduction
xla_tpu_enable_cp_send_done_schedulingboolcollective-permute send/done sched
xla_tpu_aggressive_flexible_annotation_schedulingboolscheduling-annotation aggressiveness
xla_tpu_scheduling_annotation_deannotate_unsupported_groupsboolfalse (errstr remedy =true)deannotate annotation gaps
xla_tpu_enable_all_experimental_scheduler_featuresboolenable all experimental sched features

QUIRK — the name xla_tpu_brgka_latency_hiding_scheduler_no_progress_limit carries a transposed-letter typo (brgka vs the brkga used by its three siblings). It is a distinct registered flag string, not an alias — a reimplementer must register the misspelt name verbatim or this knob is unreachable.

ICI / Collectives — largest subsystem (174)

FlagTypeDefaultPurpose
xla_tpu_debug_sflag_wait_timeout_msintTC sflag-wait watchdog
xla_tpu_debug_sc_sflag_wait_timeout_msintSparseCore sflag-wait watchdog
xla_tpu_use_resilient_collective_emitterboolfault-aware route table
xla_tpu_collect_sflag_wait_hang_coreboolhang-attribution telemetry
xla_tpu_collect_sflag_wait_hang_ratefloathang-rate statistic
xla_tpu_force_startup_barrier_in_binomial_all_reduceboolstartup barrier injection
xla_tpu_binomial_all_reduce_use_physical_core_idsboolphysical-core-id binomial AR
xla_tpu_all_gather_collective_matmul_modeenum/stringcollective-matmul AG mode
xla_tpu_all_gather_step_countintAG ring step count
xla_tpu_all_reduce_vmem_contingency_kibintAR VMEM reserve
xla_tpu_all_to_all_max_rdma_size_kibintA2A RDMA chunk cap
xla_tpu_async_ragged_all_to_all_max_rdma_size_kibintragged A2A RDMA cap
xla_tpu_add_barriers_around_aggregated_collectivesboolbarrier wrapping
xla_tpu_aggressive_opt_barrier_removalboolopt-barrier removal
xla_tpu_checksum_all_reduce_transfersboolchecksum AR transfers
xla_tpu_1d_uni_direction_ring_min_input_size_chunksint1-D ring threshold

The ICI-SDC test harness contributes a 10-flag sub-family: xla_tpu_ici_sdc_test_{iterations, packet_size_chunks, buffer_size_chunks, delay_mask, pipeline_depth, max_distance} (all int), xla_tpu_ici_sdc_test_{emit_compact_code, run_on_program_start, inject_mismatch_for_testing_only} (bool), xla_tpu_ici_sdc_test_sflag_wait_timeout_ms (int).

Fusion (101)

FlagTypeDefaultPurpose
xla_tpu_rwb_fusionbooltrue (errstr remedy =false)read-write-buffer fusion
xla_tpu_dot_dot_fusionbooltrue (errstr remedy =false)dot→dot fusion
xla_tpu_nested_dot_fusionboolfalse (errstr remedy =true)nested-dot (PartialReduce) fusion
xla_tpu_accumulate_into_mrbbooltrue (errstr remedy =false)MRB accumulation fusion
xla_tpu_allow_deeply_nested_fusion_numerical_diffbooltolerate deep-fusion numerics
xla_tpu_allow_input_fusion_in_certain_reduce_opsboolreduce-op input fusion
xla_tpu_allow_conv_input_fusion_with_downcast_convertboolconv input fusion w/ downcast
xla_tpu_async_collective_fusion_fuse_multiple_collectivesboolmulti-collective async fusion
xla_tpu_enable_async_collective_fusion_fuse_all_gatherboolAG async collective-fusion fuse
xla_tpu_enable_async_collective_fusion_fuse_all_reduceboolAR async collective-fusion fuse
xla_tpu_copy_fusion_minimum_copy_size_in_bytesintcopy-fusion size floor
xla_tpu_enable_experimental_fusion_cost_modelboolexperimental fusion cost model
xla_tpu_fusion_debugger_instrument_inputsboolfusion-debugger input instrumentation

MSA / scoped memory (55)

FlagTypeDefaultPurpose
xla_tpu_alternate_memory_benefit_scaling_factor_for_large_buffersfloatMSA benefit scaling
xla_tpu_async_copy_bandwidth_scaling_factorfloatasync-copy BW model
xla_tpu_allocate_scoped_vmem_at_same_offsetintscoped VMEM offset reuse
xla_tpu_allocate_scoped_cmem_at_same_offsetintscoped CMEM offset reuse
xla_tpu_allow_in_cmem_copyboolpermit copies into CMEM
xla_tpu_scoped_cmem_for_all_reduceboolAR result in scoped CMEM
xla_tpu_cmem_max_outstanding_prefetchesintCMEM prefetch cap
xla_tpu_cmem_max_overlap_to_mem_size_async_copy_ratiofloatCMEM overlap ratio
xla_tpu_vmem_use_telamallocbooltelamalloc VMEM allocator
xla_tpu_scoped_vmem_limit_kibintscoped VMEM byte limit (KiB)
xla_tpu_autotune_memory_space_assignmentboolMSA autotune

SparseCore TC-side (50)

FlagTypeDefaultPurpose
xla_tpu_enable_offloading_gather_to_sparsecoreboolgather offload to SC
xla_tpu_enable_offloading_scatter_to_sparsecoreboolscatter offload to SC
xla_tpu_enable_offloading_copy_to_sparsecoreboolcopy offload to SC
xla_tpu_enable_offloading_reduce_to_sparsecoreboolreduce offload to SC
xla_tpu_enable_sparse_core_reduce_scatter_v2boolfalse (errstr remedy =true)SC ND reduce-scatter v2
xla_tpu_enable_sc_log_recorderboolfalse (errstr remedy =true)SC log recorder
xla_tpu_enable_async_sc_callboolasync SC call
xla_tpu_embedding_table_oblongness_thresholdint— (errstr remedy =1)embedding-table oblongness cutoff
xla_tpu_aggregate_data_dependent_sc_opsbooldata-dependent SC aggregation

Other high-signal xla_tpu_ knobs

  • Numerics: xla_tpu_accurate_{exp, exp2, expm1, log1p, log2, logistic, sigshift} (bool), xla_tpu_bf16_emission_mode (enum), xla_tpu_auto_reduce_precision (bool), xla_tpu_experimental_enable_dynamic_int8_quantization (bool).
  • Dot/conv: xla_tpu_enable_dot_strength_reduction, xla_tpu_enable_ragged_dot_kernel, xla_tpu_choose_faster_windowed_einsum_over_mem (all bool), xla_tpu_impure_contract_ragged_conv_with (string).
  • Layout: xla_tpu_allow_layout_negotiation (bool), xla_tpu_enable_large_2nd_minor_layout, xla_tpu_allow_large_2nd_minor_layout_for_{x16, x8, x4} (int).
  • Auto-sharding: xla_tpu_auto_spmd_partitioning_memory_budget_gb (int), xla_tpu_auto_spmd_partitioning_memory_budget_ratio (float), xla_tpu_auto_spmd_partitioning_solver_timeout_seconds (int), xla_tpu_auto_spmd_keep_all_user_shardings (bool).
  • Cost-model: xla_tpu_emitter_learned_cost_model_options (string/proto), xla_tpu_enable_instruction_cycle_checking (bool), xla_tpu_hbm_initial_cycle_penalty (int), xla_tpu_impure_cost_model_logging_options (string).
  • Debug/memory: xla_tpu_enable_tile_log_recorder (bool; errstr remedy =true, so default false), xla_tpu_impure_oom_fast_exit_threshold (int; errstr remedy =-1 for verbose logging), xla_tpu_always_spill_to_default_memory (bool).

xla_jf_ — Jellyfish XLA Backend (148)

The jf codename is the TPU XLA backend namespace (shared across generations). Lands in TCE. Subsystem split: misc 63, debug/dump 24, memory/alloc 15, fusion 10, MSA 9, ICI 8, dot/conv 7, sharding 4, scheduler 3, SparseCore 2, cost-model 2, numerics 1.

FlagTypeDefaultPurpose
xla_jf_debug_levelint— (errstr remedy =2 for stack traces)JF backend debug verbosity
xla_jf_run_verifierboolrun the JF HLO verifier
xla_jf_vliw_schedulerboolJF VLIW scheduler engine
xla_jf_critical_path_schedulerboolcritical-path scheduler
xla_jf_conv_{input,output,reshape}_fusionboolconv fusion variants
xla_jf_enable_multi_output_fusionboolmulti-output fusion
xla_jf_fusion_max_vmem_mibintfusion VMEM ceiling (MiB)
xla_jf_conv_{full_precision,increased_precision}boolconv precision controls
xla_jf_auto_assign_mxuboolauto MXU assignment
xla_jf_use_cost_based_memory_coloringboolcost-based memory coloring
xla_jf_dump_{hlo_text,debug_info,llo_html}boolJF dump variants
xla_jf_dump_isa_program_protostringISA-program-proto dump path
xla_jf_experimental_{cmem,vmem}_for_hlo_outputsboolexperimental output placement
xla_jf_spmd_threshold_for_windowed_einsum_mibfloatwindowed-einsum SPMD threshold

xla_ (plain) — Generic XLA / DebugOptions-Backed (121 registered / 138 names)

The non-codename xla_* flags. Unlike the TPU families, these bind to xla::DebugOptions fields via MakeDebugOptionsFlags @ 0x1e66ce80. Subsystem split over all 138 xla_* (plain) name strings (sums to 138, not the 121 registered subset): misc 32, ICI 23, scheduler 21, SparseCore 17, memory 15, MSA 14, debug 10, others 6.

FlagTypeDefaultPurpose
xla_enable_megacore_hbm_spillboolfalse (errstr remedy =true, untested)enable megacore HBM spill
xla_enable_cross_program_prefetchboolcross-program prefetch gate
xla_default_cross_program_prefetch_heuristicboolCPP heuristic default
xla_enable_async_{all_gather,all_reduce,collective_permute}boolasync collective gates
xla_enable_async_reduce_scatter_fusionboolasync RS fusion
xla_{all_gather,all_reduce,all_to_all}_latency_bound_threshold_in_bytesfloatlatency-bound thresholds
xla_all_gather_combiner_threshold_countfloatAG combiner threshold
xla_enable_all_gather_{2d,3d}_emitterbooldimensional AG emitters
xla_hlo_scheduling_brkga_{computation_limit,generation_limit}intHLO BRKGA scheduler tuning
xla_latency_hiding_scheduler_rerunintLHS rerun count
xla_hbm_logging_buffer_size_bytesintHBM logging buffer size
xla_enable_post_msa_sync_slice_fusionboolpost-MSA sync-slice fusion
xla_hlo_parse_memory_schedule_from_filestringexternal memory schedule path

GOTCHA — the classic XLA dump/HLO knobs (xla_dump_to, xla_hlo_profile, xla_dump_hlo_as_proto, xla_step_marker_location, xla_disable_hlo_passes) appear as xla.DebugOptions fields and as .rodata strings, but they are not registered absl::Flag globals in this build. A direct cross-match of all 290 DebugOptions field names against the registered-flag set finds exactly two overlaps: xla_tpu_detect_nan (DebugOptions field 135) and xla_tpu_detect_inf (field 136). Every other dump/HLO knob is settable only through the PJRT CompileOptions.debug_options proto path, never through LIBTPU_INIT_ARGS. A reimplementer who exposes --xla_dump_to= as a libtpu command-line flag is wrong about this build.


SparseCore & Embedding — xla_sc_ (92), barna_core_ (61)

xla_sc_* are the SparseCore-compiler LLVM-backend knobs (lands in TCE); barna_core_* are the BarnaCore embedding-engine runtime knobs (standalone).

xla_sc_ representative subset

FlagTypePurpose
xla_sc_enable_instruction_fusionboolSC instruction fusion
xla_sc_enable_latency_hiding_schedulerboolSC LHS
xla_sc_enable_scheduler_memory_pressure_trackingboolSC mem-pressure tracking
xla_sc_enable_tile_overlays / _scs_overlaysboolSC tile/SCS overlays
xla_sc_enable_stack_elidingboolSC stack eliding
xla_sc_enable_hbm_optimization_modeboolSC HBM optimization mode
xla_sc_detect_nanboolSC NaN detection
xla_sc_assert_levelenumSC assert level
xla_sc_compiler_backtrace_depthintSC backtrace depth
xla_sc_elementwise_shape_scaling_factorfloatSC elementwise scaling
xla_sc_async_wrapper_fusion_typeenumSC async-wrapper fusion type
xla_sc_dump_{llvm_ir_to,mlir_to,bundles_to}stringSC IR/MLIR/bundle dump paths
xla_sc_use_legacy_embeddings_loop_configsboollegacy embedding loop configs

barna_core_ representative subset

FlagTypePurpose
barna_core_max_hbm_fraction_for_embeddingsintHBM fraction cap for embeddings
barna_core_hbm_savings_threshold_for_optimized_hbm_packingfloatoptimized-packing savings threshold
barna_core_fraction_batches_to_process_locallyboollocal-batch processing fraction
barna_core_master_partitioner_thread_countintpartitioner thread count
barna_core_hot_id_profiler_top_n_multiplefloathot-id profiler top-N multiple
barna_core_enable_software_deduplicationboolsoftware dedup
barna_core_enable_software_row_shardingboolsoftware row sharding
barna_core_file_operation_timeoutintfile-op timeout
barna_core_embedding_common_config_proto_pathstringembedding-config proto path
barna_core_partitioner_optimization_objectiveenumpartitioner objective

MSA Namespaces — xla_msa_ (22), xla_vf_ (16), xla_gf_ (14), xla_ior_ (4), xla_pf_ (1), xla_llo_ (1)

The dedicated memory-space-assignment namespaces. xla_msa_* is the generic MSA option set; xla_vf_* and xla_gf_* are gen-specific VMEM/MSA override sets (the vf / gf codename prefixes) carrying the same knob names scoped to that generation; xla_ior_* is the IOR fast-mem round-trip variant; xla_pf_* is a single ND-allreduce override; xla_llo_* is a single LLO-lifecycle flag.

xla_msa_ — full enumeration (22)

FlagTypePurpose
xla_msa_enableboolMSA master gate
xla_msa_max_cross_program_prefetchesintCPP prefetch cap
xla_msa_max_outstanding_evictionsinteviction cap
xla_msa_max_outstanding_prefetchesintprefetch cap
xla_msa_max_repacksintrepack cap
xla_msa_max_retriesintretry cap
xla_msa_{min,preferred}_overlap_to_async_copy_ratiofloatoverlap-to-async-copy ratios
xla_msa_max_overlap_to_mem_size_async_copy_ratiofloatoverlap-to-mem-size ratio
xla_msa_enable_cross_program_prefetch_freeingboolCPP freeing
xla_msa_enable_sync_copy_replacementboolsync-copy replacement
xla_msa_enable_sync_slice_replacementboolsync-slice replacement
xla_msa_enable_while_redundant_eviction_eliminationboolredundant-eviction elimination
xla_msa_enable_window_prefetchboolwindow prefetch
xla_msa_cross_program_prefetch_permissive_modeboolpermissive CPP mode
xla_msa_default_cross_program_prefetch_heuristicboolCPP heuristic default
xla_msa_expanded_scoped_alternate_memory_modeenumexpanded scoped-AM mode
xla_msa_use_bundle_aware_cost_modelboolbundle-aware cost model
xla_msa_cost_model_optionsstringcost-model config
xla_msa_experimental_ior_algorithmenumexperimental IOR algorithm
xla_msa_experimental_use_telamallocboolexperimental telamalloc
xla_msa_allocate_scoped_memory_at_same_offsetboolscoped-mem offset reuse

xla_vf_ (16), xla_gf_ (14), xla_ior_ (4), xla_pf_ (1)

xla_gf_vmem_{max_outstanding_evictions, max_repacks, max_retries} (int), xla_gf_vmem_use_ior_algorithm (enum), xla_gf_vmem_enable_while_redundant_eviction_elimination (bool) — the gen-specific VMEM mirror of the xla_msa_* set; xla_vf_* carries the same vmem_* knob set (16 names, including xla_vf_allow_replicated_vmem_writes and xla_vf_allow_split_vmem). xla_ior_{fast_mem_round_trip_production_msa, fast_mem_run_production_msa, stored_solution_path, use_stored_solution} (4) carry the IOR fast-mem round-trip variant. xla_pf_enable_nd_allreduce (1, bool) is the lone xla_pf_* flag; xla_llo_annotation_lifecycle_strict_mode (1, enum) is the lone xla_llo_* flag.


Runtime / Driver — tpu_ (69), megascale_ (150), tf_ (20), xla_mosaic_ (8)

tpu_ — runtime / compilation-cache / driver (69)

Standalone runtime flags (not compiler knobs). Representative subset:

FlagTypeDefaultPurpose
tpu_use_tfrtbool— (errstr deprecates =false)use TFRT runtime path
tpu_compilation_cache_disable_coordination_servicebooldisable cache coordination
tpu_persistent_compilation_cache_locationstringpersistent cache path
tpu_persistent_compilation_cache_ttl_secsintcache TTL
tpu_local_compilation_cache_size_bytesintlocal cache size
tpu_program_cache_eviction_policyenumcache eviction policy
tpu_program_proto_compressionboolproto compression
tpu_link_up_check_timeoutintlink-up check timeout
tpu_driver_callback_watchdog_timeoutintdriver-callback watchdog
tpu_core_dump_directorystringcore-dump directory
tpu_hbm_report_enableboolHBM report toggle
tpu_log_allocations_on_oomboollog allocations on OOM
tpu_hlo_breakpoint_debugger_server_portintHLO breakpoint debugger port
DANGEROUS_tpu_runtime_abi_verification_disabledbooldisable ABI verification (dangerous)

NOTE — the lowercase libtpu_* identifiers (libtpu_init_utils, libtpu_lockfile, libtpu_sdk_*, libtpu_telemetry_*, libtpu_version, the libtpu_lts_20260413_b_ build tag) are not flags — they are translation-unit / module name strings in .rodata. They are excluded from the 2107 catalog.

megascale_ (150) — DCN collective runtime

Top knobs: megascale_num_slices (int), megascale_slice_id (int), megascale_coordinator_address (string), megascale_transport_type (enum), megascale_enable_tpu_premapping (bool), megascale_enable_watchdog (bool), megascale_graph_hang_threshold (int), megascale_heartbeat_{interval,timeout}_ms (int), megascale_error_reporter_abort_on_{error,hang} (bool), megascale_use_heartbeat (bool), megascale_grpc_num_channels (int), megascale_use_mtls_for_grpc (bool), megascale_verify_checksums (bool), megascale_use_numa_aware_threadpool (bool; errstr remedy =false, so default true).

tf_ (20) and xla_mosaic_ (8)

tf_* are the TensorFlow-TPU bridge flags (tf_jf_* and similar). xla_mosaic_* are the Mosaic MLIR custom-kernel dialect flags, including the legacy xla_mosaic_deprecated_allow_implicit_single_buffering.


Defaults — the Certainty Boundary

proto3 carries no descriptor-level defaults, and the per-flag defaults live in xla::DefaultDebugOptions() and the FLAGS_<name> static initializers — both in .text, not recoverable from strings. What does survive is a set of help/error strings that spell a --flag=value clause. Critically, every such surviving clause is a remedy — the value the message tells the user to set when something goes wrong ("use --flag=false in the meantime", "set =true to enable") — so the spelled value is the non-default, and the implied actual default is its opposite. 13 flags carry such a =value remedy clause; the remaining flags' defaults leave no string at all.

FlagTypeRemedy =value (errstr)Implied default
xla_tpu_accumulate_into_mrbbool=false ("in the meantime")true
xla_tpu_rwb_fusionbool=false (reverted-on-fallback)true
xla_tpu_dot_dot_fusionbool=false (if failure persists)true
xla_tpu_nested_dot_fusionbool=true ("did you forget to set")false
xla_tpu_scheduling_annotation_deannotate_unsupported_groupsbool=true (to deannotate gaps)false
xla_tpu_enable_tile_log_recorderbool=true (to enable logging)false
xla_tpu_enable_sc_log_recorderbool=true (to enable logging)false
xla_tpu_enable_sparse_core_reduce_scatter_v2bool=true (SC ND RS needs)false
xla_tpu_impure_oom_fast_exit_thresholdint=-1 (more detailed logging)not string-recoverable
xla_tpu_embedding_table_oblongness_thresholdint=1 (avoid tiled layout)not string-recoverable
xla_enable_megacore_hbm_spillbool=true (to activate, untested)false
xla_jf_debug_levelint=2 (enable stack traces)not string-recoverable
megascale_use_numa_aware_threadpoolbool=false (to disable)true

GOTCHA — none of these are byte-confirmed defaults. Each =value is the value the message tells the user to set (the remedy), which is the non-default; the "Implied default" column is the inferred opposite for the booleans, and is genuinely unrecoverable for the int-valued knobs (where the remedy is a specific tuning value, not a sentinel-vs-default flip). Do not read the remedy value as the default — it is the opposite. The authoritative defaults for every flag require disassembling DefaultDebugOptions() and the FLAGS_* ctors in .text. Four further flags sometimes cited with byte-defaults (xla_tpu_allow_deeply_nested_fusion_numerical_diff, xla_tpu_enable_offloading_{gather,scatter}_to_sparsecore, xla_tpu_fusion_debugger_instrument_inputs) carry no =value string at all and have no string-derivable default.

For the type column: the 13 types above are byte-corroborated from =value evidence; the rest are convention-inferred from the flag-name suffix (enable_/use_/allow_ ⇒ bool; _ms/_kib/_count/_size/_n ⇒ int; _ratio/_factor/_fraction ⇒ float; _file/_path/_dir/_proto ⇒ string; _mode/_type/_level ⇒ enum). This is XLA's own registration convention, so it is reliable but not per-flag byte-confirmed. A _threshold suffix may be int or float; _mode/_level may be int-enum or string — those are HIGH, not CERTAIN.


Cross-References

  • xla_* Flag Atlas — the curated sibling: grouped narrative + per-subsystem deep-dive into the ~100 highest-signal knobs (point here for what a flag does)
  • Flag Families — prefix → owner routing: which proto (DebugOptions vs TCE vs standalone) each prefix lands in, live-vs-inert verdict per family
  • DebugOptions Proto — the xla.DebugOptions message (290 fields, 17 nested enums), the 2-field flag-wiring overlap (xla_tpu_detect_nan/inf)
  • Default DebugOptions — where the per-flag defaults live (DefaultDebugOptions() + FLAGS_* static initializers in .text)
  • Registry-Mediated Flags — the AbslFlagHelpGenFor registration mechanism and the MakeDebugOptionsFlags / OverrideTpuCompEnvByCmdLineFlags bind sites
  • Flag Prefix Dispatch — the TpuVersion-aware prefix-strip/select mechanism for the codename families
  • Environment VariablesLIBTPU_INIT_ARGS, LIBTPU_ON_GCE, TPU_LOAD_LIBRARY, the parse funnel into absl::ParseCommandLine