Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TCE Field Dictionary (A)

All field numbers, names, types, offsets, and addresses on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d, libtpu_lts_20260413_b_RC00). Other versions renumber and re-offset fields.

Abstract

TpuCompilationEnvironment (TCE) is the TPU-private master compile-config protobuf: a single editions-proto message carrying 1121 live fields, every one of which is also a registered absl::Flag (a perfect 1:1 mapping, the structural inverse of the GPU/CPU-shared xla.DebugOptions). This page is the field-number → name → proto type dictionary for the lower half of that proto: field numbers 1 through 560. The matching upper half — field 561 through 1218 (max field number 1218, with 97 deletion gaps) — lives on TCE Field Dictionary (B). The split point is exact and stated below.

The dictionary is reconstructed from the TpuCompilationEnvironment FileDescriptorProto carved out of the binary at protodesc_cold VA 0xbfa6060 (size 137,692 B) and cross-checked against the generated fast-parse table TpuCompilationEnvironment::_table_ @ 0x21cfa9e0, whose 1121-entry FieldEntry array is sorted by ascending field number. Each line of this page gives the field number, the verbatim field name (every name on this page was confirmed byte-present in the binary's .rodata string pool), the proto base type, and — for the 423 wrapped fields — the wrapper message/enum type. The C++ struct offset and the per-field literal default value are deliberately not repeated here; they are the subject of TCE Field-Offsets & Flag Defaults, and the structural overview (parse-table header, type histogram, HOT-tag taxonomy, AutoProto switch) is on TpuCompilationEnvironment.

For a reimplementer, the contract this page satisfies is narrow and precise:

  • Field number → name. The canonical numbering a serialized TCE proto uses on the wire and that GetTpuCompEnvWithDefaultValues walks when it resolves each field to its absl::Flag.
  • Name → proto base type. bool / int32 / int64 / uint32 / uint64 / float / double / string / enum / message — the wire type that decides how the value is parsed and stored.
  • Wrapper type, where present. Which enum (TristateProto.Value, MemorySchedulerProto.Value, …) or which message (AutoProto, RangeSpecProto, the typed helpers) a non-scalar field carries.
ProtoTpuCompilationEnvironment (editions proto, all fields optional)
Descriptor sourceFileDescriptorProto @ 0xbfa6060 (137,692 B)
Parse tableTpuCompilationEnvironment::_table_ @ 0x21cfa9e0 (1121 FieldEntry, ascending field#)
This page coversfield #1 – #560 (lower half)
Page B coversfield #561 – #1218 (dictionary B)
Split boundary#560 xla_tpu_ici_sdc_test_run_on_program_start ends here; #561 xla_tpu_debug_sflag_wait_timeout_ms opens page B
Names verified22 spot-checked verbatim in .rodata strings; 0 misses

How to Read These Tables

Each row is one live field, in ascending field-number order. Columns are uniform across every group:

  • # — the proto field number (the wire tag). Numbers are not contiguous: TCE has 97 deletion gaps in 1..1218 (no reserved_range; deleted fields simply vanish from the descriptor). Gaps inside this page's range are listed in §Field-Number Gaps.
  • Field name — the verbatim proto field name, identical to the registered absl::Flag name. Most carry the xla_tpu_ / xla_jf_ / xla_ prefixes; a handful are bare (config_criterion, loop_invert, rematerialization_algorithm).
  • Type — the proto base type. For enum and message, the Wrapper / message type column names the concrete type.

NOTE — the 423 non-scalar fields split into 74 enum-typed (67 of them TristateProto.Value = AUTO/DISABLED/ENABLED, plus 7 direct helper enums) and 349 message-typed (330 are the 30-arm AutoProto switch wrapper; the remaining 19 are typed helper messages such as RangeSpecProto, RepeatedStrings, and the msa.* option messages). The wrapper enum value-by-value tables and the AutoProto arm list live on TpuCompilationEnvironment; this page only names the wrapper per field.

GOTCHA — a field's position on this page is its field number, not its struct offset. The two are unrelated: the parse table sorts FieldEntry by field number, but the C++ _impl_ layout interleaves them by type/alignment (field #2 sits at struct offset +0xBC, field #1 at +0xB8, field #3 at +0xA8). Do not infer offset from field number; use TCE Field-Offsets & Flag Defaults.


#1 – #30 — Collectives, Cross-Program Prefetch, CMEM, Embedding

The lowest field numbers are the oldest jellyfish-era knobs: all-gather emitters, the net-router ring limits, the (now mostly deprecated) cross-program-prefetch / scoped-CMEM cluster, and the first embedding tunable. Field #21 is a gap.

#Field nameTypeWrapper / message type
1xla_enable_async_collective_permuteenumTristateProto.Value
2xla_tpu_sdc_checker_instrument_megacore_fusionbool
3xla_tpu_scoped_vmem_limit_sweep_profile_pathstring
4xla_tpu_allocate_scoped_vmem_at_same_offsetbool
5xla_enable_all_gather_2d_emitterbool
6xla_always_enable_all_gather_2d_asymmetricbool
7xla_enable_all_gather_3d_emitterbool
8xla_always_enable_all_gather_3d_asymmetricbool
9xla_tpu_enable_minor_all_gatherbool
10xla_tpu_use_routing_table_indices_in_all_gatherbool
11xla_tpu_enable_net_router_in_all_gatherbool
12xla_tpu_cross_module_net_router_ring_limitint64
13xla_tpu_cross_replica_net_router_ring_limitint64
14xla_tpu_max_cmem_used_by_memory_space_assignmentint64
15xla_enable_cross_program_prefetchbool
16xla_tpu_enable_cross_program_prefetch_freeingbool
17xla_tpu_cmem_enable_while_redundant_eviction_eliminationbool
18xla_tpu_cmem_max_outstanding_prefetchesint64
19xla_tpu_cmem_max_outstanding_evictionsint64
20xla_tpu_allocate_scoped_cmem_at_same_offsetbool
22xla_default_cross_program_prefetch_heuristicbool
23xla_tpu_ior_cmemstring
24megascale_use_one_to_all_for_gatherbool
25xla_tpu_enable_async_pincer_emitterbool
26xla_tpu_write_cmem_output_via_stores_on_megacorebool
27megascale_use_dcn_all_to_all_in_collectives_maskint64
28xla_hlo_scheduling_brkga_compute_runtime_estimatesbool
29xla_tpu_wait_n_cycles_before_program_terminationint64
30xla_tpu_embedding_table_oblongness_thresholdfloat(is_used_at_runtime)

NOTE — field #30 is one of only four fields in the whole proto with is_used_at_runtime=true; the others (#354, #432, and xla_tpu_num_embedding_devices) live further on. The bit lives in the TpuCompEnvFieldOptions extension (#535801365), not in the parse table. Fields #4, #14, #15, #16, #17, #18, #19, #20, #22, #23 are among the 101 deprecated=true fields, superseded by the per-family xla_{jf,vf,gf,cmem}_vmem_* knobs.


#31 – #95 — Scheduler, Tracing, BRKGA, LLO/LSRA, SDC Checker

This block opens with the memory-scheduler selector (#31), then a run of trace/profile toggles, the BRKGA HLO-scheduling parameters, the early LLO/LSRA register-allocation knobs, the combiner thresholds, and the first big SDC-checker cluster. Field #21, #44, #45, #54, #59, #70 are gaps in this range.

#Field nameTypeWrapper / message type
31xla_memory_schedulerenumMemorySchedulerProto.Value
32xla_enable_profilerbool
33xla_enable_hlo_tracebool
34xla_trace_only_stalling_hlobool
35xla_enable_module_tracebool
36xla_enable_mxu_tracebool
37xla_enable_transpose_tracebool
38xla_jf_trivial_tracesbool
39xla_jf_module_tracemarksbool
40xla_hbm_logging_buffer_size_bytesint64
41xla_hlo_scheduling_brkga_generation_limitint64
42xla_hlo_scheduling_brkga_computation_limitint64
43xla_hlo_scheduling_brkga_enable_as_fallbackbool
46xla_enable_megacore_hbm_spillbool
47xla_enable_lru_free_reg_assignmentbool
48xla_jf_emit_annotationsbool
49xla_jf_llo_levelint32
50xla_jf_naive_bundle_packermessageRangeSpecProto
51xla_jf_track_bundle_dependency_indicesbool
52xla_jf_pack_latchesbool
53xla_jf_bf16_propagationbool
55xla_tpu_arf_combiner_threshold_in_bytesint64
56xla_tpu_ars_combiner_threshold_in_bytesint64
57xla_tpu_agf_combiner_threshold_in_bytesint64
58xla_jf_crs_combiner_threshold_countint64
60xla_jf_bounds_check_annotate_onlymessageRangeSpecProto
61xla_jf_reconstruct_hlo_from_protobool
62xla_jf_slot_tracker_hoist_limitint64
63xla_jf_enable_multi_output_fusionbool
64xla_jf_emit_global_barrier_at_startbool
65xla_jf_lsra_v2_alloc_onlymessageRangeSpecProto
66xla_jf_lsra_v2_reserved_smemint64
67xla_jf_lsra_v2_annotatebool
68xla_jf_llo_rematerialization_parameter_thresholdint64
69xla_ra_splitmessageRangeSpecProto
71xla_jf_rematerialization_percent_shared_memory_limitint64
72xla_tpu_rematerialization_max_block_sizeint64
73xla_tpu_block_rematerialization_factorint64
74xla_tpu_rematerialization_min_size_in_bytesint64
75xla_jf_order_barna_core_feed_serializebool
76xla_jf_order_barna_core_feed_overlapbool
77xla_jf_conditional_code_motionbool
78xla_jf_auto_cross_replica_shardingbool
79xla_jf_use_vdelaybool
80xla_tpu_autotune_windows_servicestring
81xla_tpu_autotune_windowsbool
82xla_tpu_autotune_phase_orderingbool
83xla_tpu_log_post_opt_fingerprintsbool
84xla_tpu_enable_sdc_checkerbool
85xla_tpu_sdc_check_fail_ratio_debug_onlyint64
86xla_tpu_sdc_check_repeat_countint64
87xla_tpu_sdc_check_inputsbool
88xla_tpu_sdc_replicate_llobool
89tpu_sdc_checker_filter_loop_iteration_depthint32
90tpu_sdc_checker_filter_loop_iteration_checkint32
91tpu_sdc_checker_filter_loop_iteration_skipint32
92xla_tpu_sdc_check_halt_on_detectionbool
93xla_tpu_sdc_check_log_full_hlobool
94xla_tpu_sdc_extra_llo_replica_countint64
95xla_tpu_sdc_duplicate_mxu_instructionsbool

#96 – #159 — VLIW Scheduling, Convolution Precision, Bounds-Check, IOR/LLVM Backend

A scheduling-and-codegen block: the VLIW scheduler and its fuel, conv-precision numerics, the bounds-check RangeSpec family, the LLVM-backend toggles, and the first loop_invert pair. Gaps: #121, #124–#127, #156.

#Field nameTypeWrapper / message type
96xla_enable_async_all_gatherenumTristateProto.Value
97xla_tpu_enable_log_recorderbool
98xla_jf_all_to_all_shard_kibint64
99xla_tpu_all_to_all_with_different_output_addressesbool
100xla_jf_xlu_optimizerbool
101xla_jf_auto_assign_xlubool
102xla_jf_critical_path_schedulerbool
103xla_jf_load_cse_and_s2l_forwardingbool
104xla_tpu_load_store_optimizationsbool
105xla_jf_accumulation_reassociationbool
106xla_jf_vliw_schedulerbool
107xla_jf_vliw_fuelint64
108xla_tpu_force_send_recv_host_on_same_resourceenumTristateProto.Value
109xla_jf_allow_cross_replica_sharding_on_certain_reducebool
110xla_jf_cross_replica_sharding_also_try_reducing_min_span_by_factorfloat
111xla_tpu_allow_input_fusion_in_certain_reduce_opsbool
112xla_tpu_allow_sharding_on_minor_dimbool
113xla_jf_enable_buffer_aliasbool
114xla_jf_verify_sync_flagsbool
115xla_tpu_verify_matmulbool
116xla_jf_use_rng_bit_generator_emitterbool
117xla_tpu_expand_rng_bit_generatorbool
118xla_jf_module_group_simplifierbool
119xla_tpu_ior_rematbool
120xla_enable_scalar_multiply_reductionbool
122xla_optimize_llo_for_llvmbool
123xla_tpu_run_space_to_batch_on_new_platformsbool
128xla_tpu_min_elements_for_while_loop_concat_code_motionint64
129xla_tpu_sharding_metadatabool
130xla_tpu_reverse_layout_computation_orderbool
131xla_tpu_spmd_threshold_for_allgather_cseint64
132xla_tpu_verify_or_assign_tiling_before_loweringenumVerifyOrAssignTilingFlagsProto.Value
133xla_tpu_spmd_decompose_sharded_concatsbool
134xla_tpu_enable_graph_splittingbool
135xla_jf_always_use_windowed_reducebool
136xla_tpu_max_kept_sublanes_for_reduceint64
137xla_tpu_check_nan_on_reducebool
138xla_tpu_use_tree_reducebool
139xla_tpu_uni_direction_ring_max_size_2d_planeint64
140xla_tpu_1d_uni_direction_ring_min_input_size_chunksint64
141xla_tpu_nd_short_transfer_max_chunksint64
142xla_tpu_enable_pincer_short_emitterbool
143xla_tpu_permute_size4_cross_module_ringsbool
144xla_tpu_enable_2d_cross_module_reduce_scatterbool
145xla_tpu_use_strided_strategy_ndbool
146xla_tpu_checksum_all_reduce_transfersbool
147xla_tpu_enforce_two_phase_sharding_topologybool
148xla_tpu_use_routing_table_indices_in_all_reducebool
149xla_max_concurrent_send_recvint32
150xla_tpu_enable_latency_hiding_schedulerbool
151xla_tpu_licm_analysis_allowanceint64
152xla_tpu_scheduler_percent_shared_memory_limitint64
153xla_tpu_spmd_auto_partitioningbool
154xla_tpu_perform_spmd_cse_preventionbool
155xla_tpu_verify_device_assignment_in_runtimebool
157xla_jf_experimental_vmem_for_hlo_outputsint64
158xla_jf_use_cost_based_memory_coloringbool
159xla_jf_always_overlaybool

QUIRK — field #132 xla_tpu_verify_or_assign_tiling_before_lowering is a VerifyOrAssignTilingFlagsProto.Value enum (NONE=0 / VERIFY=1 / ASSIGN=2), not a bool. A reimplementer who treats every xla_tpu_*verify* knob as a boolean toggle will mis-parse this one — it gates the tile-mode selector consumed at struct offset +0xDFC.


#160 – #219 — Loop-Invert, ISA Emitter, IOR/MSA, LLVM, Instrumentation

The loop_invert / ISA-emitter RangeSpec cluster, the IOR fast-mem and stored-solution MSA knobs, the LLVM-backend group, the HLO-dedup pair, and the bf16-coalescing / rematerialization-algorithm knobs. The two bare-named strings config_criterion (#209) and rematerialization_algorithm (#212) appear here. Gaps: #214, #218.

#Field nameTypeWrapper / message type
160loop_invertmessageRangeSpecProto
161loop_invert_modulesmessageRangeSpecProto
162xla_llvm_isa_emitterbool
163xla_llvm_isa_emitter_bundlesmessageRangeSpecProto
164xla_llvm_isa_emitter_forcebool
165xla_run_scoped_memory_assignmentbool
166xla_jf_loop_trip_countint32
167xla_jf_internal_prefetch_overlaysbool
168xla_tpu_last_overlay_prefetches_firstbool
169xla_tpu_put_trap_at_hlo_marker_indexint64
170xla_tpu_single_overlay_modebool
171internal_embedding_emitter_fraction_vmem_availabledouble
172xla_allow_hoisting_across_branchbool
173xla_jf_avoid_cross_slot_vmem_bank_conflictsbool
174xla_jf_shard_f32_weight_across_loop_iterationsbool
175xla_jf_allow_cross_replica_sharding_on_batch_matmulbool
176xla_jf_fusion_max_vmem_mibdouble
177xla_tpu_pack_vloadsbool
178xla_tpu_pack_cloadsbool
179xla_tpu_sublane_shift_scratchpad_sizeint64
180xla_tpu_small_operand_count_for_loop_fusionint64
181xla_jf_fusion_max_instruction_count_for_window_configint64
182xla_tpu_copy_fusion_allow_splitbool
183xla_tpu_allow_in_cmem_copybool
184xla_jf_crs_combiner_threshold_in_bytesint64
185xla_jf_enable_hlo_pipelinebool
186xla_ior_fast_mem_run_production_msabool
187xla_ior_fast_mem_round_trip_production_msabool
188xla_ior_use_stored_solutionbool
189xla_ior_stored_solution_pathstring
190xla_use_llvm_backendbool
191xla_llvm_generate_xla_compatible_dwgbool
192xla_jf_llvm_use_fast_optbool
193xla_jf_llo2llvm_timing_infobool
194xla_jf_llvm_use_bitcode_dumpbool
195xla_jf_llvm_flagsmessageRepeatedStrings
196xla_jf_tanh_increased_precisionbool
197xla_jf_poison_vmem_allocationsbool
198xla_jf_hlo_deduplicate_onlystring
199xla_jf_hlo_deduplicationbool
200xla_tpu_instrument_hlo_operationsbool
201xla_tpu_instrumentation_configstring
202xla_tpu_merge_small_overlays_into_big_neighborsbool
203xla_tpu_net_router_trace_me_instrumentationbool
204xla_tpu_use_custom_tree_barrierbool
205xla_tpu_use_routing_table_indices_in_net_routerbool
206xla_tpu_use_routing_table_indices_in_tree_barrierbool
207bf16_coalescing_dump_killed_candidate_pairsbool
208bf16_coalescing_ignore_distance_for_pairingbool
209config_criterionstring
210jf_xla_partial_reduce_hbm_bw_ratiodouble
211jf_xla_partial_reduce_use_roofline_cost_fnbool
212rematerialization_algorithmstring
213rematerialize_even_if_memory_limit_not_reachedbool
215tpu_use_continuationsbool
216treewidth_rematerialization_minimize_memorybool
217use_op_tuner_configbool
219xla_enable_tracebool

#220 – #294 — AutoFDO Start, Conv Fusion, Pipelining, Overlay Compression, JF VMEM Family

The first AutoFDO toggle, the conv-fusion / conv-precision numerics block, the JF pipelining knobs, the overlay-compression and single-phase-ring transfer parameters, and the start of the per-family JF VMEM MSA cluster (#275–#286). Gaps: #244–#247, #250–#254, #268, #292, #294.

#Field nameTypeWrapper / message type
220xla_tpu_autofdobool
221xla_jf_auto_assign_mxubool
222xla_jf_auto_latch_lmrbool
223xla_jf_bf16_inside_cross_replica_sumbool
224xla_jf_bounds_checkmessageRangeSpecProto
225xla_jf_bounds_check_stridebool
226xla_jf_bounds_check_verbosebool
227xla_jf_collect_llo_stack_tracebool
228xla_jf_conv_base_dilation_adversarybool
229xla_jf_conv_full_precisionbool(NUMERICS)
230xla_jf_conv_increased_precisionbool(NUMERICS)
231xla_jf_conv_input_fusionbool
232xla_jf_conv_min_limit_vmem_mibdouble
233xla_jf_conv_output_fusionbool
234xla_jf_conv_prefers_padding_input_featurebool
235xla_jf_conv_reshape_fusionbool
236xla_jf_convolution_performance_targetdouble
237xla_jf_cp_pass_enabledbool
238xla_tpu_crs_bounds_check_threshold_chips_countint64
239xla_jf_debug_levelint64
240xla_jf_enable_final_priority_fusionbool
241xla_jf_enable_pipeliningbool
242xla_jf_enable_producer_consumer_multi_output_fusionbool
243xla_jf_experimental_cmem_for_hlo_outputsint64
248xla_jf_log_hlo_outputbool
249xla_jf_line_info_in_symbol_tablebool
255xla_jf_overlay_compression_thresholdint64
256xla_jf_poison_operands_before_emitterbool
257xla_jf_profile_cheap_opsbool
258xla_jf_program_hbm_alignment_in_kibint64
259xla_jf_random_latencybool
260xla_vf_vmem_use_ior_algorithmstring
261xla_jf_simplifier_pass_enabledbool
262xla_jf_single_phase_ring_max_kibint64
263xla_jf_single_phase_ring_thresholdint64
264xla_jf_span_size_in_kibint64
265xla_jf_spmd_conv_halo_exchange_always_on_lhsbool
266xla_jf_spmd_report_instruction_countint64
267xla_jf_spmd_threshold_for_windowed_einsum_mibint64
269xla_jf_tune_large_vmembool
270xla_jf_use_hw_constantsbool
271xla_jf_use_multi_colors_in_all_reduce_if_supportedbool
272xla_jf_use_rotated_pincer_emitterbool
273xla_jf_use_rotated_pincer_ring_emitterbool
274xla_jf_use_single_phase_ring_emitterbool
275xla_jf_vmem_default_cross_program_prefetch_heuristicbool
276xla_jf_vmem_enable_cross_program_prefetchbool
277xla_jf_vmem_enable_while_redundant_eviction_eliminationbool
278xla_jf_vmem_max_outstanding_evictionsint64
279xla_jf_vmem_max_outstanding_prefetchesint64
280xla_jf_vmem_max_overlap_to_mem_size_async_copy_ratiofloat
281xla_jf_vmem_max_repacksint64
282xla_jf_vmem_max_retriesint64
283xla_jf_vmem_memory_space_assignmentbool
284xla_jf_vmem_min_overlap_to_async_copy_ratiofloat
285xla_jf_vmem_preferred_overlap_to_async_copy_ratiofloat
286xla_jf_vmem_use_ior_algorithmstring
287xla_tpu_vmem_use_telamallocbool
288xla_max_concurrent_async_all_gathersint32
289xla_max_concurrent_async_collective_permutesint32
290xla_max_concurrent_host_send_recvint32
291xla_max_decomposed_all_reducesint64
293xla_set_split_input_output_dmasbool

NOTE — the xla_jf_vmem_* block (#275–#286) is the jellyfish arm of the per-family MSA overlap-ratio family. It is mirrored field-for-field by the xla_vf_vmem_* (#436–#447), xla_tpu_cmem_* (#309–#315), and xla_gf_vmem_* (#533–#547) arms later on this page. The three float ratios in each arm (max / min / preferred overlap-to-async-copy) are the MSA cost-model tuning surface; the per-version overlay picks one arm.


#295 – #354 — SparseCore Tracing, All-Reduce VMEM, CMEM MSA Family, Detect-NaN, Megacore Fusion

The first AutoProto-typed field (#295), the all-reduce VMEM contingency knobs, the xla_tpu_cmem_* MSA arm, the copy-fusion cluster, the NaN/Inf/special-FP detection toggles, and the megacore-fusion enable Tristate (#344). Gaps: #303, #335.

#Field nameTypeWrapper / message type
295xla_sparse_core_enable_hardware_tracingmessageAutoProto (SPARSE_CORE)
296xla_tpu_add_llo_regions_to_symbol_tablebool
297xla_tpu_all_reduce_vmem_contingency_kibint64
298xla_tpu_assign_all_reduce_scatter_layoutbool
299xla_tpu_auto_reduce_precisionbool
300xla_tpu_auto_spmd_keep_all_user_shardingsbool
301xla_tpu_auto_spmd_partitioning_memory_budget_gbint64
302xla_tpu_autofdo_skip_hlo_fingerprintsmessageRepeatedStrings
304xla_tpu_autotune_databasestring
305xla_tpu_binomial_all_reduce_use_physical_core_idsbool
306xla_tpu_block_rematerialization_record_statsbool
307xla_tpu_check_llo_typesbool
308xla_tpu_choose_faster_windowed_einsum_over_membool
309xla_tpu_cmem_max_overlap_to_mem_size_async_copy_ratiofloat
310xla_tpu_cmem_max_repacksint64
311xla_tpu_cmem_max_retriesint64
312xla_tpu_cmem_memory_space_assignmentbool
313xla_tpu_cmem_min_overlap_to_async_copy_ratiofloat
314xla_tpu_cmem_preferred_overlap_to_async_copy_ratiofloat
315xla_tpu_cmem_use_telamallocbool
316xla_tpu_conditional_code_motion_configstring
317xla_tpu_copy_elision_analysis_allowanceint64
318xla_tpu_copy_fusion_minimum_copy_size_in_bytesint64
319xla_tpu_copy_fusion_pad_unpad_ratiodouble
320xla_tpu_copy_fusion_thresholdint64
321xla_tpu_copy_insertion_use_region_analysisbool
322xla_tpu_copy_insertion_use_region_analysis_limitint64
323xla_tpu_decompose_all_gather_fusionbool
324xla_tpu_decompose_all_reduce_bidirectional_communicationbool
325xla_tpu_decompose_reduce_scatter_fusionbool
326xla_tpu_deduplicated_hlo_min_bundle_countint64
327xla_tpu_detect_infbool
328xla_tpu_detect_llo_nanbool
329xla_tpu_detect_nanbool
330xla_tpu_detect_only_on_fusionbool
331xla_tpu_detect_special_fpbool
332xla_tpu_dot_dot_fusionbool
333xla_tpu_dot_dot_fusion_duplicatedbool
334xla_tpu_dot_dot_fusion_separable_convs_onlybool
336xla_tpu_enable_all_experimental_scheduler_featuresbool
337xla_tpu_enable_all_reduce_scatter_fusionbool
338xla_tpu_enable_asymmetric_max_colorsbool
339xla_tpu_enable_copy_fusionbool
340xla_tpu_enable_copy_permute_minor_fusionbool
341xla_tpu_enable_cross_module_binomial_all_reducebool
342xla_tpu_enable_deduplicated_callsenumTristateProto.Value
343xla_tpu_enable_experimental_fusion_cost_modelbool
344xla_tpu_enable_megacore_fusionenumTristateProto.Value
345xla_tpu_enable_megascale_barrierbool
346xla_tpu_enable_multi_level_input_dot_dot_fusionbool
347xla_tpu_enable_multi_level_nested_dot_fusionbool
348xla_tpu_enable_multi_level_nested_loop_fusionbool
349xla_tpu_enable_multi_level_output_dot_dot_fusionbool
350xla_tpu_enable_nd_wus_on_partial_active_dimsbool
351xla_tpu_enable_pincer_short_fusion_emitterbool
352xla_tpu_enable_scheduler_memory_pressure_trackingenumTristateProto.Value
353xla_tpu_enable_sparse_gradient_rewritebool
354xla_tpu_enable_untiled_layoutbool(is_used_at_runtime)

GOTCHA — xla_tpu_detect_inf (#327) and xla_tpu_detect_nan (#329) are TCE proto fields distinct from the same-named xla.DebugOptions fields (#136/#135). The TCE fields are flag-wired and materialized from the absl default; the DebugOptions ones are born proto3-zero (FALSE) unless the flag override layer sets them. Do not conflate the two protos when reimplementing the detect-NaN path.


#355 – #432 — Untiled Layout, Experimental Padding, Megacore-Fusion Tuning, Poison-Padding, SPMD Windowed Einsum

The untiled-VMEM-DMA toggle, the experimental quantization/padding knobs, the megacore-fusion margin/scaling floats, the MSA while-execution count, the nested-dot-fusion cluster, the poison-padding pair, the RWB-fusion bool, and the SPMD windowed-einsum decomposition block. Gaps: #356–#359, #362, #379, #401, #419, #422.

#Field nameTypeWrapper / message type
355xla_tpu_enable_vmem_to_vmem_dmasbool
360xla_tpu_experimental_allow_fast_quantization_conversionsbool
361xla_tpu_experimental_cmem_fraction_for_hlo_outputsfloat
363xla_tpu_experimental_max_concat_padding_ratiodouble
364xla_tpu_experimental_max_padding_gibdouble
365xla_tpu_extra_hoisting_range_for_register_producersint64
366xla_tpu_force_startup_barrier_in_binomial_all_reducebool
367xla_tpu_force_vmem_dma_and_spansbool
368xla_tpu_fusion_config_collectionstring
369xla_tpu_handle_reduce_window_as_convolutionbool
370xla_tpu_hbm_bwdouble
371xla_tpu_hbm_initial_cycle_penaltyint64
372xla_tpu_input_conv_multi_usersbool
373xla_tpu_insert_dummy_fusions_on_conv_kernelsbool
374xla_tpu_licm_size_inflation_ratiofloat
375xla_tpu_llo_compilation_max_retriesint32
376xla_tpu_local_dma_pipe_dma_from_cmem_min_chunksint64
377xla_tpu_local_dma_pipe_dma_to_cmem_min_chunksint64
378xla_tpu_log_current_and_new_fusion_cost_modelsbool
380xla_tpu_max_clustered_loadsint64
381xla_tpu_max_vld_live_rangeint64
382xla_tpu_max_vmreg_live_rangeint64
383xla_tpu_megacore_fusion_allow_agsbool
384xla_tpu_megacore_fusion_disable_live_out_agsbool
385xla_tpu_megacore_fusion_good_overload_marginfloat
386xla_tpu_megacore_fusion_latency_bound_ar_fusion_sizeint64
387xla_tpu_megacore_fusion_orthogonal_agbool
388xla_tpu_megacore_fusion_orthogonal_arsbool
389xla_tpu_megacore_fusion_scaling_factorfloat
390xla_tpu_memory_space_assignment_while_execution_countint64
391xla_tpu_multioutput_fusion_max_operandsint64
392xla_tpu_nested_dot_fusionbool
393xla_tpu_nested_dot_fusion_supported_custom_opsstring
394xla_tpu_nested_dot_fusion_vmem_fractiondouble
395xla_tpu_op_tracemarksbool
396xla_tpu_optimize_bf16_mathbool(NUMERICS)
397xla_tpu_order_dot_after_layoutbool
398xla_tpu_override_rwb_tpu_limitationbool
399xla_tpu_poison_paddingbool
400xla_tpu_poison_padding_valueint32
402xla_tpu_prefer_async_allgather_to_allreducebool
403xla_tpu_prefer_binomial_single_phase_ring_emitterbool
404xla_tpu_prefer_dynamic_padbool
405xla_tpu_profile_traceme_levelint32
406xla_tpu_random_all_reduce_delay_maskuint32
407xla_tpu_reduce_window_reduction_dim_maxint64
408xla_tpu_remove_bf16_bitcast_converts_for_allbool
409xla_tpu_room_for_potential_register_dependencyint64
410xla_tpu_rotated_pincer_pack_allgather_fusionbool
411xla_tpu_run_all_reduce_simplifierbool
412xla_tpu_run_space_to_batchbool
413xla_tpu_rwb_fusionbool
414xla_tpu_schedule_send_recvsenumTristateProto.Value
415xla_tpu_scheduler_using_real_cost_modelenumTristateProto.Value
416xla_tpu_scoped_cmem_for_all_reduceint64
417xla_tpu_scoped_hbm_for_all_reduceint64
418xla_tpu_scoped_vmem_limit_kibint64
420xla_tpu_sg_sshfl_ignore_bitsint64
421xla_tpu_short_pincer_single_step_max_chunks_per_chipint64
423xla_tpu_spmd_bidirectional_windowed_einsumbool
424xla_tpu_spmd_f32_accum_for_bf16_arbool
425xla_tpu_spmd_rng_bit_generator_unsafebool
426xla_tpu_spmd_unroll_windowed_einsumbool
427xla_tpu_spmd_windowed_einsum_decompose_agbool
428xla_tpu_spmd_windowed_einsum_decompose_rsbool
429xla_tpu_store_to_load_forwarding_windowint64
430xla_tpu_threshold_for_long_live_range_in_flowdownint64
431xla_tpu_uni_direction_ring_max_sizeint64
432xla_tpu_untiled_layout_for_1D_onlybool(is_used_at_runtime)

NOTE — xla_tpu_rwb_fusion (#413) is one of two fields (with xla_tpu_accumulate_into_mrb, #597 on page B) whose absl default is TRUE despite an earlier help-string reading of false. The registered flag default is byte-authoritative; the help text describes behavior, not the default. See TCE Field-Offsets & Flag Defaults.


#433 – #497 — Aggressive Scheduling, VF VMEM Family, Async Collective Fusion, AutoFDO/Autotune

The aggressive-scheduling Tristate, the full xla_vf_vmem_* MSA arm (#436–#447), the send/recv aggregation cluster, the async-collective-fusion block (the first SparseDenseMatmulFdoConfig message at #462), the AutoFDO module flags, and the autotune-by-pass toggles. Gaps: #434, #448, #468, #470, #481, #484, #491.

#Field nameTypeWrapper / message type
433xla_tpu_use_aggressive_schedulingenumTristateProto.Value
435xla_tpu_use_resilient_collective_emitterbool
436xla_vf_vmem_default_cross_program_prefetch_heuristicbool
437xla_vf_vmem_enable_cross_program_prefetchbool
438xla_vf_vmem_enable_cross_program_prefetch_freeingbool
439xla_vf_vmem_enable_while_redundant_eviction_eliminationbool
440xla_vf_vmem_max_outstanding_evictionsint64
441xla_vf_vmem_max_outstanding_prefetchesint64
442xla_vf_vmem_max_overlap_to_mem_size_async_copy_ratiofloat
443xla_vf_vmem_max_repacksint64
444xla_vf_vmem_max_retriesint64
445xla_vf_vmem_memory_space_assignmentbool
446xla_vf_vmem_min_overlap_to_async_copy_ratiofloat
447xla_vf_vmem_preferred_overlap_to_async_copy_ratiofloat
449xla_tpu_enable_send_recv_aggregationbool
450xla_tpu_enable_data_parallel_all_reduce_optbool(PIPELINER)
451xla_tpu_vector_load_fusion_windowint64
452xla_tpu_fuse_only_phase0_in_2d_reduce_scatterbool
453xla_tpu_sdc_checker_log_inputs_on_sdc_eventbool
454xla_tpu_enable_expression_constant_splitterbool
455xla_jf_hlo_deduplication_all_uniquebool
456xla_tpu_pre_fusion_rematbool
457xla_tpu_sc_megachip_temporal_reusebool
458xla_tpu_enable_async_all_to_allbool
459xla_tpu_auto_spmd_partitioning_memory_budget_ratiofloat
460xla_sc_disable_megacore_partitioningbool
461xla_tpu_enable_async_collective_fusionbool
462xla_tpu_sparse_dense_matmul_fdo_configmessageSparseDenseMatmulFdoConfig
463xla_tpu_enable_async_collective_fusion_multiple_stepsbool
464xla_vf_max_vmem_used_by_memory_space_assignmentint64
465xla_tpu_enable_async_collective_fusion_fuse_all_gatherenumTristateProto.Value
466xla_tpu_enable_async_collective_fusion_fuse_all_reducebool
467xla_tpu_megacore_fusion_orthogonal_ars_marginfloat
469xla_tpu_autofdo_module_flagsbool
471xla_tpu_autofdo_module_layoutsbool
472xla_tpu_autofdo_skip_module_fingerprintsmessageRepeatedStrings
473xla_tpu_autotune_dotsbool
474xla_tpu_autotune_flagsbool
475xla_tpu_autotune_fusionsbool
476xla_tpu_autotune_layoutsbool
477xla_tpu_autotune_memory_space_assignmentbool
478xla_tpu_split_cluster_gapint64
479xla_tpu_split_cluster_sizeint64
480xla_tpu_autofdo_hlo_module_size_thresholdint32
482xla_tpu_custom_fusion_no_global_unfusiblebool
483xla_tpu_custom_fusion_traverse_edges_twicebool
485xla_tpu_overlap_compute_collective_tcbool
486xla_tpu_max_send_recv_aggregationint32
487xla_tpu_vmac_transform_strategyenumTpuVmacTransformStrategyProto.Value
488xla_tpu_vector_store_fusion_windowint64
489xla_tpu_async_copy_bandwidth_scaling_factorfloat
490xla_tpu_metrics_filename_prefixstring
492xla_tpu_enable_domain_passesbool
493xla_tpu_experimental_enable_dynamic_int8_quantizationbool
494xla_max_cross_program_prefetchesint64
495xla_tpu_enable_host_aware_passesbool
496xla_tpu_disallow_in_alt_memstring
497xla_tpu_allow_deeply_nested_fusion_numerical_diffbool

#498 – #560 — Prefetch FIFO, ICI-SDC Test, Data-Parallel DCN, GF VMEM Family, AG Pipelining

The closing block of this page: prefetch-interval-picker overrides, the numerics accurate_log2, the ICI-SDC self-test cluster (#516–#521, #560), the data-parallel DCN pipelining knobs, the full xla_gf_vmem_* MSA arm (#533–#547), the overlay-profiler toggles, and the AG-backward-pipelining group. Gaps: #506, #509, #511, #515, #527, #530, #531, #546.

#Field nameTypeWrapper / message type
498xla_tpu_prefetch_interval_picker_size_overrideint64
499xla_tpu_force_1d_allreduce_at_chunk_countint64
500xla_tpu_enable_aggressive_loop_fusion_layout_optbool
501xla_tpu_use_repeated_instance_for_preferred_prefetch_timebool
502xla_tpu_enforce_prefetch_fifo_orderbool
503xla_tpu_reduce_loop_fusion_dup_with_unfusable_userbool
504xla_tpu_accurate_log2bool(NUMERICS)
505xla_tpu_dcn_max_overlap_estimationfloat
507xla_tpu_autofdo_op_windowsbool
508xla_tpu_dcn_overlap_limitint64
510xla_tpu_enable_log_recorder_partitioned_loggingbool
512xla_tpu_no_crash_on_oombool
513xla_lhs_enable_release_start_policybool
514xla_tpu_debug_sflag_wait_shalt_on_detectionuint32
516xla_tpu_ici_sdc_test_buffer_size_chunksint32
517xla_tpu_ici_sdc_test_packet_size_chunksint32
518xla_tpu_ici_sdc_test_iterationsint32
519xla_tpu_ici_sdc_test_max_distanceint32
520xla_tpu_ici_sdc_test_pipeline_depthint32
521xla_tpu_ici_sdc_test_inject_mismatch_for_testing_onlybool
522xla_tpu_mock_send_recv_hostbool
523xla_tpu_data_parallel_dcn_ar_dual_pipelininguint32(PIPELINER)
524xla_tpu_enable_aggressive_broadcast_priority_updatebool
525xla_tpu_data_parallel_opt_different_sized_opsuint32(PIPELINER)
526xla_tpu_ars_halo_exchange_countint64
528xla_tpu_dus_emitter_desired_update_window_chunk_countint64
529xla_tpu_spmd_f32_accum_for_bf16_ar_min_subgroup_sizeint64
532xla_tpu_show_overlay_waits_in_profilerbool
533xla_gf_vmem_use_ior_algorithmstring
534xla_gf_vmem_default_cross_program_prefetch_heuristicbool
535xla_gf_vmem_enable_cross_program_prefetchbool
536xla_gf_vmem_enable_cross_program_prefetch_freeingbool
537xla_gf_vmem_enable_while_redundant_eviction_eliminationbool
538xla_gf_vmem_max_outstanding_evictionsint64
539xla_gf_vmem_max_outstanding_prefetchesint64
540xla_gf_vmem_max_overlap_to_mem_size_async_copy_ratiofloat
541xla_gf_vmem_max_repacksint64
542xla_gf_vmem_max_retriesint64
543xla_gf_vmem_memory_space_assignmentbool
544xla_gf_vmem_min_overlap_to_async_copy_ratiofloat
545xla_gf_vmem_preferred_overlap_to_async_copy_ratiofloat
547xla_gf_max_vmem_used_by_memory_space_assignmentint64
548xla_tpu_allow_multi_dim_reduce_rwbbool
549xla_tpu_show_overlay_overhead_in_profilerbool
550xla_tpu_show_overlay_details_in_profilerbool
551xla_tpu_overlay_allocation_table_sizeint64
552xla_tpu_enable_ag_backward_pipeliningbool(PIPELINER)
553xla_tpu_prefuse_self_attentionbool
554xla_tpu_enable_indexing_optimizationsint64
555xla_tpu_allow_layout_negotiationbool
556xla_tpu_backward_propagate_reducebool
557xla_tpu_decompose_all_gather_einsumbool
558xla_tpu_decompose_einsum_reduce_scatterbool
559xla_tpu_keep_hlo_proto_literals_up_touint64
560xla_tpu_ici_sdc_test_run_on_program_startbool

NOTE — field #560 xla_tpu_ici_sdc_test_run_on_program_start is the last field on this page. The next field, #561 xla_tpu_debug_sflag_wait_timeout_ms (uint32), opens TCE Field Dictionary (B). The boundary is exact: this page owns 1 ≤ field# ≤ 560, page B owns 561 ≤ field# ≤ 1218.


Field-Number Gaps in #1–#560

The numbering is not contiguous: TCE carries no reserved_range or reserved_name, so a deleted field simply leaves a hole. The gaps inside this page's range — every field number in 1..560 that is not a live field above — are:

21, 44, 45, 54, 59, 70, 121, 124, 125, 126, 127, 156, 214, 218,
244, 245, 246, 247, 250, 251, 252, 253, 254, 268, 292, 294, 303,
335, 356, 357, 358, 359, 362, 379, 401, 419, 422, 434, 448, 468,
470, 481, 484, 491, 506, 509, 511, 515, 527, 530, 531, 546

GOTCHA — these holes are deletions, not declared-reserved ranges. A reimplementer carving the descriptor must not assume the field set is 1..N contiguous; iterate the actual FieldEntry array (sorted ascending) and skip the missing numbers. The full 97-gap list across 1..1218 lives on TpuCompilationEnvironment; the slice above is the portion below #561.


Type Distribution Within #1–#560

For orientation, the proto base-type mix of the 508 live fields on this page (the lower half is bool-heavy, consistent with the early jellyfish toggle era):

Base typeNotes
boolthe dominant type; most early knobs are simple feature toggles
int64thresholds, counts, byte limits, ring sizes
int32a smaller integer set (trip counts, retry caps, ICI-SDC test sizes)
uint32 / uint64a handful: delay masks (#406), keep_hlo_proto_literals_up_to (#559), DCN-pipelining (#523/#525), sflag-wait shalt (#514)
float / doublethe MSA overlap ratios and the megacore-fusion / cost-model scaling factors
stringprofile paths, config selectors, filename prefixes, config_criterion / rematerialization_algorithm
enum13 in this range; TristateProto.Value dominates, plus MemorySchedulerProto.Value (#31), VerifyOrAssignTilingFlagsProto.Value (#132), TpuVmacTransformStrategyProto.Value (#487)
messageRangeSpecProto (bounds/ISA-emitter knobs), RepeatedStrings (LLVM flags, FDO skip lists), the first AutoProto (#295), and typed helpers (SparseDenseMatmulFdoConfig #462)

The whole-proto histogram (418 bool / 148 int64 / 349 message / 74 enum / 37 string / 34 float / 32 int32 / 14 double / 11 uint32 / 4 uint64) and the parse-table-derived type_card proof are on TpuCompilationEnvironment.


Confidence

Every field number, name, and type on this page is CERTAIN: the names derive from the TpuCompilationEnvironment FileDescriptorProto carved from the binary and were cross-checked against the 1121-entry FieldEntry array of TpuCompilationEnvironment::_table_ @ 0x21cfa9e0, whose entry order is field-number-sorted. Twenty-two of the names spanning the full #1–#560 range were additionally confirmed byte-present, verbatim, in the binary's .rodata string pool (0 misses), including the wrapper type strings TristateProto, VerifyOrAssignTilingFlags, AutoProto, and RangeSpecProto. The HOT-subsystem tags shown in italics for a few fields (NUMERICS, PIPELINER, SPARSE_CORE) and the four is_used_at_runtime markers come from the TpuCompEnvFieldOptions field-option extension (#535801365) and are HIGH confidence — recovered from the descriptor, not the parse table; not every tagged field is annotated here (the complete HOT breakdown is on TpuCompilationEnvironment). No field on this page is LOW confidence.


Cross-References