TCE Field Dictionary (A)
All field numbers, names, types, offsets, and addresses on this page apply to
libtpu.sofrom thelibtpu-0.0.40-cp314wheel (build-id89edbbe81c5b328a958fe628a9f2207d,libtpu_lts_20260413_b_RC00). Other versions renumber and re-offset fields.
Abstract
TpuCompilationEnvironment (TCE) is the TPU-private master compile-config protobuf: a single editions-proto message carrying 1121 live fields, every one of which is also a registered absl::Flag (a perfect 1:1 mapping, the structural inverse of the GPU/CPU-shared xla.DebugOptions). This page is the field-number → name → proto type dictionary for the lower half of that proto: field numbers 1 through 560. The matching upper half — field 561 through 1218 (max field number 1218, with 97 deletion gaps) — lives on TCE Field Dictionary (B). The split point is exact and stated below.
The dictionary is reconstructed from the TpuCompilationEnvironment FileDescriptorProto carved out of the binary at protodesc_cold VA 0xbfa6060 (size 137,692 B) and cross-checked against the generated fast-parse table TpuCompilationEnvironment::_table_ @ 0x21cfa9e0, whose 1121-entry FieldEntry array is sorted by ascending field number. Each line of this page gives the field number, the verbatim field name (every name on this page was confirmed byte-present in the binary's .rodata string pool), the proto base type, and — for the 423 wrapped fields — the wrapper message/enum type. The C++ struct offset and the per-field literal default value are deliberately not repeated here; they are the subject of TCE Field-Offsets & Flag Defaults, and the structural overview (parse-table header, type histogram, HOT-tag taxonomy, AutoProto switch) is on TpuCompilationEnvironment.
For a reimplementer, the contract this page satisfies is narrow and precise:
- Field number → name. The canonical numbering a serialized TCE proto uses on the wire and that
GetTpuCompEnvWithDefaultValueswalks when it resolves each field to itsabsl::Flag. - Name → proto base type. bool / int32 / int64 / uint32 / uint64 / float / double / string / enum / message — the wire type that decides how the value is parsed and stored.
- Wrapper type, where present. Which enum (
TristateProto.Value,MemorySchedulerProto.Value, …) or which message (AutoProto,RangeSpecProto, the typed helpers) a non-scalar field carries.
| Proto | TpuCompilationEnvironment (editions proto, all fields optional) |
| Descriptor source | FileDescriptorProto @ 0xbfa6060 (137,692 B) |
| Parse table | TpuCompilationEnvironment::_table_ @ 0x21cfa9e0 (1121 FieldEntry, ascending field#) |
| This page covers | field #1 – #560 (lower half) |
| Page B covers | field #561 – #1218 (dictionary B) |
| Split boundary | #560 xla_tpu_ici_sdc_test_run_on_program_start ends here; #561 xla_tpu_debug_sflag_wait_timeout_ms opens page B |
| Names verified | 22 spot-checked verbatim in .rodata strings; 0 misses |
How to Read These Tables
Each row is one live field, in ascending field-number order. Columns are uniform across every group:
- # — the proto field number (the wire tag). Numbers are not contiguous: TCE has 97 deletion gaps in
1..1218(noreserved_range; deleted fields simply vanish from the descriptor). Gaps inside this page's range are listed in §Field-Number Gaps. - Field name — the verbatim proto field name, identical to the registered
absl::Flagname. Most carry thexla_tpu_/xla_jf_/xla_prefixes; a handful are bare (config_criterion,loop_invert,rematerialization_algorithm). - Type — the proto base type. For
enumandmessage, the Wrapper / message type column names the concrete type.
NOTE — the 423 non-scalar fields split into 74
enum-typed (67 of themTristateProto.Value= AUTO/DISABLED/ENABLED, plus 7 direct helper enums) and 349message-typed (330 are the 30-armAutoProtoswitch wrapper; the remaining 19 are typed helper messages such asRangeSpecProto,RepeatedStrings, and themsa.*option messages). The wrapper enum value-by-value tables and the AutoProto arm list live on TpuCompilationEnvironment; this page only names the wrapper per field.
GOTCHA — a field's position on this page is its field number, not its struct offset. The two are unrelated: the parse table sorts
FieldEntryby field number, but the C++_impl_layout interleaves them by type/alignment (field #2 sits at struct offset+0xBC, field #1 at+0xB8, field #3 at+0xA8). Do not infer offset from field number; use TCE Field-Offsets & Flag Defaults.
#1 – #30 — Collectives, Cross-Program Prefetch, CMEM, Embedding
The lowest field numbers are the oldest jellyfish-era knobs: all-gather emitters, the net-router ring limits, the (now mostly deprecated) cross-program-prefetch / scoped-CMEM cluster, and the first embedding tunable. Field #21 is a gap.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 1 | xla_enable_async_collective_permute | enum | TristateProto.Value |
| 2 | xla_tpu_sdc_checker_instrument_megacore_fusion | bool | |
| 3 | xla_tpu_scoped_vmem_limit_sweep_profile_path | string | |
| 4 | xla_tpu_allocate_scoped_vmem_at_same_offset | bool | |
| 5 | xla_enable_all_gather_2d_emitter | bool | |
| 6 | xla_always_enable_all_gather_2d_asymmetric | bool | |
| 7 | xla_enable_all_gather_3d_emitter | bool | |
| 8 | xla_always_enable_all_gather_3d_asymmetric | bool | |
| 9 | xla_tpu_enable_minor_all_gather | bool | |
| 10 | xla_tpu_use_routing_table_indices_in_all_gather | bool | |
| 11 | xla_tpu_enable_net_router_in_all_gather | bool | |
| 12 | xla_tpu_cross_module_net_router_ring_limit | int64 | |
| 13 | xla_tpu_cross_replica_net_router_ring_limit | int64 | |
| 14 | xla_tpu_max_cmem_used_by_memory_space_assignment | int64 | |
| 15 | xla_enable_cross_program_prefetch | bool | |
| 16 | xla_tpu_enable_cross_program_prefetch_freeing | bool | |
| 17 | xla_tpu_cmem_enable_while_redundant_eviction_elimination | bool | |
| 18 | xla_tpu_cmem_max_outstanding_prefetches | int64 | |
| 19 | xla_tpu_cmem_max_outstanding_evictions | int64 | |
| 20 | xla_tpu_allocate_scoped_cmem_at_same_offset | bool | |
| 22 | xla_default_cross_program_prefetch_heuristic | bool | |
| 23 | xla_tpu_ior_cmem | string | |
| 24 | megascale_use_one_to_all_for_gather | bool | |
| 25 | xla_tpu_enable_async_pincer_emitter | bool | |
| 26 | xla_tpu_write_cmem_output_via_stores_on_megacore | bool | |
| 27 | megascale_use_dcn_all_to_all_in_collectives_mask | int64 | |
| 28 | xla_hlo_scheduling_brkga_compute_runtime_estimates | bool | |
| 29 | xla_tpu_wait_n_cycles_before_program_termination | int64 | |
| 30 | xla_tpu_embedding_table_oblongness_threshold | float | (is_used_at_runtime) |
NOTE — field #30 is one of only four fields in the whole proto with
is_used_at_runtime=true; the others (#354, #432, andxla_tpu_num_embedding_devices) live further on. The bit lives in theTpuCompEnvFieldOptionsextension (#535801365), not in the parse table. Fields #4, #14, #15, #16, #17, #18, #19, #20, #22, #23 are among the 101deprecated=truefields, superseded by the per-familyxla_{jf,vf,gf,cmem}_vmem_*knobs.
#31 – #95 — Scheduler, Tracing, BRKGA, LLO/LSRA, SDC Checker
This block opens with the memory-scheduler selector (#31), then a run of trace/profile toggles, the BRKGA HLO-scheduling parameters, the early LLO/LSRA register-allocation knobs, the combiner thresholds, and the first big SDC-checker cluster. Field #21, #44, #45, #54, #59, #70 are gaps in this range.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 31 | xla_memory_scheduler | enum | MemorySchedulerProto.Value |
| 32 | xla_enable_profiler | bool | |
| 33 | xla_enable_hlo_trace | bool | |
| 34 | xla_trace_only_stalling_hlo | bool | |
| 35 | xla_enable_module_trace | bool | |
| 36 | xla_enable_mxu_trace | bool | |
| 37 | xla_enable_transpose_trace | bool | |
| 38 | xla_jf_trivial_traces | bool | |
| 39 | xla_jf_module_tracemarks | bool | |
| 40 | xla_hbm_logging_buffer_size_bytes | int64 | |
| 41 | xla_hlo_scheduling_brkga_generation_limit | int64 | |
| 42 | xla_hlo_scheduling_brkga_computation_limit | int64 | |
| 43 | xla_hlo_scheduling_brkga_enable_as_fallback | bool | |
| 46 | xla_enable_megacore_hbm_spill | bool | |
| 47 | xla_enable_lru_free_reg_assignment | bool | |
| 48 | xla_jf_emit_annotations | bool | |
| 49 | xla_jf_llo_level | int32 | |
| 50 | xla_jf_naive_bundle_packer | message | RangeSpecProto |
| 51 | xla_jf_track_bundle_dependency_indices | bool | |
| 52 | xla_jf_pack_latches | bool | |
| 53 | xla_jf_bf16_propagation | bool | |
| 55 | xla_tpu_arf_combiner_threshold_in_bytes | int64 | |
| 56 | xla_tpu_ars_combiner_threshold_in_bytes | int64 | |
| 57 | xla_tpu_agf_combiner_threshold_in_bytes | int64 | |
| 58 | xla_jf_crs_combiner_threshold_count | int64 | |
| 60 | xla_jf_bounds_check_annotate_only | message | RangeSpecProto |
| 61 | xla_jf_reconstruct_hlo_from_proto | bool | |
| 62 | xla_jf_slot_tracker_hoist_limit | int64 | |
| 63 | xla_jf_enable_multi_output_fusion | bool | |
| 64 | xla_jf_emit_global_barrier_at_start | bool | |
| 65 | xla_jf_lsra_v2_alloc_only | message | RangeSpecProto |
| 66 | xla_jf_lsra_v2_reserved_smem | int64 | |
| 67 | xla_jf_lsra_v2_annotate | bool | |
| 68 | xla_jf_llo_rematerialization_parameter_threshold | int64 | |
| 69 | xla_ra_split | message | RangeSpecProto |
| 71 | xla_jf_rematerialization_percent_shared_memory_limit | int64 | |
| 72 | xla_tpu_rematerialization_max_block_size | int64 | |
| 73 | xla_tpu_block_rematerialization_factor | int64 | |
| 74 | xla_tpu_rematerialization_min_size_in_bytes | int64 | |
| 75 | xla_jf_order_barna_core_feed_serialize | bool | |
| 76 | xla_jf_order_barna_core_feed_overlap | bool | |
| 77 | xla_jf_conditional_code_motion | bool | |
| 78 | xla_jf_auto_cross_replica_sharding | bool | |
| 79 | xla_jf_use_vdelay | bool | |
| 80 | xla_tpu_autotune_windows_service | string | |
| 81 | xla_tpu_autotune_windows | bool | |
| 82 | xla_tpu_autotune_phase_ordering | bool | |
| 83 | xla_tpu_log_post_opt_fingerprints | bool | |
| 84 | xla_tpu_enable_sdc_checker | bool | |
| 85 | xla_tpu_sdc_check_fail_ratio_debug_only | int64 | |
| 86 | xla_tpu_sdc_check_repeat_count | int64 | |
| 87 | xla_tpu_sdc_check_inputs | bool | |
| 88 | xla_tpu_sdc_replicate_llo | bool | |
| 89 | tpu_sdc_checker_filter_loop_iteration_depth | int32 | |
| 90 | tpu_sdc_checker_filter_loop_iteration_check | int32 | |
| 91 | tpu_sdc_checker_filter_loop_iteration_skip | int32 | |
| 92 | xla_tpu_sdc_check_halt_on_detection | bool | |
| 93 | xla_tpu_sdc_check_log_full_hlo | bool | |
| 94 | xla_tpu_sdc_extra_llo_replica_count | int64 | |
| 95 | xla_tpu_sdc_duplicate_mxu_instructions | bool |
#96 – #159 — VLIW Scheduling, Convolution Precision, Bounds-Check, IOR/LLVM Backend
A scheduling-and-codegen block: the VLIW scheduler and its fuel, conv-precision numerics, the bounds-check RangeSpec family, the LLVM-backend toggles, and the first loop_invert pair. Gaps: #121, #124–#127, #156.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 96 | xla_enable_async_all_gather | enum | TristateProto.Value |
| 97 | xla_tpu_enable_log_recorder | bool | |
| 98 | xla_jf_all_to_all_shard_kib | int64 | |
| 99 | xla_tpu_all_to_all_with_different_output_addresses | bool | |
| 100 | xla_jf_xlu_optimizer | bool | |
| 101 | xla_jf_auto_assign_xlu | bool | |
| 102 | xla_jf_critical_path_scheduler | bool | |
| 103 | xla_jf_load_cse_and_s2l_forwarding | bool | |
| 104 | xla_tpu_load_store_optimizations | bool | |
| 105 | xla_jf_accumulation_reassociation | bool | |
| 106 | xla_jf_vliw_scheduler | bool | |
| 107 | xla_jf_vliw_fuel | int64 | |
| 108 | xla_tpu_force_send_recv_host_on_same_resource | enum | TristateProto.Value |
| 109 | xla_jf_allow_cross_replica_sharding_on_certain_reduce | bool | |
| 110 | xla_jf_cross_replica_sharding_also_try_reducing_min_span_by_factor | float | |
| 111 | xla_tpu_allow_input_fusion_in_certain_reduce_ops | bool | |
| 112 | xla_tpu_allow_sharding_on_minor_dim | bool | |
| 113 | xla_jf_enable_buffer_alias | bool | |
| 114 | xla_jf_verify_sync_flags | bool | |
| 115 | xla_tpu_verify_matmul | bool | |
| 116 | xla_jf_use_rng_bit_generator_emitter | bool | |
| 117 | xla_tpu_expand_rng_bit_generator | bool | |
| 118 | xla_jf_module_group_simplifier | bool | |
| 119 | xla_tpu_ior_remat | bool | |
| 120 | xla_enable_scalar_multiply_reduction | bool | |
| 122 | xla_optimize_llo_for_llvm | bool | |
| 123 | xla_tpu_run_space_to_batch_on_new_platforms | bool | |
| 128 | xla_tpu_min_elements_for_while_loop_concat_code_motion | int64 | |
| 129 | xla_tpu_sharding_metadata | bool | |
| 130 | xla_tpu_reverse_layout_computation_order | bool | |
| 131 | xla_tpu_spmd_threshold_for_allgather_cse | int64 | |
| 132 | xla_tpu_verify_or_assign_tiling_before_lowering | enum | VerifyOrAssignTilingFlagsProto.Value |
| 133 | xla_tpu_spmd_decompose_sharded_concats | bool | |
| 134 | xla_tpu_enable_graph_splitting | bool | |
| 135 | xla_jf_always_use_windowed_reduce | bool | |
| 136 | xla_tpu_max_kept_sublanes_for_reduce | int64 | |
| 137 | xla_tpu_check_nan_on_reduce | bool | |
| 138 | xla_tpu_use_tree_reduce | bool | |
| 139 | xla_tpu_uni_direction_ring_max_size_2d_plane | int64 | |
| 140 | xla_tpu_1d_uni_direction_ring_min_input_size_chunks | int64 | |
| 141 | xla_tpu_nd_short_transfer_max_chunks | int64 | |
| 142 | xla_tpu_enable_pincer_short_emitter | bool | |
| 143 | xla_tpu_permute_size4_cross_module_rings | bool | |
| 144 | xla_tpu_enable_2d_cross_module_reduce_scatter | bool | |
| 145 | xla_tpu_use_strided_strategy_nd | bool | |
| 146 | xla_tpu_checksum_all_reduce_transfers | bool | |
| 147 | xla_tpu_enforce_two_phase_sharding_topology | bool | |
| 148 | xla_tpu_use_routing_table_indices_in_all_reduce | bool | |
| 149 | xla_max_concurrent_send_recv | int32 | |
| 150 | xla_tpu_enable_latency_hiding_scheduler | bool | |
| 151 | xla_tpu_licm_analysis_allowance | int64 | |
| 152 | xla_tpu_scheduler_percent_shared_memory_limit | int64 | |
| 153 | xla_tpu_spmd_auto_partitioning | bool | |
| 154 | xla_tpu_perform_spmd_cse_prevention | bool | |
| 155 | xla_tpu_verify_device_assignment_in_runtime | bool | |
| 157 | xla_jf_experimental_vmem_for_hlo_outputs | int64 | |
| 158 | xla_jf_use_cost_based_memory_coloring | bool | |
| 159 | xla_jf_always_overlay | bool |
QUIRK — field #132
xla_tpu_verify_or_assign_tiling_before_loweringis aVerifyOrAssignTilingFlagsProto.Valueenum (NONE=0 / VERIFY=1 / ASSIGN=2), not a bool. A reimplementer who treats everyxla_tpu_*verify*knob as a boolean toggle will mis-parse this one — it gates the tile-mode selector consumed at struct offset+0xDFC.
#160 – #219 — Loop-Invert, ISA Emitter, IOR/MSA, LLVM, Instrumentation
The loop_invert / ISA-emitter RangeSpec cluster, the IOR fast-mem and stored-solution MSA knobs, the LLVM-backend group, the HLO-dedup pair, and the bf16-coalescing / rematerialization-algorithm knobs. The two bare-named strings config_criterion (#209) and rematerialization_algorithm (#212) appear here. Gaps: #214, #218.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 160 | loop_invert | message | RangeSpecProto |
| 161 | loop_invert_modules | message | RangeSpecProto |
| 162 | xla_llvm_isa_emitter | bool | |
| 163 | xla_llvm_isa_emitter_bundles | message | RangeSpecProto |
| 164 | xla_llvm_isa_emitter_force | bool | |
| 165 | xla_run_scoped_memory_assignment | bool | |
| 166 | xla_jf_loop_trip_count | int32 | |
| 167 | xla_jf_internal_prefetch_overlays | bool | |
| 168 | xla_tpu_last_overlay_prefetches_first | bool | |
| 169 | xla_tpu_put_trap_at_hlo_marker_index | int64 | |
| 170 | xla_tpu_single_overlay_mode | bool | |
| 171 | internal_embedding_emitter_fraction_vmem_available | double | |
| 172 | xla_allow_hoisting_across_branch | bool | |
| 173 | xla_jf_avoid_cross_slot_vmem_bank_conflicts | bool | |
| 174 | xla_jf_shard_f32_weight_across_loop_iterations | bool | |
| 175 | xla_jf_allow_cross_replica_sharding_on_batch_matmul | bool | |
| 176 | xla_jf_fusion_max_vmem_mib | double | |
| 177 | xla_tpu_pack_vloads | bool | |
| 178 | xla_tpu_pack_cloads | bool | |
| 179 | xla_tpu_sublane_shift_scratchpad_size | int64 | |
| 180 | xla_tpu_small_operand_count_for_loop_fusion | int64 | |
| 181 | xla_jf_fusion_max_instruction_count_for_window_config | int64 | |
| 182 | xla_tpu_copy_fusion_allow_split | bool | |
| 183 | xla_tpu_allow_in_cmem_copy | bool | |
| 184 | xla_jf_crs_combiner_threshold_in_bytes | int64 | |
| 185 | xla_jf_enable_hlo_pipeline | bool | |
| 186 | xla_ior_fast_mem_run_production_msa | bool | |
| 187 | xla_ior_fast_mem_round_trip_production_msa | bool | |
| 188 | xla_ior_use_stored_solution | bool | |
| 189 | xla_ior_stored_solution_path | string | |
| 190 | xla_use_llvm_backend | bool | |
| 191 | xla_llvm_generate_xla_compatible_dwg | bool | |
| 192 | xla_jf_llvm_use_fast_opt | bool | |
| 193 | xla_jf_llo2llvm_timing_info | bool | |
| 194 | xla_jf_llvm_use_bitcode_dump | bool | |
| 195 | xla_jf_llvm_flags | message | RepeatedStrings |
| 196 | xla_jf_tanh_increased_precision | bool | |
| 197 | xla_jf_poison_vmem_allocations | bool | |
| 198 | xla_jf_hlo_deduplicate_only | string | |
| 199 | xla_jf_hlo_deduplication | bool | |
| 200 | xla_tpu_instrument_hlo_operations | bool | |
| 201 | xla_tpu_instrumentation_config | string | |
| 202 | xla_tpu_merge_small_overlays_into_big_neighbors | bool | |
| 203 | xla_tpu_net_router_trace_me_instrumentation | bool | |
| 204 | xla_tpu_use_custom_tree_barrier | bool | |
| 205 | xla_tpu_use_routing_table_indices_in_net_router | bool | |
| 206 | xla_tpu_use_routing_table_indices_in_tree_barrier | bool | |
| 207 | bf16_coalescing_dump_killed_candidate_pairs | bool | |
| 208 | bf16_coalescing_ignore_distance_for_pairing | bool | |
| 209 | config_criterion | string | |
| 210 | jf_xla_partial_reduce_hbm_bw_ratio | double | |
| 211 | jf_xla_partial_reduce_use_roofline_cost_fn | bool | |
| 212 | rematerialization_algorithm | string | |
| 213 | rematerialize_even_if_memory_limit_not_reached | bool | |
| 215 | tpu_use_continuations | bool | |
| 216 | treewidth_rematerialization_minimize_memory | bool | |
| 217 | use_op_tuner_config | bool | |
| 219 | xla_enable_trace | bool |
#220 – #294 — AutoFDO Start, Conv Fusion, Pipelining, Overlay Compression, JF VMEM Family
The first AutoFDO toggle, the conv-fusion / conv-precision numerics block, the JF pipelining knobs, the overlay-compression and single-phase-ring transfer parameters, and the start of the per-family JF VMEM MSA cluster (#275–#286). Gaps: #244–#247, #250–#254, #268, #292, #294.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 220 | xla_tpu_autofdo | bool | |
| 221 | xla_jf_auto_assign_mxu | bool | |
| 222 | xla_jf_auto_latch_lmr | bool | |
| 223 | xla_jf_bf16_inside_cross_replica_sum | bool | |
| 224 | xla_jf_bounds_check | message | RangeSpecProto |
| 225 | xla_jf_bounds_check_stride | bool | |
| 226 | xla_jf_bounds_check_verbose | bool | |
| 227 | xla_jf_collect_llo_stack_trace | bool | |
| 228 | xla_jf_conv_base_dilation_adversary | bool | |
| 229 | xla_jf_conv_full_precision | bool | (NUMERICS) |
| 230 | xla_jf_conv_increased_precision | bool | (NUMERICS) |
| 231 | xla_jf_conv_input_fusion | bool | |
| 232 | xla_jf_conv_min_limit_vmem_mib | double | |
| 233 | xla_jf_conv_output_fusion | bool | |
| 234 | xla_jf_conv_prefers_padding_input_feature | bool | |
| 235 | xla_jf_conv_reshape_fusion | bool | |
| 236 | xla_jf_convolution_performance_target | double | |
| 237 | xla_jf_cp_pass_enabled | bool | |
| 238 | xla_tpu_crs_bounds_check_threshold_chips_count | int64 | |
| 239 | xla_jf_debug_level | int64 | |
| 240 | xla_jf_enable_final_priority_fusion | bool | |
| 241 | xla_jf_enable_pipelining | bool | |
| 242 | xla_jf_enable_producer_consumer_multi_output_fusion | bool | |
| 243 | xla_jf_experimental_cmem_for_hlo_outputs | int64 | |
| 248 | xla_jf_log_hlo_output | bool | |
| 249 | xla_jf_line_info_in_symbol_table | bool | |
| 255 | xla_jf_overlay_compression_threshold | int64 | |
| 256 | xla_jf_poison_operands_before_emitter | bool | |
| 257 | xla_jf_profile_cheap_ops | bool | |
| 258 | xla_jf_program_hbm_alignment_in_kib | int64 | |
| 259 | xla_jf_random_latency | bool | |
| 260 | xla_vf_vmem_use_ior_algorithm | string | |
| 261 | xla_jf_simplifier_pass_enabled | bool | |
| 262 | xla_jf_single_phase_ring_max_kib | int64 | |
| 263 | xla_jf_single_phase_ring_threshold | int64 | |
| 264 | xla_jf_span_size_in_kib | int64 | |
| 265 | xla_jf_spmd_conv_halo_exchange_always_on_lhs | bool | |
| 266 | xla_jf_spmd_report_instruction_count | int64 | |
| 267 | xla_jf_spmd_threshold_for_windowed_einsum_mib | int64 | |
| 269 | xla_jf_tune_large_vmem | bool | |
| 270 | xla_jf_use_hw_constants | bool | |
| 271 | xla_jf_use_multi_colors_in_all_reduce_if_supported | bool | |
| 272 | xla_jf_use_rotated_pincer_emitter | bool | |
| 273 | xla_jf_use_rotated_pincer_ring_emitter | bool | |
| 274 | xla_jf_use_single_phase_ring_emitter | bool | |
| 275 | xla_jf_vmem_default_cross_program_prefetch_heuristic | bool | |
| 276 | xla_jf_vmem_enable_cross_program_prefetch | bool | |
| 277 | xla_jf_vmem_enable_while_redundant_eviction_elimination | bool | |
| 278 | xla_jf_vmem_max_outstanding_evictions | int64 | |
| 279 | xla_jf_vmem_max_outstanding_prefetches | int64 | |
| 280 | xla_jf_vmem_max_overlap_to_mem_size_async_copy_ratio | float | |
| 281 | xla_jf_vmem_max_repacks | int64 | |
| 282 | xla_jf_vmem_max_retries | int64 | |
| 283 | xla_jf_vmem_memory_space_assignment | bool | |
| 284 | xla_jf_vmem_min_overlap_to_async_copy_ratio | float | |
| 285 | xla_jf_vmem_preferred_overlap_to_async_copy_ratio | float | |
| 286 | xla_jf_vmem_use_ior_algorithm | string | |
| 287 | xla_tpu_vmem_use_telamalloc | bool | |
| 288 | xla_max_concurrent_async_all_gathers | int32 | |
| 289 | xla_max_concurrent_async_collective_permutes | int32 | |
| 290 | xla_max_concurrent_host_send_recv | int32 | |
| 291 | xla_max_decomposed_all_reduces | int64 | |
| 293 | xla_set_split_input_output_dmas | bool |
NOTE — the
xla_jf_vmem_*block (#275–#286) is the jellyfish arm of the per-family MSA overlap-ratio family. It is mirrored field-for-field by thexla_vf_vmem_*(#436–#447),xla_tpu_cmem_*(#309–#315), andxla_gf_vmem_*(#533–#547) arms later on this page. The three float ratios in each arm (max / min / preferred overlap-to-async-copy) are the MSA cost-model tuning surface; the per-version overlay picks one arm.
#295 – #354 — SparseCore Tracing, All-Reduce VMEM, CMEM MSA Family, Detect-NaN, Megacore Fusion
The first AutoProto-typed field (#295), the all-reduce VMEM contingency knobs, the xla_tpu_cmem_* MSA arm, the copy-fusion cluster, the NaN/Inf/special-FP detection toggles, and the megacore-fusion enable Tristate (#344). Gaps: #303, #335.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 295 | xla_sparse_core_enable_hardware_tracing | message | AutoProto (SPARSE_CORE) |
| 296 | xla_tpu_add_llo_regions_to_symbol_table | bool | |
| 297 | xla_tpu_all_reduce_vmem_contingency_kib | int64 | |
| 298 | xla_tpu_assign_all_reduce_scatter_layout | bool | |
| 299 | xla_tpu_auto_reduce_precision | bool | |
| 300 | xla_tpu_auto_spmd_keep_all_user_shardings | bool | |
| 301 | xla_tpu_auto_spmd_partitioning_memory_budget_gb | int64 | |
| 302 | xla_tpu_autofdo_skip_hlo_fingerprints | message | RepeatedStrings |
| 304 | xla_tpu_autotune_database | string | |
| 305 | xla_tpu_binomial_all_reduce_use_physical_core_ids | bool | |
| 306 | xla_tpu_block_rematerialization_record_stats | bool | |
| 307 | xla_tpu_check_llo_types | bool | |
| 308 | xla_tpu_choose_faster_windowed_einsum_over_mem | bool | |
| 309 | xla_tpu_cmem_max_overlap_to_mem_size_async_copy_ratio | float | |
| 310 | xla_tpu_cmem_max_repacks | int64 | |
| 311 | xla_tpu_cmem_max_retries | int64 | |
| 312 | xla_tpu_cmem_memory_space_assignment | bool | |
| 313 | xla_tpu_cmem_min_overlap_to_async_copy_ratio | float | |
| 314 | xla_tpu_cmem_preferred_overlap_to_async_copy_ratio | float | |
| 315 | xla_tpu_cmem_use_telamalloc | bool | |
| 316 | xla_tpu_conditional_code_motion_config | string | |
| 317 | xla_tpu_copy_elision_analysis_allowance | int64 | |
| 318 | xla_tpu_copy_fusion_minimum_copy_size_in_bytes | int64 | |
| 319 | xla_tpu_copy_fusion_pad_unpad_ratio | double | |
| 320 | xla_tpu_copy_fusion_threshold | int64 | |
| 321 | xla_tpu_copy_insertion_use_region_analysis | bool | |
| 322 | xla_tpu_copy_insertion_use_region_analysis_limit | int64 | |
| 323 | xla_tpu_decompose_all_gather_fusion | bool | |
| 324 | xla_tpu_decompose_all_reduce_bidirectional_communication | bool | |
| 325 | xla_tpu_decompose_reduce_scatter_fusion | bool | |
| 326 | xla_tpu_deduplicated_hlo_min_bundle_count | int64 | |
| 327 | xla_tpu_detect_inf | bool | |
| 328 | xla_tpu_detect_llo_nan | bool | |
| 329 | xla_tpu_detect_nan | bool | |
| 330 | xla_tpu_detect_only_on_fusion | bool | |
| 331 | xla_tpu_detect_special_fp | bool | |
| 332 | xla_tpu_dot_dot_fusion | bool | |
| 333 | xla_tpu_dot_dot_fusion_duplicated | bool | |
| 334 | xla_tpu_dot_dot_fusion_separable_convs_only | bool | |
| 336 | xla_tpu_enable_all_experimental_scheduler_features | bool | |
| 337 | xla_tpu_enable_all_reduce_scatter_fusion | bool | |
| 338 | xla_tpu_enable_asymmetric_max_colors | bool | |
| 339 | xla_tpu_enable_copy_fusion | bool | |
| 340 | xla_tpu_enable_copy_permute_minor_fusion | bool | |
| 341 | xla_tpu_enable_cross_module_binomial_all_reduce | bool | |
| 342 | xla_tpu_enable_deduplicated_calls | enum | TristateProto.Value |
| 343 | xla_tpu_enable_experimental_fusion_cost_model | bool | |
| 344 | xla_tpu_enable_megacore_fusion | enum | TristateProto.Value |
| 345 | xla_tpu_enable_megascale_barrier | bool | |
| 346 | xla_tpu_enable_multi_level_input_dot_dot_fusion | bool | |
| 347 | xla_tpu_enable_multi_level_nested_dot_fusion | bool | |
| 348 | xla_tpu_enable_multi_level_nested_loop_fusion | bool | |
| 349 | xla_tpu_enable_multi_level_output_dot_dot_fusion | bool | |
| 350 | xla_tpu_enable_nd_wus_on_partial_active_dims | bool | |
| 351 | xla_tpu_enable_pincer_short_fusion_emitter | bool | |
| 352 | xla_tpu_enable_scheduler_memory_pressure_tracking | enum | TristateProto.Value |
| 353 | xla_tpu_enable_sparse_gradient_rewrite | bool | |
| 354 | xla_tpu_enable_untiled_layout | bool | (is_used_at_runtime) |
GOTCHA —
xla_tpu_detect_inf(#327) andxla_tpu_detect_nan(#329) are TCE proto fields distinct from the same-namedxla.DebugOptionsfields (#136/#135). The TCE fields are flag-wired and materialized from the absl default; the DebugOptions ones are born proto3-zero (FALSE) unless the flag override layer sets them. Do not conflate the two protos when reimplementing the detect-NaN path.
#355 – #432 — Untiled Layout, Experimental Padding, Megacore-Fusion Tuning, Poison-Padding, SPMD Windowed Einsum
The untiled-VMEM-DMA toggle, the experimental quantization/padding knobs, the megacore-fusion margin/scaling floats, the MSA while-execution count, the nested-dot-fusion cluster, the poison-padding pair, the RWB-fusion bool, and the SPMD windowed-einsum decomposition block. Gaps: #356–#359, #362, #379, #401, #419, #422.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 355 | xla_tpu_enable_vmem_to_vmem_dmas | bool | |
| 360 | xla_tpu_experimental_allow_fast_quantization_conversions | bool | |
| 361 | xla_tpu_experimental_cmem_fraction_for_hlo_outputs | float | |
| 363 | xla_tpu_experimental_max_concat_padding_ratio | double | |
| 364 | xla_tpu_experimental_max_padding_gib | double | |
| 365 | xla_tpu_extra_hoisting_range_for_register_producers | int64 | |
| 366 | xla_tpu_force_startup_barrier_in_binomial_all_reduce | bool | |
| 367 | xla_tpu_force_vmem_dma_and_spans | bool | |
| 368 | xla_tpu_fusion_config_collection | string | |
| 369 | xla_tpu_handle_reduce_window_as_convolution | bool | |
| 370 | xla_tpu_hbm_bw | double | |
| 371 | xla_tpu_hbm_initial_cycle_penalty | int64 | |
| 372 | xla_tpu_input_conv_multi_users | bool | |
| 373 | xla_tpu_insert_dummy_fusions_on_conv_kernels | bool | |
| 374 | xla_tpu_licm_size_inflation_ratio | float | |
| 375 | xla_tpu_llo_compilation_max_retries | int32 | |
| 376 | xla_tpu_local_dma_pipe_dma_from_cmem_min_chunks | int64 | |
| 377 | xla_tpu_local_dma_pipe_dma_to_cmem_min_chunks | int64 | |
| 378 | xla_tpu_log_current_and_new_fusion_cost_models | bool | |
| 380 | xla_tpu_max_clustered_loads | int64 | |
| 381 | xla_tpu_max_vld_live_range | int64 | |
| 382 | xla_tpu_max_vmreg_live_range | int64 | |
| 383 | xla_tpu_megacore_fusion_allow_ags | bool | |
| 384 | xla_tpu_megacore_fusion_disable_live_out_ags | bool | |
| 385 | xla_tpu_megacore_fusion_good_overload_margin | float | |
| 386 | xla_tpu_megacore_fusion_latency_bound_ar_fusion_size | int64 | |
| 387 | xla_tpu_megacore_fusion_orthogonal_ag | bool | |
| 388 | xla_tpu_megacore_fusion_orthogonal_ars | bool | |
| 389 | xla_tpu_megacore_fusion_scaling_factor | float | |
| 390 | xla_tpu_memory_space_assignment_while_execution_count | int64 | |
| 391 | xla_tpu_multioutput_fusion_max_operands | int64 | |
| 392 | xla_tpu_nested_dot_fusion | bool | |
| 393 | xla_tpu_nested_dot_fusion_supported_custom_ops | string | |
| 394 | xla_tpu_nested_dot_fusion_vmem_fraction | double | |
| 395 | xla_tpu_op_tracemarks | bool | |
| 396 | xla_tpu_optimize_bf16_math | bool | (NUMERICS) |
| 397 | xla_tpu_order_dot_after_layout | bool | |
| 398 | xla_tpu_override_rwb_tpu_limitation | bool | |
| 399 | xla_tpu_poison_padding | bool | |
| 400 | xla_tpu_poison_padding_value | int32 | |
| 402 | xla_tpu_prefer_async_allgather_to_allreduce | bool | |
| 403 | xla_tpu_prefer_binomial_single_phase_ring_emitter | bool | |
| 404 | xla_tpu_prefer_dynamic_pad | bool | |
| 405 | xla_tpu_profile_traceme_level | int32 | |
| 406 | xla_tpu_random_all_reduce_delay_mask | uint32 | |
| 407 | xla_tpu_reduce_window_reduction_dim_max | int64 | |
| 408 | xla_tpu_remove_bf16_bitcast_converts_for_all | bool | |
| 409 | xla_tpu_room_for_potential_register_dependency | int64 | |
| 410 | xla_tpu_rotated_pincer_pack_allgather_fusion | bool | |
| 411 | xla_tpu_run_all_reduce_simplifier | bool | |
| 412 | xla_tpu_run_space_to_batch | bool | |
| 413 | xla_tpu_rwb_fusion | bool | |
| 414 | xla_tpu_schedule_send_recvs | enum | TristateProto.Value |
| 415 | xla_tpu_scheduler_using_real_cost_model | enum | TristateProto.Value |
| 416 | xla_tpu_scoped_cmem_for_all_reduce | int64 | |
| 417 | xla_tpu_scoped_hbm_for_all_reduce | int64 | |
| 418 | xla_tpu_scoped_vmem_limit_kib | int64 | |
| 420 | xla_tpu_sg_sshfl_ignore_bits | int64 | |
| 421 | xla_tpu_short_pincer_single_step_max_chunks_per_chip | int64 | |
| 423 | xla_tpu_spmd_bidirectional_windowed_einsum | bool | |
| 424 | xla_tpu_spmd_f32_accum_for_bf16_ar | bool | |
| 425 | xla_tpu_spmd_rng_bit_generator_unsafe | bool | |
| 426 | xla_tpu_spmd_unroll_windowed_einsum | bool | |
| 427 | xla_tpu_spmd_windowed_einsum_decompose_ag | bool | |
| 428 | xla_tpu_spmd_windowed_einsum_decompose_rs | bool | |
| 429 | xla_tpu_store_to_load_forwarding_window | int64 | |
| 430 | xla_tpu_threshold_for_long_live_range_in_flowdown | int64 | |
| 431 | xla_tpu_uni_direction_ring_max_size | int64 | |
| 432 | xla_tpu_untiled_layout_for_1D_only | bool | (is_used_at_runtime) |
NOTE —
xla_tpu_rwb_fusion(#413) is one of two fields (withxla_tpu_accumulate_into_mrb, #597 on page B) whose absl default is TRUE despite an earlier help-string reading offalse. The registered flag default is byte-authoritative; the help text describes behavior, not the default. See TCE Field-Offsets & Flag Defaults.
#433 – #497 — Aggressive Scheduling, VF VMEM Family, Async Collective Fusion, AutoFDO/Autotune
The aggressive-scheduling Tristate, the full xla_vf_vmem_* MSA arm (#436–#447), the send/recv aggregation cluster, the async-collective-fusion block (the first SparseDenseMatmulFdoConfig message at #462), the AutoFDO module flags, and the autotune-by-pass toggles. Gaps: #434, #448, #468, #470, #481, #484, #491.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 433 | xla_tpu_use_aggressive_scheduling | enum | TristateProto.Value |
| 435 | xla_tpu_use_resilient_collective_emitter | bool | |
| 436 | xla_vf_vmem_default_cross_program_prefetch_heuristic | bool | |
| 437 | xla_vf_vmem_enable_cross_program_prefetch | bool | |
| 438 | xla_vf_vmem_enable_cross_program_prefetch_freeing | bool | |
| 439 | xla_vf_vmem_enable_while_redundant_eviction_elimination | bool | |
| 440 | xla_vf_vmem_max_outstanding_evictions | int64 | |
| 441 | xla_vf_vmem_max_outstanding_prefetches | int64 | |
| 442 | xla_vf_vmem_max_overlap_to_mem_size_async_copy_ratio | float | |
| 443 | xla_vf_vmem_max_repacks | int64 | |
| 444 | xla_vf_vmem_max_retries | int64 | |
| 445 | xla_vf_vmem_memory_space_assignment | bool | |
| 446 | xla_vf_vmem_min_overlap_to_async_copy_ratio | float | |
| 447 | xla_vf_vmem_preferred_overlap_to_async_copy_ratio | float | |
| 449 | xla_tpu_enable_send_recv_aggregation | bool | |
| 450 | xla_tpu_enable_data_parallel_all_reduce_opt | bool | (PIPELINER) |
| 451 | xla_tpu_vector_load_fusion_window | int64 | |
| 452 | xla_tpu_fuse_only_phase0_in_2d_reduce_scatter | bool | |
| 453 | xla_tpu_sdc_checker_log_inputs_on_sdc_event | bool | |
| 454 | xla_tpu_enable_expression_constant_splitter | bool | |
| 455 | xla_jf_hlo_deduplication_all_unique | bool | |
| 456 | xla_tpu_pre_fusion_remat | bool | |
| 457 | xla_tpu_sc_megachip_temporal_reuse | bool | |
| 458 | xla_tpu_enable_async_all_to_all | bool | |
| 459 | xla_tpu_auto_spmd_partitioning_memory_budget_ratio | float | |
| 460 | xla_sc_disable_megacore_partitioning | bool | |
| 461 | xla_tpu_enable_async_collective_fusion | bool | |
| 462 | xla_tpu_sparse_dense_matmul_fdo_config | message | SparseDenseMatmulFdoConfig |
| 463 | xla_tpu_enable_async_collective_fusion_multiple_steps | bool | |
| 464 | xla_vf_max_vmem_used_by_memory_space_assignment | int64 | |
| 465 | xla_tpu_enable_async_collective_fusion_fuse_all_gather | enum | TristateProto.Value |
| 466 | xla_tpu_enable_async_collective_fusion_fuse_all_reduce | bool | |
| 467 | xla_tpu_megacore_fusion_orthogonal_ars_margin | float | |
| 469 | xla_tpu_autofdo_module_flags | bool | |
| 471 | xla_tpu_autofdo_module_layouts | bool | |
| 472 | xla_tpu_autofdo_skip_module_fingerprints | message | RepeatedStrings |
| 473 | xla_tpu_autotune_dots | bool | |
| 474 | xla_tpu_autotune_flags | bool | |
| 475 | xla_tpu_autotune_fusions | bool | |
| 476 | xla_tpu_autotune_layouts | bool | |
| 477 | xla_tpu_autotune_memory_space_assignment | bool | |
| 478 | xla_tpu_split_cluster_gap | int64 | |
| 479 | xla_tpu_split_cluster_size | int64 | |
| 480 | xla_tpu_autofdo_hlo_module_size_threshold | int32 | |
| 482 | xla_tpu_custom_fusion_no_global_unfusible | bool | |
| 483 | xla_tpu_custom_fusion_traverse_edges_twice | bool | |
| 485 | xla_tpu_overlap_compute_collective_tc | bool | |
| 486 | xla_tpu_max_send_recv_aggregation | int32 | |
| 487 | xla_tpu_vmac_transform_strategy | enum | TpuVmacTransformStrategyProto.Value |
| 488 | xla_tpu_vector_store_fusion_window | int64 | |
| 489 | xla_tpu_async_copy_bandwidth_scaling_factor | float | |
| 490 | xla_tpu_metrics_filename_prefix | string | |
| 492 | xla_tpu_enable_domain_passes | bool | |
| 493 | xla_tpu_experimental_enable_dynamic_int8_quantization | bool | |
| 494 | xla_max_cross_program_prefetches | int64 | |
| 495 | xla_tpu_enable_host_aware_passes | bool | |
| 496 | xla_tpu_disallow_in_alt_mem | string | |
| 497 | xla_tpu_allow_deeply_nested_fusion_numerical_diff | bool |
#498 – #560 — Prefetch FIFO, ICI-SDC Test, Data-Parallel DCN, GF VMEM Family, AG Pipelining
The closing block of this page: prefetch-interval-picker overrides, the numerics accurate_log2, the ICI-SDC self-test cluster (#516–#521, #560), the data-parallel DCN pipelining knobs, the full xla_gf_vmem_* MSA arm (#533–#547), the overlay-profiler toggles, and the AG-backward-pipelining group. Gaps: #506, #509, #511, #515, #527, #530, #531, #546.
| # | Field name | Type | Wrapper / message type |
|---|---|---|---|
| 498 | xla_tpu_prefetch_interval_picker_size_override | int64 | |
| 499 | xla_tpu_force_1d_allreduce_at_chunk_count | int64 | |
| 500 | xla_tpu_enable_aggressive_loop_fusion_layout_opt | bool | |
| 501 | xla_tpu_use_repeated_instance_for_preferred_prefetch_time | bool | |
| 502 | xla_tpu_enforce_prefetch_fifo_order | bool | |
| 503 | xla_tpu_reduce_loop_fusion_dup_with_unfusable_user | bool | |
| 504 | xla_tpu_accurate_log2 | bool | (NUMERICS) |
| 505 | xla_tpu_dcn_max_overlap_estimation | float | |
| 507 | xla_tpu_autofdo_op_windows | bool | |
| 508 | xla_tpu_dcn_overlap_limit | int64 | |
| 510 | xla_tpu_enable_log_recorder_partitioned_logging | bool | |
| 512 | xla_tpu_no_crash_on_oom | bool | |
| 513 | xla_lhs_enable_release_start_policy | bool | |
| 514 | xla_tpu_debug_sflag_wait_shalt_on_detection | uint32 | |
| 516 | xla_tpu_ici_sdc_test_buffer_size_chunks | int32 | |
| 517 | xla_tpu_ici_sdc_test_packet_size_chunks | int32 | |
| 518 | xla_tpu_ici_sdc_test_iterations | int32 | |
| 519 | xla_tpu_ici_sdc_test_max_distance | int32 | |
| 520 | xla_tpu_ici_sdc_test_pipeline_depth | int32 | |
| 521 | xla_tpu_ici_sdc_test_inject_mismatch_for_testing_only | bool | |
| 522 | xla_tpu_mock_send_recv_host | bool | |
| 523 | xla_tpu_data_parallel_dcn_ar_dual_pipelining | uint32 | (PIPELINER) |
| 524 | xla_tpu_enable_aggressive_broadcast_priority_update | bool | |
| 525 | xla_tpu_data_parallel_opt_different_sized_ops | uint32 | (PIPELINER) |
| 526 | xla_tpu_ars_halo_exchange_count | int64 | |
| 528 | xla_tpu_dus_emitter_desired_update_window_chunk_count | int64 | |
| 529 | xla_tpu_spmd_f32_accum_for_bf16_ar_min_subgroup_size | int64 | |
| 532 | xla_tpu_show_overlay_waits_in_profiler | bool | |
| 533 | xla_gf_vmem_use_ior_algorithm | string | |
| 534 | xla_gf_vmem_default_cross_program_prefetch_heuristic | bool | |
| 535 | xla_gf_vmem_enable_cross_program_prefetch | bool | |
| 536 | xla_gf_vmem_enable_cross_program_prefetch_freeing | bool | |
| 537 | xla_gf_vmem_enable_while_redundant_eviction_elimination | bool | |
| 538 | xla_gf_vmem_max_outstanding_evictions | int64 | |
| 539 | xla_gf_vmem_max_outstanding_prefetches | int64 | |
| 540 | xla_gf_vmem_max_overlap_to_mem_size_async_copy_ratio | float | |
| 541 | xla_gf_vmem_max_repacks | int64 | |
| 542 | xla_gf_vmem_max_retries | int64 | |
| 543 | xla_gf_vmem_memory_space_assignment | bool | |
| 544 | xla_gf_vmem_min_overlap_to_async_copy_ratio | float | |
| 545 | xla_gf_vmem_preferred_overlap_to_async_copy_ratio | float | |
| 547 | xla_gf_max_vmem_used_by_memory_space_assignment | int64 | |
| 548 | xla_tpu_allow_multi_dim_reduce_rwb | bool | |
| 549 | xla_tpu_show_overlay_overhead_in_profiler | bool | |
| 550 | xla_tpu_show_overlay_details_in_profiler | bool | |
| 551 | xla_tpu_overlay_allocation_table_size | int64 | |
| 552 | xla_tpu_enable_ag_backward_pipelining | bool | (PIPELINER) |
| 553 | xla_tpu_prefuse_self_attention | bool | |
| 554 | xla_tpu_enable_indexing_optimizations | int64 | |
| 555 | xla_tpu_allow_layout_negotiation | bool | |
| 556 | xla_tpu_backward_propagate_reduce | bool | |
| 557 | xla_tpu_decompose_all_gather_einsum | bool | |
| 558 | xla_tpu_decompose_einsum_reduce_scatter | bool | |
| 559 | xla_tpu_keep_hlo_proto_literals_up_to | uint64 | |
| 560 | xla_tpu_ici_sdc_test_run_on_program_start | bool |
NOTE — field #560
xla_tpu_ici_sdc_test_run_on_program_startis the last field on this page. The next field, #561xla_tpu_debug_sflag_wait_timeout_ms(uint32), opens TCE Field Dictionary (B). The boundary is exact: this page owns1 ≤ field# ≤ 560, page B owns561 ≤ field# ≤ 1218.
Field-Number Gaps in #1–#560
The numbering is not contiguous: TCE carries no reserved_range or reserved_name, so a deleted field simply leaves a hole. The gaps inside this page's range — every field number in 1..560 that is not a live field above — are:
21, 44, 45, 54, 59, 70, 121, 124, 125, 126, 127, 156, 214, 218,
244, 245, 246, 247, 250, 251, 252, 253, 254, 268, 292, 294, 303,
335, 356, 357, 358, 359, 362, 379, 401, 419, 422, 434, 448, 468,
470, 481, 484, 491, 506, 509, 511, 515, 527, 530, 531, 546
GOTCHA — these holes are deletions, not declared-reserved ranges. A reimplementer carving the descriptor must not assume the field set is
1..Ncontiguous; iterate the actualFieldEntryarray (sorted ascending) and skip the missing numbers. The full 97-gap list across1..1218lives on TpuCompilationEnvironment; the slice above is the portion below #561.
Type Distribution Within #1–#560
For orientation, the proto base-type mix of the 508 live fields on this page (the lower half is bool-heavy, consistent with the early jellyfish toggle era):
| Base type | Notes |
|---|---|
| bool | the dominant type; most early knobs are simple feature toggles |
| int64 | thresholds, counts, byte limits, ring sizes |
| int32 | a smaller integer set (trip counts, retry caps, ICI-SDC test sizes) |
| uint32 / uint64 | a handful: delay masks (#406), keep_hlo_proto_literals_up_to (#559), DCN-pipelining (#523/#525), sflag-wait shalt (#514) |
| float / double | the MSA overlap ratios and the megacore-fusion / cost-model scaling factors |
| string | profile paths, config selectors, filename prefixes, config_criterion / rematerialization_algorithm |
| enum | 13 in this range; TristateProto.Value dominates, plus MemorySchedulerProto.Value (#31), VerifyOrAssignTilingFlagsProto.Value (#132), TpuVmacTransformStrategyProto.Value (#487) |
| message | RangeSpecProto (bounds/ISA-emitter knobs), RepeatedStrings (LLVM flags, FDO skip lists), the first AutoProto (#295), and typed helpers (SparseDenseMatmulFdoConfig #462) |
The whole-proto histogram (418 bool / 148 int64 / 349 message / 74 enum / 37 string / 34 float / 32 int32 / 14 double / 11 uint32 / 4 uint64) and the parse-table-derived type_card proof are on TpuCompilationEnvironment.
Confidence
Every field number, name, and type on this page is CERTAIN: the names derive from the TpuCompilationEnvironment FileDescriptorProto carved from the binary and were cross-checked against the 1121-entry FieldEntry array of TpuCompilationEnvironment::_table_ @ 0x21cfa9e0, whose entry order is field-number-sorted. Twenty-two of the names spanning the full #1–#560 range were additionally confirmed byte-present, verbatim, in the binary's .rodata string pool (0 misses), including the wrapper type strings TristateProto, VerifyOrAssignTilingFlags, AutoProto, and RangeSpecProto. The HOT-subsystem tags shown in italics for a few fields (NUMERICS, PIPELINER, SPARSE_CORE) and the four is_used_at_runtime markers come from the TpuCompEnvFieldOptions field-option extension (#535801365) and are HIGH confidence — recovered from the descriptor, not the parse table; not every tagged field is annotated here (the complete HOT breakdown is on TpuCompilationEnvironment). No field on this page is LOW confidence.
Cross-References
- TCE Field Dictionary (B) — the upper half: field #561 – #1218, same field#→name→type grammar
- TpuCompilationEnvironment — the structural overview: parse-table header, type histogram, HOT-tag taxonomy, wrapper-enum value tables, the 30-arm AutoProto switch
- TCE Field-Offsets & Flag Defaults — the C++ struct offset and literal default value of each field; the kGenFunc-vs-inline FlagImpl mechanism
- Configuration & Compile Knobs — overview — how the TCE proto sits in the PJRT compile path alongside DebugOptions and the flag registry
- Part XVI — Configuration & Compile Knobs — index entry