Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

User-Facing Hint Strings

All offsets on this page are file offsets (hex) into libtpu.so from the libtpu-0.0.40-cp314 wheel (libtpu_lts_20260413_b_RC00, build-id md5 89edbbe81c5b328a958fe628a9f2207d, 781,691,048 bytes, ELF x86-64 DYN, not stripped). In this build .rodata VMA == file offset (1:1, base 0x84a0000), so a file offset is also the load address. Other wheels will differ.

Abstract

This page catalogs libtpu's hint strings: the advisory prose the runtime and the XLA:TPU compiler emit alongside an error to tell the operator or JAX user what to do about it. They are the actionable companions to the raw error/status templates catalogued on error-templates.md. Where an error template states the fact ("X must have rank >= N: got %s"), a hint appends the remedy ("Please use --xla_tpu_rwb_fusion=false", "Reduce TPU memory usage", "Please remove the hosts from the fleet"). The two live adjacent in .rodata and are joined at a .text absl::StrCat/StrFormat callsite; this page owns the hint half and its trigger/knob mapping, not the templates.

The hint surface is signposted by five marker tokens, and the choice of token is itself a taxonomy of who fixes the problem. --flag=value, "Reduce … memory", and "Please remove the hosts …" are operator-actionable — the user has a self-service remedy. b/<id>, go/<link>, and "please file a bug" point the user back at the XLA/TPU team — they signal an internal limitation or a known issue, not a knob the user can turn. This split (operator-actionable vs file-it-upstream) is the page's primary axis, refined into seven actionability classes: flag-suggestion, doc-link, bug-report, capacity/OOM, deprecation, perf-tuning, and operator-action.

This is a reference catalog, not an algorithm. Each section groups hints by trigger and, where the binary shows it, pairs the hint with the condition that fires it and the flag/knob it names. Flag names are byte-confirmed real absl::Flag globals (each has an AbslFlagHelpGenFor<name> symbol); their help-text bodies live in the config flag families and are out of scope here. The page does not claim to know which error template each hint is concatenated onto — that link is a .text callsite not yet traced (see Not Traced).

The catalog contract:

  • The five marker tokens and how they partition operator-actionable from file-it-upstream.
  • The seven actionability classes, each with its hints grouped by subsystem.
  • For each hint: the verbatim string, its file offset, and the flag/knob or tracker coordinate it references.
  • The direction-of-default caveat: whether the advised flag value is the workaround or the on-switch — directional, not byte-confirmed.
Section.rodata of libtpu.so (base 0x84a0000, file-offset == VMA)
Marker tokens--<flag>=<val> · go/<link> · b/<id> · "file a bug" · "Reduce … memory"
Distinct go/ links54 (514 occurrences are per-gen ISA-counter docs, not remedies)
Distinct b/<id> ids31 (all with recovered context)
Flag-suggestion hints~28 strings; 10/10 spot-checked flags confirmed as registered absl::Flag
Concatenation callsite.text absl::StrCat/StrFormatnot traced (hint ↔ template link open)

At a glance, by trigger:

Actionability classTrigger (what fires the hint)Primary tokenCount (TPU/XLA subset)
flag-suggestiona knob is unset / mis-set, or a fallback path was hit--<flag>=<val>~28 strings
doc-link (action)a failure with a documented resolution playbookgo/<link> (non-counter)~14
doc-link (ISA)reading the ISA spec for a perf countergo/<gen>-isa#anchor514 occ / ~16 distinct
bug-reportan internal invariant broke / unimplemented path"file a bug" · b/<id>~40 prose + 31 b/<id>
capacity / OOMHBM/VMEM/Smem pressure, allocation failure"Reduce … memory" · OOM~15
deprecationa removed/migrating flag or feature is used"deprecated" · "no longer"~7
perf-tuninga suboptimal-but-correct code path was taken"suboptimal" · "inefficient"~6
operator-actionfleet/topology/sequencing remedy, no bug-id"Please …" · "Ensure …"~25

The Five Marker Tokens

The hint surface was located by rg over the string table for five token families. They are not just search anchors — the token a hint carries is a reliable predictor of its actionability class, because libtpu's authors use each token consistently.

--<flag>=<value>     OPERATOR-ACTIONABLE  → set this knob and retry
"Reduce … memory"    OPERATOR-ACTIONABLE  → free HBM / raise a limit
"Please remove …"    OPERATOR-ACTIONABLE  → fleet/topology remedy
go/<link>            FILE-IT-UPSTREAM     → read the team's doc/playbook
b/<id>               FILE-IT-UPSTREAM     → known issue, comment/track
"please file a bug"  FILE-IT-UPSTREAM     → internal invariant broke

NOTE — the split is a heuristic, not a hard partition: a single hint can carry both ("…suboptimal MXU throughput … Please file a bug with XLA:TPU compiler team, and use --xla_tpu_accumulate_into_mrb=false in the meantime." at 0xa083064 is both a bug-report and a flag-suggestion). When a hint mixes tokens it is catalogued under its primary remedy and cross-noted.

GOTCHA — the go/ and b/ tokens are dominated by non-actionable uses. 514 of ~640 go/ occurrences are per-gen ISA-spec links embedded in hardware-perf-counter description strings ([counter](http://go/glc-sc-isa#anchor)), not error remedies. A reimplementer scanning for "actionable hints" by grepping go/ alone will drown in counter documentation; filter to the non-counter set (Doc-Link Hints).


Flag-Suggestion Hints

The self-service remedy class: an error or log line names a specific --xla_* / --megascale_* / --tpu_* / --deepsea_* flag and the value to set. Every flag named below was confirmed to be a registered absl::Flag global by the presence of its AbslFlagHelpGenFor<name> symbol (10/10 spot-checked resolved 1:1; flag help-text bodies belong to the config section).

Fusion / MXU

OffsetHint string (verbatim)Flag referenced
0x858bbcf"(1) Please use --xla_tpu_rwb_fusion=false (and --xla_tpu_dot_dot_fusion=false if failure persists), Reason: found fallback window config while lowering fusion."xla_tpu_rwb_fusion, xla_tpu_dot_dot_fusion
0xa2b7034(duplicate of 0x858bbcf — same rwb/dot_dot fusion remedy; both copies byte-identical)(same)
0x96c35ed"PartialReduce is designed to be used with fusion. Did you forget to set `--xla_tpu_nested_dot_fusion=true`?"xla_tpu_nested_dot_fusion
0xa083064"…suboptimal MXU throughput on this HLO. Please file a bug with XLA:TPU compiler team, and use --xla_tpu_accumulate_into_mrb=false in the meantime."xla_tpu_accumulate_into_mrb

SparseCore / embedding

OffsetHint string (verbatim)Flag referenced
0x9fe2e58"If you are seeing this error when attempting to compile a distributed embedding model try running with --xla_tpu_embedding_table_oblongness_threshold=1 to ensure the embedding table is not given a tiled layout."xla_tpu_embedding_table_oblongness_threshold
0xa073044"Skipping reduce-scatter decomposition as SC ND RS needs --xla_tpu_enable_sparse_core_reduce_scatter_v2=true."xla_tpu_enable_sparse_core_reduce_scatter_v2
0xa07b05e"Number of sparse cores for scatter offloading should not be 0. To disable, please set xla_tpu_enable_offloading_scatter_to_sparsecore to false."xla_tpu_enable_offloading_scatter_to_sparsecore
0xa07b0ee"Number of SparseCores for gather offloading should not be 0. To disable, please set xla_tpu_enable_offloading_gather_to_sparsecore to false."xla_tpu_enable_offloading_gather_to_sparsecore
0x85c0718"Unsupported: computing max_nz_per_row requires --xla_sc_fused_scatter_in_compute_loop"xla_sc_fused_scatter_in_compute_loop
0x85c076e"Unsupported: computing max_unique_nz_per_row requires --xla_sc_fused_scatter_in_compute_loop"xla_sc_fused_scatter_in_compute_loop
0x85c07cb"Unsupported: computing max_unique_nz_per_row requires --xla_sc_fused_gather_in_compute_loop"xla_sc_fused_gather_in_compute_loop
0x86678da"…To avoid this failure, try to set a larger number to '--fdo_config_sparsecore_allowed_dropped_id_count_per_epoch' flag"fdo_config_sparsecore_allowed_dropped_id_count_per_epoch

Scheduler / collective / infeed

OffsetHint string (verbatim)Flag referenced
0xa008ce4"…has gaps on the way from that operand to itself. You can use --xla_tpu_scheduling_annotation_deannotate_unsupported_groups=true to deannotate the unsupported groups."xla_tpu_scheduling_annotation_deannotate_unsupported_groups
0xa060e03"Attempted to use profile-guided latency estimator without the real cost model. Either enable --xla_tpu_scheduler_using_real_cost_model or unset --xla_tpu_impure_latency_hiding_scheduler_profile_path."xla_tpu_scheduler_using_real_cost_model, xla_tpu_impure_latency_hiding_scheduler_profile_path
0xa04a55e"Sinking DCN collectives requires --xla_tpu_use_megascale_host_reduction."xla_tpu_use_megascale_host_reduction
0xa05844c"Sharded infeed with non-uniform layouts is not supported. Try turning off the infeed layout optimization (--transpose_tpu_infeed=false) and report to XLA team."transpose_tpu_infeed

Memory / OOM / runtime / debug

OffsetHint string (verbatim)Flag referenced
0x9e6e39e"…Aborting compilation early because it's unlikely to have enough device memory. Requires %s, has %s available. If more detailed logging is desired, set --xla_tpu_impure_oom_fast_exit_threshold=-1"xla_tpu_impure_oom_fast_exit_threshold
0xa074100". Reduce TPU memory usage or set --jellyfish_executor_max_wait_time_for_releasing_memory_on_oom to a larger value."jellyfish_executor_max_wait_time_for_releasing_memory_on_oom
0xa01af59"Spilling sregs to HBM on megacore is implemented but untested; use --xla_enable_megacore_hbm_spill=true to activate, and please update b/177274769 with findings."xla_enable_megacore_hbm_spill
0xa07a6ba". Restart the job or disable set --megascale_use_numa_aware_threadpool=false."megascale_use_numa_aware_threadpool
0x96deddc"…bytes. Pass --xla_hbm_logging_buffer_size_bytes="xla_hbm_logging_buffer_size_bytes
0xa06c059"Unable to log from the tile because tile log is NOT enabled. Use --xla_tpu_enable_tile_log_recorder=true to enable logging."xla_tpu_enable_tile_log_recorder
0xa06c0d5"…was invoked but logging was not enabled. Use --xla_tpu_enable_sc_log_recorder=true to enable logging."xla_tpu_enable_sc_log_recorder
0xb52d9a0"You need to set --deepsea_chips_per_host_bounds, --deepsea_host_bounds to match your setup. Contact tfrt-devs@ if you have any questions."deepsea_chips_per_host_bounds, deepsea_host_bounds
0x96c2015"…the threadpool for tensorflow operations is too small. Try increasing --45eac_num_operation_threads and please notify barna-core-devs@"_45eac_num_operation_threads (BarnaCore)
0x9fff409"Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results."xla_dump_to
0xa00da29"…if you would like to reliably retrieve the error message, try running it again with --notpu_use_continuations."tpu_use_continuations (negated)
0x87fb08b"[enable stack trace via --xla_jf_collect_llo_stack_trace]"xla_jf_collect_llo_stack_trace

Direction-of-default caveat

GOTCHA — the value a hint tells you to set is not necessarily the opposite of the default; read the verb. The "use --flag=false … in the meantime" / "Please use --flag=false" hints (rwb_fusion, dot_dot_fusion, accumulate_into_mrb) name the workaround (a non-default toggle to escape a broken path). The "use --flag=true to enable" hints (tile_log_recorder, sc_log_recorder, sparse_core_reduce_scatter_v2, nested_dot_fusion) name a feature that is off by default, where =true is the on-switch. Treat these as direction-of-default; only the ~13 rodata-evidenced defaults are byte-confirmed (config section). Do not assume the advised value equals "the default".


The go/ shortlink token splits cleanly into a small actionable playbook set and a large ISA-counter-documentation family. Only the first is a remedy; the second is reference material the profiler attaches to perf counters.

go/ linkHint string (offset)Subsystem
go/scoped-vmem0xa011573 ". See go/scoped-vmem for more details."MSA / scoped VMEM
go/sc-dynamic-bounded-slice0xa050247 "DynamicBoundedSlice lowering is only supported inside of custom fusions, see go/sc-dynamic-bounded-slice for more information."SparseCore lowering
go/xla_compile_runtime_flag_error0xa047216 "Runtime flags … must be set consistently between compile options and ABSL flags. … See go/xla_compile_runtime_flag_error for resolution."compile-env flag consistency
go/megascale-debug-playbook0xa21a318 "Debug dumping triggered. Refer to go/megascale-debug-playbook for further debugging. megascale_debug_dir = "megascale runtime
go/jfc-errata0x8644f10 / 0xa0d46d9 / 0xa0ee662 "…See errata entry go/jfc-errata#" (counter-errata caveats)JXC ISA errata
go/llvm-crash-bug0xa033687 "invalid profile created. Please file a bug at: http://go/llvm-crash-bug and include the profraw files that caused this error."PGO / LLVM crash report

NOTE — these are the only go/ links a user can act on to resolve a TPU/XLA failure. A second cluster (go/protection-key-fault, go/general-protection-fault, go/stacktraces, go/cppstackoverflow, go/lsan, go/prod-naming-1-5, go/grpc-without-init-google-exemption, go/streamz-force-collection, go/no_file_or_rpc_during_init, go/redact-debug-string) comes from statically-linked Google library prose (signal handlers, sanitizers, gRPC, telemetry), not from TPU-specific code paths. They are present in the binary but are not libtpu remedies; treat them as library noise (MEDIUM relevance).

The dominant go/ use is not a remedy. Each hardware performance counter the profiler exposes carries a Markdown link to its ISA-spec section inside the counter's description string, e.g. 0xa0dcfac "This counter counts hold-scalar-issue cycles for [yN_reserved](http://go/vfc-sc-isaa#yn-reserved)." The link prefix encodes the TPU generation:

go/jfc-isa, go/jfc-errata                       JXC  (Jellyfish / Dragonfish)
go/pfc-isa, go/pfc-sc-isa, go/pfc-bc-isa,
  go/pfc-memory-system                          PXC  (Pufferfish)
go/vfc-sc-isa (94+ occ), go/vfc-mem             VXC vector-fetch-core (Viperfish)
go/vxc-isa, go/vxc-sc-isa                        VXC ISA (Viperfish / Ghostlite vector)
go/glc-isa, go/glc-sc-isa (94 occ)              GXC general-load-core (Ghostlite)
go/gfc-isa, go/gfc-sc-isa (97 occ), go/gfc-power GXC general-fetch-core
go/core (70 occ)                                generic core-doc anchor

QUIRK — one of these links is misspelled in the shipped binary: go/vfc-sc-isaa (confirmed at 0xa0dcfac, double a). A reimplementer reproducing the counter-description corpus should preserve the typo to byte-match this build, not "fix" it. The full counter→anchor table is a separate large extraction and is not enumerated here.


Bug-Report Hints

The file-it-upstream class. These prose strings point the user at a tracker, a team, or a b/<id> known issue. They signal that the user has hit an internal limitation, not a knob to turn.

"file a bug" prose with tracker coordinates

OffsetHint string (verbatim, abbreviated)Destination
0x9f1d09c"DeepseaPlatform and --tpu_use_tfrt=false are now deprecated. … please file a bug at: https://b.stripped_domain/issues/new?component=670280"tracker component 670280 (TFRT)
0xa039ed8"RoutingTableAnalyzer detects a potential deadlock! File a bug against SliceBuilder (https://b.stripped_domain/issues/new?component=503036). Please attach core dumps retrieved from Coroner."tracker component 503036 (SliceBuilder)
0xaf0bf90"Please report a bug at: https://github.com/google/jax/issues/new?assignees=apaszke"github.com/google/jax (public)
0xab9e630 (also 0xaba4a30/0xaba4b70/0xaba4c50)"Please file a bug under Platforms > Performance > BarnaCore > Software with repro instructions"BarnaCore component path (×4)
0x858c562"Kernel body fingerprint collision detected for key: %016x%016x. Please file a bug with the XLA team and provide the colliding kernel bodies."XLA team
0x8a2941d"Fatal error in creation of RWB Fusion. Please file a bug with XLA-TPU"XLA-TPU
0x96c1211"XLA has not implemented dynamic sized slice with non-trival stride yet. Please file a bug against XLA"XLA
0x96c12d4"Unimplemented reduce-window in fusion cost modeling. Please file a bug with XLA"XLA
0x96c1fd3 / 0xa0c9b76"Allocated address is not aligned, please file a bug to tfrt-devs@" / "Duplicate allocation found, please file a bug to tfrt-devs@."tfrt-devs@
0xa036ccd"… is an unsupported memory space in TpuCustomCallScopedVmemAdjuster. Please file a P2 feature request, against the XLA-TPU team …"XLA-TPU (P2 FR)
0xa03eb45 / 0xa03eb8b"Manual sub-axis isn't supported. Please file a bug with a reproducer." / "Non-divisible sharding with unreduced axes isn't supported. …"XLA (Shardy)
0xa1b0507"The auto-sharding solver could not find a valid solution within the given time limit. Please report this as a bug!"XLA (auto-sharding)

The TPU-lowering invariant block (0xa0c8b1e..0xa0c9758)

A contiguous run of "should not happen" templates, one per HLO op the TPU backend expects to have been legalised away before lowering. These are pure internal-bug markers — there is no user remedy. The block is contiguous but not strictly homogeneous: the first string in the run, at 0xa0c8b1e, is a sibling variant phrased about layouts rather than an un-eliminated op — "Encountered mismatched layouts for select-and-scatter. This should not happen - please file a bug against XLA." The "Encountered <Op> op during TPU lowering…" template proper begins at 0xa0c8b8d (Dot).

0xa0c8b1e  Encountered mismatched layouts for select-and-scatter.
           This should not happen - please file a bug against XLA.
0xa0c8b8d  Encountered <Op> op during TPU lowering that should have been eliminated
           during an earlier phase of compilation.
           This should not happen - please file a bug against XLA.
   <Op> ∈ { Dot, Call, BatchNormTraining, BatchNormInference, BatchNormGrad,
            Pad, Reverse, select-and-scatter, custom-fusion, OutputFusion, … }   (~14 ops)

b/<id> known-issue / TODO references (31 distinct)

These pin a limitation to a tracked bug. They are not actionable beyond "comment/track" — the most a user does is leave a note on the bug. Representative rows from the 31 distinct ids follow.

b/<id>Context (offset)Class
b/1673925930xa00f6f4 (×4) "TODO(b/167392593): Support {bitcasts merging dims, sliced prefetches, …}."MSA/prefetch TODO
b/1772747690xa01af59 "…use --xla_enable_megacore_hbm_spill=true … please update b/177274769 with findings."megacore HBM spill
b/4337852880xa041d37 "…fixed by Shardy partitioner in the future, tracked in b/433785288. Contact Shardy or XLA team for help."Shardy partitioner
b/360726590xa0af6ba "Infeed buffer size … bug in the infeed operation (b/36072659). See the bug for a workaround …"infeed deadlock
b/1477873750x99e1405 / 0x99e144d "Close() appears to be hanging, this might be a deadlock see b/147787375" / "Disable() appears to be hanging, …"runtime deadlock
b/2820551660x9929df3 "Error may indicate firmware queue fullness: b/282055166"runtime/firmware
b/304815850xa12ba16 "Device reset is not yet supported on this platform (b/30481585)"runtime/device-reset
b/4883366140xa0a0fb0 "b/488336614 Only single-SC element scatter add is supported."SparseCore scatter
b/4227620040xa12c03b "Gather with implicit convert is not supported. (b/422762004)"gather lowering
b/3434907290x87050df "Not implemented: TODO(b/343490729): stores packed int2 are not supported"int2 lowering

NOTE — b/<id> ids that point a user at a workaround (b/36072659 infeed, b/494604538 cancellables) are the closest this class gets to actionable; the rest are progress-tracking only. The 6-digit-minimum rg threshold means shorter b/<id> or cl/<id> changelist references may exist below it (LOW confidence on completeness).


Capacity / OOM Hints

The operator-actionable memory-pressure class: free memory, raise a limit, or wait for compaction. Grouped by where the pressure is detected.

Compile-time and runtime OOM

OffsetHint string (verbatim, abbreviated)Remedy
0x9e6e39e"Aborting compilation early because it's unlikely to have enough device memory. Requires %s, has %s available. … set --xla_tpu_impure_oom_fast_exit_threshold=-1"compile-time OOM abort + logging knob
0xa074100". Reduce TPU memory usage or set --jellyfish_executor_max_wait_time_for_releasing_memory_on_oom to a larger value."runtime OOM → free memory or wait longer
0xa07b361"Not enough HBM spill stack available, please increase."increase HBM spill stack
0xa04bde4"Allocation fails. Try again after compaction. Please note that compaction can be slow. If you want to achieve better performance, please manage TPU buffers carefully to avoid this compaction."retry-after-compaction (capacity + perf)

Embedding capacity

OffsetHint string (verbatim, abbreviated)Trigger
0xa01cc4f"TPU embedding: out of memory allocating %lld bytes."embedding HBM allocation
0xa030f0a"The current embedding configuration does not fit on the TPU due to HBM space constraints or more optimization algorithms than supported: aggregate_min_resource_count (%d) > total_resource_count (%d)."embedding config exceeds HBM/optimizer resources
0xa056bd4"The number of unique optimizers in the TPU embedding configuration exceeds the capacity of the TPU system."too many distinct optimizers
0xa07d172"Scatter operand has %d elements, which exceeds the 32-bit limit. Unsupported on SparseCore."SparseCore 32-bit index limit

Fusion / VMEM / Smem capacity guards

OffsetHint string (verbatim)Limit
0x84b1b50 / 0x84b1b7d / 0x9fd4955"Nested dot fusion would exceed vmem capacity" / "Custom Fusion would exceed vmem capacity" / "' exceeds VMEM capacity."VMEM-capacity fusion guard
0x84b1a2b / 0xa1b5992"Smem high-water mark exceeds memory capacity" / "SCS high-Smem usage exceeds Smem capacity"Smem capacity
0x9ff331b"The input tensor is not on HBM/VMEM and it exceeds the HBM logging buffer limit."HBM logging-buffer cap (paired with --xla_hbm_logging_buffer_size_bytes)
0xa1eed4d"Too many buffers are colored in the alternate memory. Could not reserve alternate memory for colored output of instruction "alternate-memory coloring
0xa122204 / 0xa122248"The size of next-hop routing table (%d) exceeds the chip limit (%d)" / "The size of routing table (%d) exceeds the chip limit (%d)"ICI routing-table cap

NOTE — the VMEM/Smem/routing-table guards are capacity rejections, not remedies — they tell the user the limit was hit but name no flag. The actionable OOM hints (xla_tpu_impure_oom_fast_exit_threshold, jellyfish_executor_max_wait_time_for_releasing_memory_on_oom) are also flag-suggestions and cross-appear above.


Deprecation Notices

Sparse, and mostly flag-migration. Library "constexprs are no longer supported"-style noise is excluded; only TPU/XLA notices are listed.

OffsetNotice (verbatim, abbreviated)Migration
0x9f1d09c"DeepseaPlatform and --tpu_use_tfrt=false are now deprecated. This flag is slated for removal. …"remove; file bug if it was needed
0x9feecc8"--xla_tpu_impure_enable_packed_bf16_math_ops is deprecated. Please use --xla_tpu_bf16_emission_mode in TpuCompilationEnvironment."xla_tpu_bf16_emission_mode
0x8a293e0"Non-SPMD model parallelism is no longer supported by XLA:TPU"use SPMD
0xa26350a"[DEPRECATED_XLA_TPU_FLAG_USE] Deprecated TpuCompilationEnvironment flags were overridden: "runtime marker (flags overridden)
0xa2ad13b"[DEPRECATED_XLA_TPU_FLAG_USE] Deprecated TpuCompilationEnvironment flags were present and not matching their default values:"runtime marker (non-default deprecated flag)
0xa0b19dd (also 0xa0b1a9a)"The SegmentShardingHelperDivSimple class can only be used when … BarnaCores per row per task is equal to 1 … Use the SegmentShardingHelperDiv class instead." (also …ModSimple…Mod)BarnaCore class migration
0xa0b1c7c"DescriptionForDevice.* is not supported for TPU. Please use one of the GetExecutor methods instead."GetExecutor methods

NOTE — the two [DEPRECATED_XLA_TPU_FLAG_USE] strings are runtime markers the runtime prints when a deprecated TpuCompilationEnvironment flag is set; they prefix the offending flag list rather than naming a single flag. The deprecated-flag names themselves belong to the config section.


Perf-Tuning Suggestions

"Suboptimal but correct" notices: the code path works, but the user can do better.

OffsetSuggestion (verbatim, abbreviated)
0xa083064"…suboptimal MXU throughput on this HLO. … use --xla_tpu_accumulate_into_mrb=false in the meantime." (also a flag-suggestion + bug-report)
0x9fd3a5e". Switch to allocate_output to avoid performance penalty."
0x9ff1304"Concatenate fusion is inefficient."
0xa041d37"…SPMD will replicate the tensor and then partition it … which is inefficient. … tracked in b/433785288."
0xa29fd14"Layout inefficient dot whose output shape has small lane/sublane dimensions"
0xa04bde4"…If you want to achieve better performance, please manage TPU buffers carefully to avoid this compaction." (also a capacity hint)

NOTE — the MSA knob xla_tpu_msa_inefficient_use_to_copy_ratio (a proto field at 0xbfc053e) is the heuristic behind the "inefficient use-to-copy" perf path; it is named in the config layer, not in a hint string (LOW confidence that any user-facing hint prints it directly).


Operator-Action Prose

Self-service remedies that carry no bug-id: fleet/topology actions, op-sequencing protocols, init/lifecycle guidance, and team pointers.

Megascale hang digest (8 cause-branch remedies)

Each "Megascale detects a hang that is likely caused by <cause>. <remedy>." string names a different operator action keyed to the detected cause. The cause→remedy dispatch (which Cause enum selects which string) is a .text switch and is not traced.

OffsetCause → remedy (verbatim, abbreviated)
0xa058553bad TPU sparse core chips → "Please remove the hosts from the fleet and restart the workload. If problem persists please contact Megascale XLA team."
0xa05862fbad TPU tensor core chips → "Please remove the hosts from the fleet and restart the workload. …"
0x9ffc2f2networking issue → "Please examine the underlying networking stack for the following hosts."
0x9fd7519data-input stall → "Please check the workers to make sure the data input pipeline is working properly."
0xa06d7d2different modules on different devices → "Please confirm that all workers is running the exact same program. …"
0xa079562inconsistent HLO module compilation → "This is likely a bug in JAX tracing or XLA compiler. Please inspect the HLO dumps …"
0xa0d7cecworker not queuing programs → "Check if your application is blocked/crashing and preventing JAX to queue the next TPU program (jitted functions)."
0x9fe02aeunknown cause → "Megascale detects a hang but cannot determine the root cause. Please inspect the full digest below."

Associated aborts: 0xa045c93 "Aborting the coordinator as an unrecoverable error is reported …"; 0xa045d37 "Aborting the coordinator after collecting errors … as megascale_error_reporter_abort_on_hang is set to true …".

QUIRK — the bad-SC-chips (0xa058553) and bad-TC-chips (0xa05862f) remedies are near-identical English with a one-word grammatical drift ("If problem persists" vs "If the problem persists"). They are two distinct rodata strings, one per cause-branch, not a single shared template — a reimplementer building the digest must emit the matching variant per cause, not deduplicate them.

TPU embedding op-sequencing protocol (0xa009da1..0xa04698b)

~10 "Ensure that … before …" hints enforce the embedding-parameter op ordering (load_retrieve_enqueue_/infeed). They describe a strict op-protocol the JAX/TF-TPU user must follow; the underlying state machine is not traced here.

"Cannot retrieve embedding parameters … until the previous retrieve/load operation
 has completed. Ensure that the previous … ops have completed before …"

"TPU embedding infeeds cannot be run in the middle of retrieving/loading the embedding
 parameters. Please quiesce all infeeds before retrieving/loading parameters and restart
 the infeeds after … is complete. Ensure that the retrieve_/load_tpu_embedding_*_parameters
 ops have completed before the enqueue_tpu_embedding_*_batch ops are run."   (0xa046624 / 0xa046853)

"Embedding parameters can be loaded … only after the TPU system has been first initialized.
 Ensure that the load_… ops are run only after tpu.initialize_system() is complete."

Init / driver-lifecycle hints

OffsetHint (verbatim, abbreviated)
0x9fe0d47"Failed to get tpu system. Please call tf.tpu.experimental.initialize_tpu_system() before running any ops on tpu."
0x9fd4688"PjRtClient already exists for TPU. This probably means you have already implicitly initialized PJRT. … call tf.tpu.experimental.initialize_tpu_system() explicitly …"
0xa094bb2"No TPU_SYSTEM device found. Please ensure that you're connected to a host with a TPU_SYSTEM device."
0x9fe65a1"Failed to initialize TPU system, please contact Cloud TPU Support."
0xa07188f"WARNING: could not determine %s, please set env var `%s` manually, otherwise libtpu.so may not properly initialize."
0xa1e3f95"Are you using the right LibTPU version? This LibTPU is built for <…>"
0xa1a96ba"executable is built for device %s of type "%s"; cannot run it on device %s of type "%s""
0xa0012c5"… DMA buffers were still outstanding when the driver was destroyed. … please ensure all buffers are destroyed before destroying driver objects."
0xa079d96"TPU driver close was incomplete; kernel reported %d device handles outstanding. … Ensure that all userspace access to the device has ended before invoking driver close."
0x9ffb075"Lost connection to the SliceBuilder controller task (normally worker task 0). Please check controller task status."
0xa1fd28c"Please ensure that you have only created one TPUEmbedding"

Contact-team pointers

tfrt-devs@         alignment / allocation / topology-bounds bugs
barna-core-devs@   embedding hot-id-replication threadpool sizing
Shardy team / XLA  manual-axes / unreduced-axes sharding
Megascale XLA team persistent hang after host removal
Cloud TPU Support  TPU-system init failure (the one external channel)

The CPU-Feature Startup Guard (go/sigill-fail-fast)

A fatal pre-main guard that aborts if the host CPU lacks an ISA feature the build requires. Twelve variants at 0xbe7d460+, one per feature:

FATAL ERROR: This binary was compiled with <isa> enabled, but this feature is not
available on this processor (go/sigill-fail-fast).
   <isa> ∈ { aes, avx, mmx, pclmul, popcnt, sse, sse2, sse3,
             sse4.1, sse4.2, ssse3, cmpxchg16b }                 (12 variants)

NOTE — this is the absl CPU-feature startup guard from statically-linked Google code, not a TPU-specific path — it tells the operator their host CPU is too old for the build. It is included here because it is user-facing fatal prose with a go/ remedy link, but it is not a libtpu remedy. The earlier error-template catalog named 11 of the 12 variants; the twelfth, cmpxchg16b, is confirmed present at 0xbe7d460+ (CERTAIN).


Not Traced

This page is the hint half of the diagnostic surface; several links to the rest of the surface are deliberately left open because they require disassembly this catalog did not do.

  • Hint ↔ error-template link. Hints sit adjacent to their parent error template in .rodata, but the concatenation is a .text absl::StrCat/StrFormat callsite. Which of the error-template surface each hint is appended to — and the resulting absl::StatusCode — is not traced.
  • Direction-of-default. Whether a flag-suggestion's advised value is the default or the workaround is a DefaultDebugOptions() / FLAGS_* ctor question, not byte-confirmed for every flag (only ~13 rodata-evidenced defaults are). The verb-based heuristic above is directional, not authoritative.
  • Log severity. The severity each hint is emitted at (WARNING / ERROR / FATAL) is set at the callsite, except where the prose itself says "FATAL" / "Aborting" / "WARNING".
  • Megascale cause→remedy dispatch. The 8 hang-digest strings are recovered, but the Cause-enum → string switch is not traced.
  • go/<link> content resolves to internal Google URLs not present in the binary; only the link tokens and referencing prose are recovered.
  • go/<gen>-isa#anchor → counter map. The 514-occurrence ISA-doc-link family is summarised by gen-codename mapping only; the per-counter anchor table is a separate large extraction.

Cross-References

  • Error Templates — the raw error/status templates these hints are appended to; the fact, where the hint is the remedy.
  • Runtime Overview — where in the runtime the diagnostic surface sits.
  • Internal Pass Names — the pass/lowering names that appear in many "should not happen during TPU lowering" bug-report hints.
  • Flag Families (config) — the registered absl::Flag globals these hints name, with their help-text bodies and defaults.
  • XLA Flag Atlas (config) — the full --xla_tpu_* / --megascale_* knob catalog the flag-suggestion hints point to.