Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Environment Variables

All addresses on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build libtpu_lts_20260413_b_RC00, build-id md5 89edbbe81c5b328a958fe628a9f2207d, 781,691,048 bytes, ELF x86-64 DYN, not stripped; demangled C++ symbols quoted verbatim). Other versions differ.

Abstract

libtpu reads its configuration from two sources that look the same from a shell but are wired completely differently inside the .so. The first is a small, fixed set of environment variables that the TPU code reads directly through getenv("LITERAL_NAME") — these are the variables a reimplementer must hard-code, because the name string is baked into a specific reader function and there is no registry behind them. The second, and far larger, surface is the absl/XLA flag machinery: XLA_FLAGS, TF_XLA_FLAGS, and the --xla_* / --tpu_* flags injected through LIBTPU_INIT_ARGS are not read by getenv at all — they are parsed once at bootstrap by ParseCommandLineNonHelpFlags against the absl flag tables the dlopen constructor storm pre-registered. This page owns the direct-getenv catalog: every literal env-var name the binary passes to getenv, the function that reads it, and what the value does. The flag surface and its injection channel live elsewhere (see the cross-references).

The decisive fact for a reimplementer is the line between the two. A grep of .rodata finds hundreds of TPU_*, MEGASCALE_*, GRPC_*, and TF_* strings, but most are not env-var reads: TPU_CORE_TYPE_*, TPU_SEQUENCER_TYPE_*, and similar are enum-name strings; TPU_VISIBLE_DEVICES, TPU_HOST_BOUNDS, and TPU_SKIP_MDS_QUERY are absl flag names, bound by the flag parser, not by getenv; the MEGASCALE_* and GRPC_* families are flag/option strings consumed by the distributed-runtime and gRPC layers respectively. Only ~205 distinct strings reach a literal getenv() in the decompiled corpus, and of those only a handful are TPU-specific; the rest are inherited from vendored libraries (hwloc, libpfm, gRPC, GCS, the Google base/ init runtime, absl test harness). The catalog below separates the TPU-owned direct reads (CONFIRMED reader + address) from the flag-bound names (CONFIRMED string, parsed not getenv'd) so a reimplementer knows which to implement as getenv and which to implement as a flag.

A note on secure_getenv: despite the dispatch hardening elsewhere in the runtime, no literal-argument secure_getenv("NAME") call survives in the decompiled corpus — every confirmed env read is plain getenv. The bootstrap is not running with elevated privileges by the time it parses its environment, so the runtime does not bother to drop env access for setuid contexts.

Flag-injection channelLIBTPU_INIT_ARGSGetLibTpuInitArguments @ 0x20ccca20 (851 B)
Lock + topology gate readertensorflow::tpu::TryAcquireTpuLock @ 0x20ccbc40 (3531 B)
TfRT runtime selectortpu::ShouldUseTfrt @ 0x1d0fc740 (getenv in lambda $_0 @ 0x1d0fc800) reads ENABLE_TFRT_TPU_RUNTIME
Megascale gatePJRT_Client_Create @ 0xe6a8840 reads SKIP_MEGASCALE_PJRT_CLIENT
Uptime telemetry readerInitializeUptimeMetricViaEnvironmentVariables @ 0x20a65720 reads TPU_ML_PLATFORM[_VERSION]
Premapped-buffer readerTpuStatesManager::GetOrCreateTpuSystemState @ 0xf956e40
XLA_FLAGS / TF_XLA_FLAGSNOT getenv'd — parsed by the absl/tsl flag-from-env machinery
TPU_LIBRARY_PATHset by the wheel's __init__.py, read by the framework loader, not by libtpu
Distinct literal getenv args (whole .so)~205 (mostly vendored: hwloc, libpfm, gRPC, GCS, Google base)
secure_getenv literal-arg sitesnone observed
ConfidenceCONFIRMED = literal getenv("NAME") in decompiled reader at the cited address

How to Read This Catalog

Each row carries a Reader (the function and address that consumes the variable) and a Confidence that means a specific thing here:

  • CONFIRMED — the binary contains a literal getenv("NAME") (or the flag string for a flag-bound row) inside the cited function. The reimplementer should reproduce this exactly.
  • HIGH — the env-var string is present and its consuming subsystem is identified, but the precise reader call was inferred from the surrounding function rather than pinned to a single getenv line.
  • LOW — the string is present but whether it is read (vs. defined as a flag name, an enum name, or a value emitted into a log) was not resolved.

GOTCHA — a string in .rodata is not evidence of an env-var read. The single most common reimplementation error here is treating an absl flag name (e.g. TPU_VISIBLE_DEVICES) as a getenv target. libtpu binds those through the flag parser; calling getenv("TPU_VISIBLE_DEVICES") in a reimplementation would read a variable libtpu never reads. The catalog marks every flag-bound name explicitly.


1. Flag Injection

The primary configuration channel. A plugin .so has no command line of its own, so libtpu fabricates one from LIBTPU_INIT_ARGS and feeds it to the absl flag parser. The mechanics of the ingest — the space split, the argv[0] synthesis, the vector<string>char** flatten — are documented in full on ../lifecycle/tftpu-initialize-bootstrap.md §2; the flags that string carries are catalogued on xla-flag-atlas.md. This page records only the env-var reads themselves.

VariableReader (symbol @ addr)EffectDefault
LIBTPU_INIT_ARGStensorflow::tpu::GetLibTpuInitArguments @ 0x20ccca20Read by literal getenv; the value is space-split (on ASCII 0x20) into a vector<string> and a parallel vector<char const*> argv, which the function returns. The argv[0] synthesis, prepend of Cloud-TPU defaults, and the ParseCommandLineNonHelpFlags call happen in the bootstrap caller, not in this function. The injection point for every --xla_* / --tpu_* flag.unset → empty argv (no injected flags)
XLA_FLAGSabsl/tsl ParseFlagsFromEnvAndDieIfUnknown (string @ .rodata, not getenv'd by TPU code)Standard XLA flag-from-env channel; merged with the LIBTPU_INIT_ARGS argv into the same absl flag tables.unset → no env flags
TF_XLA_FLAGSxla::ParseFlagsFromEnvAndDieIfUnknown("TF_XLA_FLAGS", …) (in AllocateAndParseFlags @ 0xfe5fe80, not getenv'd)TensorFlow-bridge XLA flag channel; same flag tables.unset

NOTE — LIBTPU_INIT_ARGS is the only TPU-specific variable in this group read by a literal getenv. XLA_FLAGS / TF_XLA_FLAGS are read by the generic absl flag-from-env code (absl::ParseCommandLine reads them by way of ABSL_FLAGS_FROM_ENV-style scanning), so they are environment variables in effect but are not getenv'd at any TPU call site. A reimplementer must wire all three into one flag parse, but only LIBTPU_INIT_ARGS is a hand-rolled getenv in the TPU layer.


2. Device Selection and Topology

These variables describe the slice geometry the process should see. The split here is sharp and easy to get wrong: the bounds variables are read directly by getenv inside the lock gate, while the visible-device selectors are absl flag names.

Direct getenv reads

VariableReader (symbol @ addr)EffectDefault
TPU_CHIPS_PER_HOST_BOUNDStensorflow::tpu::TryAcquireTpuLock @ 0x20ccbc40Per-host chip grid (x,y,z) used when computing the device lock and the local topology footprint. Read alongside the lock acquisition.unset → derived from detected hardware
TPU_CHIPS_PER_PROCESS_BOUNDStensorflow::tpu::TryAcquireTpuLock @ 0x20ccbc40Per-process chip grid; bounds the chips this process claims, narrowing the host bounds for multi-process-per-host layouts.unset → equals host bounds
ALLOW_MULTIPLE_LIBTPU_LOADtensorflow::tpu::TryAcquireTpuLock @ 0x20ccbc40When set, relaxes the single-loader cross-process lock so more than one libtpu instance may bind the device.unset → single-load enforced
TPU_LOAD_LIBRARYtensorflow::tpu::TryAcquireTpuLock @ 0x20ccbc40Gates whether the cross-process TPU lock is even attempted (see ../lifecycle/tftpu-initialize-bootstrap.md §5).unset → lock attempted

Flag-bound names (NOT getenv'd)

NameWhere boundEffect
TPU_VISIBLE_DEVICESabsl flag tableComma-separated device index allow-list; consumed by the device enumerator via the flag parser, not getenv.
TPU_VISIBLE_CHIPSabsl flag tableChip-level visibility allow-list (newer spelling alongside TPU_VISIBLE_DEVICES).
TPU_VISIBLE_DEVICE_PATHSabsl flag tableExplicit device node paths to bind.
TPU_HOST_BOUNDSabsl flag tableHost grid (x,y,z) — the flag form, distinct from the getenv'd TPU_CHIPS_PER_HOST_BOUNDS.
TPU_SKIP_MDS_QUERYabsl flag tableSkips the metadata-server topology query, forcing reliance on the locally supplied bounds/topology.
TPU_TOPOLOGY_WRAP / TPU_TOPOLOGY_ALTabsl flag tableTorus wrap mode / alternate topology selector for the slice geometry.
TPU_MEGACOREabsl flag tableMegacore pairing mode for the chip generation.
TPU_ACCELERATOR_TYPEabsl flag tableAccelerator-type label (e.g. v5e-4), mapped to a topology by the PJRT topology name table.

QUIRK — there are two host-bounds knobs with different plumbing. TPU_CHIPS_PER_HOST_BOUNDS is a real getenv read inside TryAcquireTpuLock; TPU_HOST_BOUNDS is a flag name parsed by absl. They overlap in meaning but reach the runtime by different paths — a reimplementer must reproduce the first as getenv and the second as a flag, or one of the two will silently do nothing.


3. Runtime Mode and Buffer Tuning

Direct getenv reads that select the runtime backend and tune transfer buffers.

VariableReader (symbol @ addr)EffectDefault
ENABLE_TFRT_TPU_RUNTIMEtpu::ShouldUseTfrt @ 0x1d0fc800 (lambda $_0)Selects the TfRT-based TPU runtime path over the legacy StreamExecutor path. Read once by a ShouldUseTfrt predicate.unset → backend default
SKIP_MEGASCALE_PJRT_CLIENTpjrt::tpu_plugin::PJRT_Client_Create @ 0xe6a8840When set, PJRT_Client_Create (PJRT slot 15) bypasses wrapping the client in the Megascale multi-slice client and returns the single-slice client directly.unset → Megascale client built when multi-slice config present
TPU_PREMAPPED_BUFFER_SIZExla::TpuStatesManager::GetOrCreateTpuSystemState @ 0xf956e40Size (bytes) of the pre-mapped DMA staging buffer reserved per TPU system.unset → runtime-chosen size
TPU_PREMAPPED_BUFFER_TRANSFER_THRESHOLD_BYTESxla::TpuStatesManager::GetOrCreateTpuSystemState @ 0xf956e40Transfer-size threshold above which the pre-mapped buffer path is used instead of per-transfer mapping.unset → runtime default threshold
DISABLE_HOST_SEND_RECV_REGISTRATION_GLOBAL__sub_I_sendrecv_ops.cc @ 0x212c9af0 (static ctor)Suppresses registration of the host-side send/recv ops at module-init time. Read in a file-static constructor during the dlopen storm.unset → host send/recv registered
PJRT_NPROCxla::DefaultThreadPoolSize @ 0x1d7f4800Process count used to size the default thread pool; falls back to NPROC when unset. Read by literal getenv.unset → falls back to NPROC, then a derived size
CLOUD_TPU_TASK_IDtpu::TpuHal::GetTaskId @ 0x1e8142c0This process's task index within the Cloud-TPU job; required for multi-host jobs (the reader errors with "'CLOUD_TPU_TASK_ID' not specified for a multi-host job." if absent and one is needed). Read by literal getenv.unset → single-host / derived

4. Distributed / Megascale and Coordination

The multi-slice (Megascale) layer is configured almost entirely by flag-bound MEGASCALE_* strings, not by getenv. The one direct getenv in the Megascale path is the bypass switch in §3 (SKIP_MEGASCALE_PJRT_CLIENT). The coordination/topology exchange that consumes these is documented on ../megascale/bootstrap/overview.md.

NameConsumerEffect
MEGASCALE_COORDINATOR_ADDRESSMegascale bootstrap (flag/option)Address of the coordination service used for cross-slice rendezvous and coordinator election.
MEGASCALE_NUM_SLICESMegascale bootstrapTotal slice count in the multi-slice job; drives barrier and topology-exchange sizing.
MEGASCALE_SLICE_IDMegascale bootstrapThis process's slice index.
MEGASCALE_PORT / MEGASCALE_DEBUG_PORTMegascale transportService / debug listen ports for the inter-slice transport.
MEGASCALE_TRANSPORT_TYPEMegascale transportSelects the cross-slice transport (e.g. gRPC vs. DCN).
MEGASCALE_TOPOLOGYMegascale bootstrapMulti-slice topology descriptor.
MEGASCALE_AUTHENTICATIONMegascale transportAuth mode for the coordination channel.
MEGASCALE_TRACING / MEGASCALE_GRPC_ENABLE_XOR_TRACERMegascale tracingEnables Megascale request tracing / the gRPC XOR tracer.
TPU_WORKER_IDdistributed init (flag/string)Worker index within the distributed TPU job.
TPU_WORKER_HOSTNAMESdistributed init (flag/string)Comma-separated worker hostnames for the job.
TF_TASK_ID / TF_JOB_NAMEgetenv (TF distributed identity)Task index / job name in a TensorFlow distributed setup. Read by literal getenv.

NOTE — the MEGASCALE_* family is consumed by the multi-slice runtime through its options/flag parsing rather than discrete getenv calls, which is why none appear in the literal-getenv set even though all ten strings are present in .rodata. The bypass (SKIP_MEGASCALE_PJRT_CLIENT) is the exception: it is a genuine getenv inside PJRT_Client_Create, because it must short-circuit before any Megascale option parsing happens.


5. Profiling, Dump, and Telemetry

A mix of direct getenv reads (telemetry platform labels, TF graph dumps) and flag-bound dump directives. The XLA_FLAGS-driven HLO dump knobs (--xla_dump_to, etc.) are not env vars — they are flags carried through the LIBTPU_INIT_ARGS channel (see xla-flag-atlas.md).

VariableReader / ConsumerEffect
TPU_ML_PLATFORMlibtpu::telemetry::InitializeUptimeMetricViaEnvironmentVariables @ 0x20a65720ML-platform label (e.g. the framework name) stamped onto the uptime/runtime telemetry gauge.
TPU_ML_PLATFORM_VERSIONlibtpu::telemetry::InitializeUptimeMetricViaEnvironmentVariables @ 0x20a65720Platform version string for the same telemetry gauge.
TF_DUMP_GRAPH_PREFIXgetenv (TF graph dump)Directory prefix for TensorFlow graph dumps. Read by literal getenv (multiple sites).
TF_DUMP_GRAPH_NAME_FILTER / _GROUPS / _WRAPPEDgetenv (TF graph dump)Name filter / grouping / wrap controls for graph dumps. Read by literal getenv (DebugDataDumper::LoadEnvvars).
TF_DUMP_GRAPH_FMTtsl::ReadStringFromEnvVar (GetDumpGraphFormatLowerCase @ 0x10d8cf60)Output format for graph dumps (default "TXT"). Read through the tsl env-var helper, not a literal getenv.
TF_GRAPH_TO_HLO_COMPILER_DUMP_DIRgetenvDump directory for the graph→HLO compiler.
TF_LOG_XLA_ACTIVITYgetenvEnables XLA activity logging.
MLIR_CRASH_REPRODUCER_DIRECTORYgetenv (MLIR)Directory for MLIR crash reproducers.
MLIR_BRIDGE_LOG_ENABLE_ONLY_TOP_LEVEL_PASSESgetenv (MLIR)Restricts MLIR bridge logging to top-level passes.
XPROF_SKIP_DROP_EXCESS_XPLANE_BYTESgetenv (xprof)Profiler XPlane byte-budget control.
TPU_CORE_DUMP_DIRECTORYflag/option (string present)Directory for TPU core dumps.
TPU_LOG_DIR / TPU_MAX_LOG_SIZE_MBflag/option (string present)TPU log directory / size cap.
TPU_VMODULE / TPU_VLOG_LEVEL / TPU_STDERR_LOG_LEVELflag/option (string present)Per-module / global / stderr verbose-logging levels.

6. Inherited (Non-TPU) Environment

libtpu statically links a large stack of Google and third-party libraries, each of which reads its own environment. These are present and functional in the loaded .so but are not part of the TPU configuration surface a reimplementer of the TPU runtime needs to reproduce. They are catalogued by family rather than enumerated, because the list runs to ~180 distinct strings and reproducing them is reproducing those libraries, not libtpu.

FamilyExample variablesOrigin
Process bring-upGOOGLE_LOG_DIR, GOOGLE_STDERRTHRESHOLD, GOOGLE_MLOCK_HINT, GOOGLE_MAX_LOG_MB, GOOGLE_DEBUG_ON_FAILUREGoogle base/ init runtime (runs inside RealInitGoogle)
Topology discoveryHWLOC_* (~50 vars: HWLOC_XMLFILE, HWLOC_COMPONENTS, HWLOC_FSROOT, …)vendored hwloc
PMU / profilingLIBPFM_*, CPUPROFILE_*, FREQUENCY, JITDUMPDIRvendored libpfm / CPU profiler
gRPCGRPC_* (large family)vendored gRPC
Cloud storageGCS_* (~30 vars), GCE_METADATA_HOST, GOOGLE_APPLICATION_CREDENTIALS, NO_GCE_CHECKTF/GCS filesystem
SymbolizationLLVM_SYMBOLIZER_PATH, LLVM_DISABLE_SYMBOLIZATION, LLVM_OVERRIDE_PRODUCERvendored LLVM
Accelerator pathsCUDA_HOME / CUDA_PATH / CUDA_ROOT, ROCM_HOME / ROCM_PATH / ROCM_ROOTXLA host probes (no effect on the TPU path)
Test harnessTEST_TMPDIR, TEST_SRCDIR, TEST_UNDECLARED_OUTPUTS_DIR, UNITTEST_ON_BORG, XML_OUTPUT_FILEGoogle test runtime (dormant in production)
Standard POSIXHOME, PATH, PWD, TMPDIR / TMP / TEMP, TZ / TZDIRlibc / TF utilities

NOTE — XLA_ALLOW_GET_DEFAULT_PLATFORM is a genuine literal getenv read (in the XLA platform-manager layer), but it gates XLA's default-platform fallback, not anything TPU-specific. It is listed here rather than in §2 because a TPU-only reimplementation never reaches the code that consults it.


7. The TPU_LIBRARY_PATH Special Case

TPU_LIBRARY_PATH is the one variable users most associate with libtpu, yet libtpu.so itself does not getenv it. The wheel's Python wrapper sets it:

# libtpu/__init__.py (from the wheel)
if not os.environ.get('TPU_LIBRARY_PATH'):
    os.environ['TPU_LIBRARY_PATH'] = get_library_path()   # path to this libtpu.so

The variable is then read by the framework loader (JAX/PJRT's plugin-discovery code in the host process) to locate the .so to dlopen. By the time libtpu's own code runs, the path has already done its job. A reimplementer of libtpu does not implement TPU_LIBRARY_PATH; a reimplementer of the plugin loader does. The two confirmed string references to TPU_LIBRARY_PATH in the report are this Python assignment and read, not a getenv inside the binary.

QUIRK — the wheel ships TPU_LIBRARY_PATH as a set-if-unset in __init__.py, so importing libtpu is what wires the framework to the bundled .so. Setting TPU_LIBRARY_PATH in the environment before import overrides which libtpu.so is loaded — the bundled __init__.py honors a pre-existing value. This is the override hook for pointing JAX at a custom libtpu build.


Cross-References

  • overview.md — the configuration-surface map: flags, knobs, the compilation environment, and where env vars sit among them
  • ../lifecycle/tftpu-initialize-bootstrap.md — the LIBTPU_INIT_ARGS ingest mechanics (space split, argv[0] synthesis, the flag parse) and the TPU_LOAD_LIBRARY lock gate
  • xla-flag-atlas.md — the --xla_* / --tpu_* flags that LIBTPU_INIT_ARGS and XLA_FLAGS actually inject
  • flag-families.md — how the flag names are grouped and prefix-dispatched once parsed
  • ../megascale/bootstrap/overview.md — the multi-slice bootstrap that consumes the MEGASCALE_* options and is gated by SKIP_MEGASCALE_PJRT_CLIENT