Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Reconstructed-Proto Index

Every message name, field count, and descriptor VA on this page was read from the protodesc_cold descriptor pool of libtpu.so in the libtpu-0.0.40-cp314 wheel (BuildID md5 89edbbe81c5b328a958fe628a9f2207d, wheel 0.0.40, not stripped). Field numbers are stable wire identifiers; other versions may add fields but will not renumber these. Other versions differ.

Abstract

libtpu.so embeds its entire protobuf type universe as serialized google.protobuf.FileDescriptorProto records in the protodesc_cold section (VA 0xbe8af30, size 0x334180 = 3.20 MiB). The runtime rebuilds this descriptor pool at startup, calling DescriptorPool::BuildFile() for each generated _pbz_… symbol; the reflection layer then serves every message by name. The full enumeration — 760 file records, ~8 021 messages, ~2 031 enums — is catalogued on the sibling protodesc-cold-catalog page. This page is the narrower, higher-value index: of those ~760 descriptors, which messages has this wiki reconstructed field-by-field, and on which page does each reconstruction live.

A "reconstruction" here means the message schema was decoded from the descriptor bytes to reimplementation grade — field number → name → label → wire type → nested-type structure → oneof grouping — sufficient that a reader could regenerate the .proto and round-trip the wire format. The index distinguishes that from mere cataloguing (the sibling page lists all 760 by name/size/syntax but does not field-dump most of them). Every row carries the descriptor VA so a verifier can re-parse the source bytes with FileDescriptorProto.ParseFromString, and a Confidence label so a reimplementer knows which schemas to trust verbatim.

The reconstructed set clusters into five families, each owned by a deep page: the XLA HLO serialization core (the front-end ↔ TPU-backend contract: HloModuleProto / HloInstructionProto / HloComputationProto and the xla_data.proto type universe they ride on); the compiler config protos (DebugOptions, 290 fields; TpuCompilationEnvironment, 1121 fields); the TPU executable + topology protos (TpuCoreProgramProto, TpuChipPartsProto, the shared-memory locators); the runtime / collective protos (the Megascale graph, tpu_telemetry, the ContinuationQueue config, the toroidal route-cache codec); and the profiler protos (the XPlane hierarchy, the xprof Task record). The grouped detail sections below give per-family field counts, descriptor addresses, and owning pages.

For a reader using this index, the contract is:

  • Message identity — the fully-qualified proto message name as it appears as a descriptor string in the pool (verifiable by rg over the binary's strings).
  • Field count — the number of declared fields in this build's descriptor (not an upstream .proto you might find elsewhere; the count is binary-truth and can disagree with public XLA).
  • Owning page — the deep page that carries the field-by-field reconstruction; this index never duplicates the field roster, it points to it.
  • Descriptor VA — the protodesc_cold address of the enclosing FileDescriptorProto, so the schema can be re-derived from source bytes.
  • ConfidenceCERTAIN (descriptor fully decoded + cross-checked against an owning page and the binary), HIGH (descriptor decoded, owning page authored), MEDIUM (decoded but partial field-dump on the owning page).
Descriptor poolprotodesc_cold @ VA 0xbe8af30, size 0x334180 (3.20 MiB)
Records in pool760 FileDescriptorProto; ~8 021 messages, ~2 031 enums
Reconstructed (this index)~30 messages across 5 families, ~14 owning pages
Full enumerationprotodesc-cold-catalog — all 760 by name/size/syntax
Headline countsDebugOptions 290 fields · TpuCompilationEnvironment 1121 fields
Re-parse pathFileDescriptorProto.ParseFromString(bytes_at_VA)

The Reconstructed-Proto Index

One row per message this wiki has reconstructed field-by-field. The Fields column is this build's descriptor count; the Owning page is where the field roster lives; the Descriptor VA is the enclosing FileDescriptorProto in protodesc_cold. Descriptor VAs for the HLO core (hlo.proto, xla_data.proto, xla.proto) are from the file-record map; the per-message VA is the VA of the file that declares it.

MessageFieldsOwning pageDescriptor VA (file)
xla.HloModuleProto25compiler/hlo-ingestion0xc189a60 (hlo.proto)
xla.HloComputationProto8compiler/hlo-ingestion0xc189a60 (hlo.proto)
xla.HloInstructionProto70compiler/hlo-ingestion0xc189a60 (hlo.proto)
xla.HloScheduleProto1compiler/hlo-ingestion0xc189a60 (hlo.proto)
xla.HloProto2compiler/hlo-ingestion0xc189a60 (hlo.proto)
xla.ShapeProto5compiler/hlo-ingestion0xc1b7e20 (xla_data.proto)
xla.OpSharding14compiler/hlo-ingestion0xc1b7e20 (xla_data.proto)
xla.OpMetadata16compiler/hlo-ingestion0xc1b7e20 (xla_data.proto)
xla.HloModuleConfigProto42compiler/hlo-ingestion0xc021470 (xla.proto)
xla.DebugOptions290config/debugoptions-proto0xc021470 (xla.proto)
xla::jellyfish.TpuCompilationEnvironment1121config/tpu-compilation-environment(tpu_compilation_environment.proto)
tpu.TpuCoreProgramProto11compiler/tpu-program-serialization(tpu_core_program.proto)
tpu.TpuSequencerProgramProto5compiler/tpu-program-serialization(tpu_sequencer_program.proto)
tpu.TpuChipPartsProto9targets/chip-parts-binarypb(tpu_chip_parts.proto)
tpu.TpuSharedMemoryOnChipProto2dma/continuation-queue(tpu_chip_parts_locators.proto)
TpuChipConfigProto.ContinuationQueue7+dma/continuation-queue(tpu_chip_config.proto)
…tpu_telemetry.CurrentCoreStateSummary7profiling/tpu-telemetry-proto(tpu_telemetry.proto)
…tpu_telemetry.AllCoreStateSummaries1profiling/tpu-telemetry-proto(tpu_telemetry.proto)
xla.megascale.runtime.MegaScaleInfoProto8megascale/overview(megascale_info.proto)
xla.megascale.runtime.MegaScaleRuntimeError13megascale/error-aggregator(megascale_status.proto)
xla.megascale.runtime.RapidEyeErrorDigestProto17megascale/error-aggregator0xc169340 (rapideye_logging.proto)
CompressedToroidalRouteCacherouting/toroidal-route-cache(route_cache.proto)
tensorflow.profiler.Task18profiling/task-proto0xbe999a0 (task.proto)
tensorflow.profiler.XPlane6profiling/xplane-xstat-traceme(xplane.proto)
tensorflow.profiler.XEvent4profiling/trace-entry-to-xevent(xplane.proto)

NOTE — the Fields column is the count of top-level declared fields on that one message, not the file's total. HloModuleProto declares 25 fields directly; HloInstructionProto is the wide "union of all op attributes" at 70 fields; the file hlo.proto as a whole carries 22 messages. The file totals are on the protodesc-cold-catalog sibling.

GOTCHA — these counts are binary-truth for this build, and several disagree with the public XLA .proto you would find upstream. DebugOptions here has 290 live fields with a max field number of 501 (211 numbering gaps for retired GPU/CPU flags). Drive a reimplementation off the descriptor count, never off an upstream .proto.


Family 1 — XLA HLO Serialization Core

The stable contract between the JAX/TF/PyTorch front-ends and the TPU JellyfishCompiler. Three FileDescriptorProto records carry the entire schema, with two supporting records closing the type graph. The dependency chain is xla_data.protohlo.protoxla.proto, so the universe is self-contained inside these records.

FileDescriptor VASizeMsgsEnumsRole
…/service/hlo.proto0xc189a6010 242225graph spine: module / computation / instruction
…/xla_data.proto0xc1b7e2011 1435012type universe: shapes, layouts, dtypes, sharding
…/xla.proto0xc02147039 523130config layer: HloModuleConfigProto, DebugOptions
…/service/metrics.protoHloPassMetadata.kv_metrics source
…/hlo_profile_printer_data.protoper-instruction profile catalog

The graph spine is an id-edge DAG, not a tree: HloModuleProto → repeated HloComputationProto → repeated HloInstructionProto, where every data edge is an int64 operand_ids reference into the sibling instruction list and every call edge is an int64 called_computation_ids reference into the module's computation list. root_id per computation names the output. This index representation survives serialization without pointer fixups — the reason the proto is a clean DAG-by-index.

HloInstructionProto is the widest message in the family: 70 fields, a deliberately sparse "union of all op attributes" where the string opcode = 2 selects which subset is meaningful (a dot reads dot_dimension_numbers + precision_config; a custom-call reads custom_call_target + backend_config + custom_call_api_version). The field-number space runs to 99 with gaps for retired attributes.

QUIRK — HloOpcode is not a proto enum. The opcode is string opcode = 2; an exhaustive scan of all 760 descriptors finds no xla.HloOpcode enum anywhere in the pool. Opcodes serialize as lowercase text mnemonics ("add", "dot", "convolution", "fusion", "all-reduce") via the C++ HloOpcodeString()StringToHloOpcode() pair. This is why the proto is forward/backward-compatible across XLA versions that add opcodes — a new opcode is just a new string, no enum-number coordination. A reimplementer building a numeric opcode dispatch off the descriptor pool will find nothing to dispatch on; the mnemonic vocabulary is a C++-side table, not a descriptor.

NOTE — the HLO core reconstruction lives on hlo-ingestion, the page that owns the StableHLO→HLO conversion and the HLO proto schema the front-end hands in. The id-graph edge model, the 70-field attribute union, the sharding dialects (classic OpSharding + Shardy NamedShardingProto), and the two-tier source provenance (OpMetadata + StackFrameIndexProto) are reconstructed there from the descriptor pool. The xla_data.proto type universe (ShapeProto, OpSharding, OpMetadata, the dimension-numbers messages) and the xla.proto config layer (HloModuleConfigProto) ride the same descriptor records and are described alongside it.


Family 2 — Compiler Config Protos

Two master flag messages gate the entire compilation pipeline, and they are the two highest-field-count reconstructions in the index. They are siblings: DebugOptions is the backend-shared (GPU/CPU/TPU) flag set that travels with an HLO module across the PJRT boundary; TpuCompilationEnvironment is the TPU-specific master switch table the Jellyfish backend actually consumes.

MessageFieldsMax field #Type breakdownOwning page
xla.DebugOptions290501290 live, 211 numbering gaps, 17 nested enums, 2 map-entry msgsconfig/debugoptions-proto
…TpuCompilationEnvironment11211218418 bool, 349 msg, 148 int64, 74 enum, 37 string, 34 float, 32 int32, 14 double, 11 uint32, 4 uint64config/tpu-compilation-environment

DebugOptions lives in xla.proto (VA 0xc021470). Every scalar is a proto3 optional wrapped in a synthetic single-member oneof, giving explicit has-bit presence — a DebugOptions on the wire records exactly which knobs the front-end touched. The HLO-dump and HLO-pass-control flags are the part that governs serialization of the rest of the family: xla_dump_hlo_as_proto (113) → HloProto, xla_dump_hlo_snapshots (118) → HloSnapshot, xla_dump_module_metadata (144) → HloModuleMetadataProto, xla_dump_latency_hiding_schedule (182) → ScheduleProto, xla_dump_full_hlo_config (381) → HloModuleConfigProto, xla_dump_hlo_unoptimized_snapshots (405) → HloUnoptimizedSnapshot.

TpuCompilationEnvironment (TCE) is the largest single proto in the catalog (file 137 692 bytes). Its 1121 fields are numbered 1…1218 with gaps for retired flags, and at least 80% are wrapped in AutoProto so each can be in state AUTO / DISABLED / ENABLED or carry a typed value. The field naming is the subsystem map: xla_…, xla_tpu_…, xla_jf_…, megascale_…, xla_sc_…, xla_tpu_sdc_checker_…, xla_tpu_autofdo_…. This single message is the surface area for the entire TPU compiler config; its field roster is split across the field-dictionary pages, the offsets/defaults page, and the AutoOr resolver page (all cross-linked from the owning page). The DebugOptions count is the authoritative 290 live fields — see the debugoptions-proto page.

QUIRK — the TCE schema and the flag surface are the same 1121 names, 1:1: every TCE field is also a registered absl::Flag. A reimplementer can therefore treat the proto field roster and the command-line/env flag set as one namespace, not two. The TCE is an editions-syntax proto (proto2/proto3's successor); DebugOptions is proto3. Both reach RunHloPasses via the CompilationEnvironmentsProto.environments Any-packed list.


Family 3 — TPU Executable + Topology Protos

The output side of compilation and the hardware description it targets. TpuCoreProgramProto is the compiled TPU executable (the JellyfishCompiler's output); TpuChipPartsProto is the per-generation hardware-constant description the cost model and ISA emitter read.

MessageFieldsRoleOwning page
tpu.TpuCoreProgramProto11compiled executable; oneof core = TensorCore / BarnaCore / SparseCorecompiler/tpu-program-serialization
tpu.TpuSequencerProgramProto5per-sequencer program; oneof entry_point = 16 per-chip-family variantscompiler/tpu-program-serialization
tpu.TpuChipPartsProto9per-codename HW constants (HBM/VMEM/SMEM, MXU geometry, clocks, DMA granules)targets/chip-parts-binarypb
tpu.TpuSharedMemoryOnChipProto2shared-memory locator (type + index), used by ContinuationQueue mappingdma/continuation-queue

TpuCoreProgramProto.entry_point confirms the supported chip families are exactly {jellyfish, pufferfish (pxc), viperfish (vxc), ghostlite (gxc/glc), 6acc60406 (gxc/gfc)} with their per-core sub-programs (SCS = SparseCore Sequencer, TAC = Tile Access Core, TEC = Tile Execute Core). The tpu::l serializer dispatches the oneof core on a discriminant *((_DWORD*)this + 22) with (unsigned)(v20 - 5) <= 2, so tags 5/6/7 are mutually exclusive arms — see the owning page's reconstruction.

TpuChipPartsProto is reconstructed on chip-parts-binarypb as the schema for the embedded <codename>_chip_parts.binarypb blobs that the runtime parses into the xla::jellyfish::Target object — a data-driven HAL where one C++ Target class is specialized only by the bytes it loads. Its companion locator file (tpu_chip_parts_locators.proto) declares TpuCoreOnChipProto, TpuSharedMemoryOnChipProto, TpuSyncFlagRangeOnChipProto, and TpuSegmentMemoryOnChipProto, referenced from both the chip-parts schema and the ContinuationQueue config.


Family 4 — Runtime / Collective Protos

The wire formats that drive the multi-chip / multi-host collective fabric, the device-state snapshot, the async continuation queue, and the toroidal route cache.

MessageFieldsRoleOwning page
…MegaScaleInfoProto8per-collective-op metadata: TransferType, reduce op, ragged + FP-compression paramsmegascale/overview
…MegaScaleRuntimeError13runtime error record: ErrorType, hostname, task id, embedded runtime statemegascale/error-aggregator
…RapidEyeErrorDigestProto17fleet-wide root-cause digest; 9-value Cause enum, embeds first error verbatimmegascale/error-aggregator
…tpu_telemetry.CurrentCoreStateSummary7per-core live state: sequencer info, fingerprint, launch id, queued programsprofiling/tpu-telemetry-proto
…tpu_telemetry.AllCoreStateSummaries1map int32 global_core_id → CurrentCoreStateSummaryprofiling/tpu-telemetry-proto
TpuChipConfigProto.ContinuationQueue7+async continuation queue config: producer sflags, per-core ring windowdma/continuation-queue
CompressedToroidalRouteCachebaked route-cache blob, decompressed into the per-link route maprouting/toroidal-route-cache

MegaScaleInfoProto types the metadata for every cross-slice collective: a TransferType ∈ {ALL_TO_ALL, ONE_TO_ONE, REDUCE_SCATTER, ALL_GATHER, ALL_REDUCE, BROADCAST, RAGGED_ALL_TO_ALL}, an OperationType reduce op, optional ragged parameters, and optional FP-compression knobs. It binds to HLO at HloInstructionProto.channel_id + replica_group_list for the collectives the TPU backend lowers across the DCN.

RapidEyeErrorDigestProto is the most thoroughly reconstructed runtime proto: 17 live fields, 13 nested message types, a 9-value Cause enum, recovered from the rapideye_logging.proto record at VA 0xc169340 (4 460 bytes). It embeds the inbound MegaScaleRuntimeError verbatim (field 11, first_recorded_error), so the digest is self-describing.

The ContinuationQueue config is reconstructed on continuation-queue as the static record (TpuChipConfigProto.ContinuationQueue, one per core) that tells both host runtime and device producer where the async descriptor ring lives — the producer sync-flag offsets on the message head, the ring region and consumer flag in the per_core vector. The toroidal route cache (CompressedToroidalRouteCache) is reconstructed at MEDIUM on toroidal-route-cache: the baked (src,dst) → route blob, its decompress codec, and its dedup form, consumed by GetDistanceFromCache with a per-pair cache-miss fall-through to the live torus metric.

GOTCHA — tpu_telemetry is a point-in-time state snapshot, structurally the opposite of the XPlane event stream — the two never share a wire blob. AllCoreStateSummaries is a map of one CurrentCoreStateSummary per core captured at one instant; the profiler's XSpace/XPlane/XLine/XEvent/XStat is an append-only event log. A reimplementer must not unify them; their only meeting point is the PC→HLO metadata both read.


Family 5 — Profiler Protos

The XPlane trace hierarchy and the xprof per-worker metadata record. These are XPlane-compatible: a 7-message hierarchy with interned metadata dictionaries.

MessageFieldsRoleOwning page
tensorflow.profiler.Task18per-worker device metadata: provenance, host id, clock rates, profile windowprofiling/task-proto
tensorflow.profiler.XPlane6trace plane: lines + interned event/stat metadata dictionariesprofiling/xplane-xstat-traceme
tensorflow.profiler.XEvent4one timeline event: metadata id, duration, stats, offset/occurrence oneofprofiling/trace-entry-to-xevent

Task is the most precisely-anchored profiler reconstruction: a flat proto3 message of 18 singular scalar fields, carved byte-exact from the descriptor record at VA 0xbe999a0 (path …/tensorboard_plugin_profile/protobuf/task.proto). At runtime it is the value of a Map<uint32 task_id, Task> — one record per worker ordinal — read by xprof::XlaJfProfileCheapOps(Task const&) @ 0xf2ca280. The XPlane hierarchy (XSpaceXPlaneXLineXEventXStat, with interned XEventMetadata / XStatMetadata) is reconstructed across the xplane/traceme and trace-entry pages; the per-chip trace_entries.proto families that feed it are catalogued separately on the tracepoints-master-registry.


Verification Provenance

Every message name and the two headline field counts were re-checked against the binary before this index shipped:

  • Message-name presence — each fully-qualified name (HloModuleProto, HloInstructionProto, HloComputationProto, TpuCompilationEnvironment, DebugOptions, TpuChipPartsProto, TpuCoreProgramProto, TpuSequencerProgramProto, TpuSharedMemoryOnChipProto, MegaScaleInfoProto, MegaScaleRuntimeError, ContinuationQueue, RouteCache, tpu_telemetry, CurrentCoreStateSummary, AllCoreStateSummaries) was confirmed to occur as a descriptor string in the binary's extracted strings.
  • File-path presence — the source paths (service/hlo.proto, xla_data.proto, tpu_chip_parts.proto, tpu_core_program.proto, tpu_sequencer_program.proto, megascale_info.proto, tpu_telemetry.proto) were confirmed present, anchoring each FileDescriptorProto record.
  • Headline countsDebugOptions = 290 and TpuCompilationEnvironment = 1121 are corroborated against both the descriptor decode and the two owning pages, which independently state the same figures.
  • Owning-page existence — every page linked in the index table was confirmed to exist under src/. The HLO-core family rows point at hlo-ingestion, the page that owns the front-end HLO proto schema; the config rows point at the config/ field-dictionary pages; the rest point at the deep page that field-dumps each message.

What this index does not do: it does not reproduce field rosters (those live on the owning pages), and it does not claim a per-message descriptor VA for messages whose enclosing file VA was not separately recovered — those rows carry the file name in the VA column rather than a fabricated address. A blank VA is an honest gap, never a guessed one.


Cross-References