Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Internal Pass-Name Catalog

All addresses and offsets on this page apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (libtpu_lts_20260413_b_RC00, build-id md5 89edbbe81c5b328a958fe628a9f2207d, 781,691,048 bytes, ELF x86-64 DYN, not stripped, clang/LLVM trunk). The image carries its full C++ symbol table, so the pass names below are demangled name()/getArgument() symbols and verbatim .rodata strings, not inferred labels. Other wheels will differ.

Abstract

This page is the name index of every internal compiler pass libtpu registers — the verbatim identifier each pass returns from xla::HloPassInterface::name(), the -flag argument an mlir::Pass registers with the MLIR PassRegistry, and the capitalized stage string an xla::HloPassPipeline carries in its constructor. It is a reference catalog, not an algorithm page: its value is the grouped, deduplicated table of names cross-linked to the Part-V compiler pages that document each pass body. Where hlo-pass-registry.md explains how passes are added and ordered, this page answers "what is this string I just saw in a VLOG, an xla_dump directory, or a --pass-pipeline= spec, and which page documents it?"

Because the image is unstripped, the names are recovered with near-1:1 fidelity by two complementary symbol enumerations rather than by disassembly. The HLO surface comes from the _ZNK3xla*4nameEv symbol set — every XLA HLO pass overrides HloPassInterface::name(), so demangling those symbols enumerates the registered HLO passes directly; 322 of them are real passes across three provenance tiers. The MLIR surface comes from the create*Pass / Create*Pass factory symbols plus the getArgument() pass-flag strings; it splits cleanly by dialect (StableHLO/CHLO/VHLO, Shardy, XTile, tpu, mosaic_sc, sparse_core, llo) and is capped at the bottom by the 242 tpu-* LLVM-backend MachineFunction pass flags. A handful of capitalized HloPassPipeline stage strings and a phaseN_* separate-compilation metric taxonomy round out the inventory.

The two name() spellings a reader will meet differ by convention. An HLO pass's runtime name is the kebab-case of its class (ShardingPropagationsharding-propagation); ~25 of these are byte-confirmed as literal strings, the rest are built at runtime and are inferred from the class name (marked INFER below). An MLIR pass's name is the literal it hands to mlir::PassRegistration and is byte-confirmed wherever its getArgument() symbol carries the string. The page groups names by pipeline phase — HLO-opt, sharding/SPMD, layout, fusion, collective, scheduling, MSA, host-offload, SparseCore, then the MLIR lowering cascade and the LLVM-backend codegen tail — and gives, per group, the documenting wiki page. Status/error templates are on error-templates.md; advisory hint strings on hint-strings.md; this page owns only the pass-name index and the name→page cross-link.

The catalog contract:

  • The four name families and how each is recovered (name() RTTI · create*Pass factory · getArgument() flag · pipeline stage string).
  • Per phase: the verbatim registered name, its provenance tier (T TPU-private / O open-source XLA), and a confidence tag (byte-confirmed string vs convention-inferred).
  • The name → documenting-page cross-link for every group that has an owning Part-V page.
  • The honest gap: ~300 HLO name() strings are convention-inferred, the per-pass add order is owned by the registry pages, and the flag→pass gating is not on this page.
Phase groupFamilyCountOwning page
HLO-opt (expanders / canonicalizers / simplifiers)HLO name()~95hlo-pre-passes.md, algebraic-simplifier.md
Sharding / SPMD / partitionHLO name()~16sharding-propagation.md, auto-sharding-spmd.md
Layout / tiling / mem-spaceHLO name()~12layout-assignment.md
FusionHLO name()~13fusion-patterns.md, fusion-cost-model.md
Collective / asyncHLO name()~40— (collective rewriting; not individually paged)
SchedulingHLO name()~14hlo-pass-registry.md
MSA (memory-space assignment)symbol~8msa-overview.md
Host-offloadHLO name()~17
While-loop / pipelining / miscHLO name()~25loop-tiling-unrolling.md, optimization-barrier.md
SparseCore-offload (HLO-level)HLO name()~10
SparseCore tile-task (MLIR)Create*Pass36lower-to-sparsecore-llvm.md
StableHLO / CHLO / VHLO legalizerscreate*Pass/flag~40mhlo-xtile-tpu-lowering.md
Shardy (sdy) round-tripcreate*Pass/flag~35sharding-propagation.md
XTilecreate*Pass/flag~13mhlo-xtile-tpu-lowering.md
tpu dialect (TensorCore vector IR)create*Pass/flag14tpu-dialect-and-ops.md, tpu-to-llo-ods.md
mosaic_sc dialectcreate*Pass3mosaic-overview.md
llo / LLO bridgecreate*Pass/flag~5lower-to-mlo-dma-bridge.md, llvmtpu-intrinsic-catalog.md
TPU LLVM-backend MachineFunction flagscl::opt flag242— (codegen tail; flags only)
Pipeline-stage strings (monolithic)HloPassPipeline ctor~15compile-phases.md
Separate-compilation phase metrics.rodata token11compile-phases.md

NOTE — the counts are name-complete for the symbol families (HLO name(), create*Pass, getArgument()) and string-confirmed for the pipeline stages. The add order of any pass inside any pipeline is not on this page — it is owned by hlo-pass-registry.md and the per-phase registry. This is the name index; the order index is the registry.


How the Names Are Recovered

Four name families, four extraction surfaces. The provenance matters because it sets the confidence: a demangled symbol is CERTAIN for the class, while the runtime string it returns is only CERTAIN when the literal is also present in .rodata.

HLO pass name      ── _ZNK3xla*4nameEv symbol  ── kebab(ClassName)         ── string CERTAIN iff literal in .rodata, else INFER
MLIR pass flag     ── create*Pass / Create*Pass factory symbol             ── string CERTAIN where getArgument() carries it
MLIR pass flag     ── PassIN…getArgumentEv symbol body                     ── byte-confirmed literal
pipeline stage     ── HloPassPipeline ctor std::string at a .rodata offset ── byte-confirmed literal
phaseN_* metric    ── .rodata token referenced from RegisterAllPhases      ── byte-confirmed literal

The confidence convention used in every table:

TagMeaning
CERTAINThe registered name string is a literal in the binary (a name() .rodata constant, a getArgument() literal, or a stage string) — spot-confirmed below.
HIGHThe pass class symbol is present and demangled; the runtime name follows XLA's CamelCasekebab convention but the literal was not located (the name() body builds it at runtime).
INFERSame as HIGH, used for the bulk OSS rows where neither the literal nor a per-pass disassembly was done; trustworthy by convention, not byte-anchored.

Tier column: T = TPU-private (xla::jellyfish::*, xla::tpu::sparse_core::*, xla::Tpu*, xla::megascale::*) — the Deepsea/Jellyfish backend's private surface; O = open-source XLA (xla::*), shared with the CPU/GPU backends.

QUIRK — an HLO pass's registered name is the kebab-case of the class, not of the file or the factory. xla::jellyfish::TpuInt2AutoUpDownCaster registers tpu-int2-auto-up-down-caster — the Tpu prefix survives, the namespace does not. A reimplementer who derives the name from the source filename (tpu_int2_caster.cc) or the namespace path will mismatch every VLOG/xla_dump filter.


HLO Pass Names

The HLO surface is 322 passes across three provenance tiers: 137 xla::jellyfish::* (TPU-private), 24 xla::tpu::sparse_core::* (SparseCore-HLO), 161 open-source xla::*, plus ~6 megascale/sdy/spmd/conditional_opt. Grouped below by the pipeline phase that adds them. Names are the kebab-case the class returns from name(); the string is CERTAIN only where a literal was located, else INFER.

HLO-Opt — expanders / canonicalizers / simplifiers (Phase 1)

The input-scrub phase: lower high-level ops to primitive arithmetic, canonicalize shapes and conditionals, fold constants, and run the algebraic simplifier. Documented on hlo-pre-passes.md; the simplifier itself on algebraic-simplifier.md.

Registered nameClassTier
tpu-int2-auto-up-down-casterjellyfish::TpuInt2AutoUpDownCasterT
algebraic-simplifier / TPU supersetjellyfish::TpuAlgebraicSimplifier, AlgebraicSimplifierT/O
batch-norm-expanderBatchNormExpanderO
cholesky-expanderTpuCholeskyExpander, CholeskyExpanderT/O
qr-expander / eigh-expanderTpuQrExpander/TpuEighExpander (+ OSS)T/O
triangular-solve-expanderTpuTriangularSolveExpander, TriangularSolveExpanderT/O
fft-expander, lu-decomposition-expanderFftExpander, LuDecompositionExpanderO
dynamic-padder, dynamic-index-splitter, dynamic-dimension-simplifierDynamicPadder etc.O
scatter-expander / gather-expanderTpuScatterExpander/TpuGatherExpander (+ OSS)T/O
comparison-expander, logistic-expander, optimization-barrier-expanderComparisonExpander etc.O
dot-decomposer, dot-canonicalizer, dot-algorithm-normalizerDotDecomposer, jellyfish::Dot*O/T
x-precision-rewriterjellyfish::XPrecisionRewriter (x6/x9/x128 dot)T
tpu-hlo-precision-tracer, float-type-correctionjellyfish::TpuHloPrecisionTracer, FloatTypeCorrectionT
tpu-broadcast-rewriter, tpu-degenerate-dimension-rewriterjellyfish::TpuBroadcastRewriter etc.T
tpu-reduce-rewriter, tpu-reduce-window-{rewriter,normalizer}jellyfish::TpuReduce*T
tpu-convolution-rewriter, convolution-foldingjellyfish::TpuConvolutionRewriter, ConvolutionFoldingT
tpu-call-inliner (-must-fuse/-inner-must-fuse/-non-must-fuse)jellyfish::TpuCallInlinerT
tpu-hlo-support-checkerTpuHloSupportChecker (canonical acceptance test)T
hlo-dce / hlo-module-dce / hlo-cseHloDCE/HloModuleDCE/HloCSE (re-run between stages)O
flatten-call-graph, call-inliner, map-inlinerFlattenCallGraph etc.O
float-normalization, bfloat16-propagationFloatNormalization, BFloat16PropagationO
gmm, ragged-dot-expander, flash-attentionjellyfish::Gmm/RaggedDotExpander/FlashAttentionT

NOTE — the jellyfish::Tpu* rewriters are supersets of the OSS pass of the same root name, not replacements; TpuAlgebraicSimplifier re-runs the OSS AlgebraicSimplifier rule set plus TPU-only rules. Both classes have a distinct name() symbol, so a VLOG filter sees two strings.

Sharding / SPMD / Partition (Phase 2-3)

Sharding propagation and SPMD partitioning. sharding-propagation is byte-confirmed (literal appears three times — once per registration in the propagation, dump, and verifier paths). Documented on sharding-propagation.md and auto-sharding-spmd.md.

Registered nameClassTier
sharding-propagationShardingPropagation (×3 registration)O
auto-shardingAutoSharding, TpuAutoShardingO/T
tpu-spmd-partitioningjellyfish::TpuSpmdPartitionerT
tpu-partition-assignmentjellyfish::TpuPartitionAssignmentT
tpu-spmd-concat-rewriterjellyfish::TpuSpmdConcatRewriterT
tpu-sparse-core-hierarchical-spmd-partitioningsparse_core::SparseCoreHierarchicalSpmdPartitionerT
whole-graph-manual-passWholeGraphManualPassO
spmd-prepare, spmd-partitionerspmd::SpmdPrepare/SpmdPartitionerO
shardy-xlasdy::ShardyXLA (Shardy frontend import)O
hlo-domain-isolator / hlo-domain-removersharding-domain bracketsO
canonicalize-all-gather-for-cseCanonicalizeAllGatherForCSEO

Layout / Tiling / Memory-Space (Phase 4-5)

Layout assignment, TPU tiling propagation, and memory-space propagation. Documented on layout-assignment.md.

Registered nameClassTier
layout-assignmentLayoutAssignmentO
tpu-tiling-propagation, tpu-tiling-rewriterjellyfish::TpuTilingAssignment/TpuTilingRewriterT
tpu-peephole-optimize-layoutsjellyfish::TpuPeepholeOptimizeLayoutsT
window-config-assignment (-via-service)jellyfish::WindowConfigAssignment[ViaService]T
memory-space-propagationMemorySpacePropagationO
copy-insertion, tpu-copy-normalizerCopyInsertion, jellyfish::TpuCopyNormalizerO/T
convert-memory-placement-to-internal-annotationsConvertMemoryPlacementToInternalAnnotationsO
optimize-input-output-buffer-aliasOptimizeInputOutputBufferAliasO

Fusion (Phases "Pre main fusion" / "Main fusion" / "Post main fusion")

The fusion bucket. tpu-instruction-fusion (the main pass, using TpuPriorityFusionQueue) builds its name at runtime — the class symbol xla::jellyfish::TpuInstructionFusion::name() is present at 0x12e037a0 but carries no .rodata literal, so the name is HIGH-confidence by convention. Documented on fusion-patterns.md and fusion-cost-model.md.

Registered nameClassTier
tpu-instruction-fusionjellyfish::TpuInstructionFusion (name() @ 0x12e037a0)T
tpu-multi-output-fusionjellyfish::TpuMultiOutputFusionT
instruction-fusion / multi-output-fusion / megacore-fusionOSS InstructionFusion etc.O
tpu-loop-fusion-enhancerjellyfish::TpuLoopFusionEnhancerT
mosaic-fusionjellyfish::MosaicFusion (Pallas/Mosaic, HloPassFix)T
tpu-subgraph-simplifierjellyfish::PostFusionTpuSubgraphSimplifierT
user-guided-fusion-id-assignerjellyfish::UserGuidedFusionIdAssignerT
wrap-fusion-output-for-debugjellyfish::WrapFusionOutputForDebugT
fusion-tuple-deduplicator, masked-fusion-iteration-skipperjellyfish::FusionTupleDeduplicator etc.T
tpu-trivial-fusion-remover / tpu-trivial-instruction-unfuserjellyfish::TpuTrivial*T

Collective / Async Rewriting (Phase 4 onward)

All-reduce / all-gather / reduce-scatter / all-to-all combiners, legalizers, and async-collective creators, including the MegaScale cross-slice family. Not individually paged; this is the name index.

Registered nameClassTier
tpu-all-reduce-legalizer, tpu-all-reduce-scatter-fusionjellyfish::TpuAllReduceLegalizer etc.T
tpu-all-gather-combiner, tpu-all-gather-normalizerjellyfish::TpuAllGather*T
all-reduce-combiner / all-reduce-reassociate / all-reduce-simplifierOSS AllReduce*O
all-gather-combiner / all-gather-decomposer / all-gather-simplifierOSS AllGather*O
tpu-async-collective-creatorjellyfish::TpuAsyncCollectiveCreatorT
async-collective-merger / async-collective-fusion / async-op-thread-renamerjellyfish::Async*T
collective-pipeliner, collective-permute-combiner/-cse/-motionOSS Collective*O
reduce-scatter-decomposer / reduce-scatter-legalizerOSS + jellyfish::ReduceScatterLegalizerO/T
ragged-all-to-all-expander, tpu-all-to-all-compactionjellyfish::Ragged*/TpuAllToAllCompactionT
tpu-quantized-all-reduce-backend-config-setterjellyfish::TpuQuantizedAllReduceBackendConfigSetterT
barrier-assignment, async-barrier-creatorjellyfish::BarrierAssignment etc.T
MegaScale: tpu-all-reduce-merger, cross-slice-legalizer/-rewritesmegascale::compiler::*T

Scheduling (Phase 7: "final_scheduler" / "async_scheduling")

The latency-hiding scheduler family. Three scheduler name() strings are byte-confirmed dashed literals — latency-hiding-layer-scheduler, sparsecore-latency-hiding-scheduler, and legalize-scheduling-annotations (one hit each). The OSS latency-hiding-scheduler name is built at runtime: only the substring latency-hiding-scheduler inside the two TPU-prefixed forms is present in .rodata, so that row is HIGH, not CERTAIN. The scheduler placement is owned by hlo-pass-registry.md.

Registered nameClassTier
latency-hiding-schedulerLatencyHidingSchedulerO
latency-hiding-layer-schedulerjellyfish::LatencyHidingLayerScheduler (name() @ 0x10b85160)T
sparsecore-latency-hiding-schedulersparse_core::SparseCoreLatencyHidingScheduler (name() @ 0x13077ea0)T
legalize-scheduling-annotationsLegalizeSchedulingAnnotations (name() @ 0x12e9a060)O
check-no-data-dependency-in-scheduling-annotationsCheckNoDataDependencyInSchedulingAnnotationsO
annotate-traceable-loopsjellyfish::TpuAnnotateTraceableLoopsT
while-loop-all-reduce-code-motionWhileLoopAllReduceCodeMotionO
hlo-memory-scheduler, hlo-memory-scheduler-with-brkga-fallback, hlo-deschedulerOSS schedulersO
async-op-scheduler, barna-core-feed-scheduler, slack-in-trough, constant-deferringjellyfish::* / OSST/O

QUIRK — final_scheduler and async_scheduling are pipeline stage names, not pass names — they are the HloPassPipeline constructor strings for the two scheduler pipelines, and they live in the pipeline-stage section below. The passes inside those pipelines are the *-scheduler rows above. Do not confuse the pipeline label with the pass it contains.

MSA — Memory-Space Assignment (after scheduling)

MSA is one engine, not a HloPass with a name(); it is driven by xla::jellyfish::RunMemorySpaceAssignment. Documented on msa-overview.md.

Symbol / nameTier
memory_space_assignment::MemorySpaceAssignment (engine)O
MemorySpaceAssignmentBestFitRepacker (telamalloc-class repacker)O
jellyfish::RunMemorySpaceAssignment (driver fn)T
jellyfish::ScopedMemoryAssignment (scoped-vmem anchor)T
sparse_core::SparseCoreMemorySpaceAssignmentT
tpu-memory-precolor (jellyfish::TpuMemoryPrecolor)T
jellyfish::PlumbVmemInfo, PinPrecoloring, PinUserAnnotatedInstructionsT

Host-Offload (Phase 5-6)

Host-offload analysis and asyncification. Not individually paged.

Registered nameClassTier
host-offloader / host-offload-legalize / host-offloading-prepareOSS HostOffload*O
host-offloading-layout-analysis, annotate-host-compute-offloadOSSO
host-compute-asyncifier, host-memory-transfer-asyncifierOSSO
add-random-host-offloadingjellyfish::AddRandomHostOffloading (debug)T
host-offloading-lookahead, host-offload-normalizationjellyfish::*T
host-collective-rewriter, host-transfer-channel-id-assignerjellyfish::*T
infeed-decomposer / infeed-flattener / outfeed-decomposerjellyfish::*T
infeed-token-propagationInfeedTokenPropagationO

While-Loop / Pipelining / Misc

While-loop simplification, unrolling, rematerialization, and assorted TPU rewrites. Loop transforms documented on loop-tiling-unrolling.md; opt-barrier handling on optimization-barrier.md.

Registered nameClassTier
while-loop-simplifier / while-loop-constant-sinking / while-loop-invariant-code-motionOSS WhileLoop*O
while-loop-unroller / while-loop-pipeline-unroller / while-loop-trip-count-annotatorOSSO
while-loop-concat-code-motion / -expensive-invariant-code-motion / -fusible-sinkingOSSO
hlo-rematerialization / hlo-rematerialization-with-statsOSSO
add-original-valueAddOriginalValue (provenance metadata)O
tpu-initial-weight-broadcasterjellyfish::TpuInitialWeightBroadcasterT
tpu-indexed-instruction-analyzerjellyfish::TpuIndexedInstructionAnalyzerT
convert-frontend-attributes-to-backend-configjellyfish::ConvertFrontendAttributesToBackendConfig (last in HLO domain)T
cross-slice-code-motion, pred-packing, operand-span-exposurejellyfish::*T
always-crashjellyfish::AlwaysCrash (testing, gated by --xla_tpu_always_crash)T
mock-while-loop-boundingjellyfish::MockWhileLoopBounding (testing)T
conditional-code-motionconditional_opt::ConditionalCodeMotionO

SparseCore-Offload (HLO-level)

The HLO-level SparseCore offloaders that run before the SparseCore MLIR lowering. The 24 xla::tpu::sparse_core::* HLO passes (embedding / sparse-dense matmul decomposers) are the SparseCore-HLO tier; the MLIR tile-task passes are in the SparseCore MLIR section. Documented on lower-to-sparsecore-llvm.md.

Registered nameClassTier
sparse-core-compute-offloaderjellyfish::SparseCoreComputeOffloaderT
sparse-core-collective-offload (+ -remover/-aggregator)jellyfish::SparseCoreCollective*T
offload-gather-to-sparse-core / offload-scatter-to-sparse-core / offload-sort-to-sparse-corejellyfish::Offload*ToSparseCoreT
sparse-core-queue-assignment / sparse-core-queue-legalizerjellyfish::SparseCoreQueue*T
sparse-core-mismatch-detector, aggregate-data-dependent-sc-opsjellyfish::*T
24 sparse_core::* HLO passes (embedding/SDM decomposers)tpu::sparse_core::*T

MLIR Pass Names

The MLIR surface is recovered from create*Pass/Create*Pass factory symbols and getArgument() pass-flag literals. It splits by dialect and forms a 4-stage lowering cascade. Names below are the getArgument() literal (byte-confirmed where a getArgument() symbol carries it).

StableHLO ── createStablehloLowerToXtilePass ──> XTile
XTile     ── vector-layout / tiling passes     ──> tpu (TensorCore vector IR)
tpu       ── createLowerToLLOPass              ──> LLO ── tpu-bundle-packer / tpu-encode-mcinst-bundles ──> ISA bundles
tpu       ── (SparseCore fork) sc 36 passes    ──> Mlo ── createLowerToSparseCoreLlvmPass ──> LLVM-TPU IR

StableHLO / CHLO / VHLO Legalizers

Input wire-format legalization. Documented on mhlo-xtile-tpu-lowering.md.

Pass-arg flagFactory
chlo-legalize-to-stablehlo / -hlo / -high-level-mhlocreateChloLegalize*Pass
stablehlo-canonicalize-dynamismcreateStablehloCanonicalizeDynamismPass
stablehlo-complex-math-expandercreateStablehloComplexMathExpanderPass
stablehlo-convert-to-signlesscreateStablehloConvertToSignlessPass
stablehlo-legalize-quant-to-mathcreateStablehloLegalizeQuantToMathPass
stablehlo-legalize-to-{hlo,linalg,vhlo}createStablehloLegalizeTo*Pass
stablehlo-refine-{arguments,shapes}createStablehloRefine*Pass
vhlo-to-version, vhlo-legalize-to-stablehlocreateVhlo*Pass
legalize-tf, legalize-tf-{collective,communication}createLegalizeTFPass
stablehlo-ext-* (8 variants: canonicalize-dynamism, refine-shapes, sink-constants-to-control-flow, …)extension passes

Shardy (sdy) Round-Trip Sharding

The Shardy import/export round-trip. Documented on sharding-propagation.md.

Pass-arg flagFactory
sdy-import-shardings / -import-constants / -import-func-callsxla::sdy::createImport*Pass
sdy-export-ops / -export-named-computationscreateExportOpsPass etc.
sdy-round-trip-{dedup-meshes,export-ops}createSdyRoundTrip*Pass
sdy-apply-sharding-constraints, -close-shardings, -collectivesdy passes
sdy-insert-explicit-reshards, -reshard, -reshard-to-collectivessdy passes
sdy-flatten-call-graph, -inline-meshes, -lift-inlined-meshessdy passes
~35 sdy-* flags total (constant-or-scalar-{merger,splitter}, remove-*, manual-axes-cleanup, …)sdy passes

XTile

The StableHLO → XTile lowering layer, plus the XTile-CPU bufferization/vectorization passes. Documented on mhlo-xtile-tpu-lowering.md.

Pass-arg flagFactory
stablehlo-lower-to-xtilexla::xtile::createStablehloLowerToXtilePass
stablehlo-lower-to-arithcreateStablehloLowerToArithPass
xtile-verify-legal-opscreateVerifyLegalXTileOpsPass
xtile-cpu-* (bufferization, fuse-elementwise, linalg-elementwise-to-vector, lower-xtile-entry, shlo-to-vector, vector-to-scalar, unpack-sub-byte-vector-write, …)XTile-CPU passes

tpu Dialect — TensorCore Vector IR (14 create*Pass)

The TensorCore vector-layout and lowering passes. lower-to-llo and lower-to-mlo are byte-confirmed (getArgument() literals located). Documented on tpu-dialect-and-ops.md and tpu-to-llo-ods.md.

Pass-arg flagFactory
lower-to-llomlir::tpu::createLowerToLLOPass (getArgument() @ 0x112041c0)
lower-to-mlomlir::tpu::createLowerToMloPass
tpu-tiling-propagation / tiling-propagationcreateTilingPropagationPass (getArgument() @ 0x132e0d80)
tpu-apply-vector-layoutcreateApplyVectorLayoutPass
tpu-infer-vector-layout / infer-vector-layoutcreateInferVectorLayoutPass
tpu-infer-memref-layoutcreateInferMemRefLayoutPass
tpu-relayout-insertion / relayout-insertioncreateRelayoutInsertionPass
tpu-canonicalize-{memory-space,mosaic,operations}createCanonicalize*Pass
(no flag) createConvertIntegerMemrefsPass, createDebugAssertInsertionPass, createLogicalToPhysicalDeviceIdPass, createPreCanonicalizationOptimizationPasstpu passes

mosaic_sc Dialect — Mosaic-for-SparseCore (3 create*Pass)

Documented on mosaic-overview.md.

Factory
mlir::mosaic_sc::createApplyVectorLayoutPass
mlir::mosaic_sc::createInferVectorLayoutPass
mlir::mosaic_sc::createInsertRelayoutPass

SparseCore Tile-Task Passes (36 Create*Pass)

The xla::tpu::sparse_core::Create*Pass family — the SparseCore tile-task lowering chain (bounds-checks, tile outlining, allocation, prefetch scheduling, LLVM lowering). Documented on lower-to-sparsecore-llvm.md. Named by function rather than dumped flat:

Sub-groupRepresentative Create*Pass factories
Bounds / safety checksAddStreamBoundChecks, IndirectStreamBoundsCheck, LoadStoreBoundsCheck, InsertRunIdCheck, LlvmIntToPtrSafety, SyncFlagVerifier
Tile outlining / structureTileTaskOutlining, SinkOpsIntoTileTask, ExpandTiledMemRefs, TileTaskArgumentsSpill, InsertClearTileIbuf
AllocationAllocationAssignment, GlobalAllocationAssignment, TileOverlayAllocation, LegalizeHbmGlobals, PrepareHbmSpill
Scheduling / prefetchScheduleTileTaskPrefetch, LowerPrefetchTileTask, DelaysInsertion, TileBarrierInsertion
LoweringLowerToSparseCoreLlvm, LowerDynamicDimensionSize, LowerSequencerFunctions, PackedOperandsLowering, AlgebraicDecomposer
MiscHardwareTracing, SflagWaitInstrumentation, EmitTileOverlayHandler, ScsOverlaysTrampoline, MemrefAliasScopeAnnotation, PackUnpackCleaner, VectorUnroll, LoopUnroll, SparseCoreCanonicalizer, TileHostTransfer, LegalizeMegaChipModule
Pass-arg flagslegalize-hbm-globals, legalize-megachip-module, lower-to-mlo, lower-to-sc, lower-to-sparse-core-llvm

llo / LLO Bridge

The LLO-extension elimination pass that closes the MLIR cascade before bundle packing. Documented on lower-to-mlo-dma-bridge.md and llvmtpu-intrinsic-catalog.md.

Factory / class
mlir::llo::createEliminateLLOExtensionsPass / EliminateLLOExtensionsPass

TPU LLVM-Backend MachineFunction Pass Flags (242)

The bottom of the codegen stack, below LLO: the TPU LLVM target's machine-level passes, exposed as 242 tpu-* cl::opt flag strings. Only the flag strings are recovered — the llvm::Pass subclass symbols behind them are not on this page (codegen tail). Grouped by function rather than dumped flat:

FunctionRepresentative flags
Schedulingtpu-machine-scheduler[-fast], tpu-misched[-dcpl], tpu-fifo-scheduler, tpu-use-{fifo-sched,swing-modulo-sched}, tpu-pipeliner[-analysis/-info/-strategy/-verbose], tpu-pipeline-dovetailing, tpu-slack-lazy-backtrack
Super-passestpu-ssa-super-pass, tpu-enable-pipeliner-super-pass, tpu-enable-overlayer-passes
Bundle packing / encodingtpu-bundle-packer[-maintain-order/-maintain-topologial-order], tpu-encode-mcinst-bundles, tpu-post-bundle-lower, tpu-print-{mcinst,sc-mcinst}-encodings
Register alloc / spilltpu-register-prepare, tpu-round-robin-registers, tpu-pre-spill[-per-fill], tpu-spill-to-dreg, tpu-opt-spill-to-dreg, tpu-crosscall-spill-packer, tpu-fixed-{vregs,maskregs}, tpu-liveness-*, tpu-remat-pass
ISel / loweringtpu-isel, tpu-early-post-isel, tpu-opt-prepare, tpu-codegen-prep[are], tpu-frame-lowering, tpu-gep-lowering, tpu-rotate-lowering, tpu-memop-intrinsics, tpu-vliw-prepare, tpu-mem-alloc
Verifierstpu-verifier, tpu-machine-memoperands-verifier, tpu-validate-{inputs,session-inputs,function-sizes}, tpu-fatal-{function-size-assertions,mem-alloc-error,verifier-error}
V1-island / TF-bridgetpu-v1-island-{coarsening,inlining,outlining}, tpu-cluster-formation, tpu-device-propagation, tpu-merge-variables-with-execute, tpu-colocate-{composite-resource-ops,splits}

GOTCHA — several tpu-* strings are dual-surface: tpu-call-inliner, tpu-instruction-fusion, tpu-int2-auto-up-down-caster, tpu-tiling-propagation, tpu-partition-assignment appear both as HLO name() strings (above) and as tpu-* flag args here. They are the same logical pass exposed at two registry levels. Counting the tpu-* flags and the HLO name() strings as disjoint sets double-counts these; treat the HLO-level pass as canonical and the flag as its CLI alias.


Pipeline-Stage Names

Two phase taxonomies coexist. The monolithic compile labels its HloPassPipeline stages with capitalized strings (the std::string passed to the pipeline constructor); the separate compile (tpu_phase_functions.cc) labels its phases with phaseN_* metric tokens. They are not interchangeable. Both documented on compile-phases.md.

Monolithic HloPassPipeline Stage Strings

Stage stringFile-offRole
pre-optimization0x85ea920PreOptimizationPipeline (Phase 1)
post-optimization0x85ea85ePostOptimizationPipeline
Pre layout assignment0x84ef7bfpre-layout HLO
Layout assignment / XLA::JF Layout Assignment0x84ef7d5 / 0x84efbdfLayoutAssignment
Post layout assignment0x84ef6dbpost-layout HLO
HLO passes through layout assignment0x84ef757HloOptimizeThroughLayoutAssignment
HLO passes after layout assignment0x84ef6f2HloOptimizeAfterLayoutAssignment
Pre main fusion0x10963b16PostLayoutPreFusionHloOptimize
Main fusion / XLA::TPU main fusion pass— / 0x8522e3eTpuInstructionFusion bucket
Post main fusion / JF after main fusion0x85f7605 / 0x85f7616PostMainFusionHloOptimize
final_scheduler(SSO strcpy)base memory-schedule pipeline (Phase 7)
async_scheduling0x865dcabLHS + async-overlap pipeline (Phase 7)
final_scheduled0x8713d6apost-scheduling marker
simplification after layout assignment0x84ef715post-layout simplify

Separate-Compilation Phase Metric Tokens

The multi-phase compile names its phases with phaseN_* tokens (the metric keys in the separate-compilation timing table). phase1_hlo_opts and phase3_linking are byte-confirmed as literals referenced from xla::TpuCompiler::RegisterAllPhases @ 0xf849ec0. "TLP" = TPU-Level-Program.

TokenRole
phase0_stablehlo_to_hloStableHLO → HLO import
phase1_hlo_optsHLO optimization (the HLO passes above)
phase2a_tlp_loweringHLO → MLIR-tpu (TPU-Level-Program) lowering
phase2b_deduped_loweringdedup'd per-chunk lowering
phase3_linking (+ _test_only)HLO-module linking
phase2_compilation_{status,time}phase-2 status/timing metric fields
phase0_{transfer_dim,shard_size_bytes}, phase1_transfer_dimper-phase transfer/shard metric fields

NOTE — the phaseN_* tokens are metric keys, not pass names — they appear in timing/transfer tables, never as a pass a reader can run. A VLOG or xla_dump filter will match the stage strings above and the pass names earlier; it will not match phaseN_* unless reading the separate-compilation timing path. The RegisterAllPhases symbol that anchors phase1_hlo_opts/phase3_linking is the registration entry, not a per-phase pass body.


Not Confirmed / Not on This Page

Honest scope limits, so a reimplementer knows what to re-derive:

  • ~300 HLO name() strings are convention-inferred (INFER), not byte-confirmed. The name() body builds the string at runtime (a static const or a string-builder); only ~25 dashed names are present as literals in .rodata (the CERTAIN rows). The class symbol is certain; the exact returned string follows the CamelCasekebab convention but was not located per pass.
  • The add order of each pass is not here. This is the name index; the order is owned by hlo-pass-registry.md (Phase-1..6 sequence), the scheduler placement, and the fusion-bucket order.
  • The flag → pass gating is not here. Which xla_tpu_* flag enables which AddPass<T> is a .text branch in RunHloPasses (0x1093a420), not a string this page recovers.
  • The 242 tpu-* flags' llvm::Pass class symbols are not recovered — only the cl::opt flag strings.
  • The per-TpuVersion (jxc/pxc/vxc/glc/gfc) pass-set delta is not isolated — the HAL modules are named per version, but which passes are skipped for v3 vs v6e is a Target::tpu_version() branch not traced here.

Cross-References