Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Dynamic-Shape Support

Symbol names, VAs, and the build-id below apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d). Other versions differ; treat every VA as version-pinned.

Abstract

TPU does not allocate buffers whose byte-size depends on a runtime value. Its entire dynamic-shape strategy is static pad-to-bound with the real runtime size carried in a fixed-width metadata prefix. A dimension marked dynamic carries a static upper bound (the extent stored in the xla::Shape) plus a per-dimension is_dynamic bit; the runtime value lives in [0, bound]. The compiler sizes every buffer at the bound, and the xla::DynamicPadder pass — running twice in the HLO pre-pass set — rewrites the module so that after it runs, layout assignment, tiling, MSA, and LLO lowering all see a fully static module. Dynamic-ness survives only as (a) ordinary S32 scalar SSA values that compute each dimension's runtime size, and (b) two boundary custom-calls, PadToStatic and SliceToDynamic, that read and write a 1024-byte metadata prefix prepended to every dynamic buffer.

This page documents the three layers a reimplementation must get right: the bounded-dynamic Shape model and the size-operand threading (DynamicParameterBinding, DynamicDimensionInference, SetDimensionSize/GetDimensionSize); the DynamicPadder pass and its companions (DynamicIndexSplitter, DynamicDimensionSimplifier) — what they rewrite and where they insert the boundary custom-calls; and the pad-to-static-tile policy — how the prefix is sized, how Target::ShapeWithMetadataSizeBytes computes a buffer's physical bytes, and how the two boundary emitters lower the prefix to LLO. The window/conv same-padding bound arithmetic and the SparseCore MLIR dynamic-dim subsystem are out of scope and noted where they branch off.

The contract a reimplementation must honor:

  • Static pad-to-bound, never runtime-sized allocation. Target::ShapeWithMetadataSizeBytes(shape) is ShapeSizeBytesRaw(shape) + prefix for a dynamic shape and ShapeSizeBytesRaw(shape) for a static one. ShapeSizeBytesRaw uses the static (upper-bound) extents. No code path makes a buffer's physical size a function of a runtime scalar.
  • The metadata prefix is a compile-time constant: 1024 bytes. Target::DynamicShapeMetadataPrefixBytes() returns 1024 on every TPU generation, asserted divisible by sizeof(int32_t) (256 int32 dim-size slots) and <= ChunkSizeBytes().
  • DynamicPadder is the master lowering pass; it runs before layout. It sits at step 21 of the PreOptimization phase (after DynamicDimensionSimplifier at step 20, with DynamicIndexSplitter earlier at step 4 — see hlo-pre-passes.md). After it, only Set/GetDimensionSize S32 scalars and the two boundary custom-calls remain.
  • Dimension sizes are S32 scalars threaded through the HLO graph. Entry-parameter dynamic dims bind to another S32 entry parameter via DynamicParameterBinding; intra-graph the DynamicDimensionInference map carries each dim's size SSA value; a 37-override forward visitor propagates them.
  • TPU rejects unbounded dynamism. Only bounded-dynamic dims (a dynamic dim with a static upper bound) compile; unbounded dynamism is not supported is the gate.
Master passxla::DynamicPadder::RunImpl(HloModule*, exec_threads) @ 0x16998ca0 (~5,918 decompiled lines)
Companion — step 4xla::DynamicIndexSplitter::RunImpl @ 0x164ae740
Companion — step 20xla::DynamicDimensionSimplifier::RunImpl @ 0x164d0020
Dim-size analysisxla::DynamicDimensionInference::Run @ 0x1e39ad20; visitor DynamicDimensionInferenceVisitor::Run @ 0x1e3984c0
Boundary insert helpersDynamicDimensionInferenceVisitor::RequiresPadToStatic @ 0x1e39a7e0; ::InsertPadToStaticOnInstruction @ 0x1e390920
Un-pad rewriter(anon)::DynamicShapeRemovingVisitor::ConvertToDynamic @ 0x169a7bc0
Boundary emitters (LLO)jellyfish::PadToStaticEmitter::Emit @ 0x10c9ad40; jellyfish::SliceToDynamicEmitter::Emit @ 0x10c9c6c0
SizingTarget::ShapeWithMetadataSizeBytes @ 0x1d619f20TransferSizeUtil::ShapeWithMetadataSizeBytes @ 0x1d6aea00; Target::DynamicShapeMetadataPrefixBytes @ 0x1d61c4e0; Target::DynamicShapeMetadataPrefixShape @ 0x1d61c500
Source unitthird_party/tensorflow/compiler/xla/service/dynamic_padder.cc (string-anchored)
ConfidenceCONFIRMED (byte-anchored) unless a row or callout says otherwise

The Bounded-Dynamic Shape Model

The model is the standard XLA per-dimension dynamic bit paired with the static extent serving as the upper bound. Each xla::Shape carries, per dimension i, a static extent D[i] and a boolean is_dynamic[i]. When is_dynamic[i] is true, D[i] is the upper bound and the runtime value lies in [0, D[i]]. The relevant accessors are byte-anchored:

MethodVARole
xla::Shape::is_dynamic_dimension(int)0x1e52c9e0per-dim dynamic bit query
xla::Shape::set_dynamic_dimension(int, bool)0x20cd8c60set per-dim dynamic bit
xla::Shape::is_static()0x20cd8f80true iff no dim is dynamic

A dynamic dimension's runtime size is always an S32 scalar. The shape-inference helpers for the two dimension-size ops fix this:

  • xla::ShapeInference::InferSetDimensionSizeShape(Shape, Shape, long) @ 0x1e541d20
  • xla::ShapeInference::InferGetDimensionSizeShape(Shape, long) @ 0x1e541960

with the diagnostic SetDimensionSize's value has to be S32 scalar, got %s enforcing the operand dtype. The HloOpcode bytes for the dynamic-dim op family (authoritative map cross-referenced from the opcode catalog):

0x35 dynamic-reshape       0x36 dynamic-slice        0x37 dynamic-update-slice
0x3f get-dimension-size    0x70 set-dimension-size   0x51 pad

NOTE — decompile cross-check. DynamicPadder::RunImpl (0x16998ca0) discriminates these by raw opcode: at line ~1077 it tests v25 == 63 || v25 == 112 (decimal 0x3f/0x70 = get-/set-dimension-size) and special-cases the SetBound custom-call before deciding an instruction is "already-static". [Confidence: CONFIRMED — read directly from the decompiled switch.]

Compile-time invariants on the bound are guarded by verbatim diagnostics (the rewriters never silently truncate):

  • Dimension size has to be less-equal than upper bound %lld for dimension %lld in shape %s
  • dynamic size must be less than or equal to static size
  • Shape size has to be less than %d in dynamic shape bounded by %s
  • Shape size has to be greater or equal than 0 in dynamic shape bounded by %s
  • Non-positive constant for dynamic size
  • requires 'shape' to have at most one dynamic dimension, but got multiple dynamic dimensions at indices {0} and {1} — some ops cap at a single dynamic dim.

The jellyfish layer additionally models a per-dimension tiling/parallel extent as std::variant<long, jellyfish::DynamicBound> — a tiling dim is either a concrete long extent or a DynamicBound (a bounded-dynamic extent). This surfaces in VerifyParallelAttributes(..., absl::Span<variant<long, DynamicBound> const>, ...) (0x14516f40) and LiteralBase::Piece::CopyElementsWithDynamicBound<T> for literal evaluation under a bound. [Confidence: HIGH — symbol-anchored, body not fully traced.]


Dimension-Size Threading

Dynamic dim sizes are threaded as ordinary S32 scalar SSA values through three layers.

Layer 1 — entry binding: DynamicParameterBinding

For an entry parameter whose dimension is dynamic, the runtime size is supplied as another entry parameter (an S32 scalar). The binding is a map<DynamicDimension{param, index, dim} -> DynamicSizeParameter{param, index}>. The verbatim diagnostic anchors the mechanism:

-- Input param number %lld at %s has dim %lld as dynamic dimension,
   which is represented by param number %lld at %s

DynamicParameterBinding is the analysis seed: DynamicDimensionInferenceVisitor::Run consumes it, and conditionals carry a per-branch binding (dynamic_parameter_binding for conditional branch).

Layer 2 — intra-graph map: DynamicDimensionInference

This analysis maintains map<DynamicDimension{HloInstruction*, ShapeIndex, dim} -> HloInstruction*> — the S32 SSA value that holds each dim's runtime size. It is backed by two ordered-tree containers (one keyed by HloInstruction*, one by the DynamicDimension struct via operator< @ 0x1e3a8520). Public API (all byte-anchored):

MethodVA
Run(module, op_dynamism_support_fn, custom_call_handler, ShapeCheckMode, assertion_generator, exec_threads)0x1e39ad20
SetDynamicSize(inst, ShapeIndex, dim, size_inst)0x1e38f1e0
GetDynamicSize(inst, ShapeIndex, dim) -> HloInstruction*(const variant) 0x1e39bbe0
GetDynamicSizes(inst, ShapeIndex)0x1e39bc00
GetDynamicShape(inst)0x1e39b980
ForwardDynamicSize(inst, new_inst, ShapeIndex)0x1e39b580
ReplaceAllDynamicDimensionUsesWith(a, b)0x1e3927a0
CopyMapping(from, to, replacement_map)0x1e3982c0
AnalyzeDynamicDimensions()0x1e39b0a0

NOTE — decompile cross-check on Run signature. The mangled symbol at 0x1e39ad20 demangles to Run(HloModule*, std::function<OpDynamismSupport(HloInstruction*)>, std::function<bool(HloInstruction*, DynamicDimensionInference*)>, ShapeCheckMode, std::function<void(HloInstruction*)> const&, absl::flat_hash_set<string_view> const&). The first callback is the per-op dynamism-support query (OpDynamismSupport); the second is the custom_call_handler (the registered per-custom-call dynamism inferer, taking the instruction and the inference object); the fourth callback is the assertion_generator; the final flat_hash_set<string_view> is the execution-thread set. This matches the call from DynamicPadder::RunImpl (line ~1236) exactly. [Confidence: CONFIRMED.]

Run builds a DynamicParameterBinding from the entry layout, runs DynamicDimensionInferenceVisitor to propagate dim-size SSA values forward across every instruction, then calls AnalyzeDynamicDimensions.

Layer 3 — propagation: DynamicDimensionInferenceVisitor (37 visitor overrides + 3 helpers)

The forward visitor (Run @ 0x1e3984c0) computes each output dim-size SSA from operand dim-sizes. Each handler receives a callback of signature (HloInstruction* inst, ShapeIndex, long dynamic_dim, long operand_dim, HloInstruction* dynamic_size). The binary exports 37 single-argument Handle*(HloInstruction*) visitor overrides plus 3 multi-argument helpers (HandleDynamicConvolutionForward(HloInstruction*, long, long, HloInstruction*), HandleDynamicConvolutionInputGrad(HloInstruction*, long, long), HandleDynamicWindowSamePadding(HloInstruction*, HloInstruction*, long, long)) invoked from HandleCustomCall for the three DynamicConvolution* custom-calls — 40 Handle* member functions total. Representative subset with byte anchors:

HandlerVAPropagation rule
HandleParameter0x1e39a6c0seeds from DynamicParameterBinding
HandleSetDimensionSize0x1e3929e0binds dim → operand-1 (the S32 size)
HandleGetDimensionSize0x1e3923a0materializes the stored size SSA
HandleBroadcast0x1e38fae0maps via broadcast dims; Broadcast input and output dynamism mismatch
HandleConstant0x1e38fba0clears dynamism (constants are static)
HandleConcatenate0x1e391c20sum of operand dyn-sizes on the concat dim
HandleReshape0x1e394980factorizes dim groups; threads sizes per group
HandleDynamicReshape0x1e3947e0uses the explicit dim-size operands
HandlePad0x1e391360size combined with pad config
HandleDot / HandleConvolution0x1e3918a0 / 0x1e391b60propagate batch/spatial dyn dims
HandleDynamicConvolutionForward0x1e392d00GetWindowedOutputSize bound
HandleDynamicWindowSamePadding0x1e393060same-padding bound
HandleWhile0x1e398900threads dyn-size through the loop-carried tuple
HandleConditional0x1e395d00per-branch binding
HandleCustomCall0x1e390580Dynamic inferencing on custom call %s is not supported unless a handler is registered
PassThroughDynamicDimension0x1e3935e0generic forward helper (elementwise, select, clamp, transpose, slice…)

The visitor also decides where the static boundary goes, via RequiresPadToStatic(inst, ShapeIndex) (0x1e39a7e0) and InsertPadToStaticOnInstruction(inst) (0x1e390920): when a downstream op cannot consume a dynamic operand, a PadToStatic custom-call is inserted. The ShapeCheckMode enum selects whether dynamic-dim consistency is verified at compile time or deferred to a runtime assertion emitted by the assertion_generator callback; the runtime-mode anchor is dynamic dimensions size %d did not match number of dimensions %d.


The DynamicPadder Pass

xla::DynamicPadder::RunImpl(HloModule*, exec_threads) @ 0x16998ca0 is the master dynamic-shape lowering pass — the open-source XLA DynamicPadder, configured by a DynamicPadderOptions and added with AddPass<DynamicPadder>(DynamicPadderOptions). It sits at PreOptimization step 21, immediately after DynamicDimensionSimplifier (step 20) and ConditionalCanonicalizer, with DynamicIndexSplitter earlier at step 4 (see hlo-pre-passes.md, where it is recorded as a 2× AddPass).

RunImpl flow, recovered from the decompiled call sequence:

  1. Build dynamism info. Call DynamicDimensionInference::Run(module, op_dynamism_support_fn, custom_call_handler, shape_check_mode, assertion_generator, exec_threads) (line ~1236). The op_dynamism_support_fn returns an OpDynamismSupport per op; the anchor op_support != OpDynamismSupport::kNoSupport decides whether an op keeps its dynamic shape or must be padded. The TPU tpu_compile_op_support.cc supplies the per-op support table.

  2. Rewrite ops that cannot consume a dynamic operand. All rewriters live in the dynamic_padder.cc anonymous namespace:

    • PadWithScalar(inst, dim, dynamic_size, pad_value) @ 0x169a2fe0 — pads the bounded region beyond the runtime size with an identity scalar (0 for add-reduce, −inf for max-reduce, …) so padded lanes do not perturb the result.
    • GenerateBinaryMask(inst, dim, dims, dyn_sizes, iota, lt, is_lower) @ 0x169a6360 — builds an iota < dynamic_size comparison mask (the "padding mask" of the pass's output invariant).
    • RewriteDynamicReshape(inst, ddi) @ 0x169a11c0 (+ RewriteDynamicReshapeSingleGroup @ 0x169a3600) — splits the reshape into dim-groups and threads dynamic sizes.
    • RewriteInputWithDynamicPadding(inst, operand, pad, dims, Window*, size_fn) @ 0x169a6ec0 — pads a conv/window input, using GetWindowedOutputSize (0x1e3a93e0) / GetWindowedInputGradSize (0x1e3a9b40) for the bound. Driven by the three DynamicConvolution{Forward, InputGrad, KernelGrad} custom-call branches.
    • RewriteDynamicBinaryOp / RewriteDynamicSort (lambdas), and the reduce-window/select-and-scatter same-padding rewriters.

    The rewriters build HLO via MakePadHlo (0x1e3e5560), CreateSlice (0x1e593160), CreateDynamicSlice (0x1e5947e0), CreateDynamicUpdateSlice (0x1e594860), CreateReshape (0x1e594de0), and friends.

  3. Insert the two boundary custom-calls.

    • PadToStatic — at every point a dynamic value must become a static-shaped value (entry to a static-only region). Inserted by InsertPadToStaticOnInstruction / RequiresPadToStatic.
    • SliceToDynamic — the inverse, at every point a static value must be re-annotated as dynamic (output boundary).

    Targets: jellyfish::dynamic_padding_handler::kPadToStatic / kSliceToDynamic, registered via CustomCallRegistration::RegisterLoweringEmitter("SliceToDynamic", dynamic_padding_emit_helper) from custom_ops/dynamic_padding_handler.cc.

  4. Un-pad ops that DO support dynamic natively. (anon)::DynamicShapeRemovingVisitor::ConvertToDynamic(inst) @ 0x169a7bc0 (+ ConvertOperandsToDynamic @ 0x169a8ac0) walks ops whose OpDynamismSupport is "supported" and strips the PadToStatic that DynamicPadder would otherwise have inserted, avoiding an unnecessary pad/slice round-trip. The mirror anchor is Input to RemoveDynamicShapeMetadataIfPresent should be static.

NOTE — decompile cross-check. DynamicPadder::RunImpl (5,918 decompiled lines) was confirmed to call, in order, Shape::is_static (the already-static early-out at line ~1071), DynamicDimensionInference::Run (~1236), RewriteDynamicReshape (~1410), GetDynamicSize/ForwardDynamicSize/GetDynamicSizes/HasDynamicDimension throughout, PadWithScalar and RewriteInputWithDynamicPadding inside the three DynamicConvolution* custom-call branches (~3098/3172/3332/3381/3528), and DynamicShapeRemovingVisitor::ConvertToDynamic at the tail (~5178). A ret_check kernel->shape().is_static() (~3267) guards the conv-kernel path. [Confidence: CONFIRMED.]

Companions

  • xla::DynamicIndexSplitter::RunImpl @ 0x164ae740 (step 4) — splits multi-dim dynamic indices on DynamicSlice/DynamicUpdateSlice into per-dim scalar index operands, so each index is a single S32 SSA value.
  • xla::DynamicDimensionSimplifier::RunImpl @ 0x164d0020 (step 20) — folds redundant Get/SetDimensionSize chains and <= K dynamic-dim ops before DynamicPadder runs.

Pad-to-Static-Tile Policy and the Runtime Representation

After DynamicPadder, every buffer is sized at its static bound, and the runtime sizes live in a 1024-byte metadata prefix prepended to each dynamic buffer.

Buffer sizing

Target::ShapeWithMetadataSizeBytes(shape) (0x1d619f20) delegates to TransferSizeUtil::ShapeWithMetadataSizeBytes (0x1d6aea00). The decompiled body:

// xla::jellyfish::TransferSizeUtil::ShapeWithMetadataSizeBytes  @0x1d6aea00
if (element_type == 13)            // TOKEN/opaque: no payload
    prefix = 0;
else if (Shape::is_static(shape))  // fully static: no prefix
    prefix = 0;
else {                             // dynamic: prefix from layout, default 1024
    prefix = shape.layout().dynamic_shape_metadata_prefix_bytes();
    if (prefix == 0) prefix = 1024;
}
return prefix + ShapeSizeBytesRaw(shape);   // ShapeSizeBytesRaw uses static extents

So the physical byte-size is ShapeSizeBytesRaw(shape) + prefix, where ShapeSizeBytesRaw is computed from the static (upper-bound) extents — never a runtime value. The prefix is 1024 by default and can be overridden per-buffer by the xla::Layout::dynamic_shape_metadata_prefix_bytes() field.

NOTE — the prefix is layout-overridable, not a flat constant. The dynamic branch reads layout().dynamic_shape_metadata_prefix_bytes() and only falls back to 1024 when that field is zero; the TOKEN element type (13) takes neither prefix nor payload. The constant 1024 comes from Target::DynamicShapeMetadataPrefixBytes() (below), which is what populates the layout field. [Confidence: CONFIRMED — read from 0x1d6aea00.]

The prefix constant

Target::DynamicShapeMetadataPrefixBytes() @ 0x1d61c4e0 is, in full:

__int64 xla::jellyfish::Target::DynamicShapeMetadataPrefixBytes(Target *this) {
  return 1024;
}

1024 bytes, on every TPU generation. Asserted invariants (verbatim CHECK strings):

  • b.target().DynamicShapeMetadataPrefixBytes() % sizeof(int32_t) == 0 — 1024 = 256 four-byte int32 dim-size slots.
  • b.target().DynamicShapeMetadataPrefixBytes() <= b.target().ChunkSizeBytes() — the prefix fits inside one HBM/VMEM chunk.
  • metadata_offset == target.DynamicShapeMetadataPrefixBytes() — the data region begins exactly after the prefix.

Target::DynamicShapeMetadataPrefixShape() @ 0x1d61c500 builds a 1-D S32 shape of dim sizes — the logical type of the prefix.

On-device buffer layout

+------------------------------+--------------------------------------------+
| metadata prefix (1024 bytes) | data, padded to the static upper bound     |
| = DynamicShapeMetadataPrefix |  (= ShapeSizeBytesRaw(shape) bytes)         |
|   Bytes() = 0x400            |                                            |
| up to 256 int32 dim sizes    |  physical extent independent of runtime    |
+------------------------------+--------------------------------------------+

SliceToDynamic writes the prefix; PadToStatic reads it.

Layout / tiling / MSA treatment

Because DynamicPadder runs before layout, all later passes operate on static array shapes (with the per-dim is_dynamic bit retained for bookkeeping):

  • Layout (layout-assignment.md): TpuLayoutAssignment chooses layouts from the static (upper-bound) dims via ChooseCompactLayoutForShape. The dynamic_shape_metadata_prefix_bytes() field rides on the xla::Layout (anchor: input_shape.layout().dynamic_shape_metadata_prefix_bytes() is expected to be non-zero, where input_shape = ), recording that the buffer carries a prefix.
  • Tiling (loop-tiling-unrolling.md): tiling uses the static extents. lowering_util::DynamicShapeSizeCompactRaw (0x1c6ca220) and DynamicShapeSizeCompactForDmaRaw (0x1c6ca8a0) compute the actual runtime transfer byte-size from the dim-size scalars at DMA time, so DMAs move only the live region, not the full pad-to-bound buffer.
  • MSA (memory-space assignment): allocates Target::ShapeWithMetadataSizeBytes(shape) per buffer — pad-to-bound + prefix. Asserts allocation->size() == target.ShapeWithMetadataSizeBytes(allocation->shape()). MSA never sees a runtime-sized buffer.

The TF/host boundary uses tensorflow::XlaTpuPaddedShapeFn(TpuTopology, Shape, Shape*) @ 0xf7d1cc0 (→ TransferSizeUtil::SetPaddedShape) to compute the on-device padded shape for a dynamic XLA shape (pads each dynamic dim to its bound, walks tuples).


DynamicSlice / DynamicUpdateSlice / Reshape Lowering

After DynamicPadder, the module is static plus the two boundary custom-calls. dynamic-pad and dynamic-reshape never survive to LLO — DynamicPadder fully rewrites them into Pad + DynamicSlice/Reshape + mask sequences at HLO time. The remaining first-class dynamic-index ops are lowered directly by the TPU HLO→LLO LoweringEmitter:

EmitterVARole
jellyfish::LoweringEmitter::HandleDynamicSlice(hlo)0x10c3b0c0static-extent indexed load with a runtime base offset
jellyfish::LoweringEmitter::HandleDynamicUpdateSlice(hlo)0x10c3b640static-extent indexed store
jellyfish::DynamicUpdateSliceEmitter0x10c66ec0DUS variant (OpEmitter::Emit<>)
jellyfish::AsyncDynamicIndexEmitter0x10c415e0pipelined form — index scalar computed in one bundle, the indexed load/store deferred to a later one

Because the slice extents are static (the slice is into a pad-to-bound buffer), the lowering is a static-extent indexed copy with a runtime base offset computed from the S32 index scalars that DynamicIndexSplitter produced. The X128/precision and sharding visitors reuse the same handlers: XPrecisionRewriteVisitor::HandleDynamicSlice/UpdateSlice (0x1115fb40/0x1115fd00) and DimLabelPropagation::HandleDynamicSlice/UpdateSlice (0x11197ca0/0x11198220). A peephole SliceToDynamicCopyMover (0x10fc0240) pushes a Copy through SliceToDynamic(Copy(x)) to eliminate a redundant copy at the dynamic boundary.

The two boundary emitters

These are the heart of the runtime contract.

PadToStaticEmitter::Emit() @ 0x10c9ad40 takes a dynamic-shaped input buffer (prefix + pad-to-bound data) and produces a tuple of {static-shaped data array, S32 dim-size scalars}. Confirmed LLO sequence:

DynamicShapeMetadataPrefixBytes()                 // locate the 1024-byte prefix
  -> CHECK prefix % sizeof(int32_t) == 0
LloRegionBuilder::Vld(...)                          // vector-load dim-size metadata
lowering_util::ScalarToSreg(...)                    // move size into a scalar reg
lowering_util::ComputeBoundsInChunks(sizes, b)      // dim sizes -> chunk counts
lowering_util::CalculateDynamicCompact2ndMinorRatio // re-derive compact tiling ratio
Target::ChunkCountsWithTmp(shape, tmp)              // pad-to-bound chunk counts
PipelineEmitter::SetDynamicIterationBounds(...)     // iteration count driven by runtime size
PipelineEmitter::Emit(...)                          // software-pipelined chunk transfer

NOTE — decompile cross-check. All of the above were read directly from 0x10c9ad40: the DynamicShapeMetadataPrefixBytes() & 3 divisibility check with the matching CHECK string (line ~274/277), Vld (~291), ScalarToSreg (~446), ComputeBoundsInChunks (~499), ChunkCountsWithTmp (~270/271), CalculateDynamicCompact2ndMinorRatio (~705/841), and PipelineEmitter::SetDynamicIterationBounds (~630). [Confidence: CONFIRMED.]

SliceToDynamicEmitter::Emit() @ 0x10c9c6c0 is the inverse: it takes a static-shaped data array + S32 dim-size scalars, writes the prefix, and produces a dynamic-shaped buffer. Confirmed LLO sequence:

DynamicShapeMetadataPrefixShape()                   // the 1-D S32 prefix shape
  + DynamicShapeMetadataPrefixBytes() (% 4 check)
LloRegionBuilder::Vlaneseq / VimmS32 / VeqS32 / Vselect   // build per-lane validity mask
lowering_util::BroadcastScalarToVreg(...)            // broadcast each dim size
deep_copy_util::MemsetInGranules(...)                // zero/init the prefix region
lowering_util::ComputeBoundsInChunks(...)            // sizes -> chunk counts
lowering_util::CalculateDynamicCompact2ndMinorRatio  // compact tiling ratio
... CopyArray after the prefix

NOTE — decompile cross-check. Read from 0x10c9c6c0: Vlaneseq/VimmS32/VeqS32/Vselect mask construction (lines ~291–317), BroadcastScalarToVreg (~314), the prefix % sizeof(int32_t) check (~324/327), DynamicShapeMetadataPrefixShape (~336), ComputeBoundsInChunks (~430), MemsetInGranules (~541), and CalculateDynamicCompact2ndMinorRatio (~613/744). [Confidence: CONFIRMED.]


Unsupported and Rejected Cases

TPU explicitly rejects unbounded dynamism and several op-specific dynamic cases. Verbatim string anchors:

AnchorMeaning
unbounded dynamism is not supported / Unbounded dynamism is disabled for instruction: %sonly bounded-dynamic dims compile
AllToAll does not support bounded dynamic shapes / AllToAllTuple does not support unbounded dynamic shapescollectives reject dynamism
CustomCall "%s" is not supported to have a dynamic dimensionmost custom-calls must be static-shaped
Dynamic inferencing on custom call %s is not supportedno registered dynamism handler
bitcast-convert is not valid for dynamic shape %s->%sbitcast-convert rejects dynamism
The output of iota must not have dynamic dimensionsiota output must be static
Dynamic shapes are not supported for host buffers / dynamic shapes not supported in allocationshost/pinned allocs reject dynamism
MemRefType don't support dynamic shapesMLIR memref must be static

The StableHLO unbounded-Dynamic*Op family (DynamicReshapeOp, DynamicBroadcastInDimOp, DynamicConvOp, DynamicGatherOp, …) has verifiers present in the binary, but whether any are ever reachable on TPU or are purely a front-end import artifact rejected by unbounded dynamism is not supported was not established. [Confidence: LOW.]


What Is Not Recovered

  • Exact internal byte/dim-ordering of the 1024-byte prefix. Inferred to be a 1-D S32 array of dim sizes (DynamicShapeMetadataPrefixShape); whether the ordering is major-to-minor or minor-to-minor, and whether tuple sub-shapes share or each own a prefix, was not byte-dumped from a live buffer. [Confidence: MEDIUM.]
  • Full DynamicPadderOptions proto field set. Confirmed via the Run signature + anchors: shape_check_mode, op_support_from_compute (the OpDynamismSupport fn), assertion_generator, slice_dynamic_output. The complete dynamic_padding.proto descriptor was not field-by-field decoded. [Confidence: MEDIUM.]
  • The per-op OpDynamismSupport table (tpu_compile_op_support.cc) — which ops keep a dynamic operand vs require a PadToStatic boundary — was not enumerated. [Confidence: LOW.]
  • Per-handler arithmetic of each DynamicDimensionInferenceVisitor::Handle*. The open-source propagation algorithm is known and the TPU binary follows it; handler bodies were sampled, not exhaustively traced. [Confidence: MEDIUM.]
  • The SparseCore dynamic-dim subsystem (LowerDynamicDimensionSizePass, sc_tpu.set/get_dynamic_dimension_size, DynamicBoundedSlicedInput, ConvertStaticToDynamicEmitter) — a distinct MLIR-level path, out of this page's scope.
  • Conv/window same-padding bound arithmetic (GetWindowedOutputSize / RewriteDynamicConvolution*) — anchored here but not traced to reimplementation accuracy.

Cross-References

  • hlo-pre-passes.md — the ordered pre-pass table where DynamicIndexSplitter (#4), DynamicDimensionSimplifier (#20), and DynamicPadder (#21, 2× AddPass) live; this page is the algorithm detail for those rows.
  • compile-phases.md — the top-level phase ordering; DynamicPadder runs inside the PreOptimization phase, before layout/MLIR.
  • overview.md — compiler orientation and the HLO → … → LLO IR-layer stack; owns the MLIR handoff the boundary emitters feed into.
  • layout-assignment.mdTpuLayoutAssignment carries the dynamic_shape_metadata_prefix_bytes() layout field; it sees a static module after DynamicPadder.
  • loop-tiling-unrolling.md — tiling uses static extents; DynamicShapeSizeCompactForDmaRaw sizes DMAs to the live region at runtime.
  • back to index