LHS: ILP Variant

Addresses and offsets apply to libtpu.so from the libtpu-0.0.40-cp314 wheel (build-id 89edbbe81c5b328a958fe628a9f2207d). Other versions differ.

Abstract

libtpu.so ships a class literally named for an integer-linear-programming schedule — xla::ILPMemoryScheduler — together with a flag family named xla_tpu_enable_ilp_latency_hiding_scheduler and a six-field protobuf named xla::jellyfish::IlpLatencyHidingSchedulerOptions. A reader expecting "the ILP variant of the LatencyHidingScheduler" to be a single solver-backed replacement for the greedy list scheduler will be misled. Static analysis shows the name covers two structurally unrelated code paths, named alike by a shared upstream options message, but wired into two different subsystems.

The first path is a per-computation memory-scheduling MIP — xla::ILPMemoryScheduler::Run (@0x10acd020) delegating to the worker ILPMemorySchedulerForComputation::Run (@0x10acdf00). This is a genuine integer program: it builds an operations_research::math_opt::Model, declares binary placement/liveness variables and one continuous peak_memory variable, emits eleven linear-constraint families, minimizes peak_memory, and solves it through OR-Tools math_opt::Solve with SolverType = CP_SAT (4). It is the memory scheduler selected when the MemorySchedulerProto::Value enum dispatched by xla::GetMemorySchedulerAlgorithm (@0x10abd6a0) is set to ILP (6): case 6 constructs the class (vptr off_217FA470), and the dispatcher is called from live pipeline bring-up (RunHloScheduler, PostMainFusionHloOptimize). It is an on-demand scheduler, not the default — DEFAULT (0) selects DefaultMemoryScheduler — but it is fully reachable.

The other path drives the latency-hiding scheduler and contains no ILP: inside the canonical scheduler bring-up RunHloScheduler (@0x1096fac0), xla::jellyfish::EnableIlpLatencyHidingScheduler (@0x1d6b7e00) selects between two async-op classifier lambdas. When enabled, the wider lambda $_1 exposes more instruction kinds as async-overlap candidates to the canonical xla::LatencyHidingScheduler — which is the pass that runs in both cases. The "ILP-LHS" flag, as wired in 0.0.40, means "expose more async candidates," not "switch to an ILP scheduler." This flag is independent of the MemorySchedulerProto::Value::ILP enum that selects the memory-scheduling MIP: the two share the IlpLatencyHidingSchedulerOptions message name but are not the same switch.

For reimplementation, the contract is:

The xla::ILPMemoryScheduler object layout (112 bytes) and its hard-coded ≤ 8 HloValues fallback to xla::BacktrackScheduler, gated additionally by the use_backtrack_fallback flag at object offset +104.
The full ILP model: three binary variable families (I_, Vs_, Ve_) plus continuous peak_memory, eleven AddLinearConstraint families, the single-term min peak_memory objective, and the CP-SAT solve with default (unbounded) SolveParameters.
The dispatcher wiring: GetMemorySchedulerAlgorithm constructs ILPMemoryScheduler at case ILP (6) of the MemorySchedulerProto::Value switch; the MIP runs only when the compilation environment selects that enum value.
The EnableIlpLatencyHidingScheduler gate inside RunHloScheduler and the $_1 vs $_2 classifier semantics — a separate switch that does not touch the MIP.
The IlpLatencyHidingSchedulerOptions proto: which of its six fields is read (one) and which five are inert.


MIP outer Run	`xla::ILPMemoryScheduler::Run(HloComputation*, ...)` @ `0x10acd020`
MIP worker Run	`(anon)::ILPMemorySchedulerForComputation::Run` @ `0x10acdf00`
MIP vtable	`0x217fa460` (vptr `off_217FA470`), typeinfo `0x217fa490` — one consumer: dispatcher case 6
Dispatcher	`xla::GetMemorySchedulerAlgorithm` @ `0x10abd6a0` — case `ILP (6)` builds the MIP
Live gate	`xla::jellyfish::EnableIlpLatencyHidingScheduler` @ `0x1d6b7e00`
Gate caller	`(anon)::RunHloScheduler` @ `0x1096fac0` (gate at line 1084)
Async classifiers	`$_1` (ILP-on) @ `0x10977140`, `$_2` (regular) @ `0x10977420`
Options proto	`xla::jellyfish::IlpLatencyHidingSchedulerOptions` (6 fields, 1 read)
Solver backend	`math_opt::Solve` @ `0x10af9d20`, `SolverType = CP_SAT (4)` only

The Two Paths Behind One Name

The flag family xla_tpu_enable_ilp_latency_hiding_scheduler does not select a scheduler. Tracing every reference to the symbols carrying the string ILP and IlpLatencyHiding lands on two disjoint subsystems, reached by two independent switches:

two independent "ILP" switches
   │
   ├── (A) async-op classifier switch in canonical LHS
   │       flag xla_tpu_enable_ilp_latency_hiding_scheduler / proto +48
   │       EnableIlpLatencyHidingScheduler(env) @0x1d6b7e00
   │          └─ RunHloScheduler @0x1096fac0  →  picks lambda $_1 vs $_2
   │                └─ AddPass<xla::LatencyHidingScheduler>   (canonical, unchanged)
   │
   └── (B) per-computation memory-scheduling MIP
           MemorySchedulerProto::Value::ILP (6)   ← env field at +268
           GetMemorySchedulerAlgorithm @0x10abd6a0  case 6 → constructs ILPMemoryScheduler
              └─ xla::ILPMemoryScheduler::Run @0x10acd020
                    └─ ILPMemorySchedulerForComputation::Run @0x10acdf00
                          └─ math_opt::Solve(..., CP_SAT, ...) @0x10af9d20

The two paths share no state, no cost model, and no entry point. Path (B) is the only one that contains an integer program; path (A) widens the async surface fed to the greedy LHS. Both are reachable, but by different knobs: path (A) by the enable_ilp_latency_hiding_scheduler flag/proto bool, path (B) by the MemorySchedulerProto::Value enum at env offset +268. Naming both "ILP" is a build-time artifact of the upstream source: the proto field that switches path (A) lives in the same options message as the (inert-in-this-build) knobs that would tune path (B).

GOTCHA — the IlpLatencyHidingSchedulerOptions proto is a single message whose only live consumer is path (A)'s classifier switch (the bool at +48). Five of its six fields (max_solver_deterministic_time, computation_size_threshold, use_ilp_schedule_sequence, also_minimize_total_lifetime, min_compute_latency) describe path (B)'s solver, but path (B) reads none of them — it takes its AliasInfo and size function from the GetMemorySchedulerAlgorithm call args, not from this proto. A reimplementer wiring this options proto to the CP-SAT solver would reproduce nothing the binary connects: the MIP is selected by the MemorySchedulerProto::Value enum, not by these fields.

Path (B): the `ILPMemoryScheduler` MIP

Class hierarchy and dispatcher case

xla::ILPMemoryScheduler is a subclass of xla::ComputationSchedulerAlgorithm, the same base shared by every memory scheduler. Its deleting destructor (@0x10ad6900) resets the base vptr to the ComputationSchedulerAlgorithm vtable (off_21CF7F08) and frees 112 bytes, confirming the inheritance and object size. The concrete subclass is selected by an enum dispatch in xla::GetMemorySchedulerAlgorithm (@0x10abd6a0), which switches on the MemorySchedulerProto::Value stored at env + 268:

// sub_10ABD6A0 — verbatim switch structure
switch ( *(_DWORD *)(a1 + 268) ) {
  case 0: ...  // DEFAULT       → xla::DefaultMemoryScheduler
  case 1: ...  // LIST          → xla::ListMemoryScheduler
  case 2: ...  // DFS           → xla::DFSMemoryScheduler
  case 3: ...  // POST_ORDER    → xla::PostOrderScheduler
  case 4: ...  // BRKGA         → xla::BrkgaMemoryScheduler    (vptr off_217FA2C0)
  case 5: ...  // BFS           → xla::BFScheduler
  case 6: ...  // ILP           → xla::ILPMemoryScheduler      (vptr off_217FA470)
  case 7: ...  // BACKTRACKING  → xla::BacktrackMemoryScheduler
  case 8: ...  // BRUTE_FORCE   → xla::BruteForceMemoryScheduler
  case 9: ...  // LOCAL_ORDER   → xla::LocalOrderScheduler (+ inner RandomOrderScheduler)
  default:
    LogMessage("hlo_scheduling_selector.cc", 62);
    CopyToEncodedBuffer("Unexpected memory scheduler: ", 29);
    return make_unique<xla::DefaultMemoryScheduler>(...);   // fallback
}

The enum names and numeric values are byte-recovered from the embedded proto descriptor (MemorySchedulerProto.Value): DEFAULT=0, LIST=1, DFS=2, POST_ORDER=3, BRKGA=4, BFS=5, ILP=6, BACKTRACKING=7, BRUTE_FORCE=8, LOCAL_ORDER=9. Case 6 is ILP and constructs xla::ILPMemoryScheduler: it allocates 112 bytes, fills the shared ComputationSchedulerAlgorithm layout, then sets the final vptr to off_217FA470 (the ILP vtable's first virtual slot, vtable + 0x10) and the bool at +104 to 1. The default branch logs "Unexpected memory scheduler: " at hlo_scheduling_selector.cc:62 and falls back to DefaultMemoryScheduler.

NOTE — every case 1..8 builds a 112-byte object with the identical layout — base vptr off_21CF7F08, AliasInfo* at +8, the size-function AnyInvocable at +16/+64, an empty post-process std::function hook at +88/+96, and a bool at +104 — then patches the final vptr to the subclass vtable. BRKGA (case 4) patches to off_217FA2C0; ILP (case 6) patches to off_217FA470. The schedulers are siblings sharing one storage shape; only the vptr and the +104 bool value differ.

Object layout and the hard fallback

The outer Run (@0x10acd020, ~861 lines) has the reconstructed signature:

StatusOr<HloInstructionSequence>
xla::ILPMemoryScheduler::Run(HloComputation* comp,
                             TuplePointsToAnalysis const& pts,
                             HloAliasAnalysis const& alias) const;

Offset	Field (inferred)
0	vptr → `ILPMemoryScheduler` vtable first slot (`off_217FA470`, vtable base `0x217fa460`)
8	`AliasInfo const*`
16 / 64	size-function `AnyInvocable<int64_t(BufferValue const&)>` (RAII storage + live callable)
48	bool — alive marker for the inlined `AnyInvocable`
88 / 96	post-process `std::function<HloInstructionSequence(...)>` hook (empty by default)
104	bool — `use_backtrack_fallback`

The fallback test is byte-exact in the decompile (outer Run line 122):

// a3 = TuplePointsToAnalysis&;  a3+88 = values().size()
// a2 = this;  a2+104 = use_backtrack_fallback
if ( *(__int64 *)(a3 + 88) <= 8 && *(_BYTE *)(a2 + 104) )
    return xla::BacktrackScheduler::Run(this, &fallback_state, ...);   // greedy

The ILP path is taken only when (number of HloValues > 8) OR (use_backtrack_fallback is clear). The threshold 8 is hard-coded — it is not driven by the proto field computation_size_threshold, which exists but is never read. For graphs above 8 values, or with the fallback toggle cleared, the MIP runs unconditionally with no problem-size guard.

QUIRK — the use_backtrack_fallback member (offset +104) is the only bool that gates the fallback, and the comparison operand 8 is a literal in the instruction stream. The options-proto field that was clearly intended to drive this (computation_size_threshold, int64) is inert. A reimplementation reading computation_size_threshold to size the fallback would diverge from the binary, which ignores it.

Pre-solve model construction (outer Run)

// outer Run, line 168
operations_research::math_opt::Model::Model(&model, "ilp_memory_scheduling", 21);
// line 175 — the single continuous variable
ModelStorage::AddVariable(model_storage,
                          /*is_int=*/0, "peak_memory", /*name_len=*/11,
                          /*lb=*/0.0, /*ub=*/+inf);
// ... walk HloAliasAnalysis::buffers(), group HloValues by defining-instruction id
//     into absl::btree_map<HloValue const*, vector<HloInstruction*>, OrderHloValuesById>,
//     keeping only values whose defining instruction lives in *this* computation ...
// line 788 — delegate
ILPMemorySchedulerForComputation::Run(worker);

The model name string "ilp_memory_scheduling" (length 21) and the continuous-variable name "peak_memory" (length 11, is_int = 0) are both literal in the decompile. The size of each HloValue is priced by the user-supplied AnyInvocable<int64_t(BufferValue const&)> at offset 64 — there is no HighWaterMark-style streaming estimator; the peak is recovered from the MIP solution itself.

The ILP Formulation (worker Run)

The worker ILPMemorySchedulerForComputation::Run (@0x10acdf00, ~6764 lines) builds and solves the model. All anchors below are byte-verified against the decompile; source-line numbers refer to platforms/xla/service/jellyfish/hlo_scheduling/memory_schedulers.cc, recovered from LogMessage/AddSourceLocationImpl call sites.

Decision variables

All three families are binary, built via ModelStorage::AddVariable(model, /*is_int=*/1, name, name_len, /*lb=*/0, /*ub=*/1). The continuous peak_memory (built in the outer Run, is_int = 0) is the fourth.

Family	Name pattern	Decompile line	Indexed over	Meaning
`I_`	`"I_<inst>,<slot>"`	817 / 843	(instruction id, slot `0..N-1`)	`I_i,t = 1` iff instruction `i` is placed at slot `t`
`Vs_`	`"Vs_<value>,<slot>"`	1685 / 1710	(HloValue id, slot `0..N-1`)	`Vs_v,t = 1` iff value `v` has started being live by slot `t`
`Ve_`	`"Ve_<value>,<slot>"`	1807 / 1833	(HloValue id, slot `0..N-1`)	`Ve_v,t = 1` iff value `v` has ended by slot `t`
—	`peak_memory`	outer Run 175	scalar	continuous; lower-bounded by every per-slot live total

The prefix string literals "I_", "Vs_", "Ve_" are emitted by absl::StrCat with absl::numbers_internal::FastIntToBuffer for the index and slot. Variable-to-Variable maps are absl::flat_hash_map<std::pair<HloInstruction*,int>, math_opt::Variable> for instructions and flat_hash_map<std::pair<long,int>, math_opt::Variable> for values. Vs_/Ve_ model liveness as monotone step functions (the cumulative "started by" / "ended by" indicators), which is what lets a per-slot memory sum be written as a linear expression.

Constraints

There are exactly eleven Model::AddLinearConstraint call sites in the worker, at the byte-verified lines below. The functional grouping is inferred from the surrounding loop structure and the variable family each site iterates; the logical statements are reconstruction, not literal coefficient recovery.

#	Line	Family	Logical statement
1	1362	per-instruction	`Σ_t I_i,t == 1` — every instruction in exactly one slot
2	1573	per-slot	`Σ_i I_i,t == 1` — every slot holds exactly one instruction
3	2252	start init	`Vs_v,0 == I_def(v),0` — start CDF seeded by defining instruction
4	2584	start step	`Vs_v,t − Vs_v,t-1 == I_def(v),t` — monotone start stepped at defining slot
5	2952	start coupling	a user of `v` cannot run before `v` starts
6	3356	end step	`Ve_v,t − Ve_v,t-1 ≥ I_last_use(v),t` — end CDF stepped at last-use slot
7	3583	end ≤ start	`Ve_v,t ≤ Vs_v,t` — a value cannot end before it starts
8	4084	precedence (data)	`Σ_{t'≤t} I_pred,t' ≥ I_succ,(t+1)` for each HLO data edge
9	4483	precedence (control)	same shape for control dependencies
10	4887	per-slot memory	`Σ_v size(v)·(Vs_v,t − Ve_v,t) ≤ peak_memory` for each slot `t`
11	5491	objective coupling	closes `peak_memory` against the bounded per-slot expression

Sites 1–2 (the I_ placement constraints) are the assignment-problem core — a permutation of instructions onto slots. Sites 3–5 iterate the Vs_ maps and sites 6–7 the Ve_ maps (both two-loop: outer HloValue, inner slot). Sites 8–9 iterate the OrderHloValuesById-grouped predecessor/user map. Site 10 is the liveness-to-memory reduction.

NOTE — the exact coefficient vectors and RHS values inside each LinearExpression constructor are not recovered: the constructors are inlined and stamped with anonymous flat_hash_map<Variable, double> arguments that require per-call symbolic inspection. The variable families, the count (eleven), and the call-site line numbers are CERTAIN; the precise algebra of each constraint is the MEDIUM-confidence reconstruction above.

Objective

The objective is a single continuous variable, minimized (worker Run lines 5567–5573):

v1196 = 0x3FF0000000000000LL;        // IEEE-754 double 1.0  — peak_memory coefficient
LinearExpression::LinearExpression(expr, /*terms=*/{peak_memory}, /*n=*/1, /*offset=*/0.0);
Model::SetObjective(model, expr, /*maximize=*/0);   // 0 == minimize

min peak_memory, coefficient 1.0, offset 0.0, maximize = false. There is no secondary objective: the proto field also_minimize_total_lifetime is never read.

Logging

The worker emits a one-shot model summary at memory_schedulers.cc:283:

Computation: <name>
Number of instructions: <N>
Number of variables: <model.Variables().size()>
Number of constraints: <model.LinearConstraints().size()>
Number of values related to computation: <values.size()>

The string literals ("Computation: " len 13, "Number of variables: " len 21, "Number of constraints: " len 23, "Number of values related to computation: " len 41) are byte-verified at worker lines 5594/5606/5611/5616. On a successful solve it logs "Peak memory: " at line 300; at VLOG ≥ 2 it dumps per-instruction " scheduled at " (line 423) and per-value Value defined at ... with size : ... (line 454).

Solver Backend, Parameters, Termination

CP-SAT is the only backend

Solve dispatches through math_opt::Solver::New (@0x10b259a0), which validates the init args and model, then calls AllSolversRegistry::Create(registry, solver_type, model, init_args). The registry is populated exactly once, at static init in _GLOBAL__sub_I_cp_sat_solver.cc (@0x212ca300):

v4[0] = operations_research::math_opt::CpSatSolver::New;
operations_research::math_opt::AllSolversRegistry::Register(registry, /*solver_type=*/4u, v4);

AllSolversRegistry::Register (@0x10b273c0) has this single caller. SolverType = 4 is SOLVER_TYPE_CP_SAT. Every other SolverTypeProto value resolves to a missing registry entry and a failed status. The MIP's solve site passes 4 literally (worker line 5833):

operations_research::math_opt::Solve(result, model, /*solver_type=*/4, args, init_args);

Parameters: no bounds applied

The SolveArguments block constructed before the solve (worker lines ~5780–5832) is zero-initialized: empty message-callback std::function, empty solver-callback std::function, null SolveInterrupter. The SolveParametersProto is left at defaults — enable_output = false (raised only under VLOG), unbounded time, no iteration/node limit. The per-solver parameter structs (SatParameters, GScipParameters, GlopParameters, etc.) are constructed but only the CP-SAT SatParameters is consumed, and it is left at defaults.

GOTCHA — the flag xla_tpu_ilp_scheduler_max_solver_deterministic_time (data @0x223c1db0, registered in tpu_compilation_environment.cc) and the proto field max_solver_deterministic_time both have a stored value, but neither is read in the worker Run. The SatParameters deterministic-time and wall-time limits stay at their defaults (CP-SAT default: no limit). The solver runs to completion or to a hard failure — there is no convergence bound.

Termination

SolveResult r = Solve(...);
if ( !r.ok() )                                  // → memory_schedulers.cc:297
    return r.status().AddSourceLocation(...);
if ( !r->termination.EnsureIsOptimalOrFeasible().ok() )   // → memory_schedulers.cc:298
    return ...;
// success: read peak_memory and every I_i,t via SolveResult::variable_values()

EnsureIsOptimalOrFeasible (worker line 5872) accepts both OPTIMAL and FEASIBLE — any feasible primal is consumed even if non-optimal. INFEASIBLE, UNBOUNDED, IMPRECISE, and NO_SOLUTION_FOUND all convert to a failed absl::Status at source line 298. The schedule is then recovered by RetrieveSchedule: for every recorded (instruction, slot) pair, read the variable's optimal value, build std::pair<long /*slot*/, HloInstruction*>, and std::__stable_sort (instantiations @0x10adf140 / @0x10adf3a0 / @0x10adf5e0) by slot to emit the final HloInstructionSequence.

Path (A): the live async-op classifier switch

The only caller of EnableIlpLatencyHidingScheduler (@0x1d6b7e00) is RunHloScheduler (@0x1096fac0). The gate is at decompile line 1084:

if ( EnableIlpLatencyHidingScheduler(env) ) {
    is_async_pred = &RunTensorCoreAsyncOpScheduler::$_1;   // 0x10977140 — wider
} else {
    is_async_pred = &RunTensorCoreAsyncOpScheduler::$_2;   // 0x10977420 — narrower
    use_full_kind_set = EnableSchedulingAnnotationPropagation(env);   // line 1113
}
// both paths land here:
pipeline.AddPass<xla::LegalizeSchedulingAnnotations>(cfg);   // line 1136
pipeline.AddPass<xla::LatencyHidingScheduler>(ctx, core);    // line 1137 — canonical
pipeline.AddPass<xla::ConstantDeferring>();                  // line 1138

The canonical xla::LatencyHidingScheduler runs in both cases (line 1137). The gate is ahead of the AddPass, so the scheduler pass itself cannot tell which classifier was used — only the async-candidate set it receives changes. See LatencyHidingScheduler Core.

Classifier `$_1` (ILP-on, `@0x10977140`)

Byte-verified from the decompile:

// is_async($_1)(inst):
attr = inst->get_frontend_attribute("keep_original_sequence_order_in_group", 37);
if ( attr.present && attr.value_len == 4 && *(uint32*)attr.value == 1702195828 )  // "true"
    return true;                       // honour user-pinned ordering
return inst->IsOutputFusion()
    || inst->IsLoopFusion()
    || inst->opcode() == 0x82
    || inst->IsCustomCall("tpu_custom_call")
    || inst->IsCustomFusion()
    || inst->opcode() == 40;           // kDot

The literal 1702195828 is the little-endian dword for the ASCII bytes "true"; the value-length guard (4) is the comparison against the 4-byte string region. The 37-char attribute name is interned.

Classifier `$_2` (regular, `@0x10977420`)

// is_async($_2)(inst):
return inst->IsOutputFusion()
    || inst->IsCustomCall("tpu_custom_call")
    || inst->IsCustomFusion()
    || inst->opcode() == 40;           // kDot

The delta is exactly: $_1 additionally treats IsLoopFusion and opcode 0x82 as async, and honours the keep_original_sequence_order_in_group="true" pin. The ILP-LHS flag therefore widens the async-overlap surface the canonical scheduler sees — which changes the recurrence latencies DefaultSchedulerCore works with — but does not change the scheduling algorithm. The cost model is the regular LHS cost model (xla::DefaultSchedulerCore + xla::LatencyEstimator); see Scheduler Overview and Cost Model Overview.

The `IlpLatencyHidingSchedulerOptions` Proto

The message is registered, parseable from a flag, and reachable through GetIlpLatencyHidingSchedulerOptions (@0x1d6b7e60). Its Clear (@0x1db24ea0) zeroes a 6-bit presence bitmap at struct offset +16 (& 0x3F) and a payload region spanning offsets 0x18..0x32; the enable_ilp_latency_hiding_scheduler bool sits at offset +48 (0x30), inside that payload.

message IlpLatencyHidingSchedulerOptions {
  optional bool   enable_ilp_latency_hiding_scheduler = 1;  // C++ struct +48 — LIVE
  optional double max_solver_deterministic_time      = 2;   // inert
  optional int64  computation_size_threshold         = 3;   // inert
  optional bool   use_ilp_schedule_sequence          = 4;   // inert
  optional bool   also_minimize_total_lifetime       = 5;   // inert
  optional double min_compute_latency                = 6;   // inert
}

The live consumer reads only the bool at +48 (gate @0x1d6b7e00):

bool EnableIlpLatencyHidingScheduler(env) {
    auto opts = GetIlpLatencyHidingSchedulerOptions(env);
    char field48 = ((char*)&opts)[48];               // enable_ilp_latency_hiding_scheduler
    ~opts.~IlpLatencyHidingSchedulerOptions();
    if ( field48 ) return true;                       // proto override on
    AutoOr<bool> f = AutoOr<bool>::FromProtoOrDie(env + 1608);  // else fall back to the flag
    return (~f & 0x101) == 0;
}

Either the proto override or the xla_tpu_enable_ilp_latency_hiding_scheduler absl flag (data @0x223c1d50) switches the classifier to $_1. The (~v & 0x101) == 0 idiom is the AutoOr<bool> unwrap (the value bit and the present bit both set).

Field	Read in 0.0.40?	Effect
`enable_ilp_latency_hiding_scheduler` (+48)	YES	switches the canonical-LHS async classifier `$_2 → $_1`
`max_solver_deterministic_time`	no	none — `SatParameters` left at defaults
`computation_size_threshold`	no	none — fallback threshold is hard-coded `8`
`use_ilp_schedule_sequence`	no	none
`also_minimize_total_lifetime`	no	none — single-term objective
`min_compute_latency`	no	none

The five inert fields appear only in generated reflection code (_table_ @0x21cfa308, Clear, MergeImpl, _InternalSerialize, ByteSizeLong, CopyFrom, InternalSwap, GetMetadata, GetClassData). No set_*/get_* accessor is called from any non-generated function.

Flags

Flag (data symbol)	Reachable consumer in 0.0.40
`FLAGS_xla_tpu_enable_ilp_latency_hiding_scheduler` (`@0x223c1d50`)	`EnableIlpLatencyHidingScheduler` (via `AutoOr<bool>`)
`FLAGS_xla_tpu_ilp_latency_hiding_scheduler_options` (`@0x223c4750`)	`GetIlpLatencyHidingSchedulerOptions` (via `AutoOr<message>`)
`FLAGS_xla_tpu_ilp_scheduler_max_solver_deterministic_time` (`@0x223c1db0`)	none — registered but unread

Replacement Conditions vs the Greedy List Scheduler

Direct answer for the reimplementer: in libtpu-0.0.40, the MIP replaces the greedy memory scheduler only when the compilation environment sets MemorySchedulerProto::Value::ILP (6) — it is opt-in, not the default.

The MIP path (B) replaces the greedy memory scheduler when GetMemorySchedulerAlgorithm dispatches case 6. The default is DEFAULT (0) → DefaultMemoryScheduler, so an unconfigured build never runs the MIP. The greedy BacktrackScheduler is still reachable from path (B), but only as the ≤ 8 HloValues fallback inside the ILP scheduler's own Run.
The live path (A) — the enable_ilp_latency_hiding_scheduler flag — does not replace the list scheduler at all. It keeps the canonical xla::LatencyHidingScheduler pass and only widens the async-candidate set fed to it. The greedy DefaultSchedulerCore walk is unchanged; only its input changes. Path (A) is orthogonal to the path (B) enum selection.

The MIP is a complete, reachable scheduler — Model, eleven constraints, min peak_memory, CP-SAT solve — selected by one enum value. It is not the default and was not observed in a default pipeline run, but it is wired into the dispatcher and constructed by case 6.

NOTE — xla::ILPMemoryScheduler is the only memory-scheduling MIP in libtpu.so and one of the OR-Tools math_opt::Solve consumers (alongside auto-sharding's FormulateAndSolveMIPFromProblem and the MSA ILP pass). Among memory schedulers it is the lone solver-backed option; the greedy DefaultMemoryScheduler is the default and the path most compilations take.

Function & Symbol Map

Symbol	Address	Role
`xla::ILPMemoryScheduler::Run`	`0x10acd020`	outer Run: layout, `≤8` fallback, model bootstrap
`(anon)::ILPMemorySchedulerForComputation::Run`	`0x10acdf00`	worker: variables, 11 constraints, objective, solve
`xla::ILPMemoryScheduler::~ILPMemoryScheduler`	`0x10ad6900`	deleting dtor; resets base vptr
vtable `xla::ILPMemoryScheduler`	`0x217fa460`	vptr `off_217FA470`, typeinfo `0x217fa490`; one consumer (dispatcher case 6)
`xla::GetMemorySchedulerAlgorithm`	`0x10abd6a0`	enum dispatch; case `ILP (6)` builds the MIP
`xla::jellyfish::EnableIlpLatencyHidingScheduler`	`0x1d6b7e00`	live gate; reads proto +48 then flag
`xla::jellyfish::GetIlpLatencyHidingSchedulerOptions`	`0x1d6b7e60`	options accessor
`(anon)::RunHloScheduler`	`0x1096fac0`	gate caller; gate at line 1084
classifier `$_1` (ILP-on)	`0x10977140`	wider async predicate + sequence-order pin
classifier `$_2` (regular)	`0x10977420`	narrower async predicate
`IlpLatencyHidingSchedulerOptions::Clear`	`0x1db24ea0`	6-bit presence mask, payload `0x18..0x32`
`math_opt::Solve`	`0x10af9d20`	CP-SAT solve entry
`math_opt::Solver::New`	`0x10b259a0`	registry dispatch over `SolverType`
`math_opt::AllSolversRegistry::Register`	`0x10b273c0`	single caller registers CP-SAT only
`math_opt::CpSatSolver::New`	`0x10adff20`	the one registered backend
`_GLOBAL__sub_I_cp_sat_solver.cc`	`0x212ca300`	static-init: `Register(..., 4u, CpSatSolver::New)`
`RetrieveSchedule` stable-sort	`0x10adf140` / `0x10adf3a0` / `0x10adf5e0`	slot-keyed merge of the solved schedule

Cross-References

LatencyHidingScheduler Core — the canonical scheduler pass that path (A) feeds; the only LHS algorithm that runs
LHS: post_layout / final Variant — the final post-layout scheduling variant in the same pipeline
Scheduler Overview — where scheduling sits between lowering and encoding
ResourceType Taxonomy — the scheduler's 47-ID concurrency model, distinct from this MIP's peak-memory objective
Cost Model Overview — the HLO-level cost model behind GetLatencyBetween, used by path (A)'s canonical scheduler (this MIP uses only a peak-memory size function, not the cost model)
back to index — Part VIII — Instruction Scheduling & Bundle Packing

Keyboard shortcuts

libtpu Internals — Reverse-Engineering Reference