Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CycleTable Family

Every offset, value, and address on this page was read byte-exactly from libtpu.so in the libtpu-0.0.40-cp314 wheel (BuildID md5 89edbbe81c5b328a958fe628a9f2207d). Other versions differ. All .rodata addresses are virtual addresses; for this binary .rodata VMA == file offset (section [11] at 0x84a0000), and .text VMA == file offset.

Abstract

xla::jellyfish::CycleTable is the abstract base of the per-generation throughput half of the TensorCore cost model. Its one job is to translate a cycle class — an opaque CycleTable::Instruction enum value, not an LLO opcode — into the bundle-issue cycle cost of the slowest functional unit that class occupies. A reimplementer should picture two collaborating objects: the CycleTable (this family), which answers "how many issue cycles does cycle-class I cost, and which resource lane does it block?", and a per-gen Performance grid, which is the flat .rodata-baked array the CycleTable actually reads. The CycleTable owns no cycle numbers itself; it holds a Performance* at +0x10 and indexes into it.

There is exactly one subclass per tpu::TpuVersion. The six registered factories (TpuVersion 0..5) collapse to four distinct read strategies: a flat offset-LUT (JfCycleTable, shared by the two oldest gens), and three switch-over-Performance::GetResourceUsage forms (PfCycleTable, VfCycleTable, and the helper-wrapped GlcCycleTable/GfcCycleTable). The cycle-class enumeration is a dense 0x00..0x20 (33 values) shared across all gens; each gen prices a subset and returns the default 1 cycle for the rest.

This page is the framing reference for the family: it gives the class hierarchy, the registration/dispatch path, the structure of the 33-value cycle-class enum, and how the family sits next to the Performance grids and the MXU latency reservation matrices. The per-gen read paths and the baked constants live in JfCycleTable, VfCycleTable, and Per-Opcode Cycle Constants.

The contract a reimplementer must honor:

  • One CycleTable per TpuVersion, selected by a factory registry, not by if-chains. CycleTable::Create(const Target&) reads target.tpu_version() and invokes the registered lambda; an unregistered version is fatal.
  • The Instruction argument is a cycle class, not an opcode. LLO opcodes are first folded into one of 33 classes by CycleTableInstruction(LloInstruction*); only MXU matmul/latch/matprep opcodes are classified there.
  • GetCyclesForThroughput(I) is bundle-issue throughput, not dependency latency. The latency axis is a separate object (LatencyTable*); the JfCycleTable vtable has no latency slot.
  • GetResource(I) names the lane the class blocks. The scheduler adds (double)cycles into ResourceVector[GetResource(I)]; a bundle's contribution is the max over its slots' lane costs, not the sum.
  • Transcendentals are priced by scalar virtual overrides (EstimateSinCosCost, EstimateTanCost), not by the cycle-class LUT.
Abstract basexla::jellyfish::CycleTable (pure-virtual GetCyclesForThroughput)
JfCycleTable vtable0x21c1ffb8 — slots: dtor pair, GetCyclesForThroughput @ 0x1c89dce0, EstimateSinCosCost, EstimateTanCost
FactoryCycleTable::Create(const Target&) @ 0x1c89cc00 — dispatches on target.tpu_version()
Sibling factoryLatencyTable::Create(TpuVersion) @ 0x1c89fba0 — ordinal-indexed AnyInvocable vector, not a hash registry
Registration_GLOBAL__sub_I_cycle_table.cc @ 0x21353460 — 6 × FunctionRegistry::Register(version, λ)
Cycle-class enumCycleTable::Instruction — dense 0x00..0x20 (33 values); per-gen priced subset
Opcode → classCycleTableInstruction(LloInstruction*) @ 0x1c89ca80 — MXU band only
Throughput accessorGetCyclesForThroughput(Instruction) — per-gen virtual; default 1
Resource accessorCycleTable::GetResource(Instruction) @ 0x1c89ce20 — flat LUT dword_B438AEC
Underlying gridper-gen Performance object held at CycleTable+0x10

What A CycleTable Is

The cost model is split into two strictly separated levels (the split is the central design fact of this subsystem):

  1. CycleTable — a thin per-gen virtual dispatcher keyed on a cycle class. It answers two questions about a class I: GetCyclesForThroughput(I) (issue cycles) and GetResource(I) (which functional-unit lane). It stores no cycle numbers of its own; it owns a Performance*.
  2. Performance — the per-gen flat array of baked cycle constants, allocated 0xe00 bytes wide and zero-cost to query (Performance[byte_offset]). The CycleTable is the only thing that knows how to map a class to a byte offset (JF/DF) or an instruction/resource pair (PF and later).

Both the throughput cycle (this family) and the dependency latency (the LatencyTable* family) ultimately read the same per-gen Performance object; they are just different accessor methods over it (GetCyclesForThroughput/GetResourceUsage vs GetLatency/GetLatencyBetween). The JfCycleTable vtable carries no GetLatency slot at all — its slots are throughput (+0x10), EstimateSinCos (+0x18), EstimateTan (+0x20), and two device-detail accessors. On Jellyfish/Dragonfish the latency axis is supplied entirely by a separate LatencyTableJellyfish; see bundle-aware cost.

A reimplementer should not try to merge the two levels. The Performance grid is gen-private data; the CycleTable is the gen-private access pattern over it. The 33-class enum is the shared vocabulary between them.


The Class Hierarchy And Per-Version Dispatch

                 xla::jellyfish::CycleTable          (abstract base; JfCycleTable vtable @ 0x21c1ffb8)
                          |
   +----------+-----------+-----------+-------------+--------------+
   |          |           |           |             |              |
JfCycleTable (TpuVersion 0 + 1)   PfCycleTable     VfCycleTable   GlcCycleTable   GfcCycleTable
  flat offset-LUT                  (TpuVersion 2)   (TpuVersion 3) (TpuVersion 4)  (TpuVersion 5)
  jellyfish v2 / dragonfish v3     pufferfish v4    viperfish v5   ghostlite v6L   6acc60406 (TPU7x)

The six factories are registered once by the cycle_table.cc static initializer (_GLOBAL__sub_I_cycle_table.cc @ 0x21353460), each as a FunctionRegistry<TpuVersion, unique_ptr<CycleTable>(const Target&)> entry. Selection happens in CycleTable::Create @ 0x1c89cc00, which reads the version off the Target, looks up the lambda, and invokes it; a missing registration is a fatal log ("No cycle table registered for platform: ", cycle_table.cc:960).

TpuVersionCodenameSubclassGetCyclesForThroughputRead strategy
0jellyfish (v2)JfCycleTable0x1c89dce0flat offset-LUT (kUnusedRegisterJfCycleTable)
1dragonfish (v3)JfCycleTable0x1c89dce0flat offset-LUT (kUnusedRegisterDfCycleTable)
2pufferfish (v4)PfCycleTable0x1c89de60switchGetResourceUsage(instr,res)
3viperfish (v5)VfCycleTable0x1c89e2c0switchGetResourceUsage(instr,res)
4ghostlite (v6 lite)GlcCycleTable0x1c89e980 (wrapper)helper 0x1c89ed20 + CHECK(ok)
56acc60406 (TPU7x)GfcCycleTable0x1c89f060 (wrapper)helper 0x1c89f400 + CHECK(ok)

NOTE — TpuVersion 0 and 1 share one class. Both the jellyfish (v0) and dragonfish (v1) factories produce a JfCycleTable — the two registry entries bind the symbols kUnusedRegisterJfCycleTable and kUnusedRegisterDfCycleTable respectively, both to the same JfCycleTable factory lambda. Both read GetCyclesForThroughput @ 0x1c89dce0 over the same throughput offset-LUT; the gens differ only in their Performance grids, and the cells that differ are not throughput-LUT targets — so GetCyclesForThroughput is identical for JF and DF. See JfCycleTable.

The two later gens (Glc, Gfc) wrap a …Helper that returns an absl::Status-style result; the public GetCyclesForThroughput calls the helper and CHECKs success (MakeCheckFailString(..., "cycles is OK"), fatal at cycle_table.cc:817) before returning the cycle integer. A reimplementation can flatten this into a direct return; the wrapper exists only to surface "unschedulable class" as a fatal rather than a silent default.


The Sibling Factory — LatencyTable::Create(TpuVersion)

The throughput half (CycleTable) and the dependency-latency half (LatencyTable) are parallel per-gen families, but their factories use two structurally different dispatch mechanisms, and a reimplementer must not assume one mirrors the other.

CycleTable::Create(const Target&) @ 0x1c89cc00 keys a FunctionRegistry<TpuVersion, …> (a hash map). LatencyTable::Create(tpu::TpuVersion) @ 0x1c89fba0 instead keys a dense inlined-vector indexed by the version ordinal — there is no cmp $0xN switch and no hash lookup. The ordinal is bounds-checked against the vector size, then the stored absl::AnyInvocable factory at byte offset +0x18 of the version-th 32-byte slot is called directly:

// xla::jellyfish::LatencyTable::Create @ 0x1c89fba0 (decompiled, exact shape)
unique_ptr<LatencyTable> LatencyTable::Create(tpu::TpuVersion v) {
    Vector *reg = registry;                              // @ 0x225799f8, file-local inlined_vector
    if (reg == nullptr)
        LogFatal("registry", /*latency_table.cc:0x78*/); // "registry" non-null CHECK
    if ((int64_t)v < 0 || (size_t)v >= reg->size())      // signed + size bound, both fatal
        LogFatal(/*latency_table.cc:0x7a / 0x7b*/);
    void *entry = reg->data() + ((size_t)v << 5);        // 32-byte AnyInvocable stride
    void (*fn)() = *(void**)(entry + 0x18);              // stored factory pointer
    if (fn == nullptr)
        LogFatal("registered", /*latency_table.cc:0x7c*/);
    return fn(entry);                                    // call *%rax  @ 0x1c89fc0f
}

The version→ctor binding is therefore not visible inside Create — it is written at static-init time by five separate translation-unit initializers, each calling LatencyTable::Register(version, AnyInvocable) @ 0x1c89fac0 (which Resizes the vector to version+1 and stores the invoker at slot +0x18). The dispatch tail is the union of those five initializers:

OrdinalCodenameInitializer (.text.startup)Register(v, λ) argFactory invoker λConcrete ctor (new size)
0jellyfish (v2)_GLOBAL__sub_I_latency_table_jf.cc @ 0x21353860mov $0x0,%edi @ 0x21353885LocalInvoker<jellyfish::$_0> @ 0x1c8a1280LatencyTableJellyfish::C1 @ 0x1c8a0c20 (new 0x58)
1dragonfish (v3)_GLOBAL__sub_I_latency_table_jf.cc @ 0x21353860mov $0x1,%edi @ 0x213538a9LocalInvoker<jellyfish::$_1> @ 0x1c8a12c0LatencyTableJellyfish::C1 @ 0x1c8a0c20 (new 0x58)
2pufferfish (v4)_GLOBAL__sub_I_latency_table_pf.cc @ 0x213538d0mov $0x2,%edi @ 0x213538f3LocalInvoker<pufferfish::$_0> @ 0x1c8a31c0LatencyTablePufferfish::C1 @ 0x1c8a1960 (new 0x1e0)
3viperfish (v5)_GLOBAL__sub_I_latency_table_vf.cc @ 0x21353920mov $0x3,%edi @ 0x21353943LocalInvoker<viperfish::$_0> @ 0x1c8a5280LatencyTableViperfish::C1 @ 0x1c8a3f20 (new 0x1e0)
4ghostlite (v6 lite)_GLOBAL__sub_I_latency_table_gl.cc @ 0x21353970mov $0x4,%edi @ 0x21353993LocalInvoker<ghostlite::$_0> @ 0x1c8b28e0LatencyTableGhostlite::C1 @ 0x1c8b0c00 (new 0x1e0)
56acc60406 (TPU7x)_GLOBAL__sub_I_latency_table_gf.cc @ 0x213539c0mov $0x5,%edi @ 0x213539e3GF invoker λ @ 0x1c8bb180 (symbol-coalesced)GF LatencyTable ctor @ 0x1c8b9520 (new 0x1e0)

NOTE — LatencyTable and CycleTable factories are not the same machine. CycleTable::Create uses a FunctionRegistry hash map and dispatches by lambda lookup; LatencyTable::Create uses a flat ordinal-indexed inlined_vector<AnyInvocable, 8> (file-local registry @ 0x225799f8) and dispatches by registry[version](entry) (call *0x18(%rdi,version<<5)). The "no factory registered" failure modes also differ: CycleTable logs "No cycle table registered for platform: "; LatencyTable::Create instead emits three distinct fatal CHECKs (registry non-null at latency_table.cc:0x78, ordinal in-bounds at 0x7a/0x7b, slot non-null at 0x7c). A reimplementer can collapse both into one ordinal-keyed table but must preserve the bounds CHECK before the indirect call.

NOTE — JF and DF share LatencyTableJellyfish, like the cycle side. Ordinals 0 and 1 register two distinct lambdas (jellyfish::$_0 @ 0x1c8a1280, jellyfish::$_1 @ 0x1c8a12c0) but both new (0x58) and tail-call the same ctor LatencyTableJellyfish::C1 @ 0x1c8a0c20. The two lambdas exist only because JF and DF are registered as separate ordinals; the constructed object type is identical. This mirrors the JfCycleTable-for-both-gens fact above. Note the JF object is 0x58 bytes whereas PF/VF/GL/GF objects are all 0x1e0 bytes — the later gens carry the per-MatmulModifier/VlxmrModifier MXU-latency maps inline.

QUIRK — the GF (ordinal 5) factory and ctor are symbol-stripped. Unlike the other five arms, the GF invoker (0x1c8bb180) and the GF LatencyTable ctor (0x1c8b9520) carry no own demangled symbol in nm; both were folded under the neighbouring raw_hash_set<…VlxmrModifier…>::find symbol (at +0x1d80 and +0x120 respectively). The GF identity is still byte-anchored three ways: (1) latency_table_gf.cc's initializer Registers ordinal 5 with this exact invoker pointer (lea 0x1c8bb180 @ 0x213539c9); (2) the invoker new (0x1e0)s and calls 0x1c8b9520; (3) 0x1c8b9520 calls the LatencyTable base ctor (0x1c89f800), zero-fills a 0x1e0-byte body, installs its own pair of vtables (0x21c20930+0x10 at [obj], +0x48 at [obj+0x18]), and loads the ghostlite GetSharedMxuLatency singleton (0x22579a70) — the VlxmrModifier (variable-latency MXU modifier) type and the gf TU name jointly mark it as the GF/TPU7x generation. The symbol coalescing is a linker ICF artifact, not a missing function.

NOTE — base vs subclass LatencyTable. xla::jellyfish::LatencyTable is the abstract base (C2 ctor @ 0x1c89f800, providing LatencyBetween @ 0x1c89f820, IsTrueDependencyBetween, HasSetPermutePatternReservation, etc.). Every per-gen arm above subclasses it: the JF arm in the jellyfish namespace, PF in pufferfish, VF in viperfish, GL/GF in ghostlite (the GF ctor reuses the ghostlite shared MXU-latency table). The base LatencyTable::Create/Register/registry triple lives in the jellyfish namespace and is shared across all gens.


The Cycle-Class Enumeration (CycleTable::Instruction)

CycleTable::Instruction is a dense enum 0x00..0x20 (33 values). It is not an LLO opcode and not the Performance/GhPerf::Instruction grid index — it is a coarse bucketing of MXU and vector functional behavior, shared verbatim across all six gens. The role each class plays is stable across gens even though the cycle integers differ.

ClassRoleJF/DF
0x00Vector matprep, bf168
0x010x04matprep variants (bf16/fp8 family)default 1
0x05Latch / push gains, bf168
0x06Latch, int4default 1
0x07,0x086acc60406 new latch paths (latch_mode 48/50)default 1
0x09Latch, fp8default 1
0x0aPushGainsS4 — fatal on PF/VF ("Unsupported PushGainsS4.", cycle_table.cc:682)default 1
0x0bTransposed bf16 latch8
0x0cTransposed int8 latchdefault 1
0x0d,0x0e6acc60406 transposed latch (latch_mode 49/51)default 1
0x0fTransposed fp8 latchdefault 1
0x10transposed PushGainsS4 — fatal on PF/VF, same "Unsupported PushGainsS4." string as 0x0adefault 1
0x11Vector EUP classdefault 1
0x12,0x13XLU rotate in/out (RotIn/RotOut)1
0x14Shuffle / permute1
0x15,0x16Broadcast / reduce1
0x17Matrix-result read (TC)8
0x18Read/write transpose register1
0x19Cross-lane reduction1
0x1aLane comparison / EUP edge1
0x1bMatrix-result read, primary8
0x1cMatrix-result read, secondary8
0x1d,0x1eEUP unary primary/secondarydefault 1
0x1fMatrix-result read8
0x20Transcendental class (vector ALU "any")1

QUIRK — "33 classes" vs "16 priced." The enum spans 0x00..0x20 but JfCycleTable prices only 16 of them (the rest fall through to the default 1). The priced set is pinned by the literal mask 0x19FFC0821 (see JfCycleTable); later gens price additional bf16/fp8 latch and matprep variants. The JF/DF column above shows 8 for the seven MXU classes and 1 for the nine priced vector classes; "default 1" marks the classes the JF mask leaves unpriced. The role labels are stable across gens — a reimplementer ports the role-to-class map once and re-prices per gen.

Folding opcodes into classes — CycleTableInstruction

xla::jellyfish::CycleTableInstruction(LloInstruction*) @ 0x1c89ca80 is the only producer of MXU cycle classes. It classifies exactly two opcode bands and is fatal on anything else:

// xla::jellyfish::CycleTableInstruction @ 0x1c89ca80 (decompiled, exact shape)
uint32_t CycleTableInstruction(const LloInstruction *insn) {
    uint32_t op = insn->opcode;
    if ((uint16_t)(op - 141) <= 9) {                 // opcodes 141..150 = matmul/latch band
        uint8_t lm = insn->latch_mode();
        if (lm >= 0x34 || !bittest64(0xF000003FFFC3F, lm))
            LogFatal("Unsupported gain latch mode ", /*cycle_table.cc:431*/);
        return unk_B4389F4[lm];                       // latchLUT @ 0xb4389f4, 52 × int32
    }
    if ((uint16_t)(op - 155) <= 0xA) {                // opcodes 155..165 = matprep/matpush band
        uint8_t f = insn->matmul_data_format() - 1;
        if (f >= 0xA)
            LogFatal("Unsupported matmul data format ", /*cycle_table.cc:464*/);
        return unk_B438AC4[f];                         // fmtLUT @ 0xb438ac4, indices 0..9 read
    }
    LogFatal("Unsupported instruction ", /*cycle_table.cc:470*/);
}

Two .rodata lookup tables turn the MXU modifier into a cycle class:

TableAddressShapeMaps
latchLUT (unk_B4389F4)0xb4389f452 × int32, valid mask 0xF000003FFFC3FGainLatchModeInstruction
fmtLUT (unk_B438AC4)0xb438ac4int32[], indexed by matmul_data_format()-1; classifier reads indices 0..9 only (< 0xA guard)MatmulDataFormatInstruction

The latch LUT (mask bits verified against the raw table): 0x00/0x02/0x04 → 5, 0x01/0x03/0x05 → 11, 0x0a → 12, 0x0b/0x0e/0x10 → 6, 0x0c → 9, 0x0d → 15, 0x0f/0x11 → 12, 0x12/0x14/0x16/0x18 → 9, 0x13/0x15/0x17/0x19 → 15, 0x30 → 7, 0x31 → 13, 0x32 → 8, 0x33 → 14. The fmt LUT (index = format-1) reads [0,1,1,1,4,4,4,4,2,3,1]; the < 0xA guard means only indices 0..9 are reachable, i.e. fmt 1 → 0, fmt 2/3/4 → 1, fmt 5/6/7/8 → 4, fmt 9 → 2, fmt 10 → 3. The 11th entry (fmt 11 → 1) exists in .rodata but is rejected by this classifier as a fatal "Unsupported matmul data format " (cycle_table.cc:464); it is read only by later-gen paths. CycleTableInstruction itself is gen-independent — the same classifier produces MXU cycle classes for every gen. The vector/EUP/matrix-result classes (0x11..0x20) are produced by non-MXU emitter paths, not by CycleTableInstruction.

NOTE — the format LUT is wider than the classifier reads. The matmul_data_format()-1 validity check is < 0xA, so CycleTableInstruction reads only the first 10 fmt entries (formats 1..10). The 11th-and-beyond format values (packed int8/int4) are used by later-gen Performance/MxuLatency paths and are documented with matmul mode modifiers; the shared classifier rejects them as fatal.


The Resource Side — CycleTable::GetResource

CycleTable::GetResource(Instruction) @ 0x1c89ce20 is a single flat lookup, shared by all gens:

// xla::jellyfish::CycleTable::GetResource @ 0x1c89ce20 (decompiled, exact)
int GetResource(int instruction) {
    return dword_B438AEC[instruction];               // resLUT @ 0xb438aec, 33 × int32
}

The returned value is directly the slot index into a per-op ResourceVectorAccumulateInstructionUsage does ResourceVector::Acc(GetResource(I), (double)GetCyclesForThroughput(I)), and ResourceVector::Acc (0x1c89adc0) is [rdi + Resource*8] += cycles with a cmp esi, 0x17 bound (23 slots). The JF/DF resource LUT emits only the values 0..6 — the MXU/vector head of the 23-slot accumulator (see the resource enum):

GetResource valueResourceVector slotNameJF/DF occupant classes
0R[0]Matpush0x05..0x10 (latch band)
1R[1]Matmul0x00..0x04 (matprep band)
2R[2]Xlu0x17, 0x1b..0x1f (matrix-result / cross-lane)
3R[3]VectorAlu00x14
4R[4]VectorAlu10x12, 0x13
5R[5]VectorAluAny0x15, 0x16, 0x19, 0x20
6R[6]VectorEup0x11, 0x18, 0x1a

This is the mechanism by which the cost model models resource conflict: two classes that map to the same lane add (sequential on that unit); two that map to different lanes overlap (the scheduler takes the per-lane max across a bundle). The ResourceVector enum names are the binding from the symbol table; the semantic micro-port mapping under each R[k] name is an interpretation.


How This Relates To Performance And MxuLatency

Table familyWhat it answersAccessorPage
CycleTable (this page)issue cycles + lane for a cycle classGetCyclesForThroughput, GetResourcehere
Performancethe baked per-gen cycle grid the CycleTable readsGetResourceUsage(instr,res), GetLatencyoverview
MxuLatencyper-(MatmulModifier × Resource) matmul/matprep cyclesGetResourceUsage (keyed map)overview
LatencyTable*read-after-write dependency latencyGetLatency, GetLatencyBetweenbundle-aware cost

The clean way to read the picture: the CycleTable is the index logic; the Performance grid is the data; the MxuLatency map is a per-gen override of the matmul/matprep cells when the simple (instruction, resource) lookup is too coarse (matmul cost depends on (format, transpose-flag, MSR/MRB target), not a single per-opcode constant). The LatencyTable* is an orthogonal axis the scheduler combines with the throughput cycles via the per-op ResourceVector. The concrete per-gen integers — including the seven JF/DF 8-cycle MXU cells and the matmul base-latency clusters (bf16/F32 = 131 / 192 / 212, fp8 = 114-115 / 192 / 204 on Vf/Gl/Gf, resolved against the per-gen Performance latency arrays; note these are op base latencies, not the small per-resource throughput cells the MxuLatencyTable returns, and the GL/GF matmul_latencies_ map carries the sibling pair 192/182 (GL) and 211/204 (GF) — see MXU Latency: GL/GF) — are tabulated in Per-Opcode Cycle Constants.


Cross-References

  • JfCycleTable — the flat offset-LUT read path, the 16-of-33 priced subset, and the 7-column resource naming for the oldest gens.
  • VfCycleTable — the Viperfish switch-over-GetResourceUsage read path.
  • Per-Opcode Cycle Constants — the baked .rodata cycle tables grouped by gen/engine, and how the bundle-latency cost model sums them.
  • Performance Family Overview — the per-gen Performance<gen> grid the CycleTable indexes into.
  • MXU Latency Overview — the per-(MatmulModifier × Resource) reservation matrices that override the simple matmul cells.
  • Resource Enum — the 23-slot ResourceVector whose head R[0]..R[6] the JF/DF resource LUT emits.
  • Bundle-Aware Cost — how per-op throughput cycles and per-op resource vectors combine into a bundle issue cost.
  • Bundle Model Overview — the VLIW bundle layer the cost model attributes cycles to.
  • Binary: extracted/libtpu-0.0.40-cp314-cp314-manylinux_2_31_x86_64/libtpu/libtpu.so (build-id 89edbbe81c5b328a958fe628a9f2207d)
  • Index entry: Part VII — Cost & Latency Model / CycleTable — back to index