Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

cuda_tile Tree Mapping

Abstract

The public cuda-tile repository contains two C++ source files that describe the dialect as actual code: Interfaces.cpp, which is mostly an ODS-generated stub that hosts the interface implementation; and CudaTileOptimizer.cpp, the standalone tool that takes TileIR in, runs an optimizer pipeline, and emits TileIR or LLVM bytecode out. Together they cover the dialect contract (what verifiers must check) and the dialect's user-facing entry point (how a developer drives the optimizer).

This page maps both files to their tileiras counterparts. The mapping is not symmetric: Interfaces.cpp corresponds to a distributed pattern in tileiras (ODS-generated interface code spread across parser/verifier/printer), and CudaTileOptimizer.cpp has no standalone counterpart at all — its role is absorbed into the full compile-to-GPU pipeline.

Interfaces.cpp and Interfaces.td

The OSS Interfaces.cpp is a one-screen stub:

// Interfaces.cpp (OSS, abbreviated)
#include "CudaTile/IR/Interfaces.h"
#include "CudaTile/IR/Interfaces.cpp.inc"        // ODS-generated TypeInterface bodies
#include "CudaTile/IR/AttrInterfaces.cpp.inc"    // ODS-generated AttrInterface bodies

All real interface code lives in the ODS-generated .cpp.inc files. The declarations in Interfaces.td are what matter for the comparison.

AssumePredicateAttrInterface

Upstream declaration:

// Interfaces.td (OSS)
def AssumePredicateAttrInterface : AttrInterface<"AssumePredicateAttrInterface"> {
  let cppNamespace = "::mlir::cuda_tile";
  let methods = [
    InterfaceMethod<
      "Verify that the predicate is well-formed for a given assume op.",
      "::mlir::LogicalResult", "verifyWithAssumeOp",
      (ins "::mlir::Operation *":$assumeOp)
    >,
  ];
}

Tileiras carries the same interface with the same method signature. The implementation pattern in tileiras: each concrete predicate attribute (DivByAttr, SameElementsAttr, BoundedAttr) declares AssumePredicateAttrInterface in its interfaces ODS field; the ODS expansion produces a per-attribute verifyWithAssumeOp body that runs the attribute-specific check; the cuda_tile.assume op verifier resolves the predicate attribute through the interface and dispatches to the concrete implementation.

AspectOSSTileirasStatus
Interface declarationInterfaces.tdmatching ODS declarationPRESENT
ODS-generated dispatch glueInterfaces.cpp.incinlined into each concrete attribute's verifier slabINLINED
Per-attribute verifier bodyone implementation per predicate attributeone implementation per predicate attributePRESENT
Interface TypeIDone interned TypeID shared by all implementorssame — single interned TypeID per interfacePRESENT

The divergence is the location of the dispatch glue. OSS keeps it in one .cpp.inc that Interfaces.cpp includes; tileiras inlines the same dispatch into each concrete attribute's slab during ODS expansion. Both call the same per-attribute verifyWithAssumeOp body. The semantic contract is identical.

TileView TypeInterface

Upstream declaration:

// Interfaces.td (OSS)
def TileView : TypeInterface<"TileView"> {
  let cppNamespace = "::mlir::cuda_tile";
  let methods = [
    InterfaceMethod<
      "Returns the rank of the view's index space.",
      "int64_t", "getViewIndexRank"
    >,
    InterfaceMethod<
      "Returns the tile type produced when the view is fully indexed.",
      "::mlir::Type", "getViewTileType"
    >,
  ];
}

Tileiras carries the same two methods on the same interface. The view types implementing it are cuda_tile.tensor_view and cuda_tile.partition_view.

AspectOSSTileirasStatus
Interface declarationInterfaces.tdmatching ODS declarationPRESENT
getViewIndexRank()per-view-type implementationper-view-type implementationPRESENT
getViewTileType()per-view-type implementationper-view-type implementationPRESENT
Consumersview-consuming op verifiers call interface methodssame set of view-consuming ops use the same interface methodsPRESENT

Same status as AssumePredicateAttrInterface: declaration identical, dispatch glue location differs, semantics preserved.

AllElementTypeMatch Predicate

Upstream declaration:

// Interfaces.td (OSS, predicate trait)
class AllElementTypeMatch<list<int> indices> : PredOpTrait<...> { ... }

This is not a runtime-dispatched interface — it is a generated ODS predicate that emits a static check into each consuming op's verifier. OSS centralizes the predicate template in Interfaces.td and lets the TableGen expander inline it per use.

Tileiras follows the same model. The predicate is INLINED at every consuming op verifier: the ODS expander emits the same element-type-match check into each verifier body. No central helper exists at runtime in either tree; both spell out the check at every use site.

AspectOSSTileirasStatus
Predicate templateInterfaces.tdidentical templatePRESENT
Runtime helper functionnone — generated inlinenone — generated inlineINLINED
Per-op verifier codeone inlined predicate per consuming opone inlined predicate per consuming opPRESENT

CudaTileOptimizer.cpp

The OSS driver is a standalone tool. Its main function follows a textbook MLIR-tool shape:

// CudaTileOptimizer.cpp (OSS, abbreviated)
int main(int argc, char **argv) {
  mlir::registerAllPasses();
  cuda_tile::registerOptimizerPasses();

  mlir::DialectRegistry registry;
  cuda_tile::registerDialects(registry);

  MLIRContext ctx(registry);
  ctx.loadAllAvailableDialects();

  // Parse input — accepts TileIR bytecode or textual MLIR.
  OwningOpRef<Operation *> module = parseSourceFile(input_file, &ctx);
  if (!module) return 1;

  // Build pass manager rooted at cuda_tile::EntryOp.
  PassManager pm(&ctx);
  pm.addNestedPass<cuda_tile::EntryOp>(createFuseFMAPass());
  pm.addPass(createCanonicalizerPass());
  pm.addPass(createCSEPass());
  pm.addPass(createLoopInvariantCodeMotionPass());
  pm.addNestedPass<cuda_tile::EntryOp>(createLoopSplitPass());

  // Accept optional pre/post textual pipeline fragments.
  applyTextualPipelineFragments(pm, pre_fragment, post_fragment);

  if (failed(pm.run(*module))) return 1;

  // Emit TileIR bytecode, memory bytecode, MLIR file, or MLIR stdout.
  return emitOutput(*module, output_kind, output_file);
}

Tileiras has no standalone optimizer entry point. The same passes — FMA fusion, canonicalization, CSE, LICM, loop splitting — exist or have replacements in the full compile pipeline, but they are reached as part of tileiras_compile(), not as a cuda_tile-opt-style tool. The compile pipeline does not stop at cuda_tile; it lowers through nv_tileaa, nv_tileas, cute_nvgpu, cutlass, and nvvm, then runs the NVPTX backend.

Driver componentOSS behaviorTileiras behaviorDivergence kind
Input formatTileIR bytecode or textual MLIRTileIR bytecode onlySemantic (textual MLIR rejected)
Optimizer anchorcuda_tile::EntryOp-nested pass managerfull pipeline; per-pass anchors varyAnchor-op
FMA fusionFuseFMA at cuda_tile layertileas-legalize-fma-dot plus NVPTX -nvptx-fma-levelLayering (SUPERSEDED)
CanonicalizationcreateCanonicalizerPass() at cuda_tile layercanonicalizer runs after every lowering stageGranularity (split)
CSEcreateCSEPass() at cuda_tile layerCSE runs at multiple lowering layersGranularity (split)
LICMcreateLoopInvariantCodeMotionPass() at cuda_tile layerLICM runs at the nv_tileas and LLVM layersLayering (REWRITTEN)
Loop splittingLoopSplit at cuda_tile layerno equivalent at any layerABSENT
Textual pipeline fragmentsapplyTextualPipelineFragments()no equivalent; pipeline is fixed by opt-levelABSENT
Output: TileIR bytecodeemit bytecodenot exposed as terminal outputABSORBED
Output: TileIR memory bytecodeemit memory bytecodenot exposed as terminal outputABSORBED
Output: MLIR file/stdoutemit textual MLIRnot exposed as terminal outputABSORBED
Output: LLVM bitcodeemit LLVM bitcodenot exposed as terminal outputABSORBED
Terminal outputone of the four abovePTX text or CUBIN binaryLayering
Pass registrationone helper that adds the optimizer passesdistributed across dialect and extension installersStructural

The driver's anchor — cuda_tile::EntryOp — is the structural reason the OSS optimizer cannot be lifted directly into tileiras. Once the pipeline lowers past cuda_tile, no EntryOp exists to anchor a nested pass manager against. The OSS-style pass scheduling assumes the IR stays in cuda_tile for the entire optimizer run; tileiras's pipeline schedules each pass against whichever dialect is current at that point.

What Survives

The pass concepts survive. FMA fusion is a real concern in tileiras — it just happens at the TileAS and NVPTX backend layers rather than at cuda_tile. Canonicalization, CSE, and LICM are real concerns in tileiras — they run between every lowering stage rather than in one batch. Loop splitting is the one OSS pass with no tileiras counterpart at any layer; a reimplementer adding it would be extending tileiras's capabilities rather than reproducing them.

What Does Not Survive

The standalone cuda-tile-opt-style tool does not survive. The four output kinds do not survive at the tool level. The textual pipeline fragments do not survive — tileiras's pipeline is built per opt-level by a fixed builder rather than assembled from caller-supplied textual fragments.

A tileiras-compatible compiler should not expose a CudaTileOptimizer-shaped tool unless the goal is to add a tile-level optimizer that does not exist in the released binary. The full compile pipeline is the supported entry point.

Generated Code Layout

Both files include ODS-generated .cpp.inc content. The mapping for the generated pieces:

Generated artifactOSSTileirasStatus
Dialect.cpp.inc (dialect registration)included in dialect translation unitinlined into the dialect ctor slabINLINED
Ops.cpp.inc (op classes)included in ops translation unitinlined into per-op slabsINLINED
Types.cpp.inc (type classes)included in types translation unitinlined into per-type slabsINLINED
AttrDefs.cpp.inc (attribute classes)included in attrs translation unitinlined into per-attribute slabsINLINED
Interfaces.cpp.incincluded in Interfaces.cppinlined into each concrete implementorINLINED
AttrInterfaces.cpp.incincluded in Interfaces.cppinlined into each concrete implementorINLINED
Passes.cpp.inc (pass registration helpers)included in pass-registration TUspread across dialect and extension installersREWRITTEN

The cross-cutting pattern: tileiras's LTO build inlines ODS-generated dispatch into each concrete consumer rather than concentrating it in central includes. The behavior at the source-language level is identical; the build-time factoring differs.

Reimplementation Guidance

For a tileiras-compatible reimplementation:

  • Use OSS Interfaces.td as the authoritative declaration of the dialect's interfaces. The three interfaces (AssumePredicateAttrInterface, TileView, AllElementTypeMatch) are unchanged.
  • Implement AssumePredicateAttrInterface on the three predicate attributes (DivByAttr, SameElementsAttr, BoundedAttr). Implement TileView on the two view types (cuda_tile.tensor_view, cuda_tile.partition_view).
  • Do not expose a standalone cuda-tile-opt-shaped tool. The driver layer in tileiras is tileiras_create_program + tileiras_compile_program + tileiras_get_output; reimplement those, not the OSS optimizer.
  • Do not accept textual MLIR input — tileiras consumes TileIR bytecode only.
  • Do not register a LoopSplit pass for compatibility. It has no tileiras counterpart.
  • FMA fusion, canonicalization, CSE, and LICM should run at the tileiras-equivalent layers (TileAS for FMA, between every lowering stage for the others), not at the cuda_tile layer with the OSS scheduling.

Cross-References