This crate compiles Rust DSL kernels into Tile IR bytecode for GPU execution
via tileiras. Most users interact with it indirectly through cutile and
cutile-macro.
The runtime resolves tileiras in this order:
CUTILE_TILEIRAS_PATH, when set.$CUDA_TOOLKIT_PATH/bin/tileiras, whenCUDA_TOOLKIT_PATHis set and the binary exists there.- Standard CUDA 13.3/13.2 install locations, when they contain
bin/tileiras. tileirasthrough normalPATHlookup.
Set CUTILE_TILEIRAS_PATH to force a specific binary:
CUTILE_TILEIRAS_PATH=/opt/cuda-tile/bin/tileiras \
cargo test -p cutile-compilerSet CUTILE_SETUP_DIAGNOSTICS=1 to print CUDA toolkit and tileiras discovery
decisions during setup.
cargo test -p cutile-compilerSet CUTILE_DUMP to inspect the compiler's internal state after each pass.
Output goes to stderr.
# Dump the Tile IR for all kernels:
CUTILE_DUMP=ir cargo test -p cutile --test my_test -- --nocapture
# Dump multiple stages:
CUTILE_DUMP=resolved,typed,ir cargo test ...
# Dump everything:
CUTILE_DUMP=all cargo test ...| Stage | Description |
|---|---|
ast |
Raw syn AST before any passes |
resolved |
After name resolution (paths resolved) |
typed |
After type inference (types annotated) |
instantiated |
After monomorphization (no generics remain) |
ir |
cutile-ir Module, pretty-printed |
bytecode / bc |
Encoded bytecode, decoded to human-readable text |
Use CUTILE_DUMP_FILTER to limit output to specific kernels:
# By function name (matches in any module):
CUTILE_DUMP=ir CUTILE_DUMP_FILTER=my_kernel cargo test ...
# By qualified path (module::function):
CUTILE_DUMP=ir CUTILE_DUMP_FILTER=my_module::my_kernel cargo test ...
# Multiple filters (comma-separated):
CUTILE_DUMP=ir CUTILE_DUMP_FILTER=add,gemm cargo test ...TILE_IR_DUMP=1 is still supported as an alias for CUTILE_DUMP=ir.