263 lines
9.1 KiB
Markdown
263 lines
9.1 KiB
Markdown
# Chapter 3: High-level Language-Specific Analysis and Transformation
|
|
|
|
[TOC]
|
|
|
|
Creating a dialect that closely represents the semantics of an input language
|
|
enables analyses, transformations and optimizations in MLIR that require
|
|
high-level language information and are generally performed on the language AST.
|
|
For example, `clang` has a fairly
|
|
[heavy mechanism](https://clang.llvm.org/doxygen/classclang_1_1TreeTransform.html)
|
|
for performing template instantiation in C++.
|
|
|
|
We divide compiler transformations into two categories: local and global. In
|
|
this chapter, we focus on how to leverage the Toy Dialect and its high-level
|
|
semantics to perform local pattern-match transformations that would be difficult
|
|
in LLVM. For this, we use MLIR's
|
|
[Generic DAG Rewriter](../../PatternRewriter.md).
|
|
|
|
There are two methods that can be used to implement pattern-match
|
|
transformations: 1. Imperative, C++ pattern-match and rewrite 2. Declarative,
|
|
rule-based pattern-match and rewrite using table-driven
|
|
[Declarative Rewrite Rules](../../DeclarativeRewrites.md) (DRR). Note that the
|
|
use of DRR requires that the operations be defined using ODS, as described in
|
|
[Chapter 2](Ch-2.md).
|
|
|
|
## Optimize Transpose using C++ style pattern-match and rewrite
|
|
|
|
Let's start with a simple pattern and try to eliminate a sequence of two
|
|
transposes that cancel out: `transpose(transpose(X)) -> X`. Here is the
|
|
corresponding Toy example:
|
|
|
|
```toy
|
|
def transpose_transpose(x) {
|
|
return transpose(transpose(x));
|
|
}
|
|
```
|
|
|
|
Which corresponds to the following IR:
|
|
|
|
```mlir
|
|
toy.func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> {
|
|
%0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
|
|
%1 = toy.transpose(%0 : tensor<*xf64>) to tensor<*xf64>
|
|
toy.return %1 : tensor<*xf64>
|
|
}
|
|
```
|
|
|
|
This is a good example of a transformation that is trivial to match on the Toy
|
|
IR but that would be quite hard for LLVM to figure. For example, today Clang
|
|
can't optimize away the temporary array, and the computation with the naive
|
|
transpose is expressed with these loops:
|
|
|
|
```c++
|
|
#define N 100
|
|
#define M 100
|
|
|
|
void sink(void *);
|
|
void double_transpose(int A[N][M]) {
|
|
int B[M][N];
|
|
for(int i = 0; i < N; ++i) {
|
|
for(int j = 0; j < M; ++j) {
|
|
B[j][i] = A[i][j];
|
|
}
|
|
}
|
|
for(int i = 0; i < N; ++i) {
|
|
for(int j = 0; j < M; ++j) {
|
|
A[i][j] = B[j][i];
|
|
}
|
|
}
|
|
sink(A);
|
|
}
|
|
```
|
|
|
|
For a simple C++ approach to rewrite, involving matching a tree-like pattern in
|
|
the IR and replacing it with a different set of operations, we can plug into the
|
|
MLIR `Canonicalizer` pass by implementing a `RewritePattern`:
|
|
|
|
```c++
|
|
/// Fold transpose(transpose(x)) -> x
|
|
struct SimplifyRedundantTranspose : public mlir::OpRewritePattern<TransposeOp> {
|
|
/// We register this pattern to match every toy.transpose in the IR.
|
|
/// The "benefit" is used by the framework to order the patterns and process
|
|
/// them in order of profitability.
|
|
SimplifyRedundantTranspose(mlir::MLIRContext *context)
|
|
: OpRewritePattern<TransposeOp>(context, /*benefit=*/1) {}
|
|
|
|
/// This method is attempting to match a pattern and rewrite it. The rewriter
|
|
/// argument is the orchestrator of the sequence of rewrites. It is expected
|
|
/// to interact with it to perform any changes to the IR from here.
|
|
mlir::LogicalResult
|
|
matchAndRewrite(TransposeOp op,
|
|
mlir::PatternRewriter &rewriter) const override {
|
|
// Look through the input of the current transpose.
|
|
mlir::Value transposeInput = op.getOperand();
|
|
TransposeOp transposeInputOp = transposeInput.getDefiningOp<TransposeOp>();
|
|
|
|
// Input defined by another transpose? If not, no match.
|
|
if (!transposeInputOp)
|
|
return failure();
|
|
|
|
// Otherwise, we have a redundant transpose. Use the rewriter.
|
|
rewriter.replaceOp(op, {transposeInputOp.getOperand()});
|
|
return success();
|
|
}
|
|
};
|
|
```
|
|
|
|
The implementation of this rewriter is in `ToyCombine.cpp`. The
|
|
[canonicalization pass](../../Canonicalization.md) applies transformations
|
|
defined by operations in a greedy, iterative manner. To ensure that the
|
|
canonicalization pass applies our new transform, we set
|
|
[hasCanonicalizer = 1](../../DefiningDialects/Operations.md/#hascanonicalizer) and register the
|
|
pattern with the canonicalization framework.
|
|
|
|
```c++
|
|
// Register our patterns for rewrite by the Canonicalization framework.
|
|
void TransposeOp::getCanonicalizationPatterns(
|
|
RewritePatternSet &results, MLIRContext *context) {
|
|
results.add<SimplifyRedundantTranspose>(context);
|
|
}
|
|
```
|
|
|
|
We also need to update our main file, `toyc.cpp`, to add an optimization
|
|
pipeline. In MLIR, the optimizations are run through a `PassManager` in a
|
|
similar way to LLVM:
|
|
|
|
```c++
|
|
mlir::PassManager pm(module.getContext());
|
|
pm.addNestedPass<mlir::toy::FuncOp>(mlir::createCanonicalizerPass());
|
|
```
|
|
|
|
Finally, we can run `toyc-ch3 test/Examples/Toy/Ch3/transpose_transpose.toy
|
|
-emit=mlir -opt` and observe our pattern in action:
|
|
|
|
```mlir
|
|
toy.func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> {
|
|
%0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
|
|
toy.return %arg0 : tensor<*xf64>
|
|
}
|
|
```
|
|
|
|
As expected, we now directly return the function argument, bypassing any
|
|
transpose operation. However, one of the transposes still hasn't been
|
|
eliminated. That is not ideal! What happened is that our pattern replaced the
|
|
last transform with the function input and left behind the now dead transpose
|
|
input. The Canonicalizer knows to clean up dead operations; however, MLIR
|
|
conservatively assumes that operations may have side-effects. We can fix this by
|
|
adding a new trait, `NoMemoryEffect`, to our `TransposeOp`:
|
|
|
|
```tablegen
|
|
def TransposeOp : Toy_Op<"transpose", [NoMemoryEffect]> {...}
|
|
```
|
|
|
|
Let's retry now `toyc-ch3 test/transpose_transpose.toy -emit=mlir -opt`:
|
|
|
|
```mlir
|
|
toy.func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> {
|
|
toy.return %arg0 : tensor<*xf64>
|
|
}
|
|
```
|
|
|
|
Perfect! No `transpose` operation is left - the code is optimal.
|
|
|
|
In the next section, we use DRR for pattern match optimizations associated with
|
|
the Reshape op.
|
|
|
|
## Optimize Reshapes using DRR
|
|
|
|
Declarative, rule-based pattern-match and rewrite (DRR) is an operation
|
|
DAG-based declarative rewriter that provides a table-based syntax for
|
|
pattern-match and rewrite rules:
|
|
|
|
```tablegen
|
|
class Pattern<
|
|
dag sourcePattern, list<dag> resultPatterns,
|
|
list<dag> additionalConstraints = [],
|
|
dag benefitsAdded = (addBenefit 0)>;
|
|
```
|
|
|
|
A redundant reshape optimization similar to SimplifyRedundantTranspose can be
|
|
expressed more simply using DRR as follows:
|
|
|
|
```tablegen
|
|
// Reshape(Reshape(x)) = Reshape(x)
|
|
def ReshapeReshapeOptPattern : Pat<(ReshapeOp(ReshapeOp $arg)),
|
|
(ReshapeOp $arg)>;
|
|
```
|
|
|
|
The automatically generated C++ code corresponding to each of the DRR patterns
|
|
can be found under `path/to/BUILD/tools/mlir/examples/toy/Ch3/ToyCombine.inc`.
|
|
|
|
DRR also provides a method for adding argument constraints when the
|
|
transformation is conditional on some properties of the arguments and results.
|
|
An example is a transformation that eliminates reshapes when they are redundant,
|
|
i.e. when the input and output shapes are identical.
|
|
|
|
```tablegen
|
|
def TypesAreIdentical : Constraint<CPred<"$0.getType() == $1.getType()">>;
|
|
def RedundantReshapeOptPattern : Pat<
|
|
(ReshapeOp:$res $arg), (replaceWithValue $arg),
|
|
[(TypesAreIdentical $res, $arg)]>;
|
|
```
|
|
|
|
Some optimizations may require additional transformations on instruction
|
|
arguments. This is achieved using NativeCodeCall, which allows for more complex
|
|
transformations either by calling into a C++ helper function or by using inline
|
|
C++. An example of such an optimization is FoldConstantReshape, where we
|
|
optimize Reshape of a constant value by reshaping the constant in place and
|
|
eliminating the reshape operation.
|
|
|
|
```tablegen
|
|
def ReshapeConstant : NativeCodeCall<"$0.reshape(($1.getType()).cast<ShapedType>())">;
|
|
def FoldConstantReshapeOptPattern : Pat<
|
|
(ReshapeOp:$res (ConstantOp $arg)),
|
|
(ConstantOp (ReshapeConstant $arg, $res))>;
|
|
```
|
|
|
|
We demonstrate these reshape optimizations using the following
|
|
trivial_reshape.toy program:
|
|
|
|
```c++
|
|
def main() {
|
|
var a<2,1> = [1, 2];
|
|
var b<2,1> = a;
|
|
var c<2,1> = b;
|
|
print(c);
|
|
}
|
|
```
|
|
|
|
```mlir
|
|
module {
|
|
toy.func @main() {
|
|
%0 = toy.constant dense<[1.000000e+00, 2.000000e+00]> : tensor<2xf64>
|
|
%1 = toy.reshape(%0 : tensor<2xf64>) to tensor<2x1xf64>
|
|
%2 = toy.reshape(%1 : tensor<2x1xf64>) to tensor<2x1xf64>
|
|
%3 = toy.reshape(%2 : tensor<2x1xf64>) to tensor<2x1xf64>
|
|
toy.print %3 : tensor<2x1xf64>
|
|
toy.return
|
|
}
|
|
}
|
|
```
|
|
|
|
We can try to run `toyc-ch3 test/Examples/Toy/Ch3/trivial_reshape.toy -emit=mlir
|
|
-opt` and observe our pattern in action:
|
|
|
|
```mlir
|
|
module {
|
|
toy.func @main() {
|
|
%0 = toy.constant dense<[[1.000000e+00], [2.000000e+00]]> : tensor<2x1xf64>
|
|
toy.print %0 : tensor<2x1xf64>
|
|
toy.return
|
|
}
|
|
}
|
|
```
|
|
|
|
As expected, no reshape operations remain after canonicalization.
|
|
|
|
Further details on the declarative rewrite method can be found at
|
|
[Table-driven Declarative Rewrite Rule (DRR)](../../DeclarativeRewrites.md).
|
|
|
|
In this chapter, we saw how to use certain core transformations through always
|
|
available hooks. In the [next chapter](Ch-4.md), we will see how to use generic
|
|
solutions that scale better through Interfaces.
|