对LLVM Tutorial Toy前五章做了一个总结, 侧重Codegen部分.
Kaleidoscope支持的语法 #
表达式运算
:
4+5;
5-1;
5*10;
10/5;
函数定义
:
def bar(a) foo(a, 4.0) + bar(31337);
函数调用
:
cos(1.234);
if/then/else
:
extern foo();
extern bar();
def baz(x) if x then foo() else bar();
for
:
extern putchard(char);
def printstar(n)
for i = 1, i < n, 1.0 in
putchard(42); # ascii 42 = '*'
# print 100 '*' characters
printstar(100);
词法分析 #
词法分析很简单. 我这里简单画了下处理的流程图.
![](/docs-pic/llvm/Lexer And Parser.png)
具体的分析请看: Lexer and Parse
抽象语法树 #
生成抽象语法树(AST) #
这块比较简单. 简单介绍下:
Expr的(AST)生成 #
我们对所有的表达式进行集中处理.
/// expression
/// ::= primary binoprhs
///
static std::unique_ptr<ExprAST> ParseExpression() {
auto LHS = ParsePrimary();
if (!LHS)
return nullptr;
return ParseBinOpRHS(0, std::move(LHS));
}
static std::unique_ptr<ExprAST> ParsePrimary() {
switch (CurTok) {
default:
return LogError("unknown token when expecting an expression");
case tok_identifier:
return ParseIdentifierExpr();
case tok_number:
return ParseNumberExpr();
case '(':
return ParseParenExpr();
case tok_if:
return ParseIfExpr();
case tok_for:
return ParseForExpr();
}
}
ParseIdentifierExpr
,ParseNumberExpr
和ParseParenExpr
的实现请参考:
Implementing a Parser and AST.
ParseIfExpr
和ParseForExpr
的实现请参考:
Extending the Language Control Flow.
函数定义和申明的AST生成 #
请参考: Implementing a Parser and AST
Codegen(产生LLVM IR) #
四个基本表达式的codegen #
请参考: code generation to llvm ir
if/then/else #
Kaleidoscpe: if/then/else #
extern foo();
extern bar();
def baz(x) if x then foo() else bar(); // 如果 x != 0, then foo, else bar;
未经优化的LLVM IR #
根据我们上面编写的Kaleidoscope的代码, 可以生成下面的LLVM IR.
declare double @foo()
declare double @bar()
define double @baz(double %x) {
entry:
%ifcond = fcmp one double %x, 0.000000e+00
br i1 %ifcond, label %then, label %else
then: ; preds = %entry
%calltmp = call double @foo()
br label %ifcont
else: ; preds = %entry
%calltmp1 = call double @bar()
br label %ifcont
ifcont: ; preds = %else, %then
%iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] ; 返回值
ret double %iftmp
}
IR解析 #
IR指令 | 例子 | 作用 |
---|---|---|
\(\bold{fcmp}\) | \(\color{red}{fcmp}\ \color{black}{one}\ \color{black}{double}\ \color{black}{x}\ \color{black}{0.0e+00}\ \) | 比较指令, 相等则返回 0, 不相等返回 1 |
\(\bold{call}\) | \(\color{red}{call}\ \color{black}{double}\ \color{black}{@ foo()} \) | 调用函数 |
\(\bold{br}\) | \(\color{red}{br}\ \color{black}{\{label\}}\ \color{black}{\{\%ifcont\}} \) | 直接跳转 |
\(\bold{br}\) | \(\color{red}{br}\enspace\color{black}{i1\enspace \%ifcond, label \enspace then, label\enspace \%else}\) | 条件跳转 |
\(\bold{phi}\) | \(\color{red}{phi}\enspace \color{black}{double [ \% calltmp, \%then], [\%calltmp1, \%else]}\) | 调用${phi}$ 函数 |
\(\bold{ret}\) | \(\color{red}{ret}\enspace \color{black}{double\enspace \% 0}\) | 函数返回 |
Label | means |
---|---|
\(\bold{entry}\) | 函数入口 |
\(\bold{then}\) | 条件分支then |
\(\bold{else}\) | 条件分支else |
\(\bold{ifcont}\) | 结束条件分支 |
if/then/else’s CFG #
使用opt -analyze -view-cfg if.bc
生成的CFG.
if/then/else’s dominator (判断哪里需要插入phi函数) #
这里只是对插入$\phi$函数时机的一个分析. (不想了解的壳跳过).
根据我前面翻译的文章:
Dominator来分析我们if/then/else
的控制流.
我们将: entry
视为节点
\(\bold{1}\)
。 then
视为节点
\(\bold{2}\)
。
else
视为节点
\(\bold{3}\)
。 ifcont
视为节点
\(\bold{4}\)
。
1 | dom | \(\color{Gray}{1}\) | \(\color{Red}{2}\) | \(\color{Red}{3}\) | 4 |
---|---|---|---|---|---|
2 | dom | \(\color{Gray}{2}\) | |||
3 | dom | \(\color{Gray}{3}\) | |||
4 | dom | \(\color{Gray}{4}\) |
节点 | 支配分析 |
---|---|
1 | 支配: 1, 2, 3, 4. 严格支配: 2, 3, 4. 直接支配: 2, 3. 支配边界: set(). |
2 | 支配: 2. 严格支配: 无. 直接支配: 无. 支配边界: set(4). |
3 | 支配: 3. 严格支配: 无. 直接支配: 无. 支配边界: set(4). |
4 | 支配: 4. 严格支配: 无. 直接支配: 无. 支配边界: set(). |
根据 SSA理论, 我们可以在 4 节点处插入 \(\bold{phi}\) 函数.
Builder创建IR指令 #
用Builder来产生每条指令的IR.
下面是一些if/then/else
会涉及到的Builder创建指令的操作
。
创建cmp指令 #
CondV = Builder.CreateFCmpONE(
CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond");
创建条件br指令 #
Builder.CreateCondBr(CondV, ThenBB, ElseBB);
创建直接br指令 #
Builder.CreateBr(MergeBB);
创建phi指令 #
PHINode *PN =
Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp");
PN->addIncoming(ThenV, ThenBB);
PN->addIncoming(ElseV, ElseBB);
return PN;
创建Block #
- 构造block时直接在"TheFunction"后面添加该block.
BasicBlock *ThenBB =
BasicBlock::Create(TheContext, "then", TheFunction);
-
单纯创建Block, 当未添加到函数中.
BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else");
IRBuilder中插入Block #
控制流的独特性之处在于, 它有自己的内部的作用域(block). 所以我们在编写codegen代码时, 会调用内部block的codegen. 而不会直接去处理它.(内部block的codegen是它自己处理的, 不归我们的 ifExpr.Codegen管)
所以我们会这样做:
Builder.SetInsertPoint(ThenBB); // 将builder此时的插入点设置为ThenBB.
Value *ThenV = Then->codegen(); // 调用 then block的codegen
将未添加的Block添加到函数中 #
在"TheFunction"后面添加"merge block", 因为在我们创建时, 只是单纯的创建的block, 还未添加到函数中.
TheFunction->getBasicBlockList().push_back(MergeBB); // TheFunction中添加Block
Builder.SetInsertPoint(MergeBB); // IRBuilder中插入BB
Builder流程图 #
生成if/then/else 的IR可以分为四个部分生成:
- cond
- 创建cmp IR. 目的是根据条件判断返回bool值.
- 创建条件跳转IR. 目的是根据返回的bool值做分支跳转.
- then
- then.codegen(). 目的是进行then block的codegen.
- 创建直接跳转IR.目的是跳转到ifcond处.
- else
- else.codegen(). 目的是进行else block的codegen.
- 创建直接跳转IR,目的是跳转到ifcond处.
- ifcond(分支合并后):
- 创建phi函数调用IR. 目的是判断我们是从哪个分支跳转来的.
注意, 这里不会涉及ret
指令的codegen ,我们在ifExprAST
中最后做的仅仅是将$\bold{\phi}$函数的返回值返回. ret
指令的codegen应该在函数
的codegen时做.
(在上文例子中, ret
指令的codegen 是baz函数生成的)
graph TD Start("If Codegen") builderCreateCmp["Builder.CreateCmpOne"] builderCreateCond["Builder.CreateCondBr"] Start --> builderCreateCmp builderCreateCmp --> builderCreateCond subgraph Then builderSetInsertThenBB["Builder.SetInsertPoint(ThenBB);"] ThenCodegen["Then->codegen();"] builderCreateBr1["Builder.CreateBr"] builderSetInsertThenBB --> ThenCodegen ThenCodegen --> builderCreateBr1 end builderCreateCond --> builderSetInsertThenBB subgraph Else builderSetInsertElseBB["Builder.SetInsertPoint(ElseBB);"] ElseCodegen["Else->codegen();"] builderCreateBr2["Builder.CreateBr"] builderSetInsertElseBB --> ElseCodegen ElseCodegen --> builderCreateBr2 end builderCreateBr1 --> builderSetInsertElseBB subgraph Merge builderSetInsertIfContBB["builder.SetInsertPoint(MergeBB);"] builderCreatePHI["Builder.CreatePHI"] builderSetInsertIfContBB --> builderCreatePHI end builderCreateBr2 --> builderSetInsertIfContBB builderCreatePHI --> End("End Codegen") style builderSetInsertThenBB fill:#ffcc66 style builderSetInsertElseBB fill:#ffcc66 style builderSetInsertIfContBB fill:#ffcc66 style ThenCodegen fill:#ff66cc style ElseCodegen fill:#ff66cc style Start fill:#ff9966 style End fill:#ff9966
Loop #
Kaleidoscope: for #
extern putchard(char);
def printstar(n)
for i = 1, i < n, 1.0 in
putchard(42); # ascii 42 = '*'
# print 100 '*' characters
printstar(100);
未经优化的LLVM IR #
declare double @putchard(double)
define double @printstar(double %n) {
entry:
; initial value = 1.0 (inlined into phi)
br label %loop
loop: ; preds = %loop, %entry
%i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
; body
%calltmp = call double @putchard(double 4.200000e+01)
; increment
%nextvar = fadd double %i, 1.000000e+00
; termination test
%cmptmp = fcmp ult double %i, %n
%booltmp = uitofp i1 %cmptmp to double
%loopcond = fcmp one double %booltmp, 0.000000e+00
br i1 %loopcond, label %loop, label %afterloop
afterloop: ; preds = %loop
; loop always returns 0.0
ret double 0.000000e+00
}
有三个Label
, 意味者三个BasicBlock
: entry, loop, afterloop.
有四个主要的Expression
: Start, Cond, Body, Increment . 当然还有afterloop, 它总是ret 0.0
.
IR解析 #
与if/then/else
的IR指令相比较, 发现多了一条
\(\bold{uitofp}\)
指令.
IR指令 | 例子 | 作用 |
---|---|---|
\(\bold{fcmp}\) | \(\color{red}{fcmp}\enspace \color{black}{one\enspace double\enspace \%booltmp, 0.0e+00}\) | 比较指令, 相等则返回 0, 不相等返回 1 |
\(\bold{call}\) | \(\color{red}{call}\enspace \color{black}{double\enspace @ putchard(double\enspace4.2000e+01)}\) | 调用函数 |
\(\bold{br}\) | \(\color{red}{br}\enspace \color{black}{label\enspace \%loop}\) | 直接跳转 |
\(\bold{br}\) | \(\color{red}{br}\enspace \color{black}{i1\enspace \%loopcond\enspace label\enspace \%loop, \enspace label\enspace \%afterloop}\) | 条件跳转 |
\(\bold{phi}\) | \(\color{red}{phi}\enspace \color{black}{double [1.0e+0, \%entry], [\%nextvar, \%loop]}\) | 调用 \(\bold{phi}\) 函数 |
\(\bold{uitofp}\) | \(\color{red}{uitofp}\enspace \color{black}{i1\ \% cmptmp\enspace to\enspace double}\) | 将bool(1/0)转换为double(1.0/0.0) |
\(\bold{ret}\) | \(\color{red}{ret}\enspace \color{black}{double\ \% 0}\) | 函数返回 |
Label | means |
---|---|
\(\bold{entry}\) | 函数入口 |
\(\bold{loop}\) | loop循环入口 |
\(\bold{afterloop}\) | loop循环结束 |
for’s CFG #
使用opt -analyze -view-cfg loop.bc
生成的CFG.
Builder流程图 #
graph TB Start("Loop Codegen") --> StartCodegen["Start->codegen();"] StartCodegen --> builderCreateBr["Builder.CreateBr"] subgraph Loop builderSetInsertLoopBB["Builder.SetInsertPoint(LoopBB)"] builderCreatePHI["Builder.CreatePHI"] BodyCodegen["Body->codegen();"] builderSetInsertLoopBB --> builderCreatePHI builderCreatePHI --> BodyCodegen BodyCodegen --> ifStep{"Step != NULL"} ifStep --Y--> StepCodegen["Step->codegen();"] ifStep --N--> SetStep["Set Step to 1.0"] StepCodegen --> builderCreateFAdd["Builder.CreateFAdd"] SetStep --> builderCreateFAdd builderCreateFAdd --> EndCodegen["End->codegen();"] EndCodegen --> builderCreateFCmp["Builder.CreateFCmpOne"] builderCreateFCmp --> builderCreateCondBr["Builder.CreateCondBr"] end builderCreateBr --> builderSetInsertLoopBB subgraph afterloop builderSetInsertAfterBB["Builder.SetInsertPoint(AfterBB)"] end builderCreateCondBr --> builderSetInsertAfterBB builderSetInsertAfterBB --> End["End Codegen"] style builderSetInsertLoopBB fill:#ffcc66 style builderSetInsertAfterBB fill:#ffcc66 style StartCodegen fill:#ff66cc style BodyCodegen fill:#ff66cc style StepCodegen fill:#ff66cc style EndCodegen fill:#ff66cc style Start fill:#ff9966 style End fill:#ff9966
JIT和优化 #
参考: Adding JIT and Optimizer Support
graph TB subgraph CALL funcall("foo(1);") end subgraph expr nodeExpr("1+2;") end jitAddModule["JIT中 添加此Module"] jitRemoveModule["JIT中 删除此Module"] anoCodegen["执行 匿名函数Codegen"] AnonymousCall["匿名函数包装"] optimizeFunc["执行 Pass优化"] nodeExpr --> AnonymousCall funcall --> AnonymousCall AnonymousCall --> anoCodegen anoCodegen --> jitAddModule jitAddModule --> optimizeFunc optimizeFunc --> exec["jit运行此表达式, 并计算值"] exec --> jitRemoveModule subgraph func definition nodeFoo("def foo(x) x+1;") funcCodegen["执行 函数定义Codegen"] nodeFoo --> funcCodegen end funcCodegen --> jitAddFuncModule["jit中 添加此Module"] jitAddFuncModule --> initPassManager["执行 pass优化"] subgraph extern nodeExtern("extern sin(x);") protoCodegen["执行 函数申明Codegen"] nodeExtern --> protoCodegen end style jitAddModule fill:#f9f style jitRemoveModule fill:#f9f style jitAddFuncModule fill:#f9f style optimizeFunc fill:#ccff66 style initPassManager fill:#ccff66 style exec fill:#ff3300
Driver #
/// top ::= definition | external | expression | ';'
static void MainLoop() {
while (true) {
fprintf(stderr, "ready> ");
switch (CurTok) {
case tok_eof:
return;
case ';': // ignore top-level semicolons.
getNextToken();
break;
case tok_def:
HandleDefinition();
break;
case tok_extern:
HandleExtern();
break;
default:
HandleTopLevelExpression();
break;
}
}
}