Kaleidoscope: 前五章总结

对LLVM Tutorial Toy前五章做了一个总结, 侧重Codegen部分.

Kaleidoscope支持的语法 #

表达式运算:

4+5;
5-1;
5*10;
10/5;

函数定义:

def bar(a) foo(a, 4.0) + bar(31337);

函数调用:

cos(1.234);

if/then/else:

extern foo();
extern bar();
def baz(x) if x then foo() else bar();

for:

extern putchard(char);
def printstar(n)
  for i = 1, i < n, 1.0 in
    putchard(42);  # ascii 42 = '*'

# print 100 '*' characters
printstar(100);

词法分析 #

词法分析很简单. 我这里简单画了下处理的流程图.

![](/docs-pic/llvm/Lexer And Parser.png)

具体的分析请看: Lexer and Parse

抽象语法树 #

生成抽象语法树(AST) #

这块比较简单. 简单介绍下:

Expr的(AST)生成 #

我们对所有的表达式进行集中处理.

/// expression
///   ::= primary binoprhs
///
static std::unique_ptr<ExprAST> ParseExpression() {
  auto LHS = ParsePrimary();
  if (!LHS)
    return nullptr;

  return ParseBinOpRHS(0, std::move(LHS));
}

static std::unique_ptr<ExprAST> ParsePrimary() {
  switch (CurTok) {
  default:
    return LogError("unknown token when expecting an expression");
  case tok_identifier:
    return ParseIdentifierExpr();
  case tok_number:
    return ParseNumberExpr();
  case '(':
    return ParseParenExpr();
  case tok_if:
    return ParseIfExpr();
  case tok_for:
    return ParseForExpr();
  }
}

ParseIdentifierExpr,ParseNumberExprParseParenExpr的实现请参考: Implementing a Parser and AST.

ParseIfExprParseForExpr的实现请参考: Extending the Language Control Flow.

函数定义和申明的AST生成 #

请参考: Implementing a Parser and AST

Codegen(产生LLVM IR) #

四个基本表达式的codegen #

请参考: code generation to llvm ir

if/then/else #

Kaleidoscpe: if/then/else #

extern foo();
extern bar();
def baz(x) if x then foo() else bar();  // 如果 x != 0, then foo, else bar;

未经优化的LLVM IR #

根据我们上面编写的Kaleidoscope的代码, 可以生成下面的LLVM IR.

declare double @foo()

declare double @bar()

define double @baz(double %x) {
entry:
  %ifcond = fcmp one double %x, 0.000000e+00
  br i1 %ifcond, label %then, label %else

then:       ; preds = %entry
  %calltmp = call double @foo()
  br label %ifcont

else:       ; preds = %entry
  %calltmp1 = call double @bar()
  br label %ifcont

ifcont:     ; preds = %else, %then
  %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] ; 返回值
  ret double %iftmp
}

IR解析 #

IR指令 例子 作用
\(\bold{fcmp}\) \(\color{red}{fcmp}\ \color{black}{one}\ \color{black}{double}\ \color{black}{x}\ \color{black}{0.0e+00}\ \) 比较指令, 相等则返回 0, 不相等返回 1
\(\bold{call}\) \(\color{red}{call}\ \color{black}{double}\ \color{black}{@ foo()} \) 调用函数
\(\bold{br}\) \(\color{red}{br}\ \color{black}{\{label\}}\ \color{black}{\{\%ifcont\}} \) 直接跳转
\(\bold{br}\) \(\color{red}{br}\enspace\color{black}{i1\enspace \%ifcond, label \enspace then, label\enspace \%else}\) 条件跳转
\(\bold{phi}\) \(\color{red}{phi}\enspace \color{black}{double [ \% calltmp, \%then], [\%calltmp1, \%else]}\) 调用${phi}$ 函数
\(\bold{ret}\) \(\color{red}{ret}\enspace \color{black}{double\enspace \% 0}\) 函数返回
Label means
\(\bold{entry}\) 函数入口
\(\bold{then}\) 条件分支then
\(\bold{else}\) 条件分支else
\(\bold{ifcont}\) 结束条件分支

if/then/else’s CFG #

使用opt -analyze -view-cfg if.bc生成的CFG.

if/then/else’s dominator (判断哪里需要插入phi函数) #

这里只是对插入$\phi$函数时机的一个分析. (不想了解的壳跳过).

根据我前面翻译的文章: Dominator来分析我们if/then/else的控制流.

我们将: entry 视为节点 \(\bold{1}\) then视为节点 \(\bold{2}\)

else 视为节点 \(\bold{3}\) ifcont视为节点 \(\bold{4}\)

1 dom \(\color{Gray}{1}\) \(\color{Red}{2}\) \(\color{Red}{3}\) 4
2 dom \(\color{Gray}{2}\)
3 dom \(\color{Gray}{3}\)
4 dom \(\color{Gray}{4}\)
节点 支配分析
1 支配: 1, 2, 3, 4. 严格支配: 2, 3, 4. 直接支配: 2, 3. 支配边界: set().
2 支配: 2. 严格支配: 无. 直接支配: 无. 支配边界: set(4).
3 支配: 3. 严格支配: 无. 直接支配: 无. 支配边界: set(4).
4 支配: 4. 严格支配: 无. 直接支配: 无. 支配边界: set().

根据 SSA理论, 我们可以在 4 节点处插入 \(\bold{phi}\) 函数.

Builder创建IR指令 #

用Builder来产生每条指令的IR.

下面是一些if/then/else会涉及到的Builder创建指令的操作

创建cmp指令 #

CondV = Builder.CreateFCmpONE(
      CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond");

创建条件br指令 #

Builder.CreateCondBr(CondV, ThenBB, ElseBB);

创建直接br指令 #

Builder.CreateBr(MergeBB);

创建phi指令 #

  PHINode *PN =
    Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp");

  PN->addIncoming(ThenV, ThenBB);
  PN->addIncoming(ElseV, ElseBB);
  return PN;

创建Block #

  1. 构造block时直接在"TheFunction"后面添加该block.
BasicBlock *ThenBB =
    BasicBlock::Create(TheContext, "then", TheFunction);
  1. 单纯创建Block, 当未添加到函数中.

    BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else");
    

IRBuilder中插入Block #

控制流的独特性之处在于, 它有自己的内部的作用域(block). 所以我们在编写codegen代码时, 会调用内部block的codegen. 而不会直接去处理它.(内部block的codegen是它自己处理的, 不归我们的 ifExpr.Codegen管)

所以我们会这样做:

Builder.SetInsertPoint(ThenBB);          // 将builder此时的插入点设置为ThenBB.
Value *ThenV = Then->codegen();          // 调用 then block的codegen

将未添加的Block添加到函数中 #

在"TheFunction"后面添加"merge block", 因为在我们创建时, 只是单纯的创建的block, 还未添加到函数中.

  TheFunction->getBasicBlockList().push_back(MergeBB); // TheFunction中添加Block
  Builder.SetInsertPoint(MergeBB); // IRBuilder中插入BB

Builder流程图 #

生成if/then/else 的IR可以分为四个部分生成:

  • cond
    1. 创建cmp IR. 目的是根据条件判断返回bool值.
    2. 创建条件跳转IR. 目的是根据返回的bool值做分支跳转.
  • then
    1. then.codegen(). 目的是进行then block的codegen.
    2. 创建直接跳转IR.目的是跳转到ifcond处.
  • else
    1. else.codegen(). 目的是进行else block的codegen.
    2. 创建直接跳转IR,目的是跳转到ifcond处.
  • ifcond(分支合并后):
    1. 创建phi函数调用IR. 目的是判断我们是从哪个分支跳转来的.

注意, 这里不会涉及ret指令的codegen ,我们在ifExprAST中最后做的仅仅是将$\bold{\phi}$函数的返回值返回. ret指令的codegen应该在函数codegen时做.

(在上文例子中, ret指令的codegenbaz函数生成的)

graph TD Start("If Codegen") builderCreateCmp["Builder.CreateCmpOne"] builderCreateCond["Builder.CreateCondBr"] Start --> builderCreateCmp builderCreateCmp --> builderCreateCond subgraph Then builderSetInsertThenBB["Builder.SetInsertPoint(ThenBB);"] ThenCodegen["Then->codegen();"] builderCreateBr1["Builder.CreateBr"] builderSetInsertThenBB --> ThenCodegen ThenCodegen --> builderCreateBr1 end builderCreateCond --> builderSetInsertThenBB subgraph Else builderSetInsertElseBB["Builder.SetInsertPoint(ElseBB);"] ElseCodegen["Else->codegen();"] builderCreateBr2["Builder.CreateBr"] builderSetInsertElseBB --> ElseCodegen ElseCodegen --> builderCreateBr2 end builderCreateBr1 --> builderSetInsertElseBB subgraph Merge builderSetInsertIfContBB["builder.SetInsertPoint(MergeBB);"] builderCreatePHI["Builder.CreatePHI"] builderSetInsertIfContBB --> builderCreatePHI end builderCreateBr2 --> builderSetInsertIfContBB builderCreatePHI --> End("End Codegen") style builderSetInsertThenBB fill:#ffcc66 style builderSetInsertElseBB fill:#ffcc66 style builderSetInsertIfContBB fill:#ffcc66 style ThenCodegen fill:#ff66cc style ElseCodegen fill:#ff66cc style Start fill:#ff9966 style End fill:#ff9966

Loop #

Kaleidoscope: for #

extern putchard(char);
def printstar(n)
  for i = 1, i < n, 1.0 in
    putchard(42);  # ascii 42 = '*'

# print 100 '*' characters
printstar(100);

未经优化的LLVM IR #

declare double @putchard(double)

define double @printstar(double %n) {
entry:
  ; initial value = 1.0 (inlined into phi)
  br label %loop

loop:       ; preds = %loop, %entry
  %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
  ; body
  %calltmp = call double @putchard(double 4.200000e+01)
  ; increment
  %nextvar = fadd double %i, 1.000000e+00

  ; termination test
  %cmptmp = fcmp ult double %i, %n
  %booltmp = uitofp i1 %cmptmp to double
  %loopcond = fcmp one double %booltmp, 0.000000e+00
  br i1 %loopcond, label %loop, label %afterloop

afterloop:      ; preds = %loop
  ; loop always returns 0.0
  ret double 0.000000e+00
}

有三个Label, 意味者三个BasicBlock: entry, loop, afterloop.

有四个主要的Expression: Start, Cond, Body, Increment . 当然还有afterloop, 它总是ret 0.0.

IR解析 #

if/then/elseIR指令相比较, 发现多了一条 \(\bold{uitofp}\) 指令.

IR指令 例子 作用
\(\bold{fcmp}\) \(\color{red}{fcmp}\enspace \color{black}{one\enspace double\enspace \%booltmp, 0.0e+00}\) 比较指令, 相等则返回 0, 不相等返回 1
\(\bold{call}\) \(\color{red}{call}\enspace \color{black}{double\enspace @ putchard(double\enspace4.2000e+01)}\) 调用函数
\(\bold{br}\) \(\color{red}{br}\enspace \color{black}{label\enspace \%loop}\) 直接跳转
\(\bold{br}\) \(\color{red}{br}\enspace \color{black}{i1\enspace \%loopcond\enspace label\enspace \%loop, \enspace label\enspace \%afterloop}\) 条件跳转
\(\bold{phi}\) \(\color{red}{phi}\enspace \color{black}{double [1.0e+0, \%entry], [\%nextvar, \%loop]}\) 调用 \(\bold{phi}\) 函数
\(\bold{uitofp}\) \(\color{red}{uitofp}\enspace \color{black}{i1\ \% cmptmp\enspace to\enspace double}\) 将bool(1/0)转换为double(1.0/0.0)
\(\bold{ret}\) \(\color{red}{ret}\enspace \color{black}{double\ \% 0}\) 函数返回
Label means
\(\bold{entry}\) 函数入口
\(\bold{loop}\) loop循环入口
\(\bold{afterloop}\) loop循环结束

for’s CFG #

使用opt -analyze -view-cfg loop.bc生成的CFG.

loop

Builder流程图 #

graph TB Start("Loop Codegen") --> StartCodegen["Start->codegen();"] StartCodegen --> builderCreateBr["Builder.CreateBr"] subgraph Loop builderSetInsertLoopBB["Builder.SetInsertPoint(LoopBB)"] builderCreatePHI["Builder.CreatePHI"] BodyCodegen["Body->codegen();"] builderSetInsertLoopBB --> builderCreatePHI builderCreatePHI --> BodyCodegen BodyCodegen --> ifStep{"Step != NULL"} ifStep --Y--> StepCodegen["Step->codegen();"] ifStep --N--> SetStep["Set Step to 1.0"] StepCodegen --> builderCreateFAdd["Builder.CreateFAdd"] SetStep --> builderCreateFAdd builderCreateFAdd --> EndCodegen["End->codegen();"] EndCodegen --> builderCreateFCmp["Builder.CreateFCmpOne"] builderCreateFCmp --> builderCreateCondBr["Builder.CreateCondBr"] end builderCreateBr --> builderSetInsertLoopBB subgraph afterloop builderSetInsertAfterBB["Builder.SetInsertPoint(AfterBB)"] end builderCreateCondBr --> builderSetInsertAfterBB builderSetInsertAfterBB --> End["End Codegen"] style builderSetInsertLoopBB fill:#ffcc66 style builderSetInsertAfterBB fill:#ffcc66 style StartCodegen fill:#ff66cc style BodyCodegen fill:#ff66cc style StepCodegen fill:#ff66cc style EndCodegen fill:#ff66cc style Start fill:#ff9966 style End fill:#ff9966

JIT和优化 #

参考: Adding JIT and Optimizer Support

graph TB subgraph CALL funcall("foo(1);") end subgraph expr nodeExpr("1+2;") end jitAddModule["JIT中 添加此Module"] jitRemoveModule["JIT中 删除此Module"] anoCodegen["执行 匿名函数Codegen"] AnonymousCall["匿名函数包装"] optimizeFunc["执行 Pass优化"] nodeExpr --> AnonymousCall funcall --> AnonymousCall AnonymousCall --> anoCodegen anoCodegen --> jitAddModule jitAddModule --> optimizeFunc optimizeFunc --> exec["jit运行此表达式, 并计算值"] exec --> jitRemoveModule subgraph func definition nodeFoo("def foo(x) x+1;") funcCodegen["执行 函数定义Codegen"] nodeFoo --> funcCodegen end funcCodegen --> jitAddFuncModule["jit中 添加此Module"] jitAddFuncModule --> initPassManager["执行 pass优化"] subgraph extern nodeExtern("extern sin(x);") protoCodegen["执行 函数申明Codegen"] nodeExtern --> protoCodegen end style jitAddModule fill:#f9f style jitRemoveModule fill:#f9f style jitAddFuncModule fill:#f9f style optimizeFunc fill:#ccff66 style initPassManager fill:#ccff66 style exec fill:#ff3300

Driver #

/// top ::= definition | external | expression | ';'
static void MainLoop() {
  while (true) {
    fprintf(stderr, "ready> ");
    switch (CurTok) {
    case tok_eof:
      return;
    case ';': // ignore top-level semicolons.
      getNextToken();
      break;
    case tok_def:
      HandleDefinition();
      break;
    case tok_extern:
      HandleExtern();
      break;
    default:
      HandleTopLevelExpression();
      break;
    }
  }
}