## Table of Contents

- 1. Lecture 2, CPSC 539 Formal Reasoning about Compilers, 4 January 2019
- 2. Lecture 3, CPSC 539 Formal Reasoning about Compilers, 7 January 2019
- 3. Lecture 4, CPSC 539 Formal Reasoning about Compilers, 9 January 2019
- 4. Lecture 5, CPSC 539 Formal Reasoning about Compilers, 11 January 2019
- 5. Lecture 6, CPSC 539 Formal Reasoning about Compilers, 14 January 2019
- 6. Lecture 7, CPSC 539 Formal Reasoning about Compilers, 16 January 2019
- 7. Lecture 8, CPSC 539 Formal Reasoning about Compilers, 18 January 2019
- 8. Lecture 9, CPSC 539 Formal Reasoning about Compilers, 21 January 2019

## 1 Lecture 2, CPSC 539 Formal Reasoning about Compilers, 4 January 2019

### 1.1 Intro

#### 1.1.1 Goal is formal reasoning about compilers

#### 1.1.2 A compiler is a translation between languages. Some examples:

- GCC (C -> x86)
- clang (C -> LLVM)
- LLVM (C -> x86)
- Babel between JS versions
- Minifiers

### 1.2 What is a language? How to model?

- Syntax
- Semantics
- Paradigms: functional/imperative
- Calculi to model into: Lambda Calculus, etc.
- We model a language into the calculus using Sets, Logic
- Type theory?

#### 1.2.1 A language is:

- A collection of expressions (syntax)
- Some operations on those expressions (semantics)
- Shared properties

– Type system gives you static guarantees – The runtime provides dynamic guarantees

- Defined inductively, in our case in BNF form

Example given is a language over the natural numbers and the booleans, with addition, subtraction, and if statements

How is a expression different than a program? Just given a BNF, we can
say that `1 + 2`

is an expression, but since we don't know what a
program is (yet), we can't say it's a program.

- How do we get from expressions to programs?

What makes a program? It has meaning; it has some evaluation interpretation. So let us define some function,

`eval: e -> v`

, from expressions to values. And we can define eval in terms of Denotational or Operational Semantics.- Denotational Semantics

Interpret your expressions as math. Set theory, category theory, first order logic, domain theory, etc.

It's just math (set theory), so we can use all the mathematical tools at our disposal, but it also means we have to model everything in that math, which can be a huge lift. So let us say that the denotation of n in e to be the natural numbers, and so we can define operations for the arithmetic operations, which is easy to do, but what about modeling a heap?

- Operational Semantics

nterpreting the raw symbols on the page. Define a reduction relation rom individual bits of syntax into simpler bits of syntax. e -> e.

n the case of our natural and boolean language, we define relations etween, for instance, +BEGIN

_{EXAMPLE}f true then e_{1}else e_{2}-> "e_{1}" f false then e_{2}else e_{2}-> "e_{2}" +END_{EXAMPLE}ote that

`->`

is a relation between expressions. In the if case, we say hat if is a*eliminator*applied to a boolean value.ut what about

`e_1 + e_2`

?f we define the natural numbers in Peano-like

`(0, add1 0, add1 add1 0)`

ashion, then we can define addition recursively like so:+BEGIN

_{EXAMPLE}0 + e_{2}) -> e_{2}add 1 e_{1}+ e_{2}) -> e_{1}+ add 1 e_{2}+END_{EXAMPLE}et us interpret

`(add1 0) (add1 (add1 0))`

? We can apply the second ddition rule to get+BEGIN

_{EXAMPLE}- (add1 (add1 (add1 0)))

+END

_{EXAMPLE}hich reduces to

+BEGIN

_{EXAMPLE}add1 (add1 (add1 0))) +END_{EXAMPLE}e define a Conversion Relation,

`e ->* e`

that uses the reduction elation:+BEGIN

_{EXAMPLE}-–— refl ->* e_1 -> e

_{2}e_{2}->* e_{3}-----------------–— trans _1 ->* e_{3}+END_{EXAMPLE}iven a nested expression:

+BEGIN

_{EXAMPLE}f (if true then true else false) then 1 else 2 +END_{EXAMPLE}et's try using the trans rule, but we can't make progress because we nly have two rules given if, and they require values, not xpressions. So we need to define congruence rules:

+BEGIN

_{EXAMPLE}_1 ->* e_{1}' e_{2}->* e_{2}' --------------------–— cong-add _1 + e_{2}->* e_{1}' + e_{2}'->* e' --------------------------------------------–— cong-if f e then e

_{1}else e_{2}->* if e' then e_{1}else e_{2}+END_{EXAMPLE}e construct a derivation tree for

`1 + 2`

using trans and the eduction relations for additione can define a equivalence relation

`==`

+BEGIN

_{EXAMPLE}->* e'

≡ e'

' ≡ e

≡ e' +END

_{EXAMPLE}ow we define a function

`eval: e -> v`

+BEGIN

_{EXAMPLE}val(e) = v | e ->* v here`v ::`

nat | bool=, +END_{EXAMPLE}

- Denotational Semantics

## 2 Lecture 3, CPSC 539 Formal Reasoning about Compilers, 7 January 2019

### 2.1 Retrospective

- Small Step Semantics
- Induction
- Quick quiz: add pair construction and destruction (first, second) to the language from last lecture

– Syntax
– Reduction Rules `->`

– Conversion Rules `->*`

Overview: the language of booleans and natural numbers, with if,
addition, and substraction. We want to add pairs. First, syntax: let
us define `(e_1, e_2)`

to build pairs, then first `e`

and second `e`

to
deconstruct pairs.

Now to define the reduction relation, `e -> e'`

. We have two reduction rules:

first (e_1, e_2) -> e_1 second (e_1, e_2) -> e_2

Now to define congruence rules

e_1 ->* e_1' e_2 ->* e_2' --------------------------- (e_1, e_2) ->* (e_1', e_2') e ->* e' -------------------- first e ->* first e' e ->* e' ---------------------- second e ->* second e'

Lots of discussion about why we need congruence rules. Mostly because we need a way to decend into terms before we can reach those forms that match reduction rules.

### 2.2 Review of Inductively Defined Judgements

A judgement:

premises p_1 ... p_n hold ------------------------- jHold judgement j holds

Everything we're talking about in this class is about judgements. We have judgement rules for syntax, reduction, and congruence.

So about syntax:

We have the axiomatic, or "boring" judgement rules, but let's say:

e := e + e is shorthand for

e_1, e_2 \in expr ------------------ addition e_1 + e_2 \in expr

This is important for proofs, let us go back to our defintion of the Natural numbers:

We defined the natural numbers as:

--------- 0 ∈ Nat n ∈ expr -------------- add1 s ∈ Nat

Suppose we want to prove some theorem, if we have an inductive definition, we can do a proof by induction over the (inductive) structure of the definition.

Let us say we are trying to prove some property `P`

over the `Nats`

. First
we need to show that `P`

holds for `0`

, then we need to show that if ```
s ∈
exp
```

, and `P`

holds for `s`

, if we show that `P`

holds for `add1 s`

, then it
holds for all `Nat`

.

For those who are familiar with functional programming, `foldr`

can be
see as induction on pairs:

Let us define pairs:

------------ nil ∈ Pair e_2 ∈ Pair --------------------- cons e_1 e_2 ∈ Pair

So now:

`fold base_case inductive_case over p`

where `p ∈ Pair`

### 2.3 Functions: Lambda

Let us add functions to our language:

e ::= λ x.e | ( e e ) | x

Let's talk about reduction for Lambda:

(λ x.e) e' -> e[e'/x]

Let us define a substitution judgement:

e[e'/x] = e ----------------- true[e'/x] = true ------------------- false[e'/x] = false ...We define this judgement inductively... ------------ x[e'/x] = e' x_1 != x_2 ----------------- x_1[e'/x_2] = x_1

Example:

(λ x.x) 1 = x[1/x] = 1 (λ x.λ y.x) 1 -> (λ y.x)[1/x] -> ...

this is not defined so far …

Let us defined:

e[e'/x_2] = e_2 ----------------------------------------- (λ x_1.e)[e'/x_2] = λ x_1.e_2

Now we can continue our previous example to

(λ y.x)[1/x] -> λ y.1

Final example:

(λ x.λ y.x) y -> (λ y.x)[y/x] -> λ y.y

Oops. Here we have dynamic scope.

So our initial substitution rules need to be defined to be capture avoiding:

e[x_3/x_1] = e_3[e'/x_2] = e_3', x_1 != x_2, x_3 is fresh --------------------------------------------------------- (λ x_1.e)[e'/x_2] = λ x_3.e_3'

## 3 Lecture 4, CPSC 539 Formal Reasoning about Compilers, 9 January 2019

### 3.1 Overview

- Language over the booleans and the natural numbers
- Formalizing Lambda: functions, application, variables, substitution, capture avoiding subst
- Reduction rules for substitution
- We use capture avoiding substitution by convention (especially in the literature)

### 3.2 Moving On

Let us add variables/references to our language.

e ::= | x := e | deref x

But we don't have a way to represent that memory yet, because our reduction relation is only defined over expressions; We need to augment our model:

M(emory), e -> M, e

Inductively defined:

M ::= · | M, [ x |-> e ]

So now our previous reduction rules (that have no memory effects, need to refer to this M, but their reductions just pass it through.

------------------------------ assign (M, x := e) -> (M[x |-> e], e) [x |-> e] ∈ M ---------------------- deref (M, deref x) -> (M, e)

But we're still missing sequencing; or else how could we set a value, then read one?

e ::= ... | e ; e (M, e_1) -> (M', e_1') --------------------------------- seq (M, e_1 ; e_2) -> (M', e_1'; e_2) ----------------------- seq (M, v; e_2) -> (M, e_2)

So how does this affect our conversion rules?

First they are now of the form

(M, e) ->* (M, e)

As before:

Structural rules (two):

----------------- (M, e) ->* (M, e) (M, e) -> (M', e') (M', e') ->* (M'', e'') ------------------------------------------- (M, e) ->* (M'', e'')

Addition:

(M, e_1) -> (M', e_1') (M', e_2') ->* (M'', e_2') -------------------------------------------------- (M, e_1 + e_2) ->* (M'', e_1' + e_2')

Note that we've enforced an Memory evaluation order here. But not! See
that perhaps we can apply the reflexivity rule to `e_1`

, then `e_2`

would
go first to the memory.

(M, e_1) ->* (M', e_1) (M', e_2) ->* (M'', e_2') ------------------------------------------------- (M, e_1; e_2) ->* (M'', e_1'; e_2')

But this is problematic as before; we could use the reflexivity rule here
for `e_1`

and `e_2`

would modify Memory first.

So one approach here would be to enforce a precedence ordering, Left-to-Right, for example.

Now on to a evaluation relation:

eval(e) = v (<empty>, e) ->* (M', v) ------------------------ eval(e) = v

But we still have the issue of nondeterministic evaluation.

Remember, we want a language to have three things:

- Syntax
- Semantics
- Shared Properties

Our language has all sorts of undefined behavior. Eval is a partial function.

Maybe we could use…types?

Let us add a judgement that rules out programs we don't want to include in our language, and one way to do that would be to use a type system.

#### 3.2.1 Type Systems

Γ = · | Γ,x

Let's add a judgement:

[ Γ |- e ] x ∈ Γ ------------ Γ ⊢ x ----------- nat Γ ⊢ n ----------- bool Γ ⊢ b Γ, x ⊢ e --------------------- Γ ⊢ λ x.e

Let us add a precondition for eval:

Γ |- e:A e ->* v ---------------------------------------- eval(e) = v

To make our system richer, we need to add types so our judgements yield more information:

Let us define types:

A, B := Nat | Bool | A -> B

(`A`

& `B`

are metavariables for the same set)

We extend our type judgements:

[ Γ ⊢ e: A ] x: A ∈ Γ --------------- Γ ⊢ x: A (similar for Nat and Bool) for functions: Γ x:A ⊢ e: B --------------------------------- function Γ ⊢ λ x: A.e : A -> B Γ ⊢ e_1: A -> B Γ ⊢ e_2: A --------------------------------------- application Γ ⊢ e_1 e_2 Γ ⊢ e_1: Nat Γ ⊢ e_2: Nat ------------------------------------- addition Γ ⊢ e_1 + e_2: Nat

But we are faced with an issue: there are some programs that won't work:

eval((λ x.x) 0) = ???

Note that this won't work we can't statically find out a type for `x`

.

So we need to write a bunch of specialized typed identity functions for each type. Rude.

Let's see what happens when we don't require annotations. Let's try to build a derivation tree for:

x: Nat ⊢ x: Nat --------------------- ------------- ⊢ λ x.x: ? -> ? ⊢ 0: Nat ------------------------------------ |- (λ x.x) 0

The unification process by which we "infer" the types of functions can become undecidable (or just computationally very expensive) as our language grows richer.

Since we want to talk about compilers, we want decidable type systems, the annotations need to stay.

We want a Theorem:

· ⊢ e: A implies eval(e) = v

That means all programs are defined, and that they terminate. But for the most part, we can't get that. Instead we can probably get:

· ⊢ e: A => eval(e) = v or eval(e) = Ω (non-termination) or eval(e) = Error

And this would be type safety.

## 4 Lecture 5, CPSC 539 Formal Reasoning about Compilers, 11 January 2019

### 4.1 Overview

- Assignment Details

### 4.2 Back to Type Safety

Type safety theorem from last time:

· |- e : A => eval(e) = v

This is super strong, usually we'd include non-termination (divergence) or errors.

· |- e: A => eval(e) = v | eval(e) = Ω (non-termination) | eval(e) = Error

How do we prove this?

How do we prove, for instance, that `1 + 1 = 2`

? We build a derivation
tree that proves the fact using the axioms we've defined.

These are built using only implication.

P => Q P --------- Q

So how do we prove type safety, that `∀ e ∈ expr: eval(e) = v`

?

For this we need induction, defined:

Let J be some inductively defined judgement, defined by rules R_0 ... R_n To prove that some property P holds for P(J), it suffices to prove the following, we need n cases, one for every rule: 0. If P(subderivations of R_0) then P(R_0) ... n. If P(subderivations of R_n) then P(R_n)

Let us show that `∀ n: Nat, m: Nat: eval(n + m) = eval(m + n)`

By induction on `n ∈ Nat`

:

Case 0: P holds for eval(0 + m) = eval(n + 0) We know that 0 + m -> m But we need to show that eval(m + 0) = m. We'd have to prove a lemma via induction to prove this fact.

Case 1: We are operating over the second rule for e \in Nat (the inductive one): n = (add1 n') n': Nat --------------- (add 1 n'): Nat If \forall m: eval(n' + m) = eval(m + n') then eval((add1 n') + m)) = eval (m + (add1 n')) Let us step from: eval((add 1 n') + m) -> eval(n' + (add 1 m)) Instantiate induction hypothesis with (add 1 n') for m: eval(n' + (add1 m)) = eval((add 1 m) + n') So now that we have eval((add 1 m) + n') We reduce again, and we have eval(m + (add1 n')) QED

Notes that this is slightly tedious and not particularly enlightening.

Oh, good point, that proof is flawed because `eval`

is defined as a relation between expr and *values*:

We need an equivalence relation between *expressions*:

e ≡ e' e ->* e'' e' ->* e'' --------------------- e ≡ e'

Now that we have equivalence defined, we can clean up the above proof.

So for a type safety proof:

Thm: If · ⊢ e : A => eval(e) = v

Note we have all these typing rules for all the proofs in the language, so we'll have cases for all the typing rules in our language:

Let's do the natural number case:

Prove that `eval(n) = v`

.

If · ⊢ n: Nat => eval(n) = v. 1st, appeal to the definition of eval: n ->* v ----------- eval(n) = v We have conversion rules for n. And thus we can show that n ->* n.

Now for the if case:

If P holds for the subderivations of if (of which there are 3): if 1) · ⊢ e: Bool implies eval(e) = v and 2) · ⊢ e_1: B implies eval(e_1) = v_1 and 3) · ⊢ e_2: B implies eval(e_2) = v_2 then if · ⊢ if e then e_1 else e_2: B => eval(if e then e_1 else e_2) = v' We need a lemma that appeals to canonical forms, namely, that e: Bool, so eval(e) = true | false Let us expand our eval, but based on cases, we now have two derivation trees: In each of our derivation trees, we have to show that e ->* v', using reduction for if, we get e_1 or e_2 depending on case, but by reduction and conversion rules we can then appeal to the induction hypotheses and show that e ->* v', which satisifies eval(e) = v'. Sorry that was sloppy.

### 4.3 Getting type safety in "untyped" languages

Instead of forbidding `eval`

on some programs with a type system, just add more reductions so `eval`

is more defined:

[ e -> e] true + false -> Error("Cannnot add booleans") (add1 n) e -> Error("Cannot apply a number as a function") ...

Takeaway: there are **no** real untyped languages, there are some
dynamically-checked languages. And really, even "statically" typed
languages often have some runtime errors.

## 5 Lecture 6, CPSC 539 Formal Reasoning about Compilers, 14 January 2019

### 5.1 Review

- HW Questions
- Recap last lecture

### 5.2 Type Safety

e.g. for If

⊢ e:Bool ⊢ e_1:B ⊢ e_2:B ----------------------------------------- ⊢ if e then e_1 else e_2:B

It must be the case that if some expression e meets the type
judgement, then `∃ v.eval(e) = v`

e.g. for Lambda:

x:A ⊢ e:B --------------------------- ⊢ λx:A.e:A -> B

But when faced with

∃ v.eval(λ x:A.e) = v

We get stuck because we can't reduce a function straightaway. And now we have to make a distinction between observable values and irreducible expressions.

v ::= n | b i ::= v | λ x:A.e

So we need to strengthen our type safety theorem to include a premise based on the notion of irreducible expressinos.

So now we need Progress and Preservation lemmas.

Lem(Progress): Γ ⊢ e:A then either e is a value (irreducible expression) or e -> e'

Lem(Preservation): Γ ⊢ e:A and e -> e' then Γ ⊢ e': A

So with these lemmas we can proceed:

Thm(Type Safety): · ⊢ e:A => eval(e) = v

Proof: Since e:A, then either e is irreducible or e -> e' (By Progress) e = i But we get stuck. So we need to redefine our Theorem to only operate over a set of expressions that we consider well-typed, and that produced valid top-level observations. Relation =⊢ e= #+BEGIN_EXAMPLE ⊢ e: Nat --------------------- ⊢ e ⊢ e: Bool ---------------------- ⊢ e

This judgement excludes functions from our Type Safety proof, which now reads:

⊢ e => eval(e) v

Back to our previous part of the proof:

If e = i, then by canonical form eval(e) = v either e = n or e = b. Then e ->* e, and so eval(e) = e. Otherwise e -> e' We must show eval(e) = v. By Preservation, we know that e' is well-typed. By Progress, either e' = v or e' -> e'' (and may do so forever)

So our theorem now reads:

⊢ e => eval(e) = v | eval(e) = Ω

Note that if we have a Lemma for Strong Normalization, we know that
all well typed `e ->* i`

.

Now we need to prove Progress and Preservation

### 5.3 Progress

⊢ e:A => e = i | e -> e'

Proof by induction Γ ⊢ e:A By Cases on the inversion of Γ ⊢ e:A: Case Var (oh, we need to add vars to the list of irreducibles) Proved by definition.

Case Bool: ---------------------- ⊢ b: Bool Proved by definition.

Case Add: · ⊢ e_1:Nat · ⊢ e_2:Nat ---------------------------------------------- · ⊢ e_1 + e_2:Nat

If we had conversion instead of reduction, the inductive proof would be trivial (which is easy, but usually in a bad way), but we don't want that because conversion contains rules where we don't actually progress. So we need to split apart the congruence rules from the reflexivity and transitivity rules.

So in our conversion relation, we retain reflexivity and transitivity.

In our "compatible" or "congruence" closure, our new ->

Contains the rest of the congruence rules. PLUS

e reduces to e' --------------- e -> e'

So this relation now only has judgements where progress happens.

The reduction relation gets a new symbol because we stole the arrow.

So with these new relations, we can prove the case for Add appealing
to inversion on terms `e_1 + e_2`

, albeit with two cases:

Case 1: e_1 is irreducible, it steps to i + e_2' Case 2: e_1 and e_2 both step to some e_1' and e_2', by inductive hypothesis, e_1 + e_2 -> e_1' + e_2'

Case Lambda: x:A \vdash e:B ---------------------------- \vdash \lambda x:A.e: A -> B Lambdas are irred, so true by definition.

Case App: \vdash e_1:A -> B \vdash e_2:A ------------------------------- \vdash e_1 e_2: B Either a) e_1 is irreducible or b) e_1 -> e_1 Case 1: e_1 = \lambda x:A.e Either a) e_2 is irreducible b) e_2 -> e_2' Case a: (\lambda x:A.e) e_2 -> e[e_2/x] Case b: (\lambda x:A.e) e_2 -> (\lambda x:A.e) e_2' Case 2: e_1 e_2 -> e_1' e_2

## 6 Lecture 7, CPSC 539 Formal Reasoning about Compilers, 16 January 2019

### 6.1 Review

- Today is add/drop deadline

### 6.2 Discussing Properties of Programs

- Monads: monads allow us to re-order monadic operations according to the monadic laws, like associativity, etc.
- Things like the Java Interface. If e
_{1}and e_{2}implement the same Interface, then we shouldn't be able to tell them apart. - Private fields

Note that all these things are judgements over programs. But how to reason about them?

#### 6.2.1 Introducing Logical Relations

\box e P hold

………

e P hold

What we would like to say is that if e is well typed, then e P holds, this is the

Fundamental Property Lemma: γ \vdash e:A => e P holds

We can talk about Lemmas where if e P hold => some desired Property holds

We an also talk about a equivalence relation where e =~ e:A

But we will use a proof of Strong Normalization to illustrate this concept.

\box e SN(A)

Lem: γ \vdash e:A => e SN(A) Lem: e SN(A) then e ->* i where i is irreducible

Note that with γ, we're dealing with open terms, where they may be free variables. This is opposed to last week, where type safety was defined over expressions that are well-typed over the empty environment.

We wish to talk about program components, so we need to add gamma back:

\box Γ \vbox e

Note that we have components here, not just whole programs

Let's talk about linking. We define this by substitution, syntactically:

γ ::= <empty> | γ[x |-> e]

\box γ(e) = e'

<empty>(e) = e

γ(e_{2}[e_{1}/x]) = e'

gamma[x |-> e_{1}](e_{2}) = e'

But we have types, right? So

\box Γ \vdash γ

<empty> \vbox <empty>

Γ \vdash γ <empty> \vdash e:A

Γ, x:A \vdash γ[x |-> e]

Lemma: if Γ \vdash e and Γ \vdash γ => \vdash γ(e)

Basically, that well-typedness is closed over linking

#### 6.2.2 How to Formalize Logical Relations

First we define them over closed terms:

\box e SN(A)

Where A is a syntactic type. We consider each kind of type in the language.

Next, we define the relation in multiple parts. For a language, we need relations for each form in the language.

\box i SN(A)

What does it mean for a value to have this relation at type A?

Note that we have three types in our language: Bools, Nats, and \lambdas.

b SN(Bool)

n SN(Nat)

For both bools and nats, we need no preconditions.

But how do we define this for lambda?

∀ e' SN(A').e[e'/x] ->* i i SN(B)

λ x:A'.e SN(A' -> B)

Back to our expression relation:

e ->* i i SN(A)

e SN(A)

But now we can rewrite our λ value relation:

∀ e' SN(A').e[e'/x] SN(B)

λ x:A'.e SN(A' -> B)

So far, we've defined the things we need to prove to show strong normalization for closed terms. But what about open terms?

Let us say we have an open term:

Γ \vdash e:A

How do we claim that e SN(A)?

Let us say we had a well-typed closing substitution?

Γ \vdash e:A and Γ \vdash γ => γ(e) SN(A)

But how do we know if the linking substitution is part of the logical relation SN?

\vbox γ SN(Γ)

<empty> SN(∅)

γ SN(Γ) e SN(A)

γ[x |-> e] SN(Γ, x:A)

Now we can rewrite our fundamental property:

Γ \vdash e:A and Γ \vdash γ and γ SN(Γ) => γ(e) SN(A)

We will prove this over Γ \vdash e:A

This Lemma is sufficient to prove Strong Normalization:

Thm(Strong Normalization) <empty> \vdash then e ->* i

#### 6.2.3 COMING NEXT: COMPILATION

## 7 Lecture 8, CPSC 539 Formal Reasoning about Compilers, 18 January 2019

### 7.1 What is a language

Collection of expressions: syntax, BNF grammars (judgements)

Operations on these expressions: Reduction, Congruence, and Conversion, relations, which gives us an Evaluation function

Shared Properties, like Type Safety

We've spent the previous time talking about the Simply Typed Lambda Calculus

### 7.2 But what do we want, really? What is a COMPILER?

A translation between languages. What is a language? See above.

We usually think about compilers between nice languages and into languages just above bits (assembly).

What do we think about when we say assembly?

- Registers
- Instructions
- Accumulator
- Jumps

So how do we start formalizing our assembly language?

#### 7.2.1 The Syntax of Assembly

- Integers (32-bit int) i |[ i
_{1}+ i_{2}]|

(We place "well known" operations inside semantic double brackets)

- Registers r
- Addresses a
- Labels l
- Instructions I ::= mov d, s | add1 d | sub1 d | jump l
- Destinations d ::= r | a | [ a + i ]
- Source s ::= i | r | a | [ a + i ]
- Sequencing S ::= I; S | ret
- Blocks B ::= l : S
- Program P ::= B | P; B

So we want an eval(P) = w (word size value)

Let's write a simple program:

main: move r1, 0; add1 r1; ret;

Convention 1: start with first instruction of main:

\box eval(P) = w

P = B; …; main: S; B…

eval(P) = w

So let's think about the first instruction "move r1, 0". It will put 0 in register 1 and move to the next instruction:

So our reduction should be from

\box (R, S) ->> (R, S)

Now we need to add some things to our syntax:

R ::= ⋅ | R[ r |-> w ] w ::= i | o | l

Let's define a reduction for mov:

(R, (mov r_{1} i); S) ->> (R[ r |-> i ], S)

[ r |-> w ] ∈ R

(R, (add1 r); S) ->> (R [ r |-> |[ w + 1 ]|], S)

Note that the semantic brackets mean we know how to do the addition already.

What do we do in the end?

Convention 2: The final answer is the value of r1 when ret is reached.

Let's modify our judgement for eval:

(⋅, S) -> (R, ret) [ r1 |-> w ] ∈ R

eval(P) = w

Let's see another example:

main: mov a1, 0; mov r1, a1; add1 r1; ret

So! We have a memory now: let's add to the syntax:

H ::= ⋅ | H [ a |-> w ]

And our reduction relation now reads:

\vbox (R, H, S) ->> (R, H, S)

And now, a reduction relation for mov:

(R, H, (mov a, i); S) ->> (R, H [ a |-> i ], S)

…similarly for the move from memory TO a register…

And now, another program:

l: add1 r1; ret;

main: mov r1 0; jmp l

How do we **find** label l when we jump to it? We may need to keep track
of the whole program. Let's update our reduction relation:

\box (R, H, P, S) ->> (R, H, P, S)

P = B_{0}; … l: S'; B_{n} …

(R, H, P, (jmp l); S) ->> (R, H, P, S')

So, now we have an assembly language! But how do we get from the simply typed lambda calculus to ourAsm?

In our source language, we have first class functions, but we don't have this in our target language.

Let's look at function application. We can imagine:

λ x.e e'

What would that look like? Would we need a label for e'. But this leads us to a bigger issue: we have a compositional syntax in our STLC, but in order to figure out "where to go next", we need continuation passing style. This will be our first compiler pass.

### 7.3 BRIEF OVERVIEW OF COMPILATION

STLC -(continuation passing style, making returns/control flow explicit)-> CPS IR -(closure conversion, make free variables explicit)-> C-like IR -(heap allocation)-> Heap Alloc IR -(code generation, registers explicit, etc.)-> ASM

## 8 Lecture 9, CPSC 539 Formal Reasoning about Compilers, 21 January 2019

We're finally talking about compilers!

#### 8.0.1 Coming up: a fast overview of compilation passes

- STLC
- STLC with explicit jumps (via CPS translation)

For example:

λ x.x (1 + 1) -> λ x.x 2 -> x[2/x] = 2

Compiles to:

l: ret; main: mov r1 1; add1 r1; jump l

How do we abstract this?

We can say that main is in some way: do 1 + 1; jump l

So this is where we would like to get to today.

What's going on here?

(1 + 1) ->* 2

(λ x.x) (1 + 1) ->* ?

We would like to compile this to:

do (1 + 1); jump

(λ x.x) (1 + 1) ->* ?

Let's label the lambda term, "k"

do (1 + 1); jump

k:(λ x.x) (1 + 1) ->* ?

Let's create a continuation:

(λ k.(let x=1 + 1 in (k x))) (λ x.x)

Let's say we wanted to continue the computation, with "+ 2". Let's create a new continuation:

λ k_{2}.(λ k.(let x=1 + 1 in (k x)) (λ x.(k_{2} x))) (λ x.x+2)

How do we define CPS translation?

\box |[ e ]| = e

[ v ] | = (λ k.k) | [ v ] |

How do we translate values?

\box |[ v ]| = v

[ n ] | = n |

[ b ] | = b |

[ λ x.e ] | = λ x. | [ e ] |

Let's go back to \box |[ e ]| = e:

[ e_{1} + e_{2} ] |
= λ k. | [ e_{1} ] |
λ x. | [ e_{2} ] |
λ y.(let z=x + y in k z) |

[ e_{1} e_{2} ] |
= λ k. | [ e_{1} ] |
(λ f. | [ e_{2} ] |
(λ x.f x k) |

What does eval look like in the RHS of relation \box |[ e ]| = e

\box eval(e) = o

e (λ x.x) ->* o

eval(e) = o

Let's call the identity the halt continuation, or the return continuation.

What's in the syntax of our language?

v ::= x | b | n | λ x.e
e ::= v_{1} v_{2} | let x = v_{1} + v_{2} in v_{3} x

Now

\box e -> e

(λ x.e) v -> e[v/x]

let x=v_{1} + v_{2} in v_{3} x -> v_{3} (v_{1} + v_{2})

Let's go back to translating \box |[ e ]| = e for if expressions

[ if e_{1} then e_{2} else e_{2} ] |
= λ k. | [ e_{1} ] |
(λ x.if x then | [ e_{2} ] |
k else | [ e_{3} ] |
k) |

How would we do a proof of program correctness for this compiler?

Thm. Whole Program Correctness: If eval(e) = o then eval(|[ e ]|) = |[ o ]|

Remember though, we want to consider linking:

Thm. Partial Program Correctness: If Γ \vdash e, Γ \vdash γ, eval(γ(e)) = o then eval(|[ γ ]|(|[ e ]|) = |[ o ]|