What even is compiler correctness?

:: research

In this post I precisely define common compiler correctness properties. Compilers correctness properties are often referred to by vague terms such as “correctness”, “compositional correctness”, “separate compilation”, “secure compilation”, and others. I make these definitions precise and discuss the key differences. I give examples of research papers and projects that develop compilers that satisfy each of these properties.

What is a Language

Our goal is to give a generic definition to compiler correctness properties without respect to a particular compiler, language, or class of languages. We first give a generic definition of a Language over which a generic Compiler can be defined.

A Language \mathcal{L} is defined as follows. \newcommand{\peqvsym}{\overset{P}{\simeq}} \newcommand{\ceqvsym}{\overset{C}{\simeq}} \newcommand{\leqvsym}{\overset{\gamma}{\simeq}} \newcommand{\ctxeqvsym}{\overset{ctx}{\simeq}} \newcommand{\neweqv}[3]{#2 \mathrel{#1} #3} \newcommand{\peqv}{\neweqv{\peqvsym}} \newcommand{\ceqv}{\neweqv{\ceqvsym}} \newcommand{\leqv}{\neweqv{\leqvsym}} \newcommand{\ctxeqv}{\neweqv{\ctxeqvsym}} \begin{array}{llcl} \text{Programs} & P \\ \text{Components} & C \\ \text{Linking Contexts} & \gamma \\ \text{Link Operation} & γ(C) & : & P \\ \text{Program Equivalence} & \peqvsym & : & P \to P \to Prop \\ \text{Linking Equivalence} & \leqvsym & : & \gamma \to \gamma \to Prop \\ \text{Component Equivalence} & \ceqvsym & : & C \to C \to Prop \\ \text{Observational Equivalence} & \ctxeqvsym & : & C \to C \to Prop \\ \end{array} where \ctxeqvsym is the greatest compatible and adequate equivalence on Components.

A Language \mathcal{L} has a notion of Programs P . Programs can be evaluated to produce observations. Program Equivalence \peqv{P_1}{P_2} defines when two Programs produce the same observations. A Language also has a notion of Components C . Unlike Programs, Components cannot be evaluated, although they do have a notion of equivalence. However, we can produce a Program from a Component by linking. We Link by applying a Linking Context \gamma to a Component C , written \gamma(C) . Linking Contexts can also be compared for equivalence using Linking Equivalence \leqv{\gamma_1}{\gamma_2} . Observational Equivalence is a “best” notion of when two Components are related. Note, however, that a Language’s Observational Equivalence is completely determined by other aspects of the language. We are not free to pick this relation.

C is a Language; its definition is as follows. Let P be any well-defined whole C program that defines a functionmain; such a program would produce a valid executable when compiled. Let C be any well-defined C program that defines a functionmain, but requires external libraries to be linked either dynamic or statically. Such a component would produce a valid object file when compiled, but would not run without first being linked. Let \gamma be directed graphs of C libraries with a C header file. Define the Link Operation by static linking libraries at the C level. Define two Programs to be Program Equivalent when the programs both diverge, both raise the same error, or both terminate leaving the machine in the same state. Define two Linking Contexts to be Equivalent when they are exactly the same. Define two Components to be Component Equivalent when both are Program Equivalent after Linking with Linking Equivalent Linking Contexts.

Coq (or, CIC) is a Language; its definition is as follows. Let P be any closed, well-typed Coq expression. Let C be any open, well-typed Coq expression. Let \gamma be maps from free variables to Programs of the right type. Define the Link Operation as substitution. Define Components Equivalence and Program Equivalence as definitional equality. Define Linking Equivalence by applying Program Equivalence pointwise to the co-domain of the maps.

What is a Compiler

Using our generic definition of Language, we define a generic Compiler as follows.

\newcommand{\newsteqvsym}[1]{_S\!\!#1_T} \newcommand{\psteqvsym}{\newsteqvsym{\peqvsym}} \newcommand{\lsteqvsym}{\newsteqvsym{\leqvsym}} \newcommand{\csteqvsym}{\newsteqvsym{\ceqvsym}} \newcommand{\psteqv}{\neweqv{\psteqvsym}} \newcommand{\csteqv}{\neweqv{\csteqvsym}} \newcommand{\lsteqv}{\neweqv{\lsteqvsym}} \begin{array}{llcl} \text{Source Language} & \mathcal{L}_S \\ \text{Target Language} & \mathcal{L}_T \\ \text{Program Translation} & \leadsto & : & P_S \to P_T \\ \text{Component Translation} & \leadsto & : & C_S \to C_T \\ \text{Cross-Language (S/T) Program Equivalence} & \psteqvsym \\ \text{S/T Linking Equivalence} & \lsteqvsym \\ \text{S/T Component Equivalence} & \csteqvsym \\ \end{array}

Every Compiler has a source Language \mathcal{L}_S and target Language \mathcal{L}_T . We use the subscript _S when referring to definition from \mathcal{L}_S and _T when referring to definitions from \mathcal{L}_T . Every Compiler defines a translation from \mathcal{L}_S Programs to \mathcal{L}_T Programs, and similarly a translation on Components. A Compiler also defines cross-language relations on Programs, Components, and Linking Contexts.

We can define a Compiler from C to x86 as follows. Let \mathcal{L}_S be the Language for C defined earlier. Define a Language for x86 similarly. Let gcc be both the Program and Component Translation. Define S/T Program Equivalence as compiling the Source Language Program to x86, and comparing the machine states after running the x86 programs. Define S/T Linking Equivalence similarly to the definition given for the C Language. Define S/T Component Equivalence by linking with S/T Equivalent Linking Contexts and referring to S/T Program Equivalence.

We can define a Compiler from Coq to ML as follows. Let \mathcal{L}_S be the Language for Coq defined earlier. Define a Language for ML similarly. Let the Coq-to-ML extractor be both the Program and Component Translation. Define Program Equivalence via a closed cross-language logical relation indexed by source types. Define Component Equivalence by picking related substitutions, closing the Components, and referring to the Program Equivalence. Define Linking Equivalence by applying Program Equivalence pointwise to the co-domain of the map.

What Even is Compiler Correctness

Type Preservation

The simplest definition of compiler correctness is that we compile Programs to Programs, i.e., our compiler never produces garbage. A slightly less trivial definition is that we compile Components to Components. In the literature, these theorems are called “Type Preservation”. Typically, Type Preservation also connotes that the target language has a non-trivial type system and the compiler is obviously non-trivial.

Type Preservation (Programs)
P_S \leadsto P_T

Type Preservation (Components)
C_S \leadsto C_T

Type Preservation is only interesting when the source and target languages provide sound type systems that enforces sophisticated high-level abstractions. Even then, it still requires other properties or tests to ensure the compiler is non-trivial. For the user to understand the guarantees of Type Preservation, it is still necessary to understand the target language type system and the compiler.

A compiler that compiles every Program to 42 is type-preserving, in the trivial sense. By definition, the source Programs are Programs and 42 is a valid Program in many Languages. However, if you were to call such a compiler “Type Preserving”, the academic community may laugh at you.

A C-to-x86 compiler is type-preserving, in the trivial sense. Neither C nor x86 provide static guarantees worth mentioning. If you were to call such a compiler “Type Preserving”, the academic community may laugh at you.

The CompCert C-to-Mach is type-preserving, in a weak but non-trivial sense. CompCert enforces a particular memory model and notion of memory safety, and preserves this specification through the compiler to a low-level machine independent language called Mach. The assembler is type-preserving in a trivial since, since x86 provides no static guarantees to speak of.

The Coq-to-ML extractor is type-preserving, in a pretty trivial sense. As ML has a less expressive type system than Coq, and the extractor often makes use of casts, Type Preservation provides few guarantees for Components. For example, it is possible to cause a segfault by linking a extracted ML program a with a stateful ML program.

The System F-to-TAL compiler is type-preserving in a strong sense. System F provides strong data hiding and security guarantees via parametric polymorphism. TAL provides parametric polymorphic and memory safety, allowing all of System F’s types to be preserved. Even so, type preservation could hold if we compile everything to the trivial TAL program, such as halt[int]. However, a quick look at the definition of the compiler or a small test suite is sufficient to convince us that the compiler is non-trivial, and thus Type Preservation is meaningful in this context.

Whole-Program Correctness

The next definition is what I would intuitively expect of all compilers (that are bug free). A source Program should be compiled to a “related” target Program. In the literature, this theorem is referred to as “Whole-Program Correctness” or “Semantics Preservation”. Note that any Whole-Program or Semantics Preserving Compiler is also trivially Type Preserving. Such as Compiler may also be Type Preserving in a non-trivial sense.

Whole-Program Correctness
If P_S \leadsto P_T then \psteqv{P_S}{P_T}

A whole-program compiler provides no guarantees if we attempt to compile a Component and then link. Since many, arguably all, Programs are actually Components, Whole-Program Correctness is of limited use. A notable exception is in the domain embedded systems. In this domain, writing a whole source program may be practical.

The CompCert C-to-Asm compiler is proven correct with respect to Whole-Program Correctness, with machine checked proofs. CompCert refers to this guarantee as “semantics preservation”. Prior versions of CompCert pointed out that, while it is possible to Link after compilation, “the formal guarantees of semantic preservation apply only to whole programs that have been compiled as a whole by CompCert C.” More recent versions lift this restrctions, as we discuss shortly.

The CakeML CakeML-to-Asm compiler is proven correct with respect to Whole-Program Correctness, with machine checked proofs. CakeML is “a substantial subset of SML”. Asm is one of several machine languages: ARMv6, ARMv8, x86–64, MIPS–64, and RISC-V. The assemblers here, unlike in CompCert, are proven correct.

Compositional Correctness

Intuitively, Compositional Correctness is the next step from Whole-Program Correctness. Compositional Correctness should give us guarantees when we can compile a Component, then Link with a valid Linking Context in the target Language.

Compositional Correctness
If C_S \leadsto C_T and \lsteqv{\gamma_S}{\gamma_T} then \psteqv{\gamma_S(C_S)}{\gamma_T(C_T)}

To understand the guarantees of this theorem, it is necessary to understand how Linking Contexts are related between the source and target Languages. For instance, some compilers may allow linking with arbitrary target Linking Contexts. Some compilers may restrict linking to only Linking Contexts produced by the compiler.

The phrase “Compositional Correctness” usually connotes that the relation \lsteqvsym is defined independently of the compiler—that is, it is used to mean there is a specification separate from the compiler for cross-language equivalent Linking Contexts. This supports more interoperability, since linking is permitted even with Linking Contexts produced from other compilers, or handwritten in the target language, as long as they can be related to source Linking Contexts.

Compositional Compiler Correctness
If C_S \leadsto C_T and \lsteqv{\gamma_S}{\gamma_T} then \psteqv{\gamma_S(C_S)}{\gamma_T(C_T)} (where \lsteqvsym is independent of \leadsto )

The phrase “Separate Compilation” usually connotes that linking is only defined with Linking Contexts produced by same compiler. That is, when \lsteqv{\gamma_S}{\gamma_T} \iff \gamma_S \leadsto \gamma_T .

Correctness of Separate Compilation
If C_S \leadsto C_T and \gamma_S \leadsto \gamma_T then \psteqv{\gamma_S(C_S)}{\gamma_T(C_T)}

Some papers present a variant of “Semantics Preservation” stated of Components instead of Programs. Usually, this theorem implies Compositional Correctness. This require a cross-language equivalence on Components, which is usually defined in terms of linking with S/T Equivalent Linking Contexts and observing S/T Equivalent Programs.

Semantics Preservation
If C_S \leadsto C_T then \csteqv{C_S}{C_T}

Some papers define interoperability between source Components and target Linking Contexts, and between target Components and source Linking Contexts. This supports a broader notion of linking and thus a more widely applicable Compositional Correctness guarantee. However, it requires understanding the source/target interoperability semantics to understand the guarantee. There is no way to relate the resulting behaviors back to the source language, in general.

Open Compiler Correctness
  1. If C_S \leadsto C_T , then for all \gamma_T , \psteqv{\gamma_T(C_S)}{\gamma_T(C_T)}
  2. If C_S \leadsto C_T , then for all \gamma_S , \psteqv{\gamma_S(C_S)}{\gamma_S(C_T)}

Note that to satisfy Open Compiler Correctness, two new definitions of linking must be defined: one that links target Linking Contexts with source Components, and one that links source Linking Contexts with target Components.

Most compilers that generate machine code aim to be Compositional Compilers, since we should be able to link the output with any assembly, even that produced by another compiler. Compilers that target .NET VM, JVM, and LLVM are similar.

Some languages, like Coq and Racket, target special purpose VMs and aim only to be Separate Compilers.

Languages with FFIs can be thought to aim for a limited form of Open Compiler Correctness. For instance, we can link Java and C code for certain limited definitions of Linking Contexts. The full spirit of the theorem is limited to research projects, for now.

The Compositional CompCert compiler extends CompCert and its correctness proofs to guarantee Compositional Compiler Correctness. Linking is defined for any target Linking Context whose interaction semantics are related to a source Linking Context. The paper’s Corollary 2 titled “Compositional Compiler Correctness” is a generalized version of our theorem by the same name. They allow for compiling an arbitrary number of Components, then linking those with a Linking Context.

The SepCompCert compiler extends CompCert and its correctness proofs to guarantee Separate Compiler Correctness. This work notes that Separate Compiler Correctness is significantly easier to proof, increasing the proof size by 2%, compared to the 200% required in Compositional CompCert. This work was merged into CompCert as of version 2.7.

The Pilsner compiler is an MLish-to-Assemblyish compiler that guarantees Compositional Compiler Correctness. Linking is defined for any target Linking Context that is related to a MLish Component by a PILS.

Perconti and Ahmed develop a compiler from System F to a low-level typed IR that guarantees Open Compiler Correctness. In every Language, Linking is defined for any Linking Context in the source, intermediate, or target languages that has a compatible type. This paper defined a multi-language semantics in which all languages can interoperate.

Full Abstraction/Secure Compilation

Some properties of a program cannot be stated in terms of a single run of the program. We require more sophisticated compiler correctness theorems to show these properties are preserved. For instance, security properties, such as indistinguishably of a cipher text to a random string, are relational properties. These can only be stated as a property between two Programs in the same Language.

Fully abstract compilers, seek to prove compilers preserve these relational properties. Since security properties are often relational, these are sometimes called “Secure Compilers”. Often times, we also want to reflect equivalence, which usually follows from Compositional Correctness. Full abstraction refers specifically to preserving and reflecting Observational Equivalence. Papers on this topic often focus on equivalence preservation, since equivalence reflection by itself usually follows from compiler correctness, and preservation is the direction of interest for stating security properties.

Compilers that guarantee these properties are limited to research projects, as there are many open problems to be solved. The key difficulty lies in the proofs of Equivalence Preservation, which essentially requires “decompiling” a target Program into a source Program.

Equivalence Preservation
If \ceqv{C_S}{C'_S} and C_S \leadsto C_T and C'_S \leadsto C'_T then \ceqv{C_T}{C'_T}

Equivalence Reflection
If \ceqv{C_T}{C'_T} and C_S \leadsto C_T and C'_S \leadsto C'_T then \ceqv{C_S}{C'_S}

Full Abstraction
Let C_S \leadsto C_T and C'_S \leadsto C'_T . \ctxeqv{C_S}{C'_S} iff \ctxeqv{C_T}{C'_T}

Bowman and Ahmed develop an Equivalence Preserving and Reflecting compiler from The Core Calculus of Dependency (DCC) to System F. DCC guarantees certain security properties, which are preserved by encoding using parametric polymorphism. This compiler also satisfies Compositional Compiler Correctness, using a cross-language logical relation to define relatedness of Components between languages. This compiler is not Fully Abstract, as it does not define Contextual Equivalence. Instead, the compiler Preserves and Reflects the security property of interest.

Fournet et al. develop a Fully Abstract compiler from a language similar to a monomorphic subset of ML with exceptions to JavaScript. This paper demonstrates a key difficulty in Fully Abstract compilers. Often, the source language must be artificially constrained (in this case, by eliminating polymorphism and adding exceptions) in order to support back-translation.

Devriese et al. develop a Fully Abstract compiler from STLC to the untyped lambda calculus. This paper developed a key innovation in back-translation techniques, allowing a more expressive and less typed target language to be back-translated to a less expressive source and more typed source language.