In What Sense is WebAssembly Memory Safe?

2023-05-18 :: research

I’ve been trying to understand the semantics of memory in WebAssembly, and realized the “memory safety” doesn’t mean what I expect in WebAssembly.

What is memory safety?

Here are some definitions.

Memory safety is a feature of programming languages that prevents certain types of memory-access bugs, such as out-of-bounds reads and writes, and use-after-free bugs. In an app that manages a list of to-do items, for example, an out-of-bounds read could involve accessing the nonexistent sixth item in a list of five, while a use-after-free bug could involve accessing one of the items on an already deleted to-do list.

https://spectrum.ieee.org/memory-safe-programming-languages

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences.

https://en.wikipedia.org/wiki/Memory_safety

Memory (un)safety in Wasm

WebAssembly (Wasm) is a language that guarantees “type safety … [preventing] invalid calls or illegal accesses to locals, … memory safety, and … inaccessibility of code addresses or the call stack”.

(Technically, the Wasm paper describes Wasm as a binary code format, that happens to be presented as a language.)

Formally, a whole Wasm program that type checks is guaranteed to either be a well-typed value, or take an evaluation step to a well-typed program, or evaluate to the well-known dynamic error “trap”.

This is in contrast to an unsafe language like C. A well-typed C program might take a step to a well-typed program, or it might evaluate to a value of arbitrary type or no type. For example, a well-typed program of type char that reads from a buffer might evaluate to a well-typed char, or it might evaluate to an arbitrary integer that does not correspond to any character because you were reading uninitialized memory.

For example, consider the following C program.

// unsafe.c
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char** argv) {
  char* buf = malloc(0);
  memcpy(buf, "Hello world\n", 12);
  write(1, buf, 12);
  return 0;
}

(compiled with clang -o unsafe.exe unsafe.c; run with ./unsafe.exe)

This program creates a buffer of size 0, writes “Hello world\n” to it, and tries to print that to standard out. The program printed “Hello world” when I ran it, but it’s undefined behaviour, so anything could happen. I tried to writing a loop that mallocd lots of memory and wrote arbitrary numbers, but never managed to crash the program. Still, it’s not memory safe.

The equivalent Wasm program is below.

;; safe.wat
(module
 (import "wasi_unstable" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

 (memory 0)
 ;;(memory 1)
 (export "memory" (memory 0))

 (data (i32.const 0) "Hello World\n")

 (func $main (export "_start")
       (i32.store (i32.const 12) (i32.const 0))
       (i32.store (i32.const 16) (i32.const 12))
       (call $fd_write (i32.const 1) (i32.const 12) (i32.const 1) (i32.const 20))
       drop))

(run with wasmtime safe.wat)

In this example, we create a string “Hello World\n” at address 0 in the module’s memory. We then create (encode) a new iovs just after it, starting as address 12, with a pointer to address 0 and length 12. Then we call fd_write, from the wasi API.

Unfortunately, we declared the memory size to be 0, so trying to allocate this string fails, traps safely, and the process exits with an error message.

So wasm is memory safe right?

Well, sort of, but there’s a pretty key distinction here.

In C, we are creating a new pointer with malloc. We are allocating a new data structure, then using it (unsafely).

In Wasm, there is exactly one memory for the entire module. Inside that memory, we encode 2 data structures: our string, and the iovs structure used by fd_write. All access to the global memory are safe. But not all accesses to the encoded data structures are.

Most application will create data structure within the memory. That’s what our call to fd_write did. The two stores actually create an iovs structure in the global memory. We have no guarantees, within Wasm, about that data structure.

For example, here’s our Hello World program in Wasm which uses the memory safely and correctly, but creates an iovs whose length is claimed to be 100, larger than the actual string.

;; unsafe.wat
(module
 (import "wasi_unstable" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

 (memory 1)
 (export "memory" (memory 0))

 (data (i32.const 0) "Hello World\n")

 (func $main (export "_start")
       (i32.store (i32.const 12) (i32.const 0))
       (i32.store (i32.const 16) (i32.const 100))
       (call $fd_write (i32.const 1) (i32.const 12) (i32.const 1) (i32.const 20))
       drop))

(run with wastime unsafe.wat)

When I run this, I get “Hello world\nd” printed to stdout. I have no idea where that trailing d comes from, and it didn’t crash, suggesting it read uninitialized memory of some kind.

Arguable, this is cheating: Wasm does not and cannot make claims about external system functions, and wasi is unstable. But IMO the root of the error isn’t really about wasi.

Really, the root cause of this error is memory unsafety, but of a data structure encoded within a Wasm module. In a truly memory-safe language, if I try to access the 100th element of a 12-character long string, I get an error:

> racket
Welcome to Racket v8.9 [cs].
> (string-ref "Hello world\n" 100)
; string-ref: index is out of range
;   index: 100
;   valid range: [0, 11]
;   string: "Hello world\n"
; [,bt for context]

But that doesn’t happen in Wasm.

Wasm memory safety doesn’t apply to data structures implemented (encoded) within the memory. It only applies to the module’s memory, which is protected from other modules, even those running in the same process’s virtual address space.

This means Wasm modules are protected from each other, and so this kind of memory unsafety probably isn’t a security risk, only a cause of logic bugs.

In Wasm, data structures have to be encoded anyway, since Wasm doesn’t provide any kind of structure data primitives; you only have integers and some integers are interpreted as addresses into memory. But, when you encode such data structures in the memory and use them incorrectly, you have no guarantees about what happens. You could read some arbitrary data (from your own module), or read some uninitialized memory (from your own module). I.e., you get out-of-bounds reads and writes.

In another view of this, memory is the only data structure in Wasm, and it is memory safe. That’s all the language can be responsible for; if you go about encoding weird things inside that data structure, errors are likely. But this doesn’t seem like what people would expect when they hear “memory safe”. At least, it’s not what I expected at first.