wasm

Opcodes

Every WebAssembly opcode group the interpreter handles, plus what’s intentionally out of scope.

The interpreter implements every WebAssembly Core opcode plus the sign-extension proposal, the full bulk-memory proposal, non-trapping float-to-int (trunc_sat_*), the reference-types proposal (funcref, externref, ref.null / ref.is_null / ref.func, table.get / table.set / table.size / table.grow / table.fill, typed select t*), the multi-memory proposal (every memory opcode now carries a memidx; modules may declare more than one linear memory, with a parallel HostFuncMulti surface for host functions that need to reach beyond memidx 0), and the full SIMD proposal (V128 value type plumbed end-to-end; all ~236 opcodes under the 0xFD prefix — v128.const, the 14 loads + 8 stores including load*_lane / store*_lane, lane access, integer + float arithmetic, shifts, min/max, bitwise + reductions, comparisons, narrow / extend / extadd_pairwise / extmul, float ↔ int conv, demote / promote, and i32x4.dot_i16x8_s). That’s enough to run real wasm32-wasip1 binaries produced by rustc end-to-end, and to host the full sysl standard-library test suite end-to-end as sysl’s wasm32-WASI backend.

Numeric (all four scalar types)

Every i32 / i64 / f32 / f64 opcode:

  • Constantsconst for each type.
  • Comparisons — signed and unsigned for ints (lt_s, lt_u, le_s, …); ordered for floats (lt, le, gt, ge, eq, ne).
  • Arithmeticadd, sub, mul, div_s / div_u (ints), div (floats), rem_s / rem_u.
  • Bitwiseand, or, xor (ints).
  • Shiftsshl, shr_s, shr_u, rotl, rotr (ints).
  • Conversion — every cross-type cast in the Core spec (i32.wrap_i64, i64.extend_i32_s/_u, i32.trunc_f32_s/_u/…, f32.convert_i32_s/_u/…, f32.demote_f64, f64.promote_f32).
  • Reinterpretationi32.reinterpret_f32, f64.reinterpret_i64, etc. (bit-level recasts that don’t change the value’s bits).

IEEE-754 results are deterministic across JVM, Scala.js, and Scala Native — including NaN bit patterns, signed-zero, and subnormal edges.

Sign-extension proposal

i32.extend8_s, i32.extend16_s, i64.extend8_s, i64.extend16_s, i64.extend32_s. Lifts a narrow signed value into the full operand-stack width. Required by rustc-built binaries.

Non-trapping float-to-int

The eight trunc_sat_* sub-opcodes under the 0xFC prefix (sub-opcodes 0..7):

  • i32.trunc_sat_f32_s / _u, i32.trunc_sat_f64_s / _u
  • i64.trunc_sat_f32_s / _u, i64.trunc_sat_f64_s / _u

Where trunc_f32_s of NaN or out-of-range traps, trunc_sat_f32_s returns 0 for NaN and saturates at the type’s MIN_VALUE / MAX_VALUE for out-of-range. Required by rustc binaries built with -C target-feature=+nontrapping-fptoint (now the default on stable).

Variables

local.get, local.set, local.tee, global.get, global.set. Mutable and immutable globals are both supported; global.set on an immutable global is caught by the validator.

Control flow

block, loop, if / else / end, br, br_if, br_table, return, call, call_indirect, unreachable, nop.

Multi-value blocks, loops, and ifs are supported — block parameters get re-fed on br to a loop, br_if carries multi-result values, etc.

Tail calls

The tail-call proposal is supported: return_call funcidx (0x12) and return_call_indirect typeidx tableidx (0x13) replace the current call frame instead of growing the call stack. The callee’s results must equal the current function’s results — the validator enforces this. Frame-reuse is observable: deeply recursive tail calls (the test suite exercises 100k iterations) run in constant frames memory.

Exception handling

Both forms of the exception-handling proposal are supported end-to-end — the legacy “phase 3” form (try / catch / catch_all / delegate / rethrow, what wasmtime + V8 + SpiderMonkey + wat2wasm‘s --enable-exceptions emit today) and the modern try_table form (0x1F plus an exnref value type and throw_ref, the phase-4 redesign that’s standardising now).

Legacy form

  • try blocktype (0x06) — open a block-shaped region whose body can be guarded by one or more catch clauses, a single catch_all, or a single delegate.
  • catch tagidx (0x07) — handle an exception whose tag matches tagidx; the tag’s payload params are pushed onto the operand stack at handler entry.
  • catch_all (0x19) — handle any exception regardless of tag; no payload is pushed.
  • delegate labelidx (0x18) — terminator that replaces end; on a throw out of the try body, re-fire the exception at the named outer label (must be a try or the function frame).
  • throw tagidx (0x08) — pop the tag’s payload params off the operand stack and raise the matching exception.
  • rethrow labelidx (0x09) — re-raise the exception caught by the named outer catch handler. Only valid inside a catch / catch_all clause.

A catch / catch_all arrived at by normal fall-through (i.e. the try body completed without throwing) is dead code; control jumps past the entire try/catch chain.

Modern try_table form

A single new opcode replaces the try / catch / delegate / rethrow cluster. The handler vector is parsed up front as an immediate, then the body runs as a regular block.

  • try_table blocktype vec(catch-clause) (0x1F) — open a block-shaped region. Each catch clause selects a tagidx (or wildcard) and a labelidx to branch to when a matching throw escapes the body.
  • throw_ref (0x0A) — pop an exnref and re-raise the carried exception. Replaces rethrow.
  • exnref valtype (wire byte 0x69) — carries a caught exception. Bound by catch_ref / catch_all_ref handler clauses; consumed by throw_ref. Locals, params, results, and blocktypes may all be exnref.

The four catch-clause shapes (encoded as a byte before each clause’s immediates):

ByteClauseBranch arity at target
0x00catch tagidx labelidxtag’s payload params
0x01catch_ref tagidx labelidxtag’s payload params + exnref
0x02catch_all labelidx(empty)
0x03catch_all_ref labelidxexnref

labelidx is counted with the try_table frame on the control stack — labelidx 0 names the try_table itself, 1 the next outer block, etc. On a matching throw delivery, the runtime trims the operand stack to the try_table‘s entry height, pushes the handler’s declared payload, then performs the equivalent of br labelidx. Clauses are scanned in declared order; the first match wins.

Tags

Tags are declared in Section 13 or imported from a host module under import-kind 0x04. Each tag references a functype in section 1 whose results must be empty — the params are the tag’s payload shape. A throw whose tag has params (i32, i64) pops two values (top of stack = last param). The validator rejects tagidxs out of range, rethrows outside any catch frame, delegates whose target is not an enclosing try or the function frame, tag functypes with non-empty results, and try_table catch clauses whose payload doesn’t match the target label’s branch arity.

An uncaught exception that propagates past the outermost _start/invoke call surfaces through the public API as Left(WasmError.UncaughtException(tagIdx, args)). The host can pattern-match against the tagidx and re-throw as a native exception.

Memory

  • Load/store — every width variant: i32.load, i32.load8_s/_u, i32.load16_s/_u, i64.load, i64.load8_s/…/load32_s/_u, f32.load, f64.load, plus all matching stores.

  • Sizingmemory.size and memory.grow. The optional max from section 5 is honoured: grow past it returns -1 rather than expanding.

  • Bulk-memory — the full proposal, all seven ops under the 0xFC prefix:

    • memory.copy (sub 0x0A), memory.fill (sub 0x0B).
    • memory.init (sub 0x08), data.drop (sub 0x09).
    • table.init (sub 0x0C), elem.drop (sub 0x0D), table.copy (sub 0x0E).

    memory.init / table.init copy from passive data / element segments; data.drop / elem.drop mark a segment as zero-length (idempotent). Active segments are still initialised at instantiation and then marked dropped automatically — subsequent *.init with n > 0 traps, matching wasmtime / V8 / wabt semantics.

Passive vs active data + element segments

Section 11 (data) and section 9 (element) carry sealed-trait segment kinds. Active segments behave as before (copied at instantiation). Passive segments stay addressable by dataidx / elemidx until the matching .drop. Declarative element segments pre-declare funcrefs for ref.func. Element-expression-bearing forms (flags 4..7) parse ref.null and ref.func as their constant expressions; segments may carry either funcref or externref payloads.

Reference types

Funcref + externref ride on a small set of new opcodes:

  • ref.null (0xD0 + reftype byte) — typed null reference. ref.null func produces a null funcref; ref.null extern a null externref.
  • ref.is_null (0xD1) — pop a reference, push 1 if it’s a ref.null, else 0.
  • ref.func (0xD2 + funcidx LEB) — produce a non-null funcref pointing at the named function. The validator enforces that the funcidx is declared — i.e. it appears in an export, in start, or in any element segment. Body-only references would be circular and so don’t count.
  • table.get (0x25 + tableidx LEB), table.set (0x26 + tableidx LEB) — read / write a table slot. Operand type is the table’s reftype.
  • table.size (0xFC sub 16 + tableidx LEB), table.grow (0xFC sub 15 + tableidx LEB), table.fill (0xFC sub 17 + tableidx LEB) — runtime-side table resize + range fill, with a typed fill value.

Externref slots carry an opaque host AnyRef. Wasm code can only move them around (table.{get,set}, local.{get,set}, global.{get,set}, ref.is_null); inspection happens host-side via the public API. The host hands them in as Value.RefExtern(yourObject) and pulls them back as the same identity.

call_indirect is now spec-restricted to funcref tables (an externref table can’t carry callable funcrefs); the validator rejects mismatches before code runs.

Tables + functions

Section 4 funcref + externref tables. call_indirect does a signature check at the call site against the operand-stack types and the target function’s declared type; a mismatch traps with Left(InvalidModule("call_indirect type mismatch")).

Stack

drop, select. Two select forms:

  • Untyped select (0x1B) — operand types are inferred. Spec-restricted to numeric value types when reference types are present; a reftype operand is rejected at validation with a “use select t*” diagnostic.
  • Typed select t* (0x1C) — explicit operand type, encoded as 0x1C u32:count valtype[count] with count == 1 (multi-value select isn’t enabled by any shipped proposal). Required for funcref / externref operands; also accepts the four numeric scalars.

SIMD

The WebAssembly SIMD proposal adds ~236 opcodes under the 0xFD prefix and a new V128 value type (16 raw bytes, lane interpretation chosen per-opcode). The full surface is implemented.

The V128 value type (host side)

final case class V128(bits: Array[Byte]) extends Value

A SIMD value is a raw 16-byte buffer; the lane shape (i8x16, i16x8, i32x4, i64x2, f32x4, f64x2) is not carried on the value — it’s chosen per-opcode at use time. Same 16 bytes, six possible interpretations. The interpreter enforces bits.length == 16 on every constructed value.

Byte order is little-endian per the spec: lane 0 of any shape starts at byte 0, the low byte of each lane comes first, and v128.load reads bytes in memory order into the same positions. So to build a V128 from four i32 lanes:

def i32x4(a: Int, b: Int, c: Int, d: Int): V128 =
  val buf = new Array[Byte](16)
  val bb  = java.nio.ByteBuffer.wrap(buf).order(java.nio.ByteOrder.LITTLE_ENDIAN)
  bb.putInt(a).putInt(b).putInt(c).putInt(d)
  V128(buf)

inst.invoke("dot", Seq(i32x4(1, 2, 3, 4), i32x4(5, 6, 7, 8)))

To inspect a V128 result from invoke:

inst.invoke("compute") match
  case Right(Seq(V128(bs))) =>
    val bb    = java.nio.ByteBuffer.wrap(bs).order(java.nio.ByteOrder.LITTLE_ENDIAN)
    val lanes = Array(bb.getInt, bb.getInt, bb.getInt, bb.getInt)
    println(lanes.mkString("[", ", ", "]"))
  case _ => ???

The test suite’s TestSupport.simd object has helpers (fromI8, fromI16, fromI32, fromI64, fromF32, fromF64) for each lane shape — they’re test-scope but easy to copy if your host code needs the same builders.

V128 equality: the case class derives equals from Array[Byte] reference equality (Scala’s Array doesn’t define structural equals). So V128(a) == V128(b) is true only if a eq b. Compare the bytes directly if you want value equality.

Foundations

  • V128 value type (wire byte 0x7B). First-class in function params, results, locals, globals, and blocktypes. Locals zero-init to 16 zero bytes.
  • v128.const (0xFD 0x0C + 16 raw little-endian bytes). The wat-side annotations (i32x4 1 2 3 4, i16x8 ..., etc.) are text-form only; the binary just sees 16 opaque bytes.
  • 0xFD prefix dispatch — the SIMD sub-opcode is LEB-encoded.

Tests cover raw byte round-trips, parameter / local / block-result plumbing, zero-init, and the wat-form lane-annotation equivalence (v128.const i8x16 and v128.const i16x8 of the same byte payload produce identical V128 values).

Loads + stores

Every load is [i32 addr] → [v128]; the store is [i32 addr, v128 value] → []. All ops carry a multi-memory-shaped memarg (the alignment LEB’s bit 6 flags an optional memidx LEB; offset follows).

OpcodeSubWhat it does
v128.load0x00Full 16-byte little-endian load.
v128.load8x8_s / _u0x01 / 0x02Read 8 source bytes, widen each (sign / zero) into 8 i16 lanes.
v128.load16x4_s / _u0x03 / 0x04Read 4 source i16s, widen each into 4 i32 lanes.
v128.load32x2_s / _u0x05 / 0x06Read 2 source i32s, widen each into 2 i64 lanes.
v128.load8_splat0x07Read 1 byte, broadcast across all 16 lanes.
v128.load16_splat0x08Read 2 bytes, broadcast across all 8 i16 lanes.
v128.load32_splat0x09Read 4 bytes, broadcast across all 4 i32 lanes.
v128.load64_splat0x0ARead 8 bytes, broadcast across all 2 i64 lanes.
v128.store0x0BFull 16-byte little-endian store.
v128.load32_zero0x5CRead 4 bytes into lane 0; zero the remaining 12.
v128.load64_zero0x5DRead 8 bytes into lane 0; zero the remaining 8.

Out-of-bounds (addr + offset + width past memory end) traps with MemoryOutOfBounds, same shape as the scalar memory ops.

Lane access

Every “build / inspect / rearrange a v128 lane-by-lane” surface lives here. Lane shapes — i8x16, i16x8, i32x4, i64x2, f32x4, f64x2 — pick the lane width (1/2/4/8 bytes) and the count (16/8/4/2 lanes). Lane immediates are validated < lane_count at compile time.

OpcodeSubWhat it does
i8x16.shuffle0x0D16-byte laneidx immediate (each < 32); each result lane is a[c] if c<16 else b[c-16].
i8x16.swizzle0x0EDynamic shuffle. s (top) is the index vector, v (below) is the source. Result lane i = v[s[i]] if s[i] < 16 else 0.
i8x16.splat0x0FBroadcast the low 8 bits of an i32 to 16 lanes.
i16x8.splat0x10Broadcast the low 16 bits LE to 8 lanes.
i32x4.splat0x11Broadcast 4 LE bytes to 4 lanes.
i64x2.splat0x12Broadcast 8 LE bytes to 2 lanes.
f32x4.splat0x13Broadcast the IEEE-754 bit pattern of an f32 to 4 lanes.
f64x2.splat0x14Broadcast the IEEE-754 bit pattern of an f64 to 2 lanes.
i8x16.extract_lane_s / _u0x15 / 0x16Read 1 byte at lane (signed / zero extended to i32).
i8x16.replace_lane0x17Write the low byte of an i32 at the lane.
i16x8.extract_lane_s / _u0x18 / 0x19Read 2 LE bytes at lane (signed / zero extended to i32).
i16x8.replace_lane0x1AWrite the low 16 bits LE at the lane.
i32x4.extract_lane0x1BRead 4 LE bytes at lane → i32.
i32x4.replace_lane0x1CWrite 4 LE bytes at lane.
i64x2.extract_lane0x1DRead 8 LE bytes at lane → i64.
i64x2.replace_lane0x1EWrite 8 LE bytes at lane.
f32x4.extract_lane0x1FRead 4 LE bytes at lane → f32 (raw IEEE-754 bits, no NaN canonicalisation).
f32x4.replace_lane0x20Write 4 LE bytes at lane.
f64x2.extract_lane0x21Read 8 LE bytes at lane → f64.
f64x2.replace_lane0x22Write 8 LE bytes at lane.

extract_lane / replace_lane carry a 1-byte lane immediate after the sub-opcode; i8x16.shuffle carries a 16-byte laneidx vector. splat and swizzle have no immediate beyond the sub-opcode.

Integer arithmetic

Lane-wise integer arithmetic across every integer shape. Plain add / sub / mul wrap modulo 2^lane_width; _sat_s / _sat_u clamp at the signed / unsigned bounds; avgr_u is the rounding unsigned average (a + b + 1) / 2. No i8x16.mul in the spec, no saturating variants past i16x8, no avgr_u past i16x8. All ops have no immediate.

OpcodeSubWhat it does
i8x16.abs0x60abs(MinValue) wraps to MinValue (overflow mod 2^8).
i8x16.neg0x61neg(MinValue) wraps likewise.
i8x16.add0x6EWraps mod 256 per lane.
i8x16.add_sat_s / _u0x6F / 0x70Clamps to [-128, 127] / [0, 255].
i8x16.sub0x71Wraps mod 256.
i8x16.sub_sat_s / _u0x72 / 0x73Clamps to the signed / unsigned lane bounds.
i8x16.avgr_u0x7B(a + b + 1) / 2 per lane (rounds up).
i16x8.abs / neg0x80 / 0x81Same shape as i8x16.
i16x8.add / add_sat_s / add_sat_u0x8E / 0x8F / 0x90Wrap / clamp to [-32768, 32767] / [0, 65535].
i16x8.sub / sub_sat_s / sub_sat_u0x91 / 0x92 / 0x93Wrap / clamp.
i16x8.mul0x95Low 16 bits of the full-width product.
i16x8.avgr_u0x9B(a + b + 1) / 2 per lane unsigned.
i32x4.abs / neg0xA0 / 0xA1abs(Int.MinValue) wraps.
i32x4.add / sub / mul0xAE / 0xB1 / 0xB5Wrap mod 2^32; mul is the low 32 bits.
i64x2.abs / neg0xC0 / 0xC1abs(Long.MinValue) wraps.
i64x2.add / sub / mul0xCE / 0xD1 / 0xD5Wrap mod 2^64.

Sub-opcodes ≥ 0x80 encode as 2-byte LEBs in the binary; wat2wasm emits the right shape, and the dispatch’s LEB decoder handles either width transparently.

Shifts + min/max

Lane-wise shifts (shl, shr_s, shr_u) on all four integer shapes, plus per-lane signed and unsigned min/max on i8x16, i16x8, i32x4 (the spec excludes i64x2.min/max). Shifts pop the i32 count from the operand stack — it’s not an immediate — and the spec takes count mod lane_width, so e.g. i8x16.shl(_, 8) is the identity.

OpcodeSubWhat it does
i8x16.shl0x6BShift left; count mod 8.
i8x16.shr_s / _u0x6C / 0x6DArithmetic / logical right shift; count mod 8.
i8x16.min_s / _u0x76 / 0x77Per-lane signed / unsigned minimum.
i8x16.max_s / _u0x78 / 0x79Per-lane signed / unsigned maximum.
i16x8.shl0x8BShift left; count mod 16.
i16x8.shr_s / _u0x8C / 0x8DArithmetic / logical right shift; count mod 16.
i16x8.min_s / _u0x96 / 0x97Per-lane signed / unsigned minimum.
i16x8.max_s / _u0x98 / 0x99Per-lane signed / unsigned maximum.
i32x4.shl0xABShift left; count mod 32.
i32x4.shr_s / _u0xAC / 0xADArithmetic / logical right shift; count mod 32.
i32x4.min_s / _u0xB6 / 0xB7Per-lane signed / unsigned minimum.
i32x4.max_s / _u0xB8 / 0xB9Per-lane signed / unsigned maximum.
i64x2.shl0xCBShift left; count mod 64.
i64x2.shr_s / _u0xCC / 0xCDArithmetic / logical right shift; count mod 64.

The signed vs unsigned distinction matters at the lane width: byte 0xFF is -1 signed but 255 unsigned, so i8x16.shr_s of it stays -1 while i8x16.shr_u of it becomes 0x7F; i8x16.min_s picks -1 as the minimum but i8x16.min_u picks 0.

Float arithmetic

Lane-wise IEEE-754 arithmetic across f32x4 and f64x2. Rounding (ceil, floor, trunc, nearest) and abs / neg / sqrt are unary; add / sub / mul / div and min / max / pmin / pmax are binary. All ops have no immediate.

abs / neg are bit-level (clear / flip the sign bit) and preserve NaN payloads exactly — useful for round-tripping signaling NaNs. Arithmetic ops produce an NaN when any operand is NaN; the bit pattern follows the same implementation-defined rule as scalar f32.add / f64.add (in practice the JVM’s canonical 0x7FC00000 / 0x7FF8000000000000).

min and max use IEEE-754 semantics: NaN-involving inputs produce NaN, and -0 orders below +0. pmin(a,b) and pmax(a,b) follow the spec’s compare-then-pick formula — pmin = if b<a then b else a, pmax = if a<b then b else a — which means a NaN-involving compare always returns false, so a is picked. That gives pmin / pmax a different NaN behaviour from min / max: a non-NaN a paired with a NaN b yields a (no NaN propagation), but a NaN a always propagates.

OpcodeSubWhat it does
f32x4.ceil0x67Round each lane toward +Inf.
f32x4.floor0x68Round each lane toward -Inf.
f32x4.trunc0x69Round each lane toward zero (preserves signed zero).
f32x4.nearest0x6ARound half to even per lane.
f64x2.ceil / floor0x74 / 0x75Same shape as f32x4, double precision.
f64x2.trunc0x7A
f64x2.nearest0x94
f32x4.abs / neg0xE0 / 0xE1Bit-twiddle the sign bit; NaN payloads preserved.
f32x4.sqrt0xE3sqrt(-x>0) = NaN; sqrt(-0) = -0.
f32x4.add / sub / mul / div0xE4 / 0xE5 / 0xE6 / 0xE7IEEE-754 per lane. 0/0 = NaN, 1/0 = +Inf.
f32x4.min / max0xE8 / 0xE9NaN → NaN; min(-0,+0) = -0.
f32x4.pmin / pmax0xEA / 0xEBif b<a then b else a / if a<b then b else a; NaN-compare picks a.
f64x2.abs / neg0xEC / 0xEDBit-level sign manipulation.
f64x2.sqrt0xEF
f64x2.add / sub / mul / div0xF0 / 0xF1 / 0xF2 / 0xF3
f64x2.min / max0xF4 / 0xF5
f64x2.pmin / pmax0xF6 / 0xF7

Bitwise + reductions

Six bitwise ops that ignore lane shape (the v128 is just 16 raw bytes), plus nine v128 → i32 reductions. v128.bitselect is the only SIMD ternary op — it takes three v128 operands (a, b, c) and returns (a AND c) OR (b AND NOT c), where c is the selector mask. All 15 ops have no immediate past the sub-opcode.

any_true is shape-agnostic — any byte non-zero returns 1, otherwise 0. *.all_true is lane-shape-aware: a v128 whose bytes are [0, 1, 0, 1, 0, 1, ...] is i8x16.all_true = 0 (every other byte is zero) but i16x8.all_true = 1 (every 16-bit lane is non-zero). *.bitmask packs the MSB of each lane into the i32 result at the lane-indexed bit position (lane 0 → bit 0).

OpcodeSubWhat it does
v128.not0x4DBitwise complement of all 16 bytes.
v128.and0x4EBitwise AND.
v128.andnot0x4Fa AND (NOT b) — note the asymmetry.
v128.or0x50Bitwise OR.
v128.xor0x51Bitwise XOR.
v128.bitselect0x52(a AND c) OR (b AND NOT c); ternary.
v128.any_true0x531 if any bit is set, else 0.
i8x16.all_true0x631 iff every byte is non-zero.
i16x8.all_true0x831 iff every 16-bit lane is non-zero.
i32x4.all_true0xA31 iff every i32 lane is non-zero.
i64x2.all_true0xC31 iff both i64 lanes are non-zero.
i8x16.bitmask0x64Top bit of each byte → 16-bit mask in i32.
i16x8.bitmask0x84Top bit of each i16 lane → 8-bit mask.
i32x4.bitmask0xA4Top bit of each i32 lane → 4-bit mask.
i64x2.bitmask0xC4Top bit of each i64 lane → 2-bit mask.

Comparisons

Lane-wise compare ops produce a result lane that is all-1s on true (0xFF… — the bitmask shape v128.bitselect consumes natively) and all-0s on false, in the same lane width as the inputs. Every op is v128 × v128 → v128, no immediate past the sub-opcode. 48 ops in total — three full integer shapes (i8x16 / i16x8 / i32x4) get the full eq, ne, lt_s, lt_u, gt_s, gt_u, le_s, le_u, ge_s, ge_u set; i64x2 gets the six signed forms only (the spec defines no _u variants for i64); f32x4 and f64x2 each get eq, ne, lt, gt, le, ge (no signedness — floats are inherently signed).

IEEE-754 NaN: every f32/f64 compare returns false when either operand is NaN, except ne which returns true. IEEE-754 signed zero: -0.0 == +0.0 is true and -0.0 < +0.0 is false.

OpcodeSubNotes
i8x16.eq / ne0x23 / 0x24Bit-pattern equality — sign-form doesn’t matter.
i8x16.lt_s / lt_u0x25 / 0x26-1 < 0 is true signed, false unsigned (0xFF > 0).
i8x16.gt_s / gt_u0x27 / 0x28
i8x16.le_s / le_u0x29 / 0x2A
i8x16.ge_s / ge_u0x2B / 0x2C
i16x8.eq / ne0x2D / 0x2E
i16x8.lt_s / lt_u0x2F / 0x30Sign-form chooses between two reads of the 16-bit lane.
i16x8.gt_s / gt_u0x31 / 0x32
i16x8.le_s / le_u0x33 / 0x34
i16x8.ge_s / ge_u0x35 / 0x36
i32x4.eq / ne0x37 / 0x38
i32x4.lt_s / lt_u0x39 / 0x3A_u uses Integer.compareUnsigned per lane.
i32x4.gt_s / gt_u0x3B / 0x3C
i32x4.le_s / le_u0x3D / 0x3E
i32x4.ge_s / ge_u0x3F / 0x40
i64x2.eq / ne0xD6 / 0xD7
i64x2.lt_s / gt_s0xD8 / 0xD9i64x2 has signed-only compares per spec.
i64x2.le_s / ge_s0xDA / 0xDB
f32x4.eq / ne0x41 / 0x42NaN-involving: every op false except ne.
f32x4.lt / gt0x43 / 0x44
f32x4.le / ge0x45 / 0x46
f64x2.eq / ne0x47 / 0x48
f64x2.lt / gt0x49 / 0x4A
f64x2.le / ge0x4B / 0x4C

Narrow / extend / extadd_pairwise / extmul + float-int conv + demote / promote

42 ops covering everything that changes lane width or moves between integer and float lanes. Mechanically: narrow takes two source v128s and packs them into one with saturating clamps; extend (the spec’s name for “widen”) pulls half the source lanes and sign- or zero-extends each into the wider lane width; extadd_pairwise pairs adjacent narrower lanes and sums each pair (with extension) into one wider lane; extmul fuses extend + multiply at the wider lane width so the product fits exactly. The 8 float↔int conversions follow the scalar trunc_sat_* / convert_* rules per lane (NaN → 0, ±overflow saturates). The _zero suffix on the f64x2 / i32x4 form means “result has 4 lanes but only the first 2 carry data, the rest are 0”; _low on the inverse direction means “read only lanes 0..1 of the source”.

OpcodeSubWhat it does
i8x16.narrow_i16x8_s / _u0x65 / 0x66Pack 16 Short lanes into 16 saturated Byte lanes (signed [-128,127] / unsigned [0,255]).
i16x8.narrow_i32x4_s / _u0x85 / 0x86Pack 8 Int lanes into 8 saturated Short lanes.
i16x8.extend_low/high_i8x16_s / _u0x870x8ARead 8 bytes (low or high half), sign- or zero-extend each to i16.
i32x4.extend_low/high_i16x8_s / _u0xA70xAARead 4 i16 lanes (half), extend to i32.
i64x2.extend_low/high_i32x4_s / _u0xC70xCARead 2 i32 lanes (half), extend to i64.
i16x8.extadd_pairwise_i8x16_s / _u0x7C / 0x7DPair adjacent bytes, sum with extension into 8 i16 lanes.
i32x4.extadd_pairwise_i16x8_s / _u0x7E / 0x7FPair adjacent i16 lanes, sum with extension into 4 i32 lanes.
i16x8.extmul_low/high_i8x16_s / _u0x9C0x9FMultiply extended low/high bytes at i16 precision (full product).
i32x4.extmul_low/high_i16x8_s / _u0xBC0xBFMultiply extended i16 lanes at i32 precision.
i64x2.extmul_low/high_i32x4_s / _u0xDC0xDFMultiply extended i32 lanes at i64 precision.
f32x4.demote_f64x2_zero0x5ERound 2 f64 lanes to f32 lanes 0..1; lanes 2 + 3 zero-filled.
f64x2.promote_low_f32x40x5FWiden f32 lanes 0..1 to f64.
i32x4.trunc_sat_f32x4_s / _u0xF8 / 0xF9Per-lane scalar trunc_sat: NaN → 0, ±overflow → INT_MIN/MAX (_s) or 0/0xFFFFFFFF (_u).
f32x4.convert_i32x4_s / _u0xFA / 0xFBPer-lane int → f32; _u treats the signed lane as UInt32 first.
i32x4.trunc_sat_f64x2_s_zero / _u_zero0xFC / 0xFD2 f64 → i32 lanes 0..1; lanes 2 + 3 zero-filled.
f64x2.convert_low_i32x4_s / _u0xFE / 0xFFRead i32 lanes 0..1 of source, widen to f64.

Dot product + load_lane / store_lane

Nine ops: one pairwise multiply-add at i32 precision, and eight partial memory accesses that touch a single lane.

i32x4.dot_i16x8_s (the only “wider lane multiply-add” op in the spec) reads two i16x8 vectors, pairs up adjacent lanes (a[2k] * b[2k] + a[2k+1] * b[2k+1]), and produces an i32x4. The i16 lanes are sign-extended to i32 before multiplying, so each product fits exact in i32; the pair-sum can overflow only at -32768² + -32768² = 2³¹, which wraps to Int.MinValue per the spec’s two’s-complement rule. No immediate past the sub-opcode.

v128.load*_lane / v128.store*_lane are the only SIMD ops that carry BOTH a memarg AND a 1-byte lane immediate (after the sub-opcode: memarg LEBs, then a single byte for the lane index). Each load_lane reads N bytes from memory and places them at the named lane of the v128 operand (preserving every other lane); each store_lane writes N bytes from the named lane to memory. Operand stack: [i32 addr, v128 src][v128] for load, [i32 addr, v128 src][] for store. Lane index is validated < 16/8/4/2 depending on access width. Out-of-bounds (addr + offset + width > mem.size) traps with MemoryOutOfBounds.

OpcodeSubWhat it does
i32x4.dot_i16x8_s0xBAPairwise multiply-then-add: lane k = a[2k]*b[2k] + a[2k+1]*b[2k+1] with i16 sign-extension to i32. Pair-sum wraps two’s-complement on overflow.
v128.load8_lane0x54Read 1 byte at addr, write into lane (lane idx < 16). Other lanes preserved.
v128.load16_lane0x55Read 2 LE bytes, write into i16 lane (< 8).
v128.load32_lane0x56Read 4 LE bytes, write into i32 lane (< 4).
v128.load64_lane0x57Read 8 LE bytes, write into i64 lane (< 2).
v128.store8_lane0x58Write 1 byte of lane (idx < 16) to memory.
v128.store16_lane0x59Write 2 LE bytes of lane (< 8) to memory.
v128.store32_lane0x5AWrite 4 LE bytes of lane (< 4) to memory.
v128.store64_lane0x5BWrite 8 LE bytes of lane (< 2) to memory.

Relaxed SIMD

The relaxed-SIMD proposal adds 20 sub-opcodes (0x100..0x113) under the existing 0xFD SIMD prefix. The “relaxed” name reflects that the spec lets each op pick between two or more valid implementations per edge case (NaN handling, out-of-range conversion, sign-extension of partly-used operands); this interpreter pins one deterministic choice each, documented in simd_dispatch.scala and the SimdRelaxedTests fixtures.

Sub-opcodeOpShapePinned semantics
0x100i8x16.relaxed_swizzlev128, v128 → v128Identical to non-relaxed i8x16.swizzle.
0x101..0x104i32x4.relaxed_trunc_*v128 → v128Identical to i32x4.trunc_sat_* (NaN → 0, overflow saturates).
0x105..0x108f*x*.relaxed_(n)maddv128, v128, v128 → v128Unfused: (±a*b) + c per lane. Portable across JVM / Scala.js / Native.
0x109..0x10C*.relaxed_laneselectv128, v128, v128 → v128Per lane: high bit of mask‘s lane picks a (set) or b (clear).
0x10D..0x110f*x*.relaxed_min / _maxv128, v128 → v128java.lang.Math.min / max per lane — NaN propagates either way.
0x111i16x8.relaxed_q15mulr_sv128, v128 → v128Saturating signed Q15 multiply: sat_i16((a*b + 0x4000) >> 15).
0x112i16x8.relaxed_dot_i8x16_i7x16_sv128, v128 → v128Pair-sum of (signed-a × unsigned-b) byte products per i16 lane.
0x113i32x4.relaxed_dot_i8x16_i7x16_add_sv128, v128, v128 → v1284-byte (signed × unsigned) sums per i32 lane plus an i32 accumulator.

Threads + atomics

The threads proposal adds the 0xFE opcode prefix for atomic memory operations and a shared flag on memory limits. This interpreter is single-threaded, but the structural guarantees the proposal makes — alignment-checked load/store, single-step read-modify-write, compare-and-swap — all hold by construction. What’s covered:

Sub-opcodeOpShapeNotes
0x00memory.atomic.notifyi32 addr, i32 count → i32Always returns 0 (no peer threads).
0x01memory.atomic.wait32i32 addr, i32 expected, i64 timeout → i32Trap if memory unshared; trap “would-block” if value matches expected; else return 1 (not-equal).
0x02memory.atomic.wait64i32 addr, i64 expected, i64 timeout → i32Same as wait32 at 64-bit.
0x03atomic.fenceNo-op (single-threaded host).
0x10..0x16i32/i64.atomic.load[8_u/16_u/32_u]i32 → i32 or i32 → i64Plain load + alignment check.
0x17..0x1Di32/i64.atomic.store[8/16/32]… → Plain store + alignment check.
0x1E..0x47i32/i64.atomic.rmw{,8,16,32}.{add,sub,and,or,xor,xchg}[_u]i32 addr, T v → T oldReturns OLD value, leaves op(old, v) in memory.
0x48..0x4Ei32/i64.atomic.rmw{,8,16,32}.cmpxchg[_u]i32 addr, T expected, T replacement → T oldWrites replacement iff old == expected.

Limits-flag encoding:

  • flag & 0x01 — has-max (unchanged).
  • flag & 0x02 — shared memory. Validator enforces shared ⇒ has-max. Other bits are rejected with a clear diagnostic.

Validator rules unique to atomics:

  • The memarg’s align immediate must equal log2(accessWidth). Unlike regular load/store (where alignment is advisory), atomic ops require strict natural alignment — and the validator surfaces a mismatch as InvalidModule at instantiation, not at run time.
  • Unknown 0xFE sub-opcodes surface as UnknownOpcode(0xFE).

Runtime traps unique to atomics:

  • UnalignedAtomicAccess — the effective address (base + offset) was not naturally aligned to the access width.
  • ExpectedSharedMemorymemory.atomic.wait{32,64} ran against a non-shared memory.

The “would-block” case for wait* (the operand value matches expected, so a real implementation would suspend the thread) traps with InvalidModule("…would block forever on a single-threaded host") rather than spinning. The not-equal early-return path is the only observable non-trap result on this interpreter.

Multi-memory

Modules may declare any number of linear memories. Each memory opcode threads a memidx through its immediate:

  • Load/store memarg — the multi-memory encoding repurposes bit 6 of the alignment LEB as a “memidx-present” flag. When set, a memidx LEB follows; alignment is the LEB with that bit cleared. Single-memory modules emit the original shape (no flag, memidx = 0 implicit).
  • memory.size / memory.grow / memory.fill — the byte that was a must-be-zero reserved slot becomes a memidx LEB.
  • memory.copy — two memidx LEBs (dst, src), allowing memory-to-memory copies between distinct memories.
  • memory.init — second immediate is a memidx LEB (was reserved).

ModuleInstance.memories: Array[Memory] exposes the full vector; .memory keeps backwards compat returning memory 0. .exportedMemory(name) resolves an exported memory by name.

What isn’t implemented yet

GroupSub-opcodesStatus
GC proposalstruct.*, array.*, ref.cast, etc.not planned
Component modelthe packaging proposalout of scope

Each missing group is independently scoped — adding any one of them is a self-contained piece of work that doesn’t touch the others. See the project roadmap on GitHub for the active Phase-8 plan.

Validation pass

Every imported module runs through a separate validator before any code executes. See Concepts → Validation.

Binary sections

SectionIDWhat it carries
Type1Function signatures
Import2Functions, memories, globals, tables imported from the host
Function3Function-index → type-index mapping
Table4Funcref + externref tables
Memory5Linear-memory definitions
Global6Module-level globals (scalar + reftype)
Export7Names exposed to the host
Start8Function index run at instantiate time
Element9Table initializers (funcidx + elemexpr forms, funcref + externref)
Code10Function bodies
Data11Linear-memory initializers (active + passive)
DataCount12u32 = number of data segments; required when a function uses memory.init or data.drop
Tag13Exception tag declarations (attribute byte + typeidx)

Custom sections are skipped harmlessly.

Search

Esc
to navigate to open Esc to close