2022 september 6
ian henderson <ian@ianhenderson.org>
wmc is an experiment in self-describing file format design.
wmc.img is a simple image format built using wmc. try it out using the web-based decoder!
the initial decoding process begins with three preinstalled functions.
the function preinstalled at index 0 reads one byte (the length) to memory address 0, then continues in the function at index 1.
the function preinstalled at index 1 reads 1 + length bytes (a one-byte index plus a length-byte function) to memory address 1, then continues in the function at index 2.
the function preinstalled at index 2 compiles the function at memory address 2 and installs it at index, potentially replacing one of these three preinstalled functions. it then reads one byte (the length) to memory address 0 and continues in the function at index 1.
execution starts in the function at index 0.
practically, this means a wmc file begins with a list of functions encoded as a one-byte length, a one-byte index, and length bytes of function data. each function is compiled and installed at the given index. after installing a function at index 1, the preinstalled function at index 2 will read one more byte, then continue in the just-installed function, ending the list.
function data is encoded according to the webassembly specification—specifically, the func production in the binary encoding of the code section. that is, it's a vector of local variable declarations followed by a sequence of instructions ending with the byte 0x0B.
functions must take no parameters and return five i32 values:
for example, returning the values 2, 10, 32, 0, and 0 will output nothing, copy 10 bytes from the input into bytes 32-41 of the webassembly memory, then continue in the function at index 2.
the length of each output element depends on its type. for example, in an array of u16 values, each output will be two bytes long; producing ten outputs will read twenty bytes from webassembly memory. see the output element encoding section for a table of types.
either the number of bytes to read from the input or the number of output elements to produce must be greater than zero. if both values are less than or equal to zero, decoding will stop. together with the restrictions on valid instructions, this guarantees decoding will always make progress. if both values are greater than zero, output is produced before input is read (so memory used for input and output can overlap).
as a special case, an output address of -1 will read elements directly from the input instead of from webassembly memory.
if a function attempts to read more bytes than are left in the file, only the bytes remaining in the file will be read (the rest of the memory will be left alone). if there are no bytes left in the file, and no output elements are produced, then decoding will stop.
the compile function, invoked from webassembly as call 0, compiles and installs function data from webassembly memory. compile has no return value and takes three i32 parameters:
function data must contain no calls to anything other than the compile function, no call_indirect instructions, and no loop instructions. function data containing these instructions will fail to compile and stop the decoding process.
data is always read from the input as bytes, but output can be produced in a variety of types. values are encoded using up to 32 bytes, depending on their type:
tag | type | description | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | s8 | signed (two's-complement) 8-bit integer | n | |||||||||||||||||||||||||||||||
1 | u8 | unsigned 8-bit integer | n | |||||||||||||||||||||||||||||||
2 | s16 | signed 16-bit (little-endian) integer | n | |||||||||||||||||||||||||||||||
3 | u16 | unsigned 16-bit integer | n | |||||||||||||||||||||||||||||||
4 | s32 | signed 32-bit integer | n | |||||||||||||||||||||||||||||||
5 | u32 | unsigned 32-bit integer | n | |||||||||||||||||||||||||||||||
6 | s64 | signed 64-bit integer | n | |||||||||||||||||||||||||||||||
7 | u64 | unsigned 64-bit integer | n | |||||||||||||||||||||||||||||||
8 | f32 | 32-bit float | n | |||||||||||||||||||||||||||||||
9 | f64 | 64-bit float | n | |||||||||||||||||||||||||||||||
10 | bool | 1 or 0 | b | |||||||||||||||||||||||||||||||
11 | string | memory address of zero-terminated string | address | |||||||||||||||||||||||||||||||
12 | array | see [1] | number of elements | byte offset in input data | func | mlen | type tag | |||||||||||||||||||||||||||
13 | pair | name-value pair; see [2] | type tag | name | value |
[1] producing an array copies the first mlen bytes of webassembly memory into a new memory object, then creates a new decoding process with this memory and the same installed functions as the current process. the new decoding process jumps ahead byte offset in input data bytes and starts decoding in the function at index func. the resulting array is made from the elements produced from this decoding process, which all have the type indicated by the type tag. the decoding process stops as soon as number of elements elements are produced.
[2] the name in a name-value pair is the address of a zero-terminated string in webassembly memory. the value is a value of any other type, indicated by the type tag, except for another name-value pair (they can't be nested). the encoding of the value begins at byte 8; bytes beyond the length of the value's encoding are ignored. an array of name-value pairs is similar to a json object.
decoding begins as if in an array with a single element of type tag 12 (array). so the first (and only) element produced by the initial decoding process must be a 24-byte array descriptor. the array described therein will be considered the root array of the final output.
if a function produces multiple output elements, those elements are read from webassembly memory without gaps. that is: producing 10 booleans will read 10 bytes; producing 10 name-value pairs will read 320 bytes. name-value pairs are always 32 bytes long, no matter what the value type is.
this is the simple-pattern.wmc.img file, which you can view using the wmc.img decoder. the hexadecimal numbers on the left are bytes as you'd see them in a hex editor. webassembly instruction names and comments appear on the right.
0C 03 compile the following 12-byte function and install it at index 3 00 this function has no local variables 4100 i32.const 0 4100 i32.const 0 4100 i32.const 0 4101 i32.const 1 4100 i32.const 0 0B end the function, returning the values [0, 0, 0, 1, 0] that is, produce one output, encoded at memory address 0 since the root array only has one element, the decoding process stops 0C 04 compile the following 12-byte function and install it at index 4 00 this function has no local variables 4100 i32.const 0 4100 i32.const 0 4100 i32.const 0 4103 i32.const 3 4118 i32.const 24 0B end the function, returning the values [0, 0, 0, 3, 24] that is, produce three outputs, encoded starting at memory address 24 all three elements of the array are produced, and the decoding process stops 1F 05 compile the following 31-byte function and install it at index 5 00 this function has no local variables 41DE00 i32.const 94 41DE00 i32.const 94 2F0000 i32.load16_u align:0 offset:0 413C i32.const 60 6A i32.add 4101 i32.const 5 6C i32.mul 3B0000 i32.store16 align:0 offset:0 4105 i32.const 5 4100 i32.const 0 4100 i32.const 0 413F i32.const 63 41DE00 i32.const 94 0B end the function, returning the values [5, 0, 0, 63, 94] that is, produce 63 outputs, encoded starting at memory address 94, and continue in function 5 this writes a bunch of bytes that happen to be in memory as output while modifying the bytes each time (add 60 and multiply by 5) 0D 01 compile the following 13-byte function and install it at index 1 00 this function has no local variables 4103 i32.const 3 418901 i32.const 137 4101 i32.const 1 4100 i32.const 0 4100 i32.const 0 0B end the function, returning the values [3, 137, 1, 0, 0] that is, read 137 bytes, write those bytes into memory starting at address 1, and continue in function 3 the following 138 bytes are read into memory by the function at index 1 and the preinstalled function at index 2, then output by the function at index 3: 0300000000000000 this encodes an array with three elements 0000000000000000 whose data begins after an offset of 0 bytes 0400 the decoding process begins in function 4 8900 with a copy of the first 137 bytes of memory 0D000000 and the array contains name-value pairs (type tag 13) 05000000 this encodes a name-value pair with a u32 value (type tag 5) 78000000 whose name is the string at address 120 ("width") E8030000 and whose value is 1000 0000000000000000 (the value is padded out to 24 bytes) 0000000000000000 00000000 05000000 this encodes a name-value pair with a u32 value (type tag 5) 7E000000 whose name is the string at address 126 ("height") 64000000 and whose value is 100 0000000000000000 (the value is padded out to 24 bytes) 0000000000000000 00000000 0C000000 this encodes a name-value pair with an array value (type tag 12) 85000000 whose name is the string at address 133 ("rgba") 801A060000000000 and whose value is an array with 400000 elements 0000000000000000 the data of which begins after an offset of 0 bytes 0500 the array's decoding process begins in function 5 8900 with a copy of the first 137 bytes of memory 01000000 and the array contains values of type u8 (type tag 1) 776964746800 the string "width" in ascii 68656967687400 the string "height" 7267626100 the string "rgba"