4 bits ought to be enough for anyone

February 20, 2020

It is time for our 4KB audio visual experiment with Rust & WASM to get a shader on the screen.

To do that while staying within the strict byte budget, we are going to go back to the future by making a virtual machine with a 4-bit instruction set.

void main() { vec2 a = gl_FragCoord.xy/32. - 0.5; float t = time*.1; gl_FragColor=vec4(fract(a.x/3.+t),fract(a.x/5.+t),fract(a.x/7.+t),1.)*(1.-2.*abs(a.y)); }

(Please keep in mind while reading that this is all just for fun and interest, and probably has no practical use. Ok?)

In the previous part (see Part 2), we looked at a way to dynamically find javascript functions from our Rust program compiled to WebAssembly.

The program had reached 637 bytes out of our 4096 byte budget.

If you tried compiling it yourself and adding some more javascript calls, you might have noticed something a bit worrying.

If just one extra call such as “js.push(1)” is added, the overall binary size increases by 4 bytes. It doesn’t sound like much, but at that rate of growth, we would only be able to make 864 such function calls before our budget is all eaten up. Since we haven’t even started finding, let alone calling any WebGL functions yet, it seems unlikely that would be enough to make this work (Spoiler: I tried. It’s not).

We need to be on a trajectory that will allow us to make the thousands of function calls that will be needed, and at the same time it would be nice to have finer control over how all our bytes are being allocated rather than leaving it all up to our compiler.

Welcome to the golden 4-bit era

To achieve that, we are going to define our own VM (virtual machine - a simple simulated CPU) with a 4bit instruction set - meaning up to two function calls for every byte, potentially allowing us to make 6000 more function calls!

Our VM will be written in Rust. We are also going to use Rust to create a simple assembler to produce the binary programs for the VM.

Let’s get straight down to business. We start by significantly rewriting our src/lib.rs to implement the VM. It’s actually not very long - just 90 lines of Rust:

#![no_std]
#![allow(unused)]  
include!("vm.rs");
include!(concat!(env!("OUT_DIR"), "/program.rs"));

#[link(wasm_import_module = "i")]
extern {
    fn r(which:u32, a:u32) -> u32;
}

fn js_fn(which:u32, a:u32) -> u32 {
    unsafe {
        r(which, a)
    }
}

fn getop(pc:u32) -> u32 {
    let ca = (pc>>1) as usize;
    match pc&1 {
        0 => (PROGRAM[ca]&0xF) as u32,
        _ => (PROGRAM[ca]>>4) as u32,
    }
}

fn get_val(pc: &mut u32, count:u32) -> u32 {
    let mut val = 0;
    let mut done = 0;
    while done < count {
        *pc += 1;
        val |= getop(*pc)<<(done*4);
        done += 1;
    }
    val
}

fn vm(which:u32) -> u32 {
    let mut ftable_lookup = (which<<2)-1;
    let mut pc = get_val(&mut ftable_lookup, 4);
    let end = (PROGRAM.len()<<1) as u32;
    let mut whichsize = 1;
    let mut argsize = 3;
    let mut repeats = 0;
    while pc < end {
        let op = getop(pc);
        match op {
            VM_CALL => {
                let which = get_val(&mut pc, whichsize);
                loop {
                    js_fn(which, get_val(&mut pc, argsize));
                    if repeats == 0 { break; }
                    repeats -= 1;
                }
            }
            VM_PARAM => {
                let which = get_val(&mut pc, 1);
                let arg = get_val(&mut pc, 3);
                match which {
                    PARAM_WHICHSIZE => { whichsize = arg; },
                    PARAM_ARGSIZE => { argsize = arg; },
                    _ => { repeats = arg; },
                }
            }
            VM_DUP => {
                js_fn(FN_DUP, get_val(&mut pc, 2));
            }
            VM_PUSH4 => {
                js_fn(FN_PUSH, get_val(&mut pc, 1));
            }
            VM_RET => {
                break;
            }
            _ => {
                break;
            }
        }
        pc += 1;
    }
    0
}

#[no_mangle]
pub extern fn d(which:u32) -> u32 { // entry point
    vm(which)
}

use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

(Apologies in advance if you are an experienced Rust developer - I’m not, so feel free to let me know on twitter where this could be improved…)

How does it work?

Well, first it needs a program to run. This is stored as an array of bytes (u8) in the constant value PROGRAM, defined in the separate file program.rs. There are also some constants defined in vm.rs.

The program is organised as a set of instructions streams (procedures), with a jump table at the beginning giving the index where each one starts.

The entry point from javascript (d) takes the index of the procedure, and calls the vm function with that.

The vm function uses the table to jump to the correct procedure instruction stream, and then runs the instructions it finds until it hits a VM_RET instruction.

For now, there are only 5 instructions (including VM_RET). They all start with a 4-bit identifier, which is then followed by additional variable length arguments.

  • VM_CALL calls a Javascript function by index one or more times with a new unsigned integer argument each time. It is the most general instruction. By default it uses 5x4 bits=2.5 bytes for a single call, but by modifying parameters it can grow or shrink.
  • VM_PARAM is used to modify the parameters such as the size of the VM_CALL arguments, and the number of times to repeat the call with different argument values.
  • VM_DUP is an optimisation that duplicates an object on the Javascript stack using 12 bits (1.5 bytes)
  • VM_PUSH4 is another optimisation that puts a 4 bit unsigned value (0-15) on the Javascript stack using 8 bits (1 byte)

That’s it. I expect you’re thinking it looks quite wierd, and how could this possibly work.

0x4 is the magic number

Let’s get vm.rs out of the way. These are just a few constants shared between the vm implementation, and the assembler:

const VM_CALL:u32 = 0x0;
const VM_PARAM:u32 = 0x1;
const VM_DUP:u32 = 0x2;
const VM_PUSH4:u32 = 0x3;
const VM_RET:u32 = 0xF;

const PARAM_WHICHSIZE:u32 = 0;
const PARAM_ARGSIZE:u32 = 1;
const PARAM_REPEATS:u32 = 2;

const FN_CALL:u32 = 0;
const FN_PUSH:u32 = FN_CALL+1;
const FN_POP:u32 = FN_PUSH+1;
const FN_SPLICE:u32 = FN_POP+1;
const FN_APPLY:u32 = FN_SPLICE+1;
const FN_DUP:u32 = FN_APPLY+1;
const FN_SET:u32 = FN_DUP+1;
const FN_LOOKUP:u32 = FN_SET+1;
const FN_GETNAME:u32 = FN_LOOKUP+1;
const FN_GETOBJ:u32 = FN_GETNAME+1;
const FN_NEW:u32 = FN_GETOBJ+1;
const FN_SETTER:u32 = FN_NEW+1;
const OBJ_NAMESPACE:u32 = FN_SETTER+1;
const OBJ_MODVAL:u32 = OBJ_NAMESPACE+1;
const OBJ_WASM:u32 = OBJ_MODVAL+1;

They cover the instruction identifiers, parameter identifiers, and the positions of the initial javascript functions in the bootstrap javascript header.

It’s Rust all the way up

We want to generate our program at compile time. Cargo (the Rust build system) has a standard way to handle that kind of thing - the build.rs file in the root of the project.

Here we go. This is the whole thing (584 lines), including the assembler, the javascript bootstrap which we moved out from lib.rs, and our initial program which finds all the needed WebGL functions, compiles a shader, and gets it drawn to the screen - all with a single output binary 2006 bytes long:

#![allow(unused)]  
 
use std::env;
use std::fs;
use std::io::prelude::*;
use std::format;
use std::path::Path;

include!("src/vm.rs");

const JS : &str = "-->\
<script>\
s=[R,U,O,P]=[\
  (a,b)=>s[a](b),\
  a=>s.push(a),\
  _=>s.pop(),\
  a=>s.splice(-a,a),\
  a=>U(s[a].apply(O(),P(O()))),\
  a=>U(s[a]),\
  a=>s[a]=O(),\
  a=>U(s[a][O()]),\
  h=>{\
    t=Reflect.ownKeys(s[12]);\
    U(t[\
        t.map(\
            a=>a.toString().split('').map(\
                a=>a.charCodeAt()\
            ).reduce(\
                (a,b)=>((a<<1)+b*s[13]+(a>>11))&4095\
            )\
        ).findIndex(e=>e==h)\
       ]\
     )\
    },\
  h=>{s[8](h);s[7](12)},\
  a=>new s[a](O()),\
  (a,b,c)=>a[b]=c,\
  window,\
  409,\
];\
fetch('').then(\
    r=>r.arrayBuffer().then(\
        b=>WebAssembly.instantiate(\
            b.slice(4),{\
                i:{r:R}\
            }).then(\
                o=>{U(o.instance.exports);s[14].d(0);}\
            )\
        )\
    )\
</script>";

struct NibbleStream {
    bytes: Vec<u8>,
    half: bool,
    next: u8,
    modval: u32, 
    whichsize: u32,
    argsize: u32,
}

const OBJ_WINDOW:u32 = OBJ_WASM+1;
const OBJ_ARRAY:u32 = OBJ_WINDOW+1;
const OBJ_STRING:u32 = OBJ_ARRAY+1;
const OBJ_FUNCTION:u32 = OBJ_STRING+1;
const OBJ_NODE:u32 = OBJ_FUNCTION+1;
const OBJ_DOCUMENT_CLASS:u32 = OBJ_NODE+1;
const OBJ_DOCUMENT:u32 = OBJ_DOCUMENT_CLASS+1;
const OBJ_FLOAT32ARRAY:u32 = OBJ_DOCUMENT+1;
const OBJ_HTMLELEMENT:u32 = OBJ_FLOAT32ARRAY+1;
const OBJ_HTMLCANVASELEMENT:u32 = OBJ_HTMLELEMENT+1;
const OBJ_WEBGLRENDERINGCONTEXT:u32 = OBJ_HTMLCANVASELEMENT+1;
const STR_PROTOTYPE:u32 = OBJ_WEBGLRENDERINGCONTEXT+1;
const FN_ARRAY_MAP:u32 = STR_PROTOTYPE+1;
const FN_ARRAY_JOIN:u32 = FN_ARRAY_MAP+1;
const FN_STRING_FROMCHARCODE:u32 = FN_ARRAY_JOIN+1;
const FN_FUNCTION_BIND:u32 = FN_STRING_FROMCHARCODE+1;
const FN_NODE_APPENDCHILD:u32 = FN_FUNCTION_BIND+1;
const FN_DOCUMENT_CREATEELEMENT:u32 = FN_NODE_APPENDCHILD+1;
const OBJ_BODY:u32 = FN_DOCUMENT_CREATEELEMENT+1;
const FN_MAP_CHARCODE:u32 = OBJ_BODY+1;
const OBJ_EMPTY_STRING:u32 = FN_MAP_CHARCODE+1;
const OBJ_CANVAS:u32 = OBJ_EMPTY_STRING+1;
const STR_STYLE:u32 = OBJ_CANVAS+1;
const OBJ_BODYSTYLE:u32 = STR_STYLE+1;
const OBJ_CANVASSTYLE:u32 = OBJ_BODYSTYLE+1;
const STR_100PCT:u32 = OBJ_CANVASSTYLE+1;
const FN_CANVAS_GETCONTEXT:u32 = STR_100PCT+1;
const OBJ_GLCTX:u32 = FN_CANVAS_GETCONTEXT+1;
const VAL_VERTEX_SHADER:u32 = OBJ_GLCTX+1;
const VAL_FRAGMENT_SHADER:u32 = VAL_VERTEX_SHADER+1;
const FN_GL_CREATESHADER:u32 = VAL_FRAGMENT_SHADER+1;
const FN_GL_SHADERSOURCE:u32 = FN_GL_CREATESHADER+1;
const FN_GL_COMPILESHADER:u32 = FN_GL_SHADERSOURCE+1;
const FN_GL_GETSHADERINFOLOG:u32 = FN_GL_COMPILESHADER+1;
const FN_GL_CREATEPROGRAM:u32 = FN_GL_GETSHADERINFOLOG+1;
const FN_GL_ATTACHSHADER:u32 = FN_GL_CREATEPROGRAM+1;
const FN_GL_USEPROGRAM:u32 = FN_GL_ATTACHSHADER+1;
const FN_GL_GETATTRIBLOCATION:u32 = FN_GL_USEPROGRAM+1;
const FN_GL_CREATEBUFFER:u32 = FN_GL_GETATTRIBLOCATION+1;
const VAL_GL_ARRAY_BUFFER:u32 = FN_GL_CREATEBUFFER+1;
const FN_GL_BINDBUFFER:u32 = VAL_GL_ARRAY_BUFFER+1;
const VAL_GL_STATIC_DRAW:u32 = FN_GL_BINDBUFFER+1;
const FN_GL_BUFFERDATA:u32 = VAL_GL_STATIC_DRAW+1;
const FN_GL_ENABLEVERTEXATTRIBARRAY:u32 = FN_GL_BUFFERDATA+1;
const VAL_GL_FLOAT:u32 = FN_GL_ENABLEVERTEXATTRIBARRAY+1;
const FN_GL_VERTEXATTRIBPOINTER:u32 = VAL_GL_FLOAT+1;
const FN_GL_DRAWARRAYS:u32 = FN_GL_VERTEXATTRIBPOINTER+1;
const VAL_GL_TRIANGLES:u32 = FN_GL_DRAWARRAYS+1;
const FN_GL_LINKPROGRAM:u32 = VAL_GL_TRIANGLES+1;
const FN_GL_GETPROGRAMINFOLOG:u32 = FN_GL_LINKPROGRAM+1;
const OBJ_VERTEX_SHADER:u32 = FN_GL_GETPROGRAMINFOLOG+1;
const OBJ_FRAGMENT_SHADER:u32 = OBJ_VERTEX_SHADER+1;
const OBJ_SHADER_PROGRAM:u32 = OBJ_FRAGMENT_SHADER+3;
const VAL_ATTRIB_LOCATION:u32 = OBJ_SHADER_PROGRAM+2;
const OBJ_VERTEX_BUFFER:u32 = VAL_ATTRIB_LOCATION+1;
const OBJ_VERTEX_ARRAY:u32 = OBJ_VERTEX_BUFFER+1;


impl NibbleStream {
    fn close(&mut self) {
        if self.half {
            self.bytes.push(self.next);
        }
    }
    fn push(&mut self, val:u32, count:u32) {
        assert!(val < 1<<(4*count));
        let mut c = count;
        let mut v = val;
        while c > 0 {
            if self.half {
                self.next += ((v & 0xF)<<4) as u8;
                self.bytes.push(self.next);
                self.next = 0;
            } else {
                self.next += (v & 0xF) as u8;
            }
            v >>= 4;
            self.half = !self.half;
            c -= 1;
        }
    }
    fn push_bytes(&mut self, bytes:&[u8]) {
        let restore = match self.argsize {
            2 => false,
            _ => true
        };
        let oldargsize = self.argsize;
        if restore {
            self.param(PARAM_ARGSIZE, 2);
        }
        let count:u32 = bytes.len() as u32;
        self.callrep(count-1, FN_PUSH);
        for b in bytes.iter() {
            self.push(*b as u32, 2);
        }
        if restore {
            self.param(PARAM_ARGSIZE, oldargsize);
        }
        self.call(FN_PUSH,count);
        self.call(FN_PUSH,1);
        self.call(FN_PUSH,0);
        self.call(FN_APPLY, FN_SPLICE);
    }
    fn push_str(&mut self, s:&str) {
        // Cannot call this unil after needed functions init'ed
        self.dup(OBJ_EMPTY_STRING);
        self.push4(1);
        self.dup(FN_MAP_CHARCODE);
        self.push4(1);
        self.push_bytes(s.as_bytes());
        self.call(FN_APPLY, FN_ARRAY_MAP);
        self.call(FN_APPLY, FN_ARRAY_JOIN); 
    }
    fn param(&mut self, param:u32, paramval:u32) {
        self.push(VM_PARAM, 1);
        self.push(param, 1);
        self.push(paramval, 3);
        match param {
            PARAM_WHICHSIZE => { self.whichsize = paramval; },
            PARAM_ARGSIZE => { self.argsize = paramval; },
            _ => ()
        }
    }
    fn call(&mut self, which:u32, arg:u32) {
        // Generic VM function 
        self.push(VM_CALL, 1);
        self.push(which, self.whichsize);
        self.push(arg, self.argsize);
    }
    fn callrep(&mut self, repeats:u32, which:u32) {
        // Generic repeating VM function 
        self.param(PARAM_REPEATS, repeats);
        self.push(VM_CALL, 1);
        self.push(which, self.whichsize);
    }
    fn dup(&mut self, arg:u32) {
        self.push(VM_DUP, 1);
        self.push(arg, 2);
    }
    fn push4(&mut self, arg:u32) {
        assert!(arg < 16);
        self.push(VM_PUSH4, 1);
        self.push(arg, 1);
    }
    fn hash(&self, name:&str) -> u32 {
        let n = name.as_bytes();
        let mut c:u32 = n[0] as u32;
        for a in n[1..].iter() {
            let a32 = *a as u32;
            c = ((c<<1)+(a32*self.modval)+(c>>11))&4095;
        }
        return c; 
    }
}

fn new_stream() -> NibbleStream {
    NibbleStream { bytes: Vec::<u8>::new(), half:false, next:0, modval:409, whichsize:1, argsize:3, }
}
fn init() -> NibbleStream {
    let mut s = new_stream();
    s.dup(OBJ_NAMESPACE); // -> OBJ_WINDOW
    s.callrep(9, FN_GETOBJ);
    s.push(s.hash("Array"), 3);
    s.push(s.hash("String"), 3);
    s.push(s.hash("Function"), 3);
    s.push(s.hash("Node"), 3);
    s.push(s.hash("Document"), 3);
    s.push(s.hash("document"), 3);
    s.push(s.hash("Float32Array"), 3);
    s.push(s.hash("HTMLElement"), 3);
    s.push(s.hash("HTMLCanvasElement"), 3);
    s.push(s.hash("WebGLRenderingContext"), 3);

    s.dup(OBJ_ARRAY);
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETNAME, s.hash("prototype")); // -> STR_PROTOTYPE
    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_NAMESPACE); // Array.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.callrep(1, FN_GETOBJ);
    s.push(s.hash("map"), 3);
    s.push(s.hash("join"), 3);

    s.dup(OBJ_STRING); 
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETOBJ, s.hash("fromCharCode"));

    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_FUNCTION); // Function.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETOBJ, s.hash("bind"));

    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_NODE); // Node.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETOBJ, s.hash("appendChild"));

    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_DOCUMENT_CLASS); // Document.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETOBJ, s.hash("createElement"));
    s.call(FN_GETNAME, s.hash("body"));
    s.call(FN_LOOKUP, OBJ_DOCUMENT); // document.body

    s.push4(0);
    s.call(FN_PUSH, FN_STRING_FROMCHARCODE);
    s.push4(2);
    s.dup(FN_CALL);
    s.call(FN_APPLY, FN_FUNCTION_BIND); // Make fromCharCode work in [].map()

    s.push4(0);
    s.push4(0);
    s.call(FN_APPLY, OBJ_STRING); // -> empty string ""
   
    s.push_str("canvas");

    s.push4(1);
    s.dup(OBJ_DOCUMENT);
    s.call(FN_APPLY, FN_DOCUMENT_CREATEELEMENT); // -> Canvas element

    s.dup(OBJ_CANVAS);
    s.push4(1);
    s.dup(OBJ_BODY);
    s.call(FN_APPLY, FN_NODE_APPENDCHILD); // Add canvas to document
    s.call(FN_POP, 0);

    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_HTMLELEMENT); // HTMLElement.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETNAME, s.hash("style"));

    s.dup(STR_STYLE);
    s.call(FN_LOOKUP, OBJ_BODY);
    s.dup(STR_STYLE);
    s.call(FN_LOOKUP, OBJ_CANVAS);

    s.dup(OBJ_BODYSTYLE); // Set body margin to 0
    s.push_str("margin");
    s.call(FN_PUSH,0);
    s.call(FN_PUSH,3);
    s.call(FN_PUSH,0);
    s.call(FN_APPLY, FN_SETTER);
    s.call(FN_POP,0);

    s.push_str("100%");
    s.dup(OBJ_CANVASSTYLE); // Set canvas to full width/height
    s.push_str("width");
    s.dup(STR_100PCT);
    s.call(FN_PUSH,3);
    s.call(FN_PUSH,0);
    s.call(FN_APPLY, FN_SETTER);
    s.call(FN_POP,0);
    s.dup(OBJ_CANVASSTYLE);
    s.push_str("height");
    s.dup(STR_100PCT);
    s.call(FN_PUSH,3);
    s.call(FN_PUSH,0);
    s.call(FN_APPLY, FN_SETTER);
    s.call(FN_POP,0);

    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_HTMLCANVASELEMENT); // HTMLCanvasElement.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.call(FN_GETOBJ, s.hash("getContext"));

    s.push_str("webgl");

    s.push4(1);
    
    s.dup(OBJ_CANVAS);
    s.call(FN_APPLY, FN_CANVAS_GETCONTEXT);

    s.dup(STR_PROTOTYPE); 
    s.call(FN_LOOKUP, OBJ_WEBGLRENDERINGCONTEXT); // WebGLRenderingContext.prototype
    s.call(FN_SET, OBJ_NAMESPACE);
    s.callrep(19, FN_GETOBJ);
    s.push(s.hash("VERTEX_SHADER"), 3);
    s.push(s.hash("FRAGMENT_SHADER"), 3);
    s.push(s.hash("createShader"), 3);
    s.push(s.hash("shaderSource"), 3);
    s.push(s.hash("compileShader"), 3);
    s.push(s.hash("getShaderInfoLog"), 3);
    s.push(s.hash("createProgram"), 3);
    s.push(s.hash("attachShader"), 3);
    s.push(s.hash("useProgram"), 3);
    s.push(s.hash("getAttribLocation"), 3);
    s.push(s.hash("createBuffer"), 3);
    s.push(s.hash("ARRAY_BUFFER"), 3);
    s.push(s.hash("bindBuffer"), 3);
    s.push(s.hash("STATIC_DRAW"), 3);
    s.push(s.hash("bufferData"), 3);
    s.push(s.hash("enableVertexAttribArray"), 3);
    s.push(s.hash("FLOAT"), 3);
    s.push(s.hash("vertexAttribPointer"), 3);
    s.push(s.hash("drawArrays"), 3);
    s.push(s.hash("TRIANGLES"), 3);
  
    // Need to use an alternate hash mod value due to collision
    s.call(FN_PUSH, 408);
    s.call(FN_SET, OBJ_MODVAL);
    s.modval = 408;
    s.call(FN_GETOBJ, s.hash("linkProgram"));
    s.call(FN_GETOBJ, s.hash("getProgramInfoLog"));
    s.call(FN_PUSH, 409); // Restore the original mod
    s.call(FN_SET, OBJ_MODVAL);
    s.modval = 409;

    s.dup(VAL_VERTEX_SHADER);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_CREATESHADER); // Create vertex shader

    s.dup(VAL_FRAGMENT_SHADER);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_CREATESHADER); // Create fragment shader

    s.dup(OBJ_VERTEX_SHADER);
    s.push_str("\
attribute vec2 V;\
varying vec4 C;\
\
void main(){\
    C=gl_Position=vec4(V-1.,0.,1.);\
}");
    s.push4(2);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_SHADERSOURCE);
    s.call(FN_POP, 0); // Do not need the result from shaderSource

    s.dup(OBJ_VERTEX_SHADER);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_COMPILESHADER);
    s.call(FN_POP, 0);

    s.dup(OBJ_VERTEX_SHADER);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_GETSHADERINFOLOG);

    s.dup(OBJ_FRAGMENT_SHADER);
    s.push_str("\
precision highp float;\
varying vec4 C;\
\
void main(){\
    float l=min(1.,step(abs(C.x+C.y+.7),.1)*step(abs(C.x-C.y+.4),.3)+\
            step(abs(C.x+C.y-.2),1.)*step(abs(C.x-C.y+.0),.1) );\
    gl_FragColor=vec4(l,.5+.5*C.xy,1.);\
}");
    s.push4(2);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_SHADERSOURCE);
    s.call(FN_POP, 0);

    s.dup(OBJ_FRAGMENT_SHADER);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_COMPILESHADER);
    s.call(FN_POP, 0);

    s.dup(OBJ_FRAGMENT_SHADER);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_GETSHADERINFOLOG);

    s.push4(0);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_CREATEPROGRAM);

    s.dup(OBJ_SHADER_PROGRAM);
    s.dup(OBJ_VERTEX_SHADER);
    s.push4(2);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_ATTACHSHADER);
    s.call(FN_POP, 0);
    
    s.dup(OBJ_SHADER_PROGRAM);
    s.dup(OBJ_FRAGMENT_SHADER);
    s.push4(2);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_ATTACHSHADER);
    s.call(FN_POP, 0);

    s.dup(OBJ_SHADER_PROGRAM);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_LINKPROGRAM);
    s.call(FN_POP, 0);

    s.dup(OBJ_SHADER_PROGRAM);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_GETPROGRAMINFOLOG);

    s.dup(OBJ_SHADER_PROGRAM);
    s.push4(1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_USEPROGRAM);
    s.call(FN_POP, 0);

    s.dup(OBJ_SHADER_PROGRAM);
    s.push_str("V");
    s.push4(2);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_GETATTRIBLOCATION);

    s.push4(0);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_CREATEBUFFER);

    s.dup(VAL_GL_ARRAY_BUFFER);
    s.dup(OBJ_VERTEX_BUFFER);
    s.push4(2);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_BINDBUFFER);
    s.call(FN_POP, 0);

    s.param(PARAM_ARGSIZE, 1);
    s.callrep(11, FN_PUSH);
    s.push(0,1);
    s.push(0,1);
    s.push(0,1);
    s.push(2,1);
    s.push(2,1);
    s.push(0,1);
    s.push(0,1);
    s.push(2,1);
    s.push(2,1);
    s.push(0,1);
    s.push(2,1);
    s.push(2,1);
    s.param(PARAM_ARGSIZE, 3);
    s.call(FN_PUSH,12);
    s.call(FN_PUSH,1);
    s.call(FN_PUSH,0);
    s.call(FN_APPLY, FN_SPLICE);

    s.call(FN_PUSH, OBJ_FLOAT32ARRAY);
    s.call(FN_PUSH,1);
    s.call(FN_PUSH,0);
    s.call(FN_APPLY, FN_NEW);

    s.dup(VAL_GL_ARRAY_BUFFER);
    s.dup(OBJ_VERTEX_ARRAY);
    s.dup(VAL_GL_STATIC_DRAW);
    s.call(FN_PUSH,3);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_BUFFERDATA);
    s.call(FN_POP, 0);

    s.dup(VAL_ATTRIB_LOCATION);
    s.push4(2);
    s.dup(VAL_GL_FLOAT);
    s.push4(0);
    s.push4(0);
    s.push4(0);
    s.push4(6);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_VERTEXATTRIBPOINTER);
    s.call(FN_POP, 0);

    s.dup(VAL_ATTRIB_LOCATION);
    s.call(FN_PUSH,1);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_ENABLEVERTEXATTRIBARRAY);
    s.call(FN_POP, 0);
    
    s.dup(VAL_GL_TRIANGLES);
    s.push4(0);
    s.push4(6);
    s.push4(3);
    s.dup(OBJ_GLCTX);
    s.call(FN_APPLY, FN_GL_DRAWARRAYS);
    s.call(FN_POP, 0);


    s.push(VM_RET, 1);
    s.close();
    s
}

fn compile(target:&mut Vec<u8>) {
    let mut functions = Vec::<NibbleStream>::new(); // Keep code for all functions here

    functions.push(init());

    // Init the jump table with the functions
    let mut ftable = new_stream();
    let mut offset:u32 = (functions.len() as u32)*4;
    for f in functions.iter() {
        ftable.push(offset, 4);
        offset += (f.bytes.len() as u32)*2;
    }
    ftable.close();
    target.extend(ftable.bytes);
    for f in functions.iter() {
        target.extend(&f.bytes);
    }
}

fn main() {
    let out_dir = env::var_os("OUT_DIR").unwrap();
    let dest_path = Path::new(&out_dir).join("program.rs");
	let mut program_file = fs::File::create(dest_path).unwrap();

    let mut prog = Vec::<u8>::new();

    compile(&mut prog);

    // Finally append the JS bootstrap so it comes at end of binary
    prog.extend_from_slice(JS.as_bytes());

    // Write out accumulated code to PROGRAM array
    program_file.write(format!("pub const PROGRAM:[u8;{}] = [\n", prog.len()).as_bytes());
    for v in prog.iter() {
        program_file.write(format!("\t{:#x},\n", v).as_bytes());
    }
    program_file.write(format!("];\n").as_bytes());

    println!("cargo:rerun-if-changed=build.rs");
}

When you compile everything and run it in your browser, the entire page should be filled with this:

void main() { vec2 C = 2.*gl_FragCoord.xy/256.-1.; float l=min(1.,step(abs(C.x+C.y+.7),.1)*step(abs(C.x-C.y+.4),.3)+ step(abs(C.x+C.y-.2),1.)*step(abs(C.x-C.y+.0),.1) ); gl_FragColor=vec4(l,.5+.5*C.xy,1.); }

Try it out live in your browser (as long as it is Chrome…) here.

Note: This is likely to be Chrome-only at this point. It can be made more portable with a little extra effort (as I did with the original 4KB demo which was verified also on Safari & Firefox). The main issues are that each browser has slightly different namespaces, causing different hash collisions which require some tweaking (alternating the used mod value).

Nibbling away at the onion

The main job of build.rs is to generate - at compile time - the program.rs file which is then compiled with the VM into WebAssembly to provide the program for the VM to run.

At the moment, it defines a program with just one procedure (0), which when called, creates a canvas element on the page, and uses WebGL to render a single static image - it does not yet animate anything.

To assemble the program, a struct called “NibbleStream” offers functions that allow 4-bit (and multiples of…) values to be added to an array of bytes representing one procedure.

The meat of the action is in the init function. This is where the program is defined using a sequence of calls to NibbleStream functions.

Each call appends one or more VM instructions to the stream. There are no conditionals or loops, everything will be executed in order, modifying the Javascript-side stack as we go.

To start out, the only object we have in the stack is the (global) window, which is where all other functions can be found.

Initially, we use our function finder on the Javascript side (FN_GETOBJ) to get some needed objects and namespaces from the window.

The first thing to note here is that now we don’t have any strange hash values in the code - because this is being run before we compile the final WASM code, we can use Rust to calculate the hashes from the strings we are interested in. Each hash turns into a 12-bit (3x4bit) sequence in the program.

Because our VM’s generic VM_CALL instruction can repeat, we can keep the overhead of function lookup low in case we need many names from the same source. We use the repeat parameter to call the FN_GETOBJ as many times as needed, with each new name adding only 12-bits, amortizing down the bit-cost of the instruction.

Once we have the objects from the window, we need to go through those one at a time getting the key functions we will need to call, such as String.fromCharCode, Function.bind, and Document.createElement.

An important thing here is that we need to be able to create arbitrary strings of text in the javascript side, for parameters and our shader programs.

Instead of hard-coding support for this in the bootstrap Javascript code, we put it together from more basic found JS-functions. The way it works is we build a string by pushing a sequence of numbers to the JS stack, then collecting those into an array with splice, and calling the equivalent of map(a=>String.fromCharCode(a)).join("") on the result. Once the functions have been found, the NibbleStream.push_str() function can be used to simplify this into a single assembler function call.

After a bit more function plumbing, we have enough of them to create our canvas, add it to the page, and tweak the styles to fill the view (surprisingly tedious!).

With the canvas in place, we can make our WebGL context and move onto the real business. Again we need to lookup a list of needed functions and constants for WebGL. Then we can create our vertex and fragment shaders (defined inline), compile and link them, then create a vertex buffer for a quad (using 4bit unsigned integers 0 and 2 only), all the needed bindings, and finally draw it once with drawArrays. Phew! The tick/check mark means we are good and everything worked as expected.

void main() { vec2 a = gl_FragCoord.xy/32. - 0.5; float t = time*.1; gl_FragColor=vec4(fract(a.x/3.-t),fract(a.x/5.-t),fract(a.x/7.+t),1.)*(1.-2.*abs(a.y)); }