Nov 13, 2019 - Github Link
To provide a very simple overview of the assembly programming language in VLAHB and assembly programming principles in general. Machine code and opcodes is beyond the scope of this blog post.
A virtual machine ("vm") acts like a real computer but it only exists within software and not hardware. This means that a virtual machine is not made of physical pieces of etched silicon, printed circuit boards, transistors or capacitors or anything like that. Instead it is "virtual" which means that we program this computer ourselves to do what want: to read and execute binary files according to a blueprint we give it.
Checkout the file called vm.c in VLAHB. See the switch statement and all the case lines? This is code that tells the virtual machine that if it sees an instruction 0x0001, do X. If it sees 0x0002, do Y, and so on and so forth.
VLAHB is a vm together with an assembly language - let's call it VASM - and an assembler (assembler.py) which converts assembly files (file.asm) to raw machine code (file.bin) that our vm can read and execute.
The process of turning file.asm to file.bin is called assembling.
Our vm features a special integer called a program counter ("pc") which keeps track of what line the vm is reading at a time. When you tell the vm to run the program myFile.bin, the pc value is assigned to the line number of the program that the vm will begin "reading" (think Turing Machine). The pc is set to this value and then...
Nothing is inherently special about these cryptic codes. 1002 0003 0000 fde8 has no inherent meaning but the vm reads and knows to load the literal 65000 into ram at index 4098.
There is a list called ram stored in our virtual machine vm.c. It contains 65535 0's. Each slot of ram holds an integer from 0 to 4294967295. By changing the values in our ram slots we can get super mario to run across the screen, create a paint application, or solve world peace.
// Empty brackets [ ] represent 0 ram := [ ][ ][ ][ ][ ][ ][ ][ ]... ^ ^ ^ ^ ^ ^ ^ ^ 0 1 2 3 4 5 6 7
[-----------] [-------] [--] [---------] [---------------------------] 0 4095 4096 4099 4100 4101 27140 27141 65535
|slots in ram||what they do|
|4096-4099||4 pointers (U,V,Y,Z resp.)|
|4100||return slot for function outputs|
The simplest operation to perform on ram is a direct load. This loads a value into a slot of ram.
LD R 2 // load 2 into the slot of ram at index 3
ram := [ ][ ][ ][ ][ ][ ][ ]... ^ ^ ^ ^ ^ ^ ^ ^ 0 1 2 3 4 5 6 7
There are other ways to manipulate ram. VASM handles all the basic operations +, -, ×, ÷
ADD R 8 // ram = ram + 8 SUB R R // ram = ram-ram MUL R 5 // ram = ram * 5
LD R 2 MUL R 2 LD R 3 MUL R R EXIT
Our assembly language supports a primitive to functions called LABELS. When assembly code is compiled down to machine instructions for the virtual machine, labels don't appear in the code. Instead they are treated as markers in the code where you can loop to, jump to, etc. Therefore it is better to write GOTO MY_FIRST_LABEL rather than something like GOTO 1729.
Q. Why are we storing stuff in R?
MATH_ADD_TWO_NUMBERS: LD R R ADD R R // store the sum of R and R into R RETURN
To jump the pc to a label, call it with the CALL opcode. This pushes the current pc to the stack - a stack in the CPU - and the pc is set to the line where the label is
RETURN: this pops the pc from the stack and sets the pc to that popped value. If we write something like the code snippet below then ram will not be touched.
CALL WHAT_IS_LIFE EXIT // we never go here LD R 55 LD R 51 LD R 44 WHAT_IS_LIFE: RETURN
But why? Once we hit CALL WHAT_IS_LIFE, the pc will be pushed, and the pc jumps to WHAT_IS_LIFE. The stack and pc now looks like this:
stack =  pc = 0
In the next line we hit a RETURN, which means we pop from the stack and assign our pc to that value.
stack = [ ] pc = 0
Now we advance a line to the second line of our program and hit EXIT. The vm exits.
LD R 3 LD R 4 CALL DO_SOMETHING EXIT DO_SOMETHING: LD R R ADD R R RETURN
Let's say you want to be able to programmatically place values in ram with assembly. Let's say for instance you want to put 7 in R, 7 in R, and 7 in R.
Notice that our index of ram is a multiple of 3 each time: 3, 6, 9. We can write
but with a pointer:
LD R 7 LD R 7 LD R 7
How does this work?
LD R 3 // pointer U LD R[U] 7 ADD R 3 // R -> 6 LD R[U] 7 ADD R 3 // R -> 9 LD R[U] 7
A pointer refers to a slot of ram that the vm treats in a _special_ way. The word _special_ refers to vm interpreting that value as an index of ram. VLAHB has 4 hard-coded slots for pointers, located at ram slots R, R, R and R, with the designated letters U, V, Y, Z respectively.
Look at the first two lines of asm code above. We first load 3 into 4096. The second line is LD R[U] 7. This tells the program to load a 7 into the ram[ram].
Since ram = 3, we are loading 7 into ram.
NB There are more opcodes built into VLAHB that use pointers. Checkout vm.c to see them all.
ram := ... ^ ^ ^ ^ ^ ^ ^ ^ 0 1 2 3 4 5 6 7
LD R 7 LD R 7 LD R 7 LD R 7 LD R 7 LD R 7 LD R 7 LD R 7 LD R 7 LD R 7 LD R 7
A subset of opcodes are called conditional and can result in the pc skipping the next line of assembly code (that's +2 lines in machine code since each valid assembly line maps to exactly 2 lines of machine code after it's compiled). These opcodes compare one value with another. These opcodes are called conditional because they may jump over an extra line of assembly depending on some condition.
For example CMP R 2 checks if ram is equal to 2. If it is, skip next line. Else, do nothing. Read the example below carefully
LD R 4 LD R 0 CMP R 8 // if R == 8, skip next line LD R 1 EXIT // R=1 at end of program
|CMP||is equal to|
|LT||less than, <|
|LTE||less than or equal, <=|
|GT||greater than, >|
|GTE||greater than or equal, >=|
Exercise 4 What is loaded into slot R when the program hits EXIT?
LD R 3 LD R 0 CMP R 2 ADD R 1 GTE R 2 ADD R 1 ADD R 2 EXIT
A subset of our ram slots are dedcated to pixels for the 160X144 px display. This happens to be the same resolution of the original Game Boy. The choice of vram slots is arbitrary so we'll pick slots from ram to ram inclusive. These slots map onto the screen from top to bottom, left to right, starting with the top-left pixel.
An int stored in vram is interpreted as an rgba color. It is easier to load a hexidecmial integer as it's easier to read what color you are loading.
Ex. 1 red pixel in top left corner
// let's place a red pixel at the top-left corner LD R 0XFF0000FF // rgba(255,0,0,255) BLIT // this draws the screen INFINITE_LOOP: INPUT R SHT R R 6 // END SHT R R 7 // ESC // exit vm if press ESC or END CMP R 0 EXIT CMP R 0 EXIT GOTO INFINITE_LOOP // loop forever so we can see the red dot
Ex. 2 red, green, blue pixels in top left corner
LD R 0XFF0000FF // red LD R 0X00FF00FF // green LD R 0X0000FFFF // blue BLIT INFINITE_LOOP_2: INPUT R SHT R R 6 // END SHT R R 7 // ESC // exit vm if press ESC or END CMP R 0 EXIT CMP R 0 EXIT GOTO INFINITE_LOOP_2 // loop forever
Run the following in your terminal to get a 4 byte wide hexdump of the binary file vlahb generated above.
$ xxd -c 4 bin/file.bin
You can see debug messages as you run vm.c by changing this line to #define DEBUG 1. Warning: it is very slow.
Thank you taking the time to read and as always. Feedback is highly appreciated. This project was really fun to write and I am very excited to see the response of other people.
Massive thanks to @glouw for the idea of VLAHB and his support throughout the project.Happy coding! 💻