Assembly in 7 Minutes

Nov 13, 2019 - Github Link


Scope of Blog

To provide a very simple overview of the assembly programming language in VLAHB and assembly programming principles in general. Machine code and opcodes is beyond the scope of this blog post.

The Brains of the Operation

A virtual machine ("vm") acts like a real computer but it only exists within software and not hardware. This means that a virtual machine is not made of physical pieces of etched silicon, printed circuit boards, transistors or capacitors or anything like that. Instead it is "virtual" which means that we program this computer ourselves to do what want: to read and execute binary files according to a blueprint we give it.

Checkout the file called vm.c in VLAHB. See the switch statement and all the case lines? This is code that tells the virtual machine that if it sees an instruction 0x0001, do X. If it sees 0x0002, do Y, and so on and so forth.

Assembly to Binary

VLAHB is a vm together with an assembly language - let's call it VASM - and an assembler (assembler.py) which converts assembly files (file.asm) to raw machine code (file.bin) that our vm can read and execute.

The process of turning file.asm to file.bin is called assembling.

Our vm features a special integer called a program counter ("pc") which keeps track of what line the vm is reading at a time. When you tell the vm to run the program myFile.bin, the pc value is assigned to the line number of the program that the vm will begin "reading" (think Turing Machine). The pc is set to this value and then...

Nothing is inherently special about these cryptic codes. 1002 0003 0000 fde8 has no inherent meaning but the vm reads and knows to load the literal 65000 into ram at index 4098.

RAM

There is a list called ram stored in our virtual machine vm.c. It contains 65535 0's. Each slot of ram holds an integer from 0 to 4294967295. By changing the values in our ram slots we can get super mario to run across the screen, create a paint application, or solve world peace.

// Empty brackets [ ] represent 0
ram := [ ][ ][ ][ ][ ][ ][ ][ ]...
        ^  ^  ^  ^  ^  ^  ^  ^ 
        0  1  2  3  4  5  6  7 

Ram Slot Dedication: this represents how our ram slot are organized
[-----------] [-------] [--] [---------] [---------------------------]
0       4095  4096 4099 4100 4101  27140 27141                   65535 


slots in ram what they do
0-4095 function inputs
4096-4099 4 pointers (U,V,Y,Z resp.)
4100 return slot for function outputs
4101-27140 vram
27141-65535 free space

Simple Operations

The simplest operation to perform on ram is a direct load. This loads a value into a slot of ram.



Ex. 1
LD R[3] 2  // load 2 into the slot of ram at index 3

ram := [ ][ ][ ][2][ ][ ][ ][ ]...
        ^  ^  ^  ^  ^  ^  ^  ^ 
        0  1  2  3  4  5  6  7 
    


There are other ways to manipulate ram. VASM handles all the basic operations +, -, ×, ÷



Ex. 2
ADD R[1] 8     // ram[1] = ram[1] + 8
SUB R[2] R[1]  // ram[2] = ram[2]-ram[1]
MUL R[2] 5     // ram[2] = ram[2] * 5



Exercise 1 What happens to ram after compiling this assembly code and running it in vm?
(assume ram is initialized as an array of 0s)
LD R[2] 2
MUL R[2] 2
LD R[69] 3
MUL R[2] R[3]
EXIT

Labels

Our assembly language supports a primitive to functions called LABELS. When assembly code is compiled down to machine instructions for the virtual machine, labels don't appear in the code. Instead they are treated as markers in the code where you can loop to, jump to, etc. Therefore it is better to write GOTO MY_FIRST_LABEL rather than something like GOTO 1729.

MATH_ADD_TWO_NUMBERS:
LD R[4100] R[0]
ADD R[4100] R[1]  // store the sum of R[0] and R[1] into R[4100]
RETURN
Q. Why are we storing stuff in R[4100]?

A. Our function output is by convention stored at R[4100]. We could have picked R[1729].

To jump the pc to a label, call it with the CALL opcode. This pushes the current pc to the stack - a stack in the CPU - and the pc is set to the line where the label is

RETURN: this pops the pc from the stack and sets the pc to that popped value. If we write something like the code snippet below then ram will not be touched.

CALL WHAT_IS_LIFE
EXIT

// we never go here
LD R[0] 55
LD R[1] 51
LD R[2] 44

WHAT_IS_LIFE:
    RETURN

But why? Once we hit CALL WHAT_IS_LIFE, the pc will be pushed, and the pc jumps to WHAT_IS_LIFE. The stack and pc now looks like this:

    stack = [0]
    pc = 0
    

In the next line we hit a RETURN, which means we pop from the stack and assign our pc to that value.

    stack = [ ]
    pc = 0
    

Now we advance a line to the second line of our program and hit EXIT. The vm exits.




Exercise 2 Describe in your own words what the program below is doing to ram.
LD R[0] 3
LD R[1] 4
CALL DO_SOMETHING
EXIT

DO_SOMETHING:
    LD R[4100] R[0]
    ADD R[4100] R[1]
    RETURN

Pointers

Let's say you want to be able to programmatically place values in ram with assembly. Let's say for instance you want to put 7 in R[3], 7 in R[6], and 7 in R[9].

Notice that our index of ram is a multiple of 3 each time: 3, 6, 9. We can write

LD R[3] 7
LD R[6] 7
LD R[9] 7
but with a pointer:
LD R[4096] 3  // pointer U
LD R[U] 7

ADD R[4096] 3  // R[4096] -> 6
LD R[U] 7

ADD R[4096] 3  // R[4096] -> 9
LD R[U] 7
How does this work?

A pointer refers to a slot of ram that the vm treats in a _special_ way. The word _special_ refers to vm interpreting that value as an index of ram. VLAHB has 4 hard-coded slots for pointers, located at ram slots R[4096], R[4097], R[4098] and R[4099], with the designated letters U, V, Y, Z respectively.



letter ram slot
U R[4096]
V R[4097]
Y R[4098]
Z R[4099]

Look at the first two lines of asm code above. We first load 3 into 4096. The second line is LD R[U] 7. This tells the program to load a 7 into the ram[ram[4096]].

Since ram[4096] = 3, we are loading 7 into ram[3].

ram := [0][0][0][7][0][0][0][0]...
        ^  ^  ^  ^  ^  ^  ^  ^ 
        0  1  2  3  4  5  6  7 
NB There are more opcodes built into VLAHB that use pointers. Checkout vm.c to see them all.


Exercise 3 The code snippet below loads the integer 7 into slots 120 to 130 inclusive. Rewrite the code below using the opcode LD R[U] R[V].
LD R[120] 7
LD R[121] 7
LD R[122] 7
LD R[123] 7
LD R[124] 7
LD R[125] 7
LD R[126] 7
LD R[127] 7
LD R[128] 7
LD R[129] 7
LD R[130] 7

Conditional Opcodes

A subset of opcodes are called conditional and can result in the pc skipping the next line of assembly code (that's +2 lines in machine code since each valid assembly line maps to exactly 2 lines of machine code after it's compiled). These opcodes compare one value with another. These opcodes are called conditional because they may jump over an extra line of assembly depending on some condition.

For example CMP R[0] 2 checks if ram[0] is equal to 2. If it is, skip next line. Else, do nothing. Read the example below carefully

LD R[0] 4
LD R[1] 0

CMP R[0] 8  // if R[0] == 8, skip next line
LD R[1] 1
EXIT  // R[1]=1 at end of program


opcode meaning
CMP is equal to
LT less than, <
LTE less than or equal, <=
GT greater than, >
GTE greater than or equal, >=



Exercise 4 What is loaded into slot R[33] when the program hits EXIT?

LD R[0] 3
LD R[33] 0

CMP R[0] 2
ADD R[33] 1
GTE R[0] 2
ADD R[33] 1
ADD R[33] 2

EXIT

VRAM

A subset of our ram slots are dedcated to pixels for the 160X144 px display. This happens to be the same resolution of the original Game Boy. The choice of vram slots is arbitrary so we'll pick slots from ram[4101] to ram[27140] inclusive. These slots map onto the screen from top to bottom, left to right, starting with the top-left pixel.

An int stored in vram is interpreted as an rgba color. It is easier to load a hexidecmial integer as it's easier to read what color you are loading.

Ex. 1 red pixel in top left corner

// let's place a red pixel at the top-left corner

LD R[4101] 0XFF0000FF // rgba(255,0,0,255)
BLIT  // this draws the screen

INFINITE_LOOP:
    INPUT R[0]
    SHT R[0] R[29000] 6  // END
    SHT R[0] R[29001] 7  // ESC

    // exit vm if press ESC or END
    CMP R[29000] 0
        EXIT
    CMP R[29001] 0
        EXIT

    GOTO INFINITE_LOOP  // loop forever so we can see the red dot

Ex. 2 red, green, blue pixels in top left corner

LD R[4101] 0XFF0000FF // red
LD R[4102] 0X00FF00FF // green
LD R[4103] 0X0000FFFF // blue
BLIT

INFINITE_LOOP_2:
    INPUT R[0]
    SHT R[0] R[29000] 6  // END
    SHT R[0] R[29001] 7  // ESC

    // exit vm if press ESC or END
    CMP R[29000] 0
        EXIT
    CMP R[29001] 0
        EXIT

    GOTO INFINITE_LOOP_2  // loop forever

Write an Assembly Game

Tips

  1. See pong.asm for a reference on how to code user input into a game. The INPUT and SHT opcodes are necessary for this.
  2. Use C syntax highlighting for your text editor for asm files.
  3. Checkout sprite.asm in the repo. It contains 5X5 px sprites for numbers 0-9 and letters A-Z.
  4. Slots 27141-65535 have no special designation and are free to use for anything you want. You can store variables, perform arithmetic, etc.

Debugging

Run the following in your terminal to get a 4 byte wide hexdump of the binary file vlahb generated above.


$ xxd -c 4 bin/file.bin
    

You can see debug messages as you run vm.c by changing this line to #define DEBUG 1. Warning: it is very slow.



Thank You!

Thank you taking the time to read and as always. Feedback is highly appreciated. This project was really fun to write and I am very excited to see the response of other people.

Massive thanks to @glouw for the idea of VLAHB and his support throughout the project.

Happy coding! 💻