About Archive Tags RSS Feed

 

Entries posted in January 2019

I decided it was time to write a compiler

31 January 2019 15:01

I've spent some time in the recent past working with interpreters, and writing a BASIC interpreter, but one thing I'd not done is write a compiler.

Once upon a time I worked for a compiler-company, but I wasn't involved with the actual coding at that time. Instead I worked on other projects, and did a minor amount of system-administration.

There are enough toy-languages that it didn't seem worthwhile to write a compiler for yet another one. At the same time writing a compiler for a full-language would get bogged down in a lot of noise.

So I decided to simplify things: I would write a compiler for "maths". Something that would take an expression and output assembly-language, which could then be compiled.

The end result is this simple compiler:

Initially I wrote something that would parse expressions such as 3 + 4 * 5 and output an abstract-syntax-tree. I walked the tree and started writing logic to pick registers, and similar. It all seemed like more of a grind than a useful exercise - and considering how ludicrous compiling simple expressions to assembly language already was it seemed particularly silly.

So once again I simplified, deciding to accept only a simple "reverse-polish-like" expression, and outputing the assembly for that almost directly.

Assume you want to calculate "((3 * 5) +2)" you'd feed my compiler:

  3 5 * 2 +

To compile that we first load the initial state 3, then we walk the rest of the program always applying an operation with an operand:

  • Store 3
  • 5 * -> multiply by 5.
  • 2 + -> add 2.
  • ..

This approach is trivial to parse, and trivial to output the assembly-language for: Pick a register and load your starting value, then just make sure all your operations apply to that particular register. (In the case of Intel assembly it made sense to store the starting value in EAX, and work with that).

A simple program would then produce a correspondingly simple output. Given 1 1 + we'd expect this output:

  .intel_syntax noprefix
  .global main

  .data
    result: .asciz "Result %d\n"

  main:
    mov rax, 1
    add rax, 1

    lea rdi,result
    mov rsi, rax
    xor rax, rax
    call printf
    xor rax, rax
    ret

With that output you can assemble the program, and run it:

 $ gcc -static -o program program.s
 $ ./program
 Result 2

I wrote some logic to allow calculating powers too, so you can output 2 ^ 8, etc. That's just implemented the naive-way, where you have a small loop and multiply the contents of EAX by itself the appropriate number of times. Modulus is similarly simple to calculate.

Adding support for named variables, and other things, wouldn't be too hard. But it would involve register-allocation and similar complexity. Perhaps that's something I need to flirt with, to make the learning process complete, but to be honest I can't be bothered.

Anyway check it out, if you like super-fast maths. My benchmark?

$  time perl -e 'print 2 ** 8 . "\n"'
256
real    0m0.006s
user    0m0.005s
sys     0m0.000s

vs.

$ math-compiler -compile '2 8 ^'
$ time ./a.out
Result 256

real   0m0.001s
user   0m0.001s
sys    0m0.000s

Wow. Many wow. Much speed. All your base-two are belong to us.

| No comments