I've spent some time in the recent past working with interpreters, and writing a BASIC interpreter, but one thing I'd not done is write a compiler.
Once upon a time I worked for a compiler-company, but I wasn't involved with the actual coding at that time. Instead I worked on other projects, and did a minor amount of system-administration.
There are enough toy-languages that it didn't seem worthwhile to write a compiler for yet another one. At the same time writing a compiler for a full-language would get bogged down in a lot of noise.
So I decided to simplify things: I would write a compiler for "maths". Something that would take an expression and output assembly-language, which could then be compiled.
The end result is this simple compiler:
Initially I wrote something that would parse expressions such as 3 + 4 * 5
and output an abstract-syntax-tree. I walked the tree and started writing logic to pick registers, and similar. It all seemed like more of a grind than a useful exercise - and considering how ludicrous compiling simple expressions to assembly language already was it seemed particularly silly.
So once again I simplified, deciding to accept only a simple "reverse-polish-like" expression, and outputing the assembly for that almost directly.
Assume you want to calculate "((3 * 5) +2)
" you'd feed my compiler:
3 5 * 2 +
To compile that we first load the initial state 3
, then we walk the rest of the program always applying an operation with an operand:
- Store
3
5 *
-> multiply by 5.2 +
-> add 2.- ..
This approach is trivial to parse, and trivial to output the assembly-language for: Pick a register and load your starting value, then just make sure all your operations apply to that particular register. (In the case of Intel assembly it made sense to store the starting value in EAX
, and work with that).
A simple program would then produce a correspondingly simple output. Given 1 1 +
we'd expect this output:
.intel_syntax noprefix
.global main
.data
result: .asciz "Result %d\n"
main:
mov rax, 1
add rax, 1
lea rdi,result
mov rsi, rax
xor rax, rax
call printf
xor rax, rax
ret
With that output you can assemble the program, and run it:
$ gcc -static -o program program.s
$ ./program
Result 2
I wrote some logic to allow calculating powers too, so you can output 2 ^ 8
, etc. That's just implemented the naive-way, where you have a small loop and multiply the contents of EAX by itself the appropriate number of times. Modulus is similarly simple to calculate.
Adding support for named variables, and other things, wouldn't be too hard. But it would involve register-allocation and similar complexity. Perhaps that's something I need to flirt with, to make the learning process complete, but to be honest I can't be bothered.
Anyway check it out, if you like super-fast maths. My benchmark?
$ time perl -e 'print 2 ** 8 . "\n"'
256
real 0m0.006s
user 0m0.005s
sys 0m0.000s
vs.
$ math-compiler -compile '2 8 ^'
$ time ./a.out
Result 256
real 0m0.001s
user 0m0.001s
sys 0m0.000s
Wow. Many wow. Much speed. All your base-two are belong to us.