Recently I've been writing a couple of simple compilers, which take input in a particular format and generate assembly language output. This output can then be piped through gcc
to generate a native executable.
Public examples include this trivial math compiler and my brainfuck compiler.
Of course there's always the nagging thought that relying upon gcc
(or nasm
) is a bit of a cheat. So I wondered how hard is it to write an assembler? Something that would take assembly-language program and generate a native (ELF) binary?
And the answer is "It isn't hard, it is just tedious".
I found some code to generate an ELF binary, and after that assembling simple instructions was pretty simple. I remember from my assembly-language days that the encoding of instructions can be pretty much handled by tables, but I've not yet gone into that.
(Specifically there are instructions like "add rax, rcx
", and the encoding specifies the source/destination registers - with different forms for various sized immediates.)
Anyway I hacked up a simple assembler, it can compile a.out
from this input:
.hello DB "Hello, world\n"
.goodbye DB "Goodbye, world\n"
mov rdx, 13 ;; write this many characters
mov rcx, hello ;; starting at the string
mov rbx, 1 ;; output is STDOUT
mov rax, 4 ;; sys_write
int 0x80 ;; syscall
mov rdx, 15 ;; write this many characters
mov rcx, goodbye ;; starting at the string
mov rax, 4 ;; sys_write
mov rbx, 1 ;; output is STDOUT
int 0x80 ;; syscall
xor rbx, rbx ;; exit-code is 0
xor rax, rax ;; syscall will be 1 - so set to xero, then increase
inc rax ;;
int 0x80 ;; syscall
The obvious omission is support for "JMP", "JMP_NZ", etc. That's painful because jumps are encoded with relative offsets. For the moment if you want to jump:
push foo ; "jmp foo" - indirectly.
ret
:bar
nop ; Nothing happens
mov rbx,33 ; first syscall argument: exit code
mov rax,1 ; system call number (sys_exit)
int 0x80 ; call kernel
:foo
push bar ; "jmp bar" - indirectly.
ret
I'll update to add some more instructions, and see if I can use it to handle the output I generate from a couple of other tools. If so that's a win, if not then it was a fun learning experience:
https://github.com/akkartik/mu
I think you can make your compiler even simpler by rethinking the need for Assembly. Easier-to-transform IRs are possible. Here's mine.