Recently I've been writing a couple of simple compilers, which take input in a particular format and generate assembly language output. This output can then be piped through
gcc to generate a native executable.
Public examples include this trivial math compiler and my brainfuck compiler.
Of course there's always the nagging thought that relying upon
nasm) is a bit of a cheat. So I wondered how hard is it to write an assembler? Something that would take assembly-language program and generate a native (ELF) binary?
And the answer is "It isn't hard, it is just tedious".
I found some code to generate an ELF binary, and after that assembling simple instructions was pretty simple. I remember from my assembly-language days that the encoding of instructions can be pretty much handled by tables, but I've not yet gone into that.
(Specifically there are instructions like "
add rax, rcx", and the encoding specifies the source/destination registers - with different forms for various sized immediates.)
Anyway I hacked up a simple assembler, it can compile
a.out from this input:
.hello DB "Hello, world\n" .goodbye DB "Goodbye, world\n" mov rdx, 13 ;; write this many characters mov rcx, hello ;; starting at the string mov rbx, 1 ;; output is STDOUT mov rax, 4 ;; sys_write int 0x80 ;; syscall mov rdx, 15 ;; write this many characters mov rcx, goodbye ;; starting at the string mov rax, 4 ;; sys_write mov rbx, 1 ;; output is STDOUT int 0x80 ;; syscall xor rbx, rbx ;; exit-code is 0 xor rax, rax ;; syscall will be 1 - so set to xero, then increase inc rax ;; int 0x80 ;; syscall
The obvious omission is support for "JMP", "JMP_NZ", etc. That's painful because jumps are encoded with relative offsets. For the moment if you want to jump:
push foo ; "jmp foo" - indirectly. ret :bar nop ; Nothing happens mov rbx,33 ; first syscall argument: exit code mov rax,1 ; system call number (sys_exit) int 0x80 ; call kernel :foo push bar ; "jmp bar" - indirectly. ret
I'll update to add some more instructions, and see if I can use it to handle the output I generate from a couple of other tools. If so that's a win, if not then it was a fun learning experience:
I think you can make your compiler even simpler by rethinking the need for Assembly. Easier-to-transform IRs are possible. Here's mine.