About Archive Tags RSS Feed

 

Writing a brainfuck compiler.

14 June 2020 19:00

So last night I had the idea that it might be fun to write a Brainfuck compiler, to convert BF programs into assembly language, where they'd run nice and quickly.

I figured I could allocate a day to do the work, and it would be a pleasant distraction on a Sunday afternoon. As it happened it only took me three hours from start to finish.

There are only a few instructions involved in brainfuck:

  • >
    • increment the data pointer (to point to the next cell to the right).
  • <
    • decrement the data pointer (to point to the next cell to the left).
  • +
    • increment (increase by one) the byte at the data pointer.
  • -
    • decrement (decrease by one) the byte at the data pointer.
  • .
    • output the byte at the data pointer.
  • ,
    • accept one byte of input, storing its value in the byte at the data pointer.
  • [
    • if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command.
  • ]
    • if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command.

The Wikipedia link early shows how you can convert cleanly to a C implementation, so my first version just did that:

  • Read a BF program.
  • Convert to a temporary C-source file.
  • Compile with gcc.
  • End result you have an executable which you can run.

The second step was just as simple:

  • Read a BF program.
  • Convert to a temporary assembly language file.
  • Compile with nasm, link with ld.
  • End result you have an executable which you can run.

The following program produces the string "Hello World!" to the console:

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

My C-compiler converted that to the program:

extern int putchar(int);
extern char getchar();

char array[30000];
int idx = 0;

int main (int arc, char *argv[]) {
  array[idx]++;
  array[idx]++;
  array[idx]++;
  array[idx]++;
  array[idx]++;
  array[idx]++;
  array[idx]++;
  array[idx]++;

  while (array[idx]) {
    idx++;
    array[idx]++;
    array[idx]++;
    array[idx]++;
    array[idx]++;

    while (array[idx]) {
      idx++;
      array[idx]++;
      array[idx]++;
      idx++;
      array[idx]++;
      array[idx]++;
      array[idx]++;
    ..
    ..

The assembly language version is even longer:


global _start
section .text

_start:
  mov r8, stack
  add byte [r8], 8
label_loop_8:
  cmp byte [r8], 0
  je close_loop_8
  add r8, 1
  add byte [r8], 4
label_loop_14:
  cmp byte [r8], 0
  je close_loop_14
  add r8, 1
  add byte [r8], 2
  add r8, 1
  add byte [r8], 3
  add r8, 1
  add byte [r8], 3
  add r8, 1
  add byte [r8], 1
  sub r8, 4
  sub byte [r8], 1
  jmp label_loop_14
close_loop_14:
  add r8, 1
  add byte [r8], 1
  ..

  mov rax, 60
  mov rdi, 0
  syscall
section .bss
stack: resb 300000

Annoyingly the assembly language version ran slower than the C-version, which I was sneakily compiling with "gcc -O3 .." to ensure it was optimized.

The first thing that I did was to convert it to fold adjacent instructions. Instead of generating separate increment instructions for ">>>" I instead started to generate "add xx, 3". That didn't help as much as I'd hoped, but it was still a big win.

After that I made a minor tweak to the way that loops are handled to compare at the end of the loop as well as the start, and that shaved off a fraction of a second.

As things stand I think I'm "done". It might be nice to convert the generated assembly language to something gcc can handle, to drop the dependency on nasm, but I don't feel a pressing need for that yet.

Was a fun learning experience, and I think I'll come back to optimization later.

| No comments