Adventures optimizing a bytecode based scripting language

I've recently spent some time working with a simple scripting language. Today I spent parts of the day trying to make it faster. As a broad overview we'll be considering this example script:

  if ( 1 + 2 * 3 == 7 ) { return true; }

  return false;

This gets compiled into a simple set of bytecode instructions, which can refer to a small collection of constants - which are identified by offset/ID:

  000000	    OpConstant	0	// load constant: &{1}
  000003	    OpConstant	1	// load constant: &{2}
  000006	    OpConstant	2	// load constant: &{3}
  000009	         OpMul
  000010	         OpAdd
  000011	    OpConstant	3	// load constant: &{7}
  000014	       OpEqual
  000015	 OpJumpIfFalse	20
  000018	        OpTrue
  000019	      OpReturn
  000020	       OpFalse
  000021	      OpReturn


Constants:
  000000 Type:INTEGER Value:1
  000001 Type:INTEGER Value:2
  000002 Type:INTEGER Value:3
  000003 Type:INTEGER Value:7

The OpConstant instruction means to load the value with the given ID from the constant pool and place it onto the top of the stack. The multiplication and addition operations both pop values from the stack, apply the appropriate operation and push the result back. All standard stuff.

Of course these constants are constant so it seemed obvious to handle the case of integers as a special case. Rather than storing them in the constant-pool, where booleans, strings, etc live, we just store them inline. That meant our program would look like this:

  000000	        OpPush	1
  000003	        OpPush	2
  000006	        OpPush	3
  000009	         OpMul
  000010	         OpAdd
  ...

At this point the magic begins: we can scan the program from start to finish. Every time we find "OpPush", "OpPush", then "Maths" we can rewrite the program to contain the appropriate result already. So this concrete fragment:

OpPush 2
OpPush 3
OpMul

is now replaced by:

OpPush	6
OpNop     ; previous opconstant
OpNop     ; previous arg1
OpNop     ; previous arg2
OpNop     ; previous OpMul

Repeating the process, and applying the same transformation to our comparision operation we now have the updated bytecode of:

  000000	         OpNop
  000001	         OpNop
  000002	         OpNop
  000003	         OpNop
  000004	         OpNop
  000005	         OpNop
  000006	         OpNop
  000007	         OpNop
  000008	         OpNop
  000009	         OpNop
  000010	         OpNop
  000011	         OpNop
  000012	         OpNop
  000013	         OpNop
  000014	        OpTrue
  000015	 OpJumpIfFalse	20
  000018	        OpTrue
  000019	      OpReturn
  000020	       OpFalse
  000021	      OpReturn

The OpTrue instruction pushes a "true" value to the stack, while the OpJumpIfFalse only jumps to the given program offset if the value popped off the stack is non-true. So we can remove those two instructions.

Our complete program is now:

  000000	         OpNop
  000001	         OpNop
  000002	         OpNop
  000003	         OpNop
  000004	         OpNop
  000005	         OpNop
  000006	         OpNop
  000007	         OpNop
  000008	         OpNop
  000009	         OpNop
  000010	         OpNop
  000011	         OpNop
  000012	         OpNop
  000013	         OpNop
  000014	         OpNop
  000015	         OpNop
  000018	        OpTrue
  000019	      OpReturn
  000020	       OpFalse
  000021	      OpReturn

There are two remaining obvious steps:

Remove all the OpNop instructions.
Recognize the case that there are zero jumps in the generated bytecode.
- In that case we can stop processing code once we hit the first OpReturn

With that change made our input program:

if ( 1 + 2 * 3 == 7 ) { return true; } return false;

Now becomes this compiled bytecode:

  000000	        OpTrue
  000001	      OpReturn

Which now runs a lot faster than it did in the past. Of course this is completely artificial, but it was a fun process to work through regardless.

Tags: bytecode, evalfilter, golang, optimization | No comments

Adventures optimizing a bytecode based scripting language

Recent Posts