# Entries tagged `optimization`

## Adventures optimizing a bytecode based scripting language

18 November 2019 19:45

I've recently spent some time working with a simple scripting language. Today I spent parts of the day trying to make it faster. As a broad overview we'll be considering this example script:

``````  if ( 1 + 2 * 3 == 7 ) { return true; }

return false;
``````

This gets compiled into a simple set of bytecode instructions, which can refer to a small collection of constants - which are identified by offset/ID:

``````  000000	    OpConstant	0	// load constant: &{1}
000003	    OpConstant	1	// load constant: &{2}
000006	    OpConstant	2	// load constant: &{3}
000009	         OpMul
000010	         OpAdd
000011	    OpConstant	3	// load constant: &{7}
000014	       OpEqual
000015	 OpJumpIfFalse	20
000018	        OpTrue
000019	      OpReturn
000020	       OpFalse
000021	      OpReturn

Constants:
000000 Type:INTEGER Value:1
000001 Type:INTEGER Value:2
000002 Type:INTEGER Value:3
000003 Type:INTEGER Value:7
``````

The `OpConstant` instruction means to load the value with the given ID from the constant pool and place it onto the top of the stack. The multiplication and addition operations both pop values from the stack, apply the appropriate operation and push the result back. All standard stuff.

Of course these constants are constant so it seemed obvious to handle the case of integers as a special case. Rather than storing them in the constant-pool, where booleans, strings, etc live, we just store them inline. That meant our program would look like this:

``````  000000	        OpPush	1
000003	        OpPush	2
000006	        OpPush	3
000009	         OpMul
000010	         OpAdd
...
``````

At this point the magic begins: we can scan the program from start to finish. Every time we find "OpPush", "OpPush", then "Maths" we can rewrite the program to contain the appropriate result already. So this concrete fragment:

``````OpPush 2
OpPush 3
OpMul
``````

is now replaced by:

``````OpPush	6
OpNop     ; previous opconstant
OpNop     ; previous arg1
OpNop     ; previous arg2
OpNop     ; previous OpMul
``````

Repeating the process, and applying the same transformation to our comparision operation we now have the updated bytecode of:

``````  000000	         OpNop
000001	         OpNop
000002	         OpNop
000003	         OpNop
000004	         OpNop
000005	         OpNop
000006	         OpNop
000007	         OpNop
000008	         OpNop
000009	         OpNop
000010	         OpNop
000011	         OpNop
000012	         OpNop
000013	         OpNop
000014	        OpTrue
000015	 OpJumpIfFalse	20
000018	        OpTrue
000019	      OpReturn
000020	       OpFalse
000021	      OpReturn
``````

The `OpTrue` instruction pushes a "true" value to the stack, while the `OpJumpIfFalse` only jumps to the given program offset if the value popped off the stack is non-true. So we can remove those two instructions.

Our complete program is now:

``````  000000	         OpNop
000001	         OpNop
000002	         OpNop
000003	         OpNop
000004	         OpNop
000005	         OpNop
000006	         OpNop
000007	         OpNop
000008	         OpNop
000009	         OpNop
000010	         OpNop
000011	         OpNop
000012	         OpNop
000013	         OpNop
000014	         OpNop
000015	         OpNop
000018	        OpTrue
000019	      OpReturn
000020	       OpFalse
000021	      OpReturn
``````

There are two remaining obvious steps:

• Remove all the OpNop instructions.
• Recognize the case that there are zero jumps in the generated bytecode.
• In that case we can stop processing code once we hit the first OpReturn

With that change made our input program:

``````if ( 1 + 2 * 3 == 7 ) { return true; } return false;
``````

Now becomes this compiled bytecode:

``````  000000	        OpTrue
000001	      OpReturn
``````

Which now runs a lot faster than it did in the past. Of course this is completely artificial, but it was a fun process to work through regardless.

| No comments