Parsing PHP for fun and profit

Sunday, 7 April 2019

Recently I've been dealing with a lot of PHP code, and coders. I'm still not a huge fan of the language, but at the same time modern PHP is a world apart from legacy PHP which I dismissed 10ish years ago.

I've noticed a lot of the coders have had a good habit of documenting their code, but also consistently failing to keep class-names up to date. For example this code:


  * Class Bar
  * Comments go here ..
 class Foo

The rest of the file? Almost certainly correct, but that initial header contained a reference to the class Bar even though the implementation presented Foo.

I found a bunch of PHP linters which handle formatting, and coding-style checks, but nothing to address this specific problem. So I wrote a quick hack:

  • Parse PHP files.
    • Look for "/*".
    • Look for "*/".
    • Look for "class".
    • Treat everything else as a token, except for whitespace which we just silently discard.

Once you have a stream of such tokens you can detect this:

  • Found the start of a comment?
    • Clear the contents of any previously saved comment, in lastComment.
    • Append each subsequent token to "lastComment" until you hit EOF, or the end of a comment token.
  • Found a class token?
    • Look at the contents of the lastComment variable and see if it contains "class", after all the class might not have documentation that refers to any class.
    • If there is "class xxx" mentioned check it matches the current class name.

There were some initial false-positives when I had to handle cases like this:

throw new \Exception("class not found");

(Here my naive handling would decide we'd found a class called not.)

Anyway the end result was stable and detected about 150 offenses in the 2000 file codebase I'm looking at.

Good result. Next step was integrating that into the CI system.

And that concludes my recent PHP adventures, using go to help ;)

(Code isn't public; I suspect you could rewrite it in an hour. I also suspect I was over-engineering and a perl script using regexp would do the job just as well..)

| No comments


Raising a bilingual child

Tuesday, 5 March 2019

The last time I talked about parenting it was in the context of a childcare timetable, where my wife and I divide the day explicitly hour by hour so that one of us is "in charge" at all times.

For example, I might take care of Oiva from 7AM-12pm on Saturdays, then she takes over until 5pm, and I take 5-7PM (bed-time). We alternate who gives him a bath and sits/reads with him until he's asleep.

Even if all three of us are together there is always one person who is in-charge, and will handle nappies, food, and complaints. The system works well, and has done since he was a few weeks old. The big benefit is that both of us can take time off, avoiding burnout and frustration.

Anyway that's all stable, although my wifes overnight shifts sometimes play havoc with the equality, and I think we're all happy with it. The child himself seems to recognize who is in charge, and usually screams for the appropriate parent as required.

Today's post is more interesting, because it covers bilingual children, which our child is:

  • His mother is Finnish.
    • She speaks Finnish to him, exclusively.
  • I'm from the UK.
    • I speak English to him, exclusively.

Between ourselves we speak English 95% of the time and Finnish 2% of the time. The rest of our communication involves grunting, pointing, and eye-contact.

He's of an age now where he's getting really good at learning new words, and you can usually see who he learned them from. For example he's obsessed with (toy) cars. One of his earlier words was "auto", but these days he sometimes says "car" to me. He's been saying "ei" for months now, which is Finnish for "no". But now he's also started to say "no" in English.

We took care of a neighbours dog over the weekend, and when the dog tried to sniff one of his cars he pointed a finger at it, and said "No!". That was adorable.

Anyway his communication is almost exclusively single-words so far. If he's hungry he might say:

  • leipä! leipä! leipä!
    • Bread! Bread! Bread!
  • muesli! muesli! muesli!
    • muesli! muesli! muesli!

He understands complex ideas, commands, instructions, and sentences in both English and Finnish ("We're going to the shop", "Would you like to play in the park?", and many many more). But he's only really starting to understand that he can say the same thing in multiple languages - as per the example above of "ei" vs "no", or "car" vs "auto".

Usually he uses the word in the language he heard it in first. For example he'll say goodbye to people by saying "moi moi", but greet them with "hello". There are fun words though. For example 99% of the time a dog is a "woof woof", but sometimes recently he's been describing them as "hauva". A train is a "choo choo", as is a tram, and a rabbit is a "pupu".

He's started saying "kissa" for cat, but when watching cartoons or reading books he's more likely to identify them as dogs.

No real conclusion here, but it's adorable when he says isä/isi for Daddy, and äiti for Mummy. Or when he's finished at the dining table and sometimes he says "pois" and other times says "away".

Sometimes you can see confusion when we both refer to something with different words, but he seems pretty adept at understanding. I'm looking forward to seeing him flip words between languages more often - using each one within a couple of minutes. He has done that sometimes, but it's a rare thing. He'll sometimes say "daddy car" and "äiti auto", but more often than not the association seems random. He's just as likely to say "more kala" as "more fish".

| No comments


Experimenting with github actions

Tuesday, 26 February 2019

Recently I heared that travis-CI had been bought out, and later that they'd started to fire their staff.

I've used Travis-CI for a few years now, via github, to automatically build binaries for releases, and to run tests.

Since I was recently invited to try the Github Actions beta I figured it was time to experiment.

Github actions allow you to trigger "stuff" on "actions". Actions are things like commits being pushed to your repository, new releases appearing, and so on. "Stuff" is basically "launch a specific docker container".

The specified docker container has a copy of your project repository cloned into it, and you can operate upon it pretty freely.

I created two actions (which basically means I authored two Dockerfiles), and setup the meta-information, so that now I can do what I used to do with travis easily:

  • github-action-tester
    • Allows tests to be run whenever a new commit is pushed to your repository.
    • Or whenever a pull-request is submitted, or updated.
  • github-actions-publish-binaries
    • If you create a new release in the github UI your project is built, and the specified binaries are attached to the release.

Configuring these in the repository is very simple, you have to define a workflow at .github/main.workflow, and my projects tend to look very similar:

  # pushes trigger the testsuite
  workflow "Push Event" {
    on = "push"
    resolves = ["Test"]

  # pull-requests trigger the testsuite
  workflow "Pull Request" {
    on = "pull_request"
    resolves = ["Test"]

  # releases trigger new binary artifacts
  workflow "Handle Release" {
    on = "release"
    resolves = ["Upload"]

  ## The actions

  ## Run the test-cases, via .github/
  action "Test" {
     uses = "skx/github-action-tester@master"

  ## Build the binaries, via .github/build, then upload them.
  action "Upload" {
    uses = "skx/github-action-publish-binaries@master"
    args = "math-compiler-*"
    secrets = ["GITHUB_TOKEN"]

In order to make the actions generic they both execute a shell-script inside your repository. For example the action to run the tests just executes

  • .github/

That way you can write the tests that make sense. For example a golang application would probably run go test ..., but a C-based system might run make test.

Similarly the release-making action runs .github/build, and assumes that will produce your binaries, which are then uploaded.

The upload-action requires the use of a secret, but it seems to be handled by magic - I didn't create one. I suspect GITHUB_TOKEN is a magic-secret which is generated on-demand.

Anyway I updated a few projects, and you can see their configuration by looking at .github within the repository:

All in all it was worth the few hours I spent on it, and now I no longer use Travis-CI. The cost? I guess now I'm tied to github some more...

| No comments


Updated myy compiler, and bought a watch.

Saturday, 16 February 2019

The simple math-compiler I introduced in my previous post has had a bit of an overhaul, so that now it is fully RPN-based.

Originally the input was RPN-like, now it is RPN for real. It handles error-detection at run-time, and generates a cleaner assembly-language output:

In other news I bought a new watch, which was a fun way to spend some time.

I love mechanical watches, clocks, and devices such as steam-engines. While watches are full of tiny and intricate parts I like the pretence that you can see how they work, and understand them. Steam engines are seductive because their operation is similar; you can almost look at them and understand how they work.

I've got a small collection of watches at the moment, ranging from €100-€2000 in price, these are universally skeleton-watches, or open-heart watches.

My recent purchase is something different. I was looking at used Rolexs, and found some from 1970s. That made me suddenly wonder what had been made the same year as I was born. So I started searching for vintage watches, which had been manufactured in 1976. In the end I found a nice Soviet Union piece, made by Raketa. I can't prove that this specific model was actually manufactured that year, but I'll keep up the pretence. If it is +/- 10 years that's probably close enough.

My personal dream-watch is the Rolex Oyster (I like to avoid complications). The Oyster is beautiful, and I can afford it. But even with insurance I'd feel too paranoid leaving the house with that much money on my wrist. No doubt I'll find a used one, for half that price, sometime. I'm not in a hurry.

(In a horological-sense a "complication" is something above/beyond the regular display of time. So showing the day, the date, or phase of the moon would each be complications.)

| No comments


I decided it was time to write a compiler

Thursday, 31 January 2019

I've spent some time in the recent past working with interpreters, and writing a BASIC interpreter, but one thing I'd not done is write a compiler.

Once upon a time I worked for a compiler-company, but I wasn't involved with the actual coding at that time. Instead I worked on other projects, and did a minor amount of system-administration.

There are enough toy-languages that it didn't seem worthwhile to write a compiler for yet another one. At the same time writing a compiler for a full-language would get bogged down in a lot of noise.

So I decided to simplify things: I would write a compiler for "maths". Something that would take an expression and output assembly-language, which could then be compiled.

The end result is this simple compiler:

Initially I wrote something that would parse expressions such as 3 + 4 * 5 and output an abstract-syntax-tree. I walked the tree and started writing logic to pick registers, and similar. It all seemed like more of a grind than a useful exercise - and considering how ludicrous compiling simple expressions to assembly language already was it seemed particularly silly.

So once again I simplified, deciding to accept only a simple "reverse-polish-like" expression, and outputing the assembly for that almost directly.

Assume you want to calculate "((3 * 5) +2)" you'd feed my compiler:

  3 5 * 2 +

To compile that we first load the initial state 3, then we walk the rest of the program always applying an operation with an operand:

  • Store 3
  • 5 * -> multiply by 5.
  • 2 + -> add 2.
  • ..

This approach is trivial to parse, and trivial to output the assembly-language for: Pick a register and load your starting value, then just make sure all your operations apply to that particular register. (In the case of Intel assembly it made sense to store the starting value in EAX, and work with that).

A simple program would then produce a correspondingly simple output. Given 1 1 + we'd expect this output:

  .intel_syntax noprefix
  .global main

    result: .asciz "Result %d\n"

    mov rax, 1
    add rax, 1

    lea rdi,result
    mov rsi, rax
    xor rax, rax
    call printf
    xor rax, rax

With that output you can assemble the program, and run it:

 $ gcc -static -o program program.s
 $ ./program
 Result 2

I wrote some logic to allow calculating powers too, so you can output 2 ^ 8, etc. That's just implemented the naive-way, where you have a small loop and multiply the contents of EAX by itself the appropriate number of times. Modulus is similarly simple to calculate.

Adding support for named variables, and other things, wouldn't be too hard. But it would involve register-allocation and similar complexity. Perhaps that's something I need to flirt with, to make the learning process complete, but to be honest I can't be bothered.

Anyway check it out, if you like super-fast maths. My benchmark?

$  time perl -e 'print 2 ** 8 . "\n"'
real    0m0.006s
user    0m0.005s
sys     0m0.000s


$ math-compiler -compile '2 8 ^'
$ time ./a.out
Result 256

real   0m0.001s
user   0m0.001s
sys    0m0.000s

Wow. Many wow. Much speed. All your base-two are belong to us.

| No comments


This is mostly how I make bread

Sunday, 28 October 2018

I've talked about making bread in the past on this blog, here's my typical recipe - this recipe requires only two bowls, a spatula, a dutch-oven, oven-gloves, and some scales.


  • 1kg flour
  • 950g water.
    • If you want to keep it simple you could use 1kg of water instead. It simplifies the measuring if you're using a balance-scale
  • 1.5 teaspoon salt.
  • 1 teaspoon instant/dried yeast.

Here is the flour, water, & yeast - not pictured salt!

Mix all ingredients together. It will be sticky, and your hands will become messy. Embrace it. (ProTip: Take off your watch and wedding-ring(s) if applicable.) Expect it to take 2-5 minutes to do a decent job. Ensure you scoop your hand right into the bottom of the mixture, to make sure there is no flour clumped together at the bottom of the bowl which is not fully mixed in.

The end result is a sticky mess which will look something like this, perhaps your bowl will be cleaner and you'll have done a better job at mixing all the flour!

Cover the bowl with cling-film, and stick in the fridge overnight. (I tend to mix stuff at 6PM in the evening, then come back to it around 9AM the following morning, which means the bowl sits in the fridge for 14 hours or so.)

Take the bowl out of the fridge and you should see it has "grown", and it will have a lot of bubbles on the top, as growth of the yeast emitted CO2.

You'll also see that it is significantly more gloopy, as chains of gluten have formed

Anyway now your bowl is on the counter, out of the fridge, you want turn on your oven and set it to 250°C, with the dutch oven inside it. While you're waiting for the oven to heat up transfer the sticky mess to a new bowl, lined with baking-paper. This will make it easier to add to the pot when we're ready to actually cook it.

As per the previous video the mixture will be very sticky, but you should be able to manage it. Don't worry too much about the shape, it'll become a "loaf-shape" when it cooks, the only reason we're moving it is because it is much easier to lift the mixture into the pot by holding the paper, than trying to scrape it from your cold bowl to your very hot dutch-oven. Anyway once you've moved it to a new bowl you'll have something like this:

When your oven has reached the right temperature carefully transfer the mixture, in its paper, to the dutch oven which you'll then return to the oven.

  • Cook for 40 minutes at 250°C
  • Cook for an additional 20 minutes at 200°C
    • Just turn down the temperature-dial.
  • Finally open the oven, remove the lid from the pot, and cook for a further 15 minutes (still at 200°C)

The end result will be something similar to this:



  • You can see vestiages of the paper-wrapper in the final result.
  • I like my bread dark.
  • Let it cool down before you eat it, something like 45-60 minutes once you've removed from the oven.



A visual basic server

Tuesday, 23 October 2018

So my previous post described a BASIC interpreter I'd written.

Before the previous release I decided to ensure that it was easy to embed, and that it was possible to extend the BASIC environment such that it could call functions implemented in golang.

One of the first things that came to mind was to allow a BASIC script to plot pixels in a PNG. So I made that possible by adding "PLOT x,y" and "SAVE" primitives.

Taking that step further I then wrote a HTTP-server which would allow you to enter a BASIC program and view the image it created. It's a little cute at least.

Install it from source, or fetch a binary if you prefer, via:

$ go get -u

Then launch it and point your browser at http://localhost:8080, and you'll be presented with something like this:

Fun times.



So I wrote a basic BASIC

Saturday, 20 October 2018

So back in June I challenged myself to write a BASIC interpreter in a weekend. The next time I mentioned it was to admit defeat. I didn't really explain in any detail, because I thought I'd wait a few days and try again and I was distracted at the time I wrote my post.

As it happened that was over four months ago, so clearly it didn't work out. The reason for this was because I was getting too bogged down in the wrong kind of details. I'd got my heart set on doing this the "modern" way:

  • Write a lexer to spit the input into tokens
    • LINE-NUMBER:10, PRINT, "Hello, World"
  • Then I'd take those tokens and form an abstract syntax tree.
  • Finally I'd walk the tree evaluating as I went.

The problem is that almost immediately I ran into problems, my naive approach didn't have a good solution for identifying line-numbers. So I was too paralysed to proceed much further.

I sidestepped the initial problem and figured maybe I should just have a series of tokens, somehow, which would be keyed off line-number. Obviously when you're interpreting "traditional" BASIC you need to care about lines, and treat them as important because you need to handle fun-things like this:

20 GOTO 10

Anyway I'd parse each line, assuming only a single statement upon a line (ha!) you can divide it into:

  • Number - i.e. line-number.
  • Statement.
  • Newline to terminate.

Then you could have:

code{blah} ..
code[10] = "PRINT STEVE ROCKS"
code[20] = "GOTO 10"

Obviously you spot the problem there, if you think it through. Anyway. I've been thinking about it off and on since then, and the end result is that for the past two evenings I've been mostly writing a BASIC interpreter, in golang, in 20-30 minute chunks.

The way it works is as you'd expect (don't make me laugh ,bitterly):

  • Parse the input into tokens.
  • Store those as an array.
  • Interpet each token.
    • No AST
    • No complicated structures.
    • Your program is literally an array of tokens.

I cheated, horribly, in parsing line-numbers which turned out to be exactly the right thing to do. The output of my naive lexer was:

INT:10, PRINT, STRING:"Hello World", NEWLINE, INT:20, GOTO, INT:10

Guess what? If you (secretly) prefix a newline to the program you're given you can identify line-numbers just by keeping track of your previous token in the lexer. A line-number is any number that follows a newline. You don't even have to care if they sequential. (Hrm. Bug-report?)

Once you have an array of tokens it becomes almost insanely easy to process the stream and run your interpreter:

 program[] = { LINE_NUMBER:10, PRINT, "Hello", NEWLINE, LINE_NUMBER:20 ..}

 let offset := 0
 for( offset < len(program) ) {
    token = program[offset]

    if ( token == GOTO ) { handle_goto() ; }
    if ( token == PRINT ) { handle_print() ; }
    .. handlers for every other statement

Make offset a global. And suddenly GOTO 10 becomes:

  • Scan the array, again, looking for "LINE_NUMBER:10".
  • Set offset to that index.

Magically it all just works. Add a stack, and GOSUB/RETURN are handled with ease too by pushing/popping the offset to it.

In fact even the FOR-loop is handled in only a few lines of code - most of the magic happening in the handler for the "NEXT" statement (because that's the part that needs to decide if it needs to jump-back to the body of the loop, or continue running.

OK this is a basic-BASIC as it is missing primtives (CHR(), LEN,etc) and it only cares about integers. But the code is wonderfully simple to understand, and the test-case coverage is pretty high.

I'll leave with an example:

10 REM This is a program
00 REM
 01 REM This program should produce 126 * 126 * 10
 02 REM  = 158760
 03 REM
 05 GOSUB 100
 10 FOR i = 0 TO 126
 20  FOR j = 0 TO 126 STEP 1
 30   FOR k = 0 TO 10
 40    LET a = i * j * k
 50   NEXT k
 60  NEXT j
 70 NEXT i
 75 PRINT a, "\n"
 80 END
100 PRINT "Hello, I'm multiplying your integers"

Loops indented for clarity. Tokens in upper-case only for retro-nostalgia.

Find it here, if you care:

I had fun. Worth it.

I even "wrote" a "game":



Pulling back

Saturday, 29 September 2018

I've updated my fork of the monkey programming language to allow object-based method calls.

That's allowed me to move some of my "standard-library" code into Monkey, and out of Go which is neat. This is a simple example:

 // Reverse a string,
 function string.reverse() {
   let r= "";
   let l = len(self);

   for( l > 0 ) {
      r += self[l-1];
   return r;

Usage is the obvious:

  puts( "Steve".reverse() );


  let s = "Input";
  s = s.reverse();
  puts( s + "\n" );

Most of the pain here was updating the parser to recognize that "." meant a method-call was happening, once that was done it was otherwise only a matter of passing the implicit self object to the appropriate functions.

This work was done in a couple of 30-60 minute chunks. I find that I'm only really able to commit to that amount of work these days, so I've started to pull back from other projects.

Oiva is now 21 months old and he sucks up all my time & energy. I can't complain, but equally I can't really start concentrating on longer-projects even when he's gone to sleep.

And that concludes my news for the day.

Goodnight dune..

| No comments


PAM HaveIBeenPwned module

Monday, 17 September 2018

So the PAM module which I pondered about in my previous post now exists:

I did mention "sponsorship" in my post which lead to a couple of emails, and the end result of that was that a couple of folk donated to charity in my/its name. Good enough.

Perhaps in the future I'll explore patreon/similar, but I don't feel very in-demand so I'll avoid it for the moment.

Anyway I guess it should be Debian-packaged for neatness, but I'll resist for the moment.

| No comments


Recent Posts

Recent Tags