About Archive Tags RSS Feed

 

Entries posted in April 2019

Parsing PHP for fun and profit

7 April 2019 12:01

Recently I've been dealing with a lot of PHP code, and coders. I'm still not a huge fan of the language, but at the same time modern PHP is a world apart from legacy PHP which I dismissed 10ish years ago.

I've noticed a lot of the coders have had a good habit of documenting their code, but also consistently failing to keep class-names up to date. For example this code:

 <?php

 /**
  * Class Bar
  *
  * Comments go here ..
  */
 class Foo
 {
    ..

The rest of the file? Almost certainly correct, but that initial header contained a reference to the class Bar even though the implementation presented Foo.

I found a bunch of PHP linters which handle formatting, and coding-style checks, but nothing to address this specific problem. So I wrote a quick hack:

  • Parse PHP files.
    • Look for "/*".
    • Look for "*/".
    • Look for "class".
    • Treat everything else as a token, except for whitespace which we just silently discard.

Once you have a stream of such tokens you can detect this:

  • Found the start of a comment?
    • Clear the contents of any previously saved comment, in lastComment.
    • Append each subsequent token to "lastComment" until you hit EOF, or the end of a comment token.
  • Found a class token?
    • Look at the contents of the lastComment variable and see if it contains "class", after all the class might not have documentation that refers to any class.
    • If there is "class xxx" mentioned check it matches the current class name.

There were some initial false-positives when I had to handle cases like this:

throw new \Exception("class not found");

(Here my naive handling would decide we'd found a class called not.)

Anyway the end result was stable and detected about 150 offenses in the 2000 file codebase I'm looking at.

Good result. Next step was integrating that into the CI system.

And that concludes my recent PHP adventures, using go to help ;)

(Code isn't public; I suspect you could rewrite it in an hour. I also suspect I was over-engineering and a perl script using regexp would do the job just as well..)

| No comments