About Archive Tags RSS Feed


Writing a brainfuck compiler.

14 June 2020 19:00

So last night I had the idea that it might be fun to write a Brainfuck compiler, to convert BF programs into assembly language, where they'd run nice and quickly.

I figured I could allocate a day to do the work, and it would be a pleasant distraction on a Sunday afternoon. As it happened it only took me three hours from start to finish.

There are only a few instructions involved in brainfuck:

  • >
    • increment the data pointer (to point to the next cell to the right).
  • <
    • decrement the data pointer (to point to the next cell to the left).
  • +
    • increment (increase by one) the byte at the data pointer.
  • -
    • decrement (decrease by one) the byte at the data pointer.
  • .
    • output the byte at the data pointer.
  • ,
    • accept one byte of input, storing its value in the byte at the data pointer.
  • [
    • if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command.
  • ]
    • if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command.

The Wikipedia link early shows how you can convert cleanly to a C implementation, so my first version just did that:

  • Read a BF program.
  • Convert to a temporary C-source file.
  • Compile with gcc.
  • End result you have an executable which you can run.

The second step was just as simple:

  • Read a BF program.
  • Convert to a temporary assembly language file.
  • Compile with nasm, link with ld.
  • End result you have an executable which you can run.

The following program produces the string "Hello World!" to the console:


My C-compiler converted that to the program:

extern int putchar(int);
extern char getchar();

char array[30000];
int idx = 0;

int main (int arc, char *argv[]) {

  while (array[idx]) {

    while (array[idx]) {

The assembly language version is even longer:

global _start
section .text

  mov r8, stack
  add byte [r8], 8
  cmp byte [r8], 0
  je close_loop_8
  add r8, 1
  add byte [r8], 4
  cmp byte [r8], 0
  je close_loop_14
  add r8, 1
  add byte [r8], 2
  add r8, 1
  add byte [r8], 3
  add r8, 1
  add byte [r8], 3
  add r8, 1
  add byte [r8], 1
  sub r8, 4
  sub byte [r8], 1
  jmp label_loop_14
  add r8, 1
  add byte [r8], 1

  mov rax, 60
  mov rdi, 0
section .bss
stack: resb 300000

Annoyingly the assembly language version ran slower than the C-version, which I was sneakily compiling with "gcc -O3 .." to ensure it was optimized.

The first thing that I did was to convert it to fold adjacent instructions. Instead of generating separate increment instructions for ">>>" I instead started to generate "add xx, 3". That didn't help as much as I'd hoped, but it was still a big win.

After that I made a minor tweak to the way that loops are handled to compare at the end of the loop as well as the start, and that shaved off a fraction of a second.

As things stand I think I'm "done". It might be nice to convert the generated assembly language to something gcc can handle, to drop the dependency on nasm, but I don't feel a pressing need for that yet.

Was a fun learning experience, and I think I'll come back to optimization later.

| No comments


Updated my linux-security-modules for the Linux kernel

22 May 2020 09:00

Almost three years ago I wrote my first linux-security-module, inspired by a comment I read on LWN

I did a little more learning/experimentation and actually produced a somewhat useful LSM, which allows you to restrict command-execution via the use of a user-space helper:

  • Whenever a user tries to run a command the LSM-hook receives the request.
  • Then it executes a userspace binary to decide whether to allow that or not (!)

Because the detection is done in userspace writing your own custom rules is both safe and easy. No need to touch the kernel any further!

Yesterday I rebased all the modules so that they work against the latest stable kernel 5.4.22 in #7.

The last time I'd touched them they were built against 5.1, which was itself a big jump forwards from the 4.16.7 version I'd initially used.

Finally I updated the can-exec module to make it gated, which means you can turn it on, but not turn it off without a reboot. That was an obvious omission from the initial implementation #11.

Anyway updated code is available here:

I'd kinda like to write more similar things, but I lack inspiration.

| No comments


Some brief sysbox highlights

17 May 2020 15:00

I started work on sysbox again recently, adding a couple of simple utilities. (The whole project is a small collection of utilities, distributed as a single binary to ease installation.)

Imagine you want to run a command for every line of STDIN, here's a good example:

 $ cat input | sysbox exec-stdin "youtube-dl {}"

Here you see for every (non-empty) line of input read from STDIN the command "youtube-dl" has been executed. "{}" gets expanded to the complete line read. You can also access individual fields, kinda like awk.

(Yes youtube-dl can read a list of URLs from a file, this is an example!)

Another example, run groups for every local user:

$ cat /etc/passwd | sysbox exec-stdin --split=: groups {1}

Here you see we have split the input-lines read from STDIN by the : character, instead of by whitespace, and we've accessed the first field via "{1}". This is certainly easier for scripting than using a bash loop.

On the topic of bash; command-completion for each subcommand, and their arguments, is now present:

$ source <(sysbox bash-completion)

And I've added a text-based UI for selecting files. You can also execute a command, against the selected file:

$ sysbox choose-file -exec "xine {}" /srv/tv

This is what that looks like:


You'll see:

  • A text-box for filtering the list.
  • A list which can be scrolled up/down/etc.
  • A brief bit of help information in the footer.

As well as choosing files, you can also select from lines read via STDIN, and you can filter the command in the same way as before. (i.e. "{}" is the selected item.)

Other commands received updates, so the calculator now allows storing results in variables:

$ sysbox calc
calc> let a = 3
calc> a / 9 * 3
calc> 1 + 2 * a
calc> 1.2 + 3.4



I'm a contractor, available for work

29 April 2020 14:00

For the foreseeable future I'm working for myself, as a contractor, here in sunny Helsinki, Finland.

My existing contract only requires me to work 1.5-2.0 days a week, meaning my week looks something like this:

  • Monday & Tuesday
    • I work for the client.
  • Wednesday - Friday
    • I act as a stay-at-home dad.

It does mean that I'm available for work Wednesday-Friday though, in the event I can find projects to work upon, or companies who would be willing to accept my invoices.

I think of myself as a sysadmin, but I know all about pipelines, automation, system administration, and coding in C, C++, Perl, Ruby, Golang, Lua, etc.

On the off-chance anybody reading this has a need for small projects, services, daemons, or APIs to be implemented then don't hesitate to get in touch.

I did manage to fill a few days over the past few weeks completing an online course from Helsinki Open University, Devops with Docker, it is possible I'll find some more courses to complete in the future. (There is an upcoming course Devops with Kubernetes which I'll definitely complete.)

| No comments


A collection of simple golang tools

14 April 2020 09:00

In a previous job I wrote some simple utilities which were helpful for continuous-integration pipelines. Most of these were very very simple, for example:

  • Find all *.json files, and validate them.
  • Find all *.yaml files, and validate them.
  • Run all the tests in a given directory, execute them one by one.
    • But stop if any single test failed.
    • run-parts does this on Debian GNU/Linux systems, but on CentOS there is no -exit-on-error flag.
    • Which is the sole reason I had to re-implement that tool. Sigh.
  • Parse PHP-files and look for comments that were bogus.

Each of these simple tools were run of the mill shell-scripts, or simple binaries. I've been putting together a simple deployable tool which lets you run them easily, so here it is:

The idea is that you install the tool, then run the commands by specify the appropriate sub-command:

sysbox validate-yaml [path/to/find/files/beneath]

Of course if you symlink validate-yaml to sysbox you don't need to prefix with the name - so this is just like busybox in this regard.

Might be interesting to some.

| No comments


A busy few days

7 April 2020 09:00

Over the past few weeks things have been pretty hectic. Since I'm not working at the moment I'm mostly doing childcare instead. I need a break, now and again, so I've been sending our child to päiväkoti two days a week with him home the rest of the time.

I love taking care of the child, because he's seriously awesome, but it's a hell of a lot of work when most of our usual escapes are unavailable. For example we can't go to the (awesome) Helsinki Central Library as that is closed.

Instead of doing things outdoors we've been baking bread together, painting, listening to music and similar. He's a big fan of any music with drums and shouting, so we've been listening to Rammstein, The Prodigy, and as much Queen as I can slip in without him complaining ("more bang bang!").

I've also signed up for some courses at the Helsinki open university, including Devops with Docker so perhaps I have a future career working with computers? I'm hazy.

Finally I saw a fun post the other day on reddit asking about the creation of a DSL for server-setup. I wrote a reply which basically said two things:

  • First of all you need to define the minimum set of primitives you can execute.
    • (Creating a file, fetching a package, reloading services when a configuration file changes, etc.)
  • Then you need to define a syntax for expressing those rules.
    • Not using YAML. Because Ansible fucked up bigtime with that.
    • It needs to be easy to explain, it needs to be consistent, and you need to decide before you begin if you want "toy syntax" or "programming syntax".
    • Because adding on conditionals, loops, and similar, will ruin everything if you add it once you've started with the wrong syntax. Again, see Ansible.

Anyway I had an idea of just expressing things in a simple fashion, borrowing Puppet syntax (which I guess is just Ruby hash literals). So a module to do stuff with files would just look like this:

file { name   => "This is my rule",
       target => "/tmp/blah",
       ensure => "absent" }

The next thing to do is to allow that to notify another rule, when it results in a change. So you add in:

notify => "Name of rule"

# or
notify => [ "Name of rule", "Name of another rule" ]

You could also express dependencies the other way round:

shell { name => "Do stuff",
        command => "wc -l /etc/passwd > /tmp/foo",
        requires => [ "Rule 1", "Rule 2"] }

Anyway the end result is a simple syntax which allows you to do things; I wrote a file to allow me to take a clean system and configure it to run a simple golang application in an hour or so.

The downside? Well the obvious one is that there's no support for setting up cron jobs, setting up docker images, MySQL usernames/passwords, etc. Just a core set of primitives.

Adding new things is easy, but also an endless job. So I added the ability to run external/binary plugins stored outside the project. To support that is simple with the syntax we have:

  • We pass the parameters, as JSON, to STDIN of the binary.
  • We read the result from STDOUT
    • Did the rule result in a change to the system?
    • Or was it a NOP?

All good. People can write modules, if they like, and they can do that in any language they like.

Fun times.

We'll call it marionette since it's all puppet-inspired:

And that concludes this irregular update.



Goodbye twitter

11 March 2020 10:30

I was slow to start using twitter, but found it a lot of fun. Often it would be useful at times when websites were slow; I'd hop to the website and search "edinburgh network", "github down", or "helsinki outage" and find live updates as people disclosed problems before the appropriate companies updated their status pages.

I've found a lot of useful information, in near real-time, over the past few years. For example I remember hearing a loud explosion a few years back and had no idea what it was. Turns out it was an electrical substation catching fire nearby.

Anyway recently I've been getting a lot of fake notifications, things that aren't real:

  • In case you missed XXX's tweet.

You can't disable these notifications, the only thing you can do is click "see less often". For a couple of days I did that every time I saw them, to no avail.

So I'm done. I've removed references to my account anywhere I could spot them, and I've signed out for good.

(I read twitter on my desktop 99% of the time, though I did use my mobile phone to make posts, especially when images/pictures were involved.)

I've not deleted my account, but I'd uninstalled the application and deleted the entry from my password-store. No doubt in a few years they'll delete my account, though they seem to have recently backtracked on their attempts to nuke inactive accounts.



Tracking rental-income via org-mode

7 February 2020 21:20

I've been enjoying the process of exploring org-mode recently as I keep running into references to it. Often these references are people questioning me: why don't you just use 'org-mode' instead of this crazy-thing you're managing yourself?

As one concrete example, no pun intended, I look after some flats, and every month I update my records to keep track of things. Until now I've used a set of markdown files, one for each property, and each has details of tenants, income, repairs, & etc. I've now ported these to org-mode-files. The first thing I did was create a table for each year so this year's might look like this:

#+NAME: 2020
| Month     | Tenant |    Rent | Mortgage | Housing Company |
| January   | Bob    |     250 |   150.00 |           75.00 |
| February  | Bob    |     250 |   150.00 |           75.00 |
| March     |        |         |          |                 |
| April     |        |         |          |                 |
| May       |        |         |          |                 |
| ..        |    ... |     ... |    ....  |           ...   |
| Totals    |        |  500.00 |   300.00 |          150.00 |
#+TBLFM: @>$3=vsum(@I..@II);%0.2f::@>$4=vsum(@I..@II);%0.2f::@>$5=vsum(@I..@II);%0.2f

Here you see the obvious:

  • I've declared a table with a name 2020.
  • I've got totals of the various columns at the bottom of the table.
  • I only populate each row on the day when I check to see whether rent has been paid by the lovely tenant.
    • So all the rows past the current-month are empty.

This is probably all very familiar to org-mode users, let us go a little further, we have a table named 2020, let us create a new table called 2020-totals to show more useful figures:

#+NAME: 2020-totals
| Year |  Income | Expenses | Profit |
| 2020 |  500.00 |   450.00 |  50.00 |
#+TBLFM: @2$2=remote(2020,@>$3);%.2f::@2$3=remote(2020,@>$4)+remote(2020,@>$5);%.2f::@>$4=@>$2-@$3;%.2f

Again nothing amazing here:

  • We reference "remote"-values (i.e values from a different table).
  • We look for things like "row:@>", "column:$3" which means "row:last column:3rd".
    • The bottom line should be split by :: to make sense, like so:
      • @2$2=remote(2020,@>$3);%.2f
      • %.2f is a format-string, to control how many decimal places to show.
      • @2$3=remote(2020,@>$4) + remote(2020,@>$5);%.2f
      • Expenses are "Mortgage" + "Housing Company"
      • i.e. The contents of the fourth and fifth column.
      • @>$4=@>$2-@$3;%.2f
  • The end result is that we have sum of income, sum of expenses, and the difference between them is the profit (or loss).

Of course I've got records going back a while, so what we really want is to have a complete/global table of income/expenses and profit (or loss if that's a negative figure). We'll assume there are multiple tables in our document, a pair for each year "2019", "2019-totals", etc. To generate our global income/expenses/property we just have to sum the columns for each of the tables which have names matching the pattern "*-totals". Here we go:

#+NAME: income-expenses-profit
| Income   | €10000.00 |
| Expenses | € 8000.00 |
| Profit   | € 2000.00 |
#+TBLFM: @1$2='(car (sum-field-in-tables "-totals$" 1));€%.2f::@2$2='(car (sum-field-in-tables "-totals$" 2));€%.2f::@3$2='(car (sum-field-in-tables "-totals$" 3));€%.2f

Once again there are three values here, and splitting by :: makes them more readable:

  • @1$2='(car (sum-field-in-tables "-totals$" 1));€%.2f
  • @2$2='(car (sum-field-in-tables "-totals$" 2));€%.2f
  • @3$2='(car (sum-field-in-tables "-totals$" 3));€%.2f

In short we set the value row:X, column:2 to be the value of evaluating the Emacs lisp expression (car (sum-field-in-tables .., rather than using the built-in table support from org-mode. I did have to write that function myself to do the necessary table-iteration and field summing, but with the addition of naive filtering-support that turned out to be pretty useful as we'll see later:

(defun sum-field-in-tables (pattern field &optional filter)
  "For every table in the current document which has a name matching the
supplied pattern perform a sum of the specified column.

If the optional filter is present then the summing will ignore any rows
which do not match the given filter-pattern.

The return value is a list containing the sum, and a count of those
rows which were summed."

Using this function I managed to achieve what I wanted, and also as a bonus I could do clever things like show the total income/payments from a given tenant. If you refer back to the 2020-table you'll see there is a column for the tenant's name. So I can calculate the total income from that tenant, and the number of payments by summing:

  • All tables with a name "^:digit:$"
    • column 3 (i.e. rent)
    • where rows match the filter "Bob"

The end result is another table like so:

#+NAME: tenants-paid
| Tenant    | Rent Paid | Rented Months |
| [[Alice]] | €1500.00  |             6 |
| [[Bob]]   | €1500.00  |             6 |
| [[Chris]] | €1500.00  |             6 |
#+TBLFM: $2='(car (sum-field-in-tables "^[0-9]*$" 2 $1));€%0.2f::$3='(cdr (sum-field-in-tables "^[0-9]*$" 2 $1))

This works because the table-sum function returns two things, the actual sum of the rows, and the number of rows that were summed:

  • So (car (sum-field-in-tables .. returns the actual sum. The rent the person has paid, total.
  • And (cdr (sum-field-in-tables .. returns the number of payments that have been made by the tenant with the given name.

The only thing I had to do explicitly was to add the rows for Alice, Bob, and Chris to this table.

Anyway this is all rather cool, and I'm pretty happy, though if I could have avoided writing lisp I'd have been a little happier. Now I guess I need to choose between one of two approaches:

  • Do I put the lisp function in the report-file itself?
    • Which then needs to be evaluated when the file is loaded - simple enough.
  • Or do I drop it inside ~/.emacs/init.md - which is good for me, but means the file isn't self-contained for others?

I'll probably continue to play with this a little more over the next few days/weeks. Exporting to HTML and PDF worked like a charm, once I configured some minor things and setup a couple of sections of my documents with a :noexport: tag.

| No comments


Initial server migration complete..

28 January 2020 12:20

So recently I talked about how I was moving my email to a paid GSuite account, that process has now completed.

To recap I've been paying approximately €65/month for a dedicated host from Hetzner:

  • 2 x 2Tb drives.
  • 32Gb RAM.
  • 8-core CPU.

To be honest the server itself has been fine, but the invoice is a little horrific regardless:

  • SB31 - €26.05
  • Additional subnet /27 - €26.89

I'm actually paying more for the IP addresses than for the server! Anyway I was running a bunch of virtual machines on this host:

  • mail
    • Exim4 + Dovecot + SSH
    • I'd SSH to this host, daily, to read mail with my console-based mail-client, etc.
  • www
    • Hosted websites.
    • Each different host would run an instance of lighttpd, serving on localhost:XXX running under a dedicated UID.
    • Then Apache would proxy to the right one, and handle SSL.
  • master
    • Puppet server, and VPN-host.
  • git
  • ..
    • Bunch more servers, nine total.

My plan is to basically cut down and kill 99% of these servers, and now I've made the initial pass:

I've now bought three virtual machines, and juggled stuff around upon them. I now have:

  • debian - €3.00/month
  • dns - €3.00/month
    • This hosts my commercial DNS thing
    • Admin overhead is essentially zero.
    • Profit is essentially non-zero :)
  • shell - €6.00/month
    • The few dynamic sites I maintain were moved here, all running as www-data behind Apache. Meh.
    • This is where I run cron-jobs to invoke rss2email, my google mail filtering hack.
    • This is also a VPN-provider, providing a secure link to my home desktop, and the other servers.

The end result is that my hosting bill has gone down from being around €50/month to about €20/month (€6/month for gsuite hosting), and I have far fewer hosts to maintain, update, manage, and otherwise care about.

Since I'm all cloudy-now I have backups via the provider, as well as those maintained by rsync.net. I'll need to rebuild the shell host over the next few weeks as I mostly shuffled stuff around in-place in an adhoc fashion, but the two other boxes were deployed entirely via Ansible, and Deployr. I made the decision early on that these hosts should be trivial to relocate and they have been!

All static-sites such as my blog, my vanity site and similar have been moved to netlify. I lose the ability to view access-logs, but I'd already removed analytics because I just don't care,. I've also lost the ability to have custom 404-pages, etc. But the fact that I don't have to maintain a host just to serve static pages is great. I was considering using AWS to host these sites (i.e. S3) but chose against it in the end as it is a bit complex if you want to use cloudfront/cloudflare to avoid bandwidth-based billing surprises.

I dropped MX records from a bunch of domains, so now I only receive email at steve.fi, steve.org.uk, and to a lesser extent dns-api.com. That goes to Google. Migrating to GSuite was pretty painless although there was a surprise: I figured I'd setup a single user, then use aliases to handle the mail such that:

  • debian@example -> steve
  • facebook@example -> steve
  • webmaster@example -> steve

All told I have about 90 distinct local-parts configured in my old Exim setup. Turns out that Gsuite has a limit of like 20 aliases per-user. Happily you can achieve the same effect with address maps. If you add an address map you can have about 4000 distinct local-parts, and reject anything else. (I can't think of anything worse than having wildcard handling; I've been hit by too many bounce-attacks in the past!)

Oh, and I guess for completeness I should say I also have a single off-site box hosted by Scaleway for €5/month. This runs monitoring via overseer and notification via purppura. Monitoring includes testing that websites are up, that responses contain a specific piece of text, DNS records resolve to expected values, SSL certificates haven't expired, & etc.

Monitoring is worth paying for. I'd be tempted to charge people to use it, but I suspect nobody would pay. It's a cute setup and very flexible and reliable. I've been pondering adding a scripting language to the notification - since at the moment it alerts me via Pushover, Email, and SMS-messages. Perhaps I should just settle on one! Having a scripting language would allow me to use different mechanisms for different services, and severities.

Then again maybe I should just pay for pingdom, or similar? I have about 250 tests which run every two minutes. That usually exceeds most services free/cheap offerings..



procmail for gmail?

24 January 2020 12:20

After 10+ years I'm in the process of retiring my mail-host. In the future I'll no longer be running exim4/dovecot/similar, and handling my own mail. Instead it'll all go to a (paid) Google account.

It feels like the end of an era, as it means a lot of my daily life will not be spent inside a single host no longer will I run:

ssh steve@mail.steve.org.uk

I'm still within my Gsuite trial, but I've mostly finished importing my vast mail archive, via mbsync.

The only outstanding thing I need is some scripting for the mail. Since my mail has been self-hosted I've evolved a large and complex procmail configuration file which sorted incoming messages into Maildir folders.

Having a quick look around last night I couldn't find anything similar for the brave new world of Google Mail. So I hacked up a quick script which will automatically add labels to new messages that don't have any.

Finding messages which are new/unread and which don't have labels is a matter of searching for:

is:unread -has:userlabels

From there adding labels is pretty simple, if you decide what you want. For the moment I'm keeping it simple:

  • If a message comes from "Bob Smith" <bob.smith@example.com>
    • I add the label "bob.smith".
    • I add the label "example.com".

Both labels will be created if they don't already exist, and the actual coding part was pretty simple. To be more complex/flexible I would probably need to integrate a scripting language (oh, I have one of those), and let the user decide what to do for each message.

The biggest annoyance is setting up the Google project, and all the OAUTH magic. I've documented briefly what I did but I don't actually know if anybody else could run the damn thing - there's just too much "magic" involved in these APIs.

Anyway procmail-lite for gmail. Job done.