About Archive Tags RSS Feed

 

Announce: github2mr

17 January 2020 19:19

myrepos is an excellent tool for applying git operations to multiple repositories, and I use it extensively.

I've written several scripts to dump remote repository-lists into a suitable configuration format, and hopefully I've done that for the last time.

github2mr correctly handles:

  • Exporting projects from Github.com
  • Exporting projects from (self-hosted installations of) Github Enterprise.
  • Exporting projects from (self-hosted installations of) Gitbucket.

If it can handle Gogs, Gitea, etc, then I'd love to know, otherwise patches are equally welcome!

| No comments

 

Exporting github repositories to myrepos

16 January 2020 19:19

myrepos is an excellent tool for applying git operations to multiple repositories, and I use it extensively.

Given a configuration file like this:

..

[github.com/skx/asql]
checkout = git clone git@github.com:skx/asql.git

[github.com/skx/bookmarks.public]
checkout = git clone git@github.com:skx/bookmarks.public.git

[github.com/skx/Buffalo-220-NAS]
checkout = git clone git@github.com:skx/Buffalo-220-NAS.git

[github.com/skx/calibre-plugins]
checkout = git clone git@github.com:skx/calibre-plugins.git

...

You can clone all the repositories with one command:

mr -j5 --config .mrconfig.github checkout

Then pull/update them them easily:

mr -j5 --config .mrconfig.github update

It works with git repositories, mercurial, and more. (The -j5 argument means to run five jobs in parallel. Much speed, many fast. Big wow.)

I wrote a simple golang utility to use the github API to generate a suitable configuration including:

  • All your personal repositories.
  • All the repositories which belong to organizations you're a member of.

Currently it only supports github, but I'll update to include self-hosted and API-compatible services such as gitbucket. Is there any interest in such a tool? Or have you all written your own already?

(I have the feeling I've written this tool in Perl, Ruby, and even using curl a time or two already. This time I'll do it properly and publish it to save effort next time!)

| 2 comments

 

I won't write another email client

8 January 2020 19:19

Once upon a time I wrote an email client, in a combination of C++ and Lua.

Later I realized it was flawed, and because I hadn't realized that writing email clients is hard I decided to write it anew (again in C++ and Lua).

Nowadays I do realize how hard writing email clients is, so I'm not going to do that again. But still .. but still ..

I was doing some mail-searching recently and realized I wanted to write something that processed all the messages in a Maildir folder. Imagine I wanted to run:

 message-dump ~/Maildir/people-foo/ ~/Maildir/people-bar/  \
     --format '${flags} ${filename} ${subject}'

As this required access to (arbitrary) headers I had to read, parse, and process each message. It was slow, but it wasn't that slow. The second time I ran it, even after adjusting the format-string, it was nice and fast because buffer-caches rock.

Anyway after that I wanted to write a script to dump the list of folders (because I store them recursively so ls -1 ~/Maildir wasn't enough):

 maildir-dump --format '${unread}/${total} ${path}'

I guess you can see where this is going now! If you have the following three primitives, you have a mail-client (albeit read-only)

  • List "folders"
  • List "messages"
  • List a single message.

So I hacked up a simple client that would have a sub-command for each one of these tasks. I figured somebody else could actually use that, be a little retro, be a little cool, pretend they were using MH. Of course I'd have to write something horrid as a bash-script to prove it worked - probably using dialog to drive it.

And then I got interested. The end result is a single golang binary that will either:

  • List maildirs, with a cute format string.
  • List messages, with a cute format string.
  • List a single message, decoding the RFC2047 headers, showing text/plain, etc.
  • AND ALSO USE ITSELF TO PROVIDE A GUI

And now I wonder, am I crazy? Is writing an email client hard? I can't remember

Probably best to forget the GUI exists. Probably best to keep it a couple of standalone sub-commands for "scripting email stuff".

But still .. but still ..

| 4 comments

 

Adventures optimizing a bytecode based scripting language

18 November 2019 19:45

I've recently spent some time working with a simple scripting language. Today I spent parts of the day trying to make it faster. As a broad overview we'll be considering this example script:

  if ( 1 + 2 * 3 == 7 ) { return true; }

  return false;

This gets compiled into a simple set of bytecode instructions, which can refer to a small collection of constants - which are identified by offset/ID:

  000000	    OpConstant	0	// load constant: &{1}
  000003	    OpConstant	1	// load constant: &{2}
  000006	    OpConstant	2	// load constant: &{3}
  000009	         OpMul
  000010	         OpAdd
  000011	    OpConstant	3	// load constant: &{7}
  000014	       OpEqual
  000015	 OpJumpIfFalse	20
  000018	        OpTrue
  000019	      OpReturn
  000020	       OpFalse
  000021	      OpReturn


Constants:
  000000 Type:INTEGER Value:1
  000001 Type:INTEGER Value:2
  000002 Type:INTEGER Value:3
  000003 Type:INTEGER Value:7

The OpConstant instruction means to load the value with the given ID from the constant pool and place it onto the top of the stack. The multiplication and addition operations both pop values from the stack, apply the appropriate operation and push the result back. All standard stuff.

Of course these constants are constant so it seemed obvious to handle the case of integers as a special case. Rather than storing them in the constant-pool, where booleans, strings, etc live, we just store them inline. That meant our program would look like this:

  000000	        OpPush	1
  000003	        OpPush	2
  000006	        OpPush	3
  000009	         OpMul
  000010	         OpAdd
  ...

At this point the magic begins: we can scan the program from start to finish. Every time we find "OpPush", "OpPush", then "Maths" we can rewrite the program to contain the appropriate result already. So this concrete fragment:

OpPush 2
OpPush 3
OpMul

is now replaced by:

OpPush	6
OpNop     ; previous opconstant
OpNop     ; previous arg1
OpNop     ; previous arg2
OpNop     ; previous OpMul

Repeating the process, and applying the same transformation to our comparision operation we now have the updated bytecode of:

  000000	         OpNop
  000001	         OpNop
  000002	         OpNop
  000003	         OpNop
  000004	         OpNop
  000005	         OpNop
  000006	         OpNop
  000007	         OpNop
  000008	         OpNop
  000009	         OpNop
  000010	         OpNop
  000011	         OpNop
  000012	         OpNop
  000013	         OpNop
  000014	        OpTrue
  000015	 OpJumpIfFalse	20
  000018	        OpTrue
  000019	      OpReturn
  000020	       OpFalse
  000021	      OpReturn

The OpTrue instruction pushes a "true" value to the stack, while the OpJumpIfFalse only jumps to the given program offset if the value popped off the stack is non-true. So we can remove those two instructions.

Our complete program is now:

  000000	         OpNop
  000001	         OpNop
  000002	         OpNop
  000003	         OpNop
  000004	         OpNop
  000005	         OpNop
  000006	         OpNop
  000007	         OpNop
  000008	         OpNop
  000009	         OpNop
  000010	         OpNop
  000011	         OpNop
  000012	         OpNop
  000013	         OpNop
  000014	         OpNop
  000015	         OpNop
  000018	        OpTrue
  000019	      OpReturn
  000020	       OpFalse
  000021	      OpReturn

There are two remaining obvious steps:

  • Remove all the OpNop instructions.
  • Recognize the case that there are zero jumps in the generated bytecode.
    • In that case we can stop processing code once we hit the first OpReturn

With that change made our input program:

if ( 1 + 2 * 3 == 7 ) { return true; } return false;

Now becomes this compiled bytecode:

  000000	        OpTrue
  000001	      OpReturn

Which now runs a lot faster than it did in the past. Of course this is completely artificial, but it was a fun process to work through regardless.

| No comments

 

Keeping a simple markdown work-log, via emacs

1 November 2019 16:00

For the past few years I've been keeping a work-log of everything I do. I don't often share these, though it is sometimes interesting to be able to paste into a chat-channel "Oh on the 17th March I changed that .."

I've had a couple of different approaches but for the past few years I've mostly settled upon emacs ~/Work.md. I just create a heading for the date and I'm done:

 # 10-03-2019

 * Did a thing.
   * See this link
 * Did another thing.

 ## Misc.

 Happy Birthday to me.

As I said I've been doing this for years, but it was only last week that I decided to start making it more efficient. Since I open this file often I should bind it to a key:

(defun worklog()
  (interactive "*")
  (find-file "~/Work.MD"))

(global-set-key (kbd "C-x w") 'worklog)

This allows me to open the log by just pressing C-x w. The next step was to automate the headers. So I came up with a function which will search for today's date, adding it if missing:

(defun worklog-today()
  "Move to today's date, if it isn't found then append it"
  (interactive "*")
  (beginning-of-buffer)
  (if (not (search-forward (format-time-string "# %d-%m-%Y") nil t 1))
      (progn
        (end-of-buffer)
        (insert (format-time-string "\n\n# %d-%m-%Y\n")))))

Now we use some magic to makes this function run every time I open ~/Work.md:

(defun worklog_hook ()
  (when (equalp (file-name-nondirectory (buffer-file-name)) "work.md")
    (worklog-today)
    )
)

(add-hook 'find-file-hook 'worklog_hook)

Finally there is a useful package imenu-list which allows you to create an inline sidebar for files. Binding that to a key allows it to be toggled easily:

    (add-hook 'markdown-mode-hook
     (lambda ()
      (local-set-key (kbd "M-'") 'imenu-list-smart-toggle)

The end result is a screen that looks something like this:

If you have an interest in such things I store my emacs configuration on github, in a dotfile-repository. My init file is writting in markdown, which makes it easy to read:

| 4 comments

 

/usr/bin/timedatectl

23 October 2019 10:00

Today I was looking over a system to see what it was doing, checking all the running processes, etc, and I spotted that it was running openntpd.

This post is a reminder to myself that systemd now contains an NTP-client, and I should go round and purge the ntpd/openntpd packages from my systems.

You can check on the date/time via:

$ timedatectl 
                      Local time: Wed 2019-10-23 09:17:08 EEST
                  Universal time: Wed 2019-10-23 06:17:08 UTC
                        RTC time: Wed 2019-10-23 06:17:08
                       Time zone: Europe/Helsinki (EEST, +0300)
       System clock synchronized: yes
systemd-timesyncd.service active: yes
                 RTC in local TZ: no

If the system is not setup to sync it can be enabled via:

$ sudo timedatectl set-ntp true

Finally logs can be checked as you would expect:

$ journalctl -u systemd-timesyncd.service

| 2 comments

 

A blog overhaul

8 October 2019 18:00

When this post becomes public I'll have successfully redeployed my blog!

My blog originally started in 2005 as a Wordpress installation, at some point I used Mephisto, and then I wrote my own solution.

My project was pretty cool; I'd parse a directory of text-files, one file for each post, and insert them into an SQLite database. From there I'd initiate a series of plugins, each one to generate something specific:

  • One plugin would output an archive page.
  • Another would generate a tag cloud.
  • Yet another would generate the actual search-results for a particular month/year, or tag-name.

All in all the solution was flexible and it wasn't too slow because finding posts via the SQLite database was pretty good.

Anyway I've come to realize that freedom and architecture was overkill. I don't need to do fancy presentation, I don't need a loosely-coupled set of plugins.

So now I have a simpler solution which uses my existing template, uses my existing posts - with only a few cleanups - and generates the site from scratch, including all the comments, in less than 2 seconds.

After running make clean a complete rebuild via make upload (which deploys the generated site to the remote host via rsync) takes 6 seconds.

I've lost the ability to be flexible in some areas, but I've gained all the speed. The old project took somewhere between 20-60 seconds to build, depending on what had changed.

In terms of simplifying my life I've dropped the remote installation of a site-search which means I can now host this site on a static site with only a single handler to receive any post-comments. (I was 50/50 on keeping comments. I didn't want to lose those I'd already received, and I do often find valuable and interesting contributions from readers, but being 100% static had its appeal too. I guess they stay for the next few years!)

| 5 comments

 

A slack hack

17 September 2019 21:50

So recently I've been on-call, expected to react to events around the clock. Of course to make it more of a challenge alerts are usually raised via messages to a specific channel in slack which come from a variety of sources. Let's pretend I'm all retro/hip and I'm using IRC instead.

Knowing what I'm like I knew there was essentially zero chance a single beep on my phone, from the slack/irc app, would wake me up. So I spent a couple of hours writing a simple bot:

  • Connect to the server.
  • Listen for messages.
  • When an alert is posted in the channel:
    • Trigger a voice-call via the twilio API.

That actually worked out really, really, really well. Twilio would initiate a call to my mobile which absolutely would, could, and did wake me up. I did discover a problem pretty quickly though; too many phone-calls!

Imagine something is broken. Imagine a notice goes to your channel, and then people start replying to it:

  Some Bot: Help! Stuff is broken!  I'm on Fire!!  :fire: :hot: :boom:
  Colleague Bob: Is this real?
  Colleague Ann: Can you poke Chris?
  Colleage Chris: Oh dears, woe is me.

The first night I was on call I got a phone call. Then another. Then another. Even I replied to the thread/chat to say "Yeah I'm on it". So the next step was to refine my alerting:

  • If there is a message in the channel
    • Which is not from Bob
    • Which is not from Steve
    • Which is not from Ann
    • Which is not from Chris
    • Which doesn't contain the text "common false-positive"
    • Which doesn't contain the text "backup completed"
  • Then make a phone-call.

Of course the next problem was predictable enough, so the rules got refined:

  • If the time is between 7PM and 7AM raise the alert.
  • Unless it is the weekend in which case we alert regardless of the time of day.

So I had a growing set of rules. All encoded in my goloang notification application. I moved some of them to JSON (specificially a list of users/messages to ignore) but things like the time of day were harder to move.

I figured I shouldn't be hardwiring these things. So last night put together a simple filter-library, an evaluation engine, in golang to handle them. Now I can load a script and filter things out much more dynamically. For example assume I have the following struct:

type Message struct {
    Author  string
    Channel string
    Message string
    ..
}

And an instance of that struct named message, I can run a user-written script against that object:

 // Create a new eval-filter
 eval, er := evalfilter.New( "script goes here ..." )

 // Run it against the "message" object
 out, err := eval.Run( message )

The logic of reacting now goes inside that script, which is hopefully easy to read - but more importantly can be edited without recompiling the application:

//
// This is a filter script:
//
//   return false means "do nothing".
//   return true means initiate a phone-call.
//

//
// Ignore messages on channels that we don't care about
//
if ( Channel !~ "_alerts" ) { return false; }

//
// Ignore messages from humans who might otherwise write in our channels
// of interest.
//
if ( Sender == "USER1" ) { return false; }   // Steve
if ( Sender == "USER2" ) { return true; }    // Ann
if ( Sender == "USER3" ) { return false; }   // Bob


//
// Is it a weekend? Always alert.
//
if ( IsWeekend() ) { return true ; }

//
// OK so it is not a weekend.
//
// We only alert if 7pm-7am
//
// The WorkingHours() function returns `true` during working hours.
//
if ( WorkingHours() ) { return false ; }

//
// OK by this point we should raise a call:
//
// * The message was NOT from a colleague we've filtered out.
// * The message is upon a channel with an `_alerts` suffix.
// * It is not currently during working hours.
//   * And we already handled weekends by raising calls above.
//
return true ;

If the script returns true I initiate a phone-call. If the script returns false we ignore the message/event.

The alerting script itself is trivial, and probably non-portable, but the filtering engine is pretty neat. I can see a few more uses for it, even without it having nested blocks and a real grammar. So take a look, if you like:

| No comments

 

That time I didn't find a kernel bug, or did I?

14 August 2019 13:01

Recently I saw a post to the linux kernel mailing-list containing a simple fix for a use-after-free bug. The code in question originally read:

    hdr->pkcs7_msg = pkcs7_parse_message(buf + buf_len, sig_len);
    if (IS_ERR(hdr->pkcs7_msg)) {
        kfree(hdr);
        return PTR_ERR(hdr->pkcs7_msg);
    }

Here the bug is obvious once it has been pointed out:

  • A structure is freed.
    • But then it is dereferenced, to provide a return value.

This is the kind of bug that would probably have been obvious to me if I'd happened to read the code myself. However patch submitted so job done? I did have some free time so I figured I'd scan for similar bugs. Writing a trivial perl script to look for similar things didn't take too long, though it is a bit shoddy:

  • Open each file.
  • If we find a line containing "free(.*)" record the line and the thing that was freed.
  • The next time we find a return look to see if the return value uses the thing that was free'd.
    • If so that's a possible bug. Report it.

Of course my code is nasty, but it looked like it immediately paid off. I found this snippet of code in linux-5.2.8/drivers/media/pci/tw68/tw68-video.c:

    if (hdl->error) {
        v4l2_ctrl_handler_free(hdl);
        return hdl->error;
    }

That looks promising:

  • The structure hdl is freed, via a dedicated freeing-function.
  • But then we return the member error from it.

Chasing down the code I found that linux-5.2.8/drivers/media/v4l2-core/v4l2-ctrls.c contains the code for the v4l2_ctrl_handler_free call and while it doesn't actually free the structure - just some members - it does reset the contents of hdl->error to zero.

Ahah! The code I've found looks for an error, and if it was found returns zero, meaning the error is lost. I can fix it, by changing to this:

    if (hdl->error) {
        int err = hdl->error;
        v4l2_ctrl_handler_free(hdl);
        return err;
    }

I did that. Then looked more closely to see if I was missing something. The code I've found lives in the function tw68_video_init1, that function is called only once, and the return value is ignored!

So, that's the story of how I scanned the Linux kernel for use-after-free bugs and contributed nothing to anybody.

Still fun though.

I'll go over my list more carefully later, but nothing else jumped out as being immediately bad.

There is a weird case I spotted in ./drivers/media/platform/s3c-camif/camif-capture.c with a similar pattern. In that case the function involved is s3c_camif_create_subdev which is invoked by ./drivers/media/platform/s3c-camif/camif-core.c:

        ret = s3c_camif_create_subdev(camif);
        if (ret < 0)
                goto err_sd;

So I suspect there is something odd there:

  • If there's an error in s3c_camif_create_subdev
    • Then handler->error will be reset to zero.
    • Which means that return handler->error will return 0.
    • Which means that the s3c_camif_create_subdev call should have returned an error, but won't be recognized as having done so.
    • i.e. "0 < 0" is false.

Of course the error-value is only set if this code is hit:

    hdl->buckets = kvmalloc_array(hdl->nr_of_buckets,
                      sizeof(hdl->buckets[0]),
                      GFP_KERNEL | __GFP_ZERO);
    hdl->error = hdl->buckets ? 0 : -ENOMEM;

Which means that the registration of the sub-device fails if there is no memory, and at that point what can you even do?

It's a bug, but it isn't a security bug.

| 2 comments

 

Building a computer - part 3

1 August 2019 13:01

This is part three in my slow journey towards creating a home-brew Z80-based computer. My previous post demonstrated writing some simple code, and getting it running under an emulator. It also described my planned approach:

  • Hookup a Z80 processor to an Arduino Mega.
  • Run code on the Arduino to emulate RAM reads/writes and I/O.
  • Profit, via the learning process.

I expect I'll have to get my hands-dirty with a breadboard and naked chips in the near future, but for the moment I decided to start with the least effort. Erturk Kocalar has a website where he sells "shields" (read: expansion-boards) which contain a Z80, and which is designed to plug into an Arduino Mega with no fuss. This is a simple design, I've seen a bunch of people demonstrate how to wire up by hand, for example this post.

Anyway I figured I'd order one of those, and get started on the easy-part, the software. There was some sample code available from Erturk, but it wasn't ideal from my point of view because it mixed driving the Z80 with doing "other stuff". So I abstracted the core code required to interface with the Z80 and packaged it as a simple library.

The end result is that I have a z80 retroshield library which uses an Arduino mega to drive a Z80 with something as simple as this:

#include <z80retroshield.h>


//
// Our program, as hex.
//
unsigned char rom[32] =
{
    0x3e, 0x48, 0xd3, 0x01, 0x3e, 0x65, 0xd3, 0x01, 0x3e, 0x6c, 0xd3, 0x01,
    0xd3, 0x01, 0x3e, 0x6f, 0xd3, 0x01, 0x3e, 0x0a, 0xd3, 0x01, 0xc3, 0x16,
    0x00
};


//
// Our helper-object
//
Z80RetroShield cpu;


//
// RAM I/O function handler.
//
char ram_read(int address)
{
    return (rom[address]) ;
}


// I/O function handler.
void io_write(int address, char byte)
{
    if (address == 1)
        Serial.write(byte);
}


// Setup routine: Called once.
void setup()
{
    Serial.begin(115200);


    //
    // Setup callbacks.
    //
    // We have to setup a RAM-read callback, otherwise the program
    // won't be fetched from RAM and executed.
    //
    cpu.set_ram_read(ram_read);

    //
    // Then we setup a callback to be executed every time an "out (x),y"
    // instruction is encountered.
    //
    cpu.set_io_write(io_write);

    //
    // Configured.
    //
    Serial.println("Z80 configured; launching program.");
}


//
// Loop function: Called forever.
//
void loop()
{
    // Step the CPU.
    cpu.Tick();
}

All the logic of the program is contained in the Arduino-sketch, and all the use of pins/ram/IO is hidden away. As a recap the Z80 will make requests for memory-contents, to fetch the instructions it wants to execute. For general purpose input/output there are two instructions that are used:

IN A, (1)   ; Read a character from STDIN, store in A-register.
OUT (1), A  ; Write the character in A-register to STDOUT

Here 1 is the I/O address, and this is an 8 bit number. At the moment I've just configured the callback such that any write to I/O address 1 is dumped to the serial console.

Anyway I put together a couple of examples of increasing complexity, allowing me to prove that RAM read/writes work, and that I/O reads and writes work.

I guess the next part is where I jump in complexity:

  • I need to wire a physical Z80 to a board.
  • I need to wire a PROM to it.
    • This will contain the program to be executed - hardcoded.
  • I need to provide power, and a clock to make the processor tick.

With a bunch of LEDs I'll have a Z80-system running, but it'll be isolated and hard to program. (Since I'll need to reflash the RAM/ROM-chip).

The next step would be getting it hooked up to a serial-console of some sort. And at that point I'll have a genuinely programmable standalone Z80 system.

| No comments