If your code accepts URIs as input..

Monday, 12 September 2016

There are many online sites that accept reading input from remote locations. For example a site might try to extract all the text from a webpage, or show you the HTTP-headers a given server sends back in response to a request.

If you run such a site you must make sure you validate the schema you're given - also remembering to do that if you're sent any HTTP-redirects.

Really the issue here is a confusion between URL & URI.

The only time I ever communicated with Aaron Swartz was unfortunately after his death, because I didn't make the connection. I randomly stumbled upon the html2text software he put together, which had an online demo containing a form for entering a location. I tried the obvious input:

file:///etc/passwd

The software was vulnerable, read the file, and showed it to me.

The site gives errors on all inputs now, so it cannot be used to demonstrate the problem, but on Friday I saw another site on Hacker News with the very same input-issue, and it reminded me that there's a very real class of security problems here.

The site in question was http://fuckyeahmarkdown.com/ and allows you to enter a URL to convert to markdown - I found this via the hacker news submission.

The following link shows the contents of /etc/hosts, and demonstrates the problem:

http://fuckyeahmarkdown.example.com/go/?u=file:///etc/hosts&read=1&preview=1&showframe=0&submit=go

The output looked like this:

..
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
127.0.0.1 stage
127.0.0.1 files
127.0.0.1 brettt..
..

In the actual output of '/etc/passwd' all newlines had been stripped. (Which I now recognize as being an artifact of the markdown processing.)

UPDATE: The problem is fixed now.

| 9 comments.

 

Using the compiler to help you debug segfaults

Friday, 5 August 2016

Recently somebody reported that my console-based mail-client was segfaulting when opening an IMAP folder, and then when they tried with a local Maildir-hierarchy the same fault was observed.

I couldn't reproduce the problem at all, as neither my development host (read "my personal desktop"), nor my mail-host had been crashing at all, both being in use to read my email for several months.

Debugging crashes with no backtrace, or real hint of where to start, is a challenge. Even when downloading the same Maildir samples I couldn't see a problem. It was only when I decided to see if I could add some more diagnostics to my code that I came across a solution.

My intention was to make it easier to receive a backtrace, by adding more compiler options:

  -fsanitize=address -fno-omit-frame-pointer

I added those options and my mail-client immediately started to segfault on my own machine(s), almost as soon as it started. Ultimately I found three pieces of code where I was allocating C++ objects and passing them to the Lua stack, a pretty fundamental part of the code, which were buggy. Once I'd tracked down the areas of code that were broken and fixed them the user was happy, and I was happy too.

Its interesting that I've been running for over a year with these bogus things in place, which "just happened" to not crash for me or anybody else. In the future I'll be adding these options to more of my C-based projects, as there seems to be virtually no downside.

In related news my console editor has now achieved almost everything I want it to, having gained:

  • Syntax highlighting via Lua + LPEG
  • Support for TAB completion of Lua-code and filenames.
  • Bookmark support.
  • Support for setting the mark and copying/cutting regions.

The only outstanding feature, which is a biggy, is support for Undo which I need to add.

Happily no segfaults here, so far..

| 2 comments.

 

A final post about the lua-editor.

Saturday, 23 July 2016

I recently mentioned that I'd forked Antirez's editor and added lua to it.

I've been working on it, on and off, for the past week or two now. It's finally reached a point where I'm content:

  • The undo-support is improved.
  • It has buffers, such that you can open multiple files and switch between them.
    • This allows this to work "kilua *.txt", for example.
  • The syntax-highlighting is improved.
    • We can now change the size of TAB-characters.
    • We can now enable/disable highlighting of trailing whitespace.
  • The default configuration-file is now embedded in the body of the editor, so you can run it portably.
  • The keyboard input is better, allowing multi-character bindings.
    • The following are possible, for example ^C, M-!, ^X^C, etc.

Most of the obvious things I use in Emacs are present, such as the ability to customize the status-bar (right now it shows the cursor position, the number of characters, the number of words, etc, etc).

Anyway I'll stop talking about it now :)

| No comments

 

Adding lua to all the things!

Thursday, 14 July 2016

Recently Antirez made a post documenting a simple editor in 1k of pure C, the post was interesting in itself, and the editor is a cute toy because it doesn't use curses - instead using escape sequences.

The github project became very popular and much interesting discussion took place on hacker news.

My interest was piqued because I've obviously spent a few months working on my own console based program, and so I had to read the code, see what I could learn, and generally have some fun.

As expected Salvatore's code is refreshingly simple, neat in some areas, terse in others, but always a pleasure to read.

Also, as expected, a number of forks appeared adding various features. I figured I could do the same, so I did the obvious thing in adding Lua scripting support to the project. In my fork the core of the editor is mostly left alone, instead code was moved out of it into an external lua script.

The highlight of my lua code is this magic:

  --
  -- Keymap of bound keys
  --
  local keymap = {}

  --
  --  Default bindings
  --
  keymap['^A']        = sol
  keymap['^D']        = function() insert( os.date() ) end
  keymap['^E']        = eol
  keymap['^H']        = delete
  keymap['^L']        = eval
  keymap['^M']        = function() insert("\n") end

I wrote a function invoked on every key-press, and use that to lookup key-bindings. By adding a bunch of primitives to export/manipulate the core of the editor from Lua I simplified the editor's core logic, and allowed interesting facilities:

  • Interactive evaluation of lua.
  • The ability to remap keys on the fly.
  • The ability to insert command output into the buffer.
  • The implementation of copy/past entirely in Lua_.

All in all I had fun, and I continue to think a Lua-scripted editor would be a neat project - I'm just not sure there's a "market" for another editor.

View my fork here, and see the sample kilo.lua config file.

| No comments

 

I've been moving and updating websites.

Friday, 8 July 2016

I've spent the past days updating several of my websites to be "responsive". Mostly that means I open the site in firefox then press Ctrl-alt-m to switch to mobile-view. Once I have the mobile-view I then fix the site to look good in small small space.

Because my general design skills are poor I've been fixing most sites by moving to bootstrap, and ensuring that I don't use headers/footers that are fixed-position.

Beyond the fixes to appearances I've also started rationalizing the domains, migrating content across to new homes. I've got a provisional theme setup at steve.fi, and I've moved my blog over there too.

The plan for blog-migration went well:

  • Setup a redirect to from https://blog.steve.org.uk to https://blog.steve.fi/
  • Replace the old feed with a CGI script which outputs one post a day, telling visitors to update their feed.
    • This just generates one post, but the UUID of the post has the current date in it. That means it will always be fresh, and always be visible.
  • Updated the template/layout on the new site to use bootstrap.

The plan was originally to setup a HTTP-redirect, but I realized that this would mean I'd need to keep the redirect in-place forever, as visitors would have no incentive to fix their links, or update their feeds.

By adding the fake-RSS-feed, pointing to the new location, I am able to assume that eventually people will update, and I can drop the dns record for blog.steve.org.uk entirely - Already google seems to have updated its spidering and searching shows the new domain already.

| 2 comments.

 

So I've been busy.

Thursday, 30 June 2016

The past few days I've been working on my mail client which has resulted in a lot of improvements to drawing, display and correctness.

Since then I've been working on adding GPG-support. My naive attempt was to extract the signature, and the appropriate body-part from the message. Write them both to disk then I could validate via:

gpg --verify msg.sig msg

However that failed, and it took me a long to work out why. I downloaded the source to mutt, which can correctly verify an attached-signature, then hacked lib.c to neuter the mutt_unlink function. That left me with a bunch of files inside $TEMPFILE one of which provided the epiphany.

A message which is to be validated is indeed written out to disk, just as I would have done, as is the signature. Ignoring the signature the message is interesting:

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Mon, 27 Jun 2016 08:08:14 +0200

...

--=20
Bob Smith

The reason I'd failed to validate my message-body was because I'd already decoded the text of the MIME-part, and I'd also lost the prefixed two lines "Content-type:.." and Content-Transfer:.... I'm currently trying to work out if it is possible to get access to the RAW MIME-part-text in GMIME.

Anyway that learning aside I've made a sleazy hack which just shells out to mimegpg, and this allows me to validate GPG signatures! That's not the solution I'd prefer, but that said it does work, and it works with inline-signed messages as well as messages with application/pgp-signature MIME-parts.

Changing the subject now. I wonder how many people read to the end anyway?

I've been in Finland for almost a year now. Recently I was looking over websites and I saw that the domain steve.fi was going to expire in a few weeks. So I started obsessively watching it. Today I claimed it.

So I'll be slowly moving things from beneath steve.org.uk to use the new home steve.fi.

I also setup a mini-portfolio/reference site at http://steve.kemp.fi/ - which was a domain I registered while I was unsure if I could get steve.fi.

Finally now is a good time to share more interesting news:

  • I've been reinstated as a Debian developer.
  • We're having a baby.
    • Interesting times.

| 7 comments.

 

So I should document the purple server a little more

Wednesday, 15 June 2016

I should probably document the purple server I hacked together in Perl and mentioned in my last post. In short it allows you to centralise notifications. Send "alerts" to it, and when they are triggered they will be routed from that central location. There is only a primitive notifier included, which sends data to the console, but there are sample stubs for sending by email/pushover, and escalation.

In brief you create alerts by sending a JSON object via HTTP-POST. These objects contain a bunch of fields, but the two most important are:

  • id
    • A human-name for the alert. e.g. "disk-space", "heartbeat", or "unread-mail".
  • raise
    • When to raise the alert. e.g. "now", "+5m", "1466006086".

When an update is received any existing alert has its values updated, which makes heartbeat alerts trivial. Send a message with:

{ "id": "heartbeat", "raise": "+5m", .. }

The existing alert will be updated each time such a new event is submitted, which means that the time at which that alert will raise will be pushed back by five minutes. If you send this every 60 seconds then you'll get informed of an outage five minutes after your server explodes (because the "+5m" will have been turned into an absolute time, and that time will eventually become in the past - triggering a notification).

Alerts are keyed on the source IP which sent the submission and the id field, meaning you can send the same update from multiple hosts without causing any problems.

Notifications can be viewed in a reasonably pretty Web UI, so you can clear raised-alerts, see the pending ones, and suppress further notifications on something that has been raised. (By default notifications are issued every sixty seconds, until the alert is cleared. There is support for only raising an alert once, which is useful for services you might deliver events via, such as pushover which will repeat themselves.)

Anyway this is a fun project, which is a significantly simplified and less scalable version of a project which is open-sourced already and used at Bytemark.

| 4 comments.

 

A mixed weekend

Monday, 30 May 2016

This past seven days have been a little mixed:

  • I updated documentation on my simple object store.
  • I created a simplified alerting system.
    • Heavily inspired by something we use at work.
    • My version is much much simpler, but still useful enough to alert me of outages (via hearbeats) and unread email. (Both of which are sent via pushover notifications.)
  • I bought a pair of cheap USB "game controllers"
    • And have spend several hours playing SNES games such as Bomberman 2, and Super Mario Brothers 3.
    • I'm using mednafan, as it supports cheats, fullscreen, sound, and is pretty easy to drive.

Finally I spent the tail end of the weekend being a little red, sore, and itchy. . I figured this was a surprising outbreak of Dyshidrosis on my hands, and eczema on my body. Instead I received a diagnosis of Scarlet Fever. So now I feel somewhat Dickensian!

Apparently this infection is on the rise!

| 2 comments.

 

Accidental data-store .. is go!

Thursday, 19 May 2016

A couple of days ago I wrote::

The code is perl-based, because Perl is good, and available here on github:

..

TODO: Rewrite the thing in #golang to be cool.

I might not be cool, but I did indeed rewrite it in golang. It was quite simple, and a simple benchmark of uploading two million files, balanced across 4 nodes worked perfectly.

https://github.com/skx/sos/

| 2 comments.

 

Accidental data-store ..

Wednesday, 18 May 2016

A few months back I was looking over a lot of different object-storage systems, giving them mini-reviews, and trying them out in turn.

While many were overly complex, some were simple. Simplicity is always appealing, providing it works.

My review of camlistore was generally positive, because I like the design. Unfortunately it also highlighted a lack of documentation about how to use it to scale, replicate, and rebalance.

How hard could it be to write something similar, but also paying attention to keep it as simple as possible? Well perhaps it was too easy.

Blob-Storage

First of all we write a blob-storage system. We allow three operations to be carried out:

  • Retrieve a chunk of data, given an ID.
  • Store the given chunk of data, with the specified ID.
  • Return a list of all known IDs.

 

API Server

We write a second server that consumers actually use, though it is implemented in terms of the blob-storage server listed previously.

The public API is trivial:

  • Upload a new file, returning the ID which it was stored under.
  • Retrieve a previous upload, by ID.

 

Replication Support

The previous two services are sufficient to write an object storage system, but they don't necessarily provide replication. You could add immediate replication; an upload of a file could involve writing that data to N blob-servers, but in a perfect world servers don't crash, so why not replicate in the background? You save time if you only save uploaded-content to one blob-server.

Replication can be implemented purely in terms of the blob-servers:

  • For each blob server, get the list of objects stored on it.
  • Look for that object on each of the other servers. If it is found on N of them we're good.
  • If there are fewer copies than we like, then download the data, and upload to another server.
  • Repeat until each object is stored on sufficient number of blob-servers.

 

My code is reliable, the implementation is almost painfully simple, and the only difference in my design is that rather than having an API-server which allows both "uploads" and "downloads" I split it into two - that means you can leave your "download" server open to the world, so that it can be useful, and your upload-server can be firewalled to only allow a few hosts to access it.

The code is perl-based, because Perl is good, and available here on github:

TODO: Rewrite the thing in #golang to be cool.

| 3 comments.