Entries posted in March 2014

Anti-social coding

3 March 2014 21:50

I get all excited when I load up Github's front-page and see something like:

"robyn has forked skx/xxx to robyn/xxx"

I wonder what they will do, what changes do they have in mind?

Days pass, and no commits happen.

Anti-social coding: Cloning the code, I guess in case I delete my repository, but not intending to make any changes.

Tags: github | 9 comments

Replacing ugly things would save the world many hours

5 March 2014 21:50

There are some tools that we use daily, whether we realize it or not, that are unduly ugly. Over time you learn to use them and you forget just how hard they are to learn, and you take it for granted.

Today I had to guide somebody through using procmail, and I'd forgotten how annoying it is.

In brief I use procmail in three ways, each of which I had to document:

Run a command, given a new email, and replace the original email with the output of that command.
Run a command, silently. Just for fun.
Match a regular expression on a header-field, and file accordingly.
- Later extended to matching regexps on multiple headers. ("AND" + "OR" )

There are some projects that are too entrenched to ever be replaced ("make", I'm looking at you), but procmail? I reckon there's a chance a replacement would be useful, quickly.

Then again, maybe I'm biased.

Tags: make, procmail | 11 comments

So I bought some new hardware, for audio purposes.

6 March 2014 21:50

This week I received a logitech squeezebox radio, which is basically an expensive toy that allows you to listen to either "internet radio", or music streamed from your own PC via a portable device that accesses the network wirelessly.

The main goal of this purchase was to allow us to listen to media stored on a local computer in the bedroom, or living-room.

The hardware scans your network looking for a media server, so the first step is to install that:

The slimdevices wiki has decent information.
- Or you can find the code on github.com.

The media-server has a couple of open ports; one for streaming the media, and one for a user-browsable HTML interface. Interestingly the radio-device shows up in the web-interface, so you can mess around with the currently loaded playlist from your office, while your wife is casually listening to music in the bedroom. (I'm not sure if that's a feature or not yet ;)

Although I didn't find any alternative server-implementations I did find a software-client which you can use to play music from the central server - slimp3slave - and again you can push playlists, media, etc, to this.

My impressions are pretty positive; the device was too expensive, certainly I wouldn't buy two, but it is functional. The user-interface is decent, and the software being available and open is a big win.

Downsides? No remote-control for the player, because paying an additional £70 is never going to happen, but otherwise I can't think of anything.

(Shame the squeezebox product line seems to have been cancelled (?))

Procmail Alternatives?

Although I did start hacking a C & Lua alternative, it looks like there are enough implementations out there that I don't feel so strongly any more.

I'm working in a different way to most people, rather than sort mails at delivery time I'm going to write a trivial daemon that will just watch ~/Maildir/.Incoming, and move mails out of there. That means that no errors will cause mail to be lost at SMTP/delivery time.

I'm going to base my work on Email::Filter since it offers 90% of the primitives I want. The only missing thing is the ability to filter mails via external commands which has now been reported as a bug/omission.

Tags: audio, github, perl, procmail, random, squeezebox, streaming | 10 comments

Time to get back to my roots: Perl

7 March 2014 21:50

Today I wrote a perl Test::RemoteServer module:

#!/usr/bin/perl -w -I.

use strict;
use warnings;

use Test::More tests => 4;
use Test::RemoteServer;

#
#  Ping Tests
#
ping_ok( "192.168.0.1",       "Website host is up: IPv4" );
ping6_ok( "www.steve.org.uk", "Website host is up: IPv6" );

#
#  Socket tests
#
socket_open( "ipv4.steve.org.uk", "2222", "OpenSSH is running" );
socket_closed( "ipv4.steve.org.uk", "22", "OpenSSH is not available on :22" );

I can see a lot of value in defining tests that are carried out against remote hosts - even if they're more basic than the kind of comprehensive testing you'd get via Custodian, Nagios, etc.

Being able to run "make test" and remotely probe services is cool.

Unfortunately I suspect the new-hotness is to couple the testing with your Chef, Puppet, CFengine, Slaughter, Ansible, etc, policies. That way you have two things:

A consistent way to define system-state.
A consistent way to test that the damn thing worked.

Coming to CPAN in the near future anyway, I can throw it up on Github in advance if there is any interest..

Tags: perl, servers | 4 comments

Discovering back-end servers automatically?

11 March 2014 21:50

Recently I've been pondering how to do service discovery.

Pretend you have a load balancer which accepts traffic and routes incoming requests to different back-ends. The loadbalancer might be pound, varnish, haproxy, nginx, or similar. The back-ends might be node applications, apache, or similar.

The typical configuration of the load-balancer will read:

# forward

# backends
backend web1  { .host = "10.0.0.4"; }
backend web2  { .host = "10.0.0.6"; }
backend web3  { .host = "10.0.0.5"; }

#  afterword

I've seen this same setup in many situations, and while it can easily be imagined that there might be "random HTTP servers" on your (V)LAN which shouldn't receive connections it seems like a pain to keep updating the backends.

Using UDP/multicast broadcasts it is trivial to announce "Hey I'm a HTTP-server with the name 'foo'", and it seems to me that this should allow seamless HTTP load-balancing.

To be more explicit - this is normal:

The load-balancer listens for HTTP requests, and forwards them to back-ends.
When back-ends go away they stop receiving traffic.

What I'd like to propose is another step:

When a new back-end advertises itself with the tag "foo" it should be automatically added and start to receive traffic.

i.e. This allows backends to be removed from service when they go offline but also to be added when they come online. Without the load-balancer needing its configuration to be updated.

This means you'd not give a static list of back-ends to your load-balancer, instead you'd say "Route traffic to any service that adfvertises itself with the tag 'foo'.".

VLANS, firewalls, multicast, udp, all come into play, but in theory this strikes me as being useful, obvious, and simple.

(Failure cases? Well if the "announcer" dies then the backend won't get traffic routed to it. Just like if the backend were offline. And clearly if a backend is announced, but not receiving HTTP-requests it would be dropped as normal.)

If I get the time this evening I'll sit down and look at some load-balancer source code to see if any are written in such a way that I could add this "broadcast discovery" as a plugin/minor change.

Tags: discovery, http, load-balancing | 14 comments

So load-balancers are awesome

14 March 2014 21:50

When I was recently talking about load-balancers, and automatically adding back-ends, not just removing bad ones that go offline, I obviously spent a while looking over some.

There are several dedicated load-balancers packaged for Debian GNU/Linux, including:

In addition to actual dedicated load-balancers there are things that can be coerced into running in that way: apache2, varnish, squid, nginx, & etc.

Of the load-balancers I was immediately drawn to both pen and pound, because they have command line tools ("penctl" and "poundctl" respectively) for adding/removing/updating the running configuration.

Pen I've been using for a couple of days now, and although it suffers from some security issues I'm confident they will be resolved in the near future. (#741370)

My only outstanding task is to juggle some hosts around and stress-test the pair of them a little more before deciding on a winner.

In other news I kinda regret the whole blogspam.net API. I'd have had a far simpler life if I'd just ran the damn thing as a DNSBL in the first place. (That's essentially how it operates on the whole anyway. Submit spammy comments for long enough and you're just blacklisted, thereafter.)

Tags: blogspam, load-balancing, pen, pound | 2 comments

Sending commands to your running mail-client

17 March 2014 21:50

So I wrote a mail client, and this morning I added the ability for it to receive input from a Unix domain socket.

In one terminal I have my email client open. In another I run:

lumailctl /tmp/foo.sock "open('/home/skx/Maildir/.livejournal.2014/');"

That opens the unix domain socket, and pipes the following command to it:

open('/home/skx/Maildir/.livejournal.2014/');

The mail client has already got the socket open, and the end result is that my mail client suddenly opens the specified mail folder, and redraws itself.

Neat.

The "open" function is obviously a lua function, which builds upon the lua primitives the client understands:

function open( folder )
   clear_selected_folders()
   set_selected_folder(folder)
   index()
end

Obviously this would be woefully insecure if it were released like this. Later I'll wire up some lua function to establish the socket, such that the user specifies where the socket is created (their home directory, ideally), and it doesn't run by default.

Tags: lua, lumail | 2 comments

Time to mix things up again

20 March 2014 21:50

I'm currently a contractor, working for/with Dyn, until April the 11th.

I need to decide what I'm doing next, if anything. In the meantime here are some diversions:

Some trivial security issues

I noticed and reported two more temporary-file issues insecure temporary file usage in apt-extracttemplates (apt), and libreadline6: Insecure use of temporary files - in _rl_trace.

Neither of those are particularly serious, but looking for them took a little time. I recently started re-auditing code, and decided to do three things:

Download the source code to every package installed upon this system.
Download the source code to all packages matching the pattern ^libpam-, and ^libruby-*.

I've not yet finished slogging through the code, but my expectation will be a few more issues. I'll guess 5-10, given my cynical nature.

NFS-work

I've been tasked with the job of setting up a small cluster running from a shared and writeable NFS-root.

This is a fun project which I've done before, PXE-booting a machine and telling it to mount a root filesystem over NFS is pretty straight-forward. The hard part is making that system writeable, such that you can boot and run "apt-get install XX". I've done it in the past using magic filesystems, or tmpfs. Either will work here, so I'm not going to dwell on it.

Another year

I had another birthday, so that was nice.

My wife took me to a water-park where we swam like fisheseses, and that tied in nicely with a recent visit to Deep Sea World, where we got to walk through a glass tunnel, beneath a pool FULL OF SHARKS, and other beasties.

Beyond that I received another Global Knife, which has now been bloodied, since I managed to slice my finger open chopping mushrooms on Friday. Oops. Currently I'm in that annoying state where I'm slowly getting used to typing with a plaster around the tip of my finger, but knowing that it'll have to come off again and I'll get confused again.

Linux Distribution

I absolutely did not start working on a "linux distribution", because that would be crazy. Do I look like a crazy-person?

All I did was play around with GNU Stow, and ponder the idea of using a minimal LibC and GNU Stow to organize things.

It went well, but the devil is always in the details.

I like the idea of a master-distribution which installs pam, ssh, etc, but then has derivitives for "This is a webserver", "This is a Ruby server", and "This is a database server".

Consider it like task-selection, but with higher ambition.

There's probably more I could say; a new kitchen sink (literally) and a new tap have made our kitchen nicer, I've made it past six months of regular gym-based workouts, and I didn't die when I went to the beach in the dark the other night, so that was nice.

Umm? Stuff?

Have a nice day. Thanks.

Tags: misc, nfs, security | 2 comments

That is it, I'm going to do it

22 March 2014 21:50

That's it, I'm going to do it: I have now committed myself to writing a scalable, caching, reverse HTTP proxy.

The biggest question right now is implementation language; obviously "threading" of some kind is required so it is a choice between Perl's anyevent, Python's twisted, Rubys event machine, or node.js.

I'm absolutely, definitely, not going to use C, or C++.

Writing a a reverse proxy in node.js is almost trivial, the hard part will be working out which language to express the caching behaviour, on a per type, and per-resource basis.

I will ponder.

Tags: caching, reverse proxy | 12 comments

So I failed at writing some clustered code in Perl

24 March 2014 21:50

Until this time next month I'll be posting code-based discussions only.

Recently I've been wanting to explore creating clustered services, because clusters are definitely things I use professionally.

My initial attempt was to write an auto-clustering version of memcached, because that's a useful tool. Writing the core of the service took an hour or so:

Simple KeyVal.pm implementation.
Give it the obvious methods get, set, delete.
Make it more interesting by creating a read-only append-log.
The logfile will be replayed for clustering.

At the point I was done the following code worked:

use KeyVal;

# Create an object, and set some values
my $obj = KeyVal->new( logfile => "/tmp/foo.log" );
$obj->incr( "steve" );
$obj->incr( "steve" );

print $obj->get( "steve" ) # prints 2.

# Now replay the append-only log
my $replay = KeyVal->new( logfile => "/tmp/foo.log" );
$replay->replay();

print $replay->get( "steve" ) # prints 2.

In the first case we used the primitives to increment a value twice, and then fetch it. In the second case we used the logfile the first object created to replay all prior transactions, then output the value.

Neat. The next step was to make it work over a network. Trivial.

Finally I wanted to autodetect peers, and deploy replication. Each host would send out regular messages along the lines of "Do you have updates made since $time?". Any that did would replay the logfile from the given unixtime offset.

However here I ran into problems. Peer discovery was supposed to be basic, and I figured I'd write something that did leader election by magic. Unfortunately Perls threading code is .. unpleasant:

I wanted to store all known-peers in a singleton.
Then I wanted to create threads that would announce and receive updates.

This failed. Majorly. Because you cannot launch the implementation of a class-method as a thread. Equally you cannot make a variable which is "complex" shared across threads.

I wrote some demo code which works without packages and a shared singleton:

https://github.com/skx/peer-discovery

The Ruby version, by contrast, is much more OO and neater. Meh.

I've now shelved the project.

My next, big, task was to make the network service utterly memcached compatible. That would have been fiddly, but not impossible. Right now I just use a simple line-based network protocol.

I suspect I could have got what I wanted using EventMachine, or similar, but that's a path I've not yet explored, and I'm happy enough with that decision.

Tags: clustering, perl, ruby, software | 2 comments

New GPG-key

24 March 2014 21:50

I've now generated a new GPG-key for myself:

$ gpg --fingerprint 229A4066
pub   4096R/0C626242 2014-03-24
      Key fingerprint = D516 C42B 1D0E 3F85 4CAB  9723 1909 D408 0C62 6242
uid                  Steve Kemp (Edinburgh, Scotland) <[email protected]>
sub   4096R/229A4066 2014-03-24

The key can be found online via mit.edu : 0x1909D4080C626242

This has been signed with my old key:

pub   1024D/CD4C0D9D 2002-05-29
      Key fingerprint = DB1F F3FB 1D08 FC01 ED22  2243 C0CF C6B3 CD4C 0D9D
uid                  Steve Kemp <[email protected]>
sub   2048g/AC995563 2002-05-29

If there is anybody who has signed my old key who wishes to sign my new one then please feel free to get in touch to arrange it.

Tags: gpg, security | No comments

A diversion on off-site storage

26 March 2014 21:50

Yesterday I took a diversion from thinking about my upcoming cache project, largely because I took some pictures inside my house, and realized my offsite backup was getting full.

I have three levels of backups:

Home stuff on my desktop is replicated to my wifes desktop, and vice-versa.
A simple server running rsync (content-free http://rsync.io/).
A "peering" arrangement of a small group of friends. Each of us makes available a small amount of space and we copy to-from each others shares, via rsync / scp as appropriate.

Unfortunately my rsync-based personal server is getting a little too full, and will certainly be full by next year. S3 is pricy, and I don't trust the "unlimited" storage people (backblaze,etc) to be sustainable and reliable long-term.

The pricing on Google-drive seems appealing, but I guess I'm loathe to share more data with Google. Perhaps I could dedicated a single "[email protected]" login to that, separate from all-else.

So the diversion came along when I looked for Amazon S3-comptible, self-hosted, servers. There are a few, most of them are PHP-based, or similarly icky.

So far cloudfoundry's vlob looks the most interesting, but the main project seems stalled/dead. Sadly using s3cmd to upload files failed, but certainly the `curl` based API works as expected.

I looked at Gluster, CEPH, and similar, but didn't yet come up with a decent plan for handling offsite storage, but I know I have only six months or so before the need becomes pressing. I imagine the plan has to be using N-small servers with local storage, rather than 1-Large server, purely because pricing is going to be better that way.

Decisions decisions.

Tags: s3 | 13 comments

Some things on DNS and caching

29 March 2014 21:50

Although there wasn't too many comments on my what would you pay for? post I did get some mails.

I was reminded about this via Mario Langs post, which echoed a couple of private mails I received.

Despite being something that I take for granted, perhaps because my hosting comes from the Bytemark, people do seem willing to pay money for DNS hosting.

Which is odd. I mean you could do it very very very cheaply if you had just four virtual machines. You can get complex and be geo-fancy, and you could use anycast on a small AS, but really? You could just deploy four virtual machines0 to provide a.ns, b.ns, c.ns, d.ns, and be better than 90% of DNS hosters out there.

The thing that many people mentioned was Git-backed, or Git-based DNS. Which would be trivial if you used tinydns, and no much harder if you used bind.

I suspect I'm "not allowed" to do DNS-things for a while, due to my contract with Dyn, but it might be worth checking...

ObRandom: Beat me to it. Register gitdns.io, or similar, and configure hooks from github to compile tinydns records.

In other news I started documenting early thoughts about my caching reverse proxy, which has now got a name stockpile.

I wrote some stub code using node.js, and although it was functional it soon became callback hell:

Is this resource cachable?
Does this thing exist in the cache already?
Should we return the server's response to the client, archive to memcached, or do both?

Expressing the rules neatly is also a challenge. I want the server core to be simple and the configuration to be something like:

is_cachable ( vhost, source, request, backened )
{
    /**
     * If the file is static, then it is cachable.
     */
    if ( request.url.match( /\.(jpg|png|txt|html?|gif)$/i ) ) {
        return true;
    }

    /**
     * If there is a cookie then the answer is false.
     */
    if ( request.has_cookie? ) { return false ; }

    /**
     * If the server is alive we'll now pass the remainder through it
     * if not then we'll serve from the cache.
     */
    if ( backend.alive? ) {
        return false;
    }
    else {
        return true;
    }
}

I can see there is value in judging the cachability based on the server response, but I plan to ignore that except for "Expires:", "Etag", etc ,etc)

Anyway callback hell does make me want to reexamine the existing C/C++ libraries out there. Because I think I could do better.

Tags: caching, dns | 3 comments

Entries posted in March 2014

Recent Posts