Entries tagged lazyweb

Related tags: accesslogs, acl, apache, apache2, apt, apt-caching, blog, books, chronicle, codecs, cvs, cvsrepository, debian packages, entrys that should be articles, exaile, git, gnome, homework, image galleries, linkti.me, logfiles, mercurial, metacity, migrations, mp3, multicast, music, mysql, ogg, paging, perl, permissions, php-syslog-ng, picshare, playlists, pushing my luck, questions, searching, security holes, sql, todo, unix, vain.my.flat, videos, woe is me, xen, xen-hosting, xen-tools, xmms, xmms2, youtube.

Dynamically discovering settings for a cluster?

Friday, 6 September 2013

Pretend I run a cluster, for hosting a site. Pretend that I have three-six web-nodes, and each one needs to know which database host to contact.

How do I control that?

Right now I have a /etc/settings.conf file, more or less, deployed by Slaughter. That works. Another common pattern is to use a hostname - for example pmaster.example.org.

However failover isn't considered here. If I wanted to update to point to a secondary database I'd need to either:

  • Add code to retry the second host on failure.
    • Worry about divergence if some hosts used DB1, then DB2, then DB1 came back online.
    • Failover is easy. Fail-back is probably best avoided.
  • Worry about DNS caches and TTL.

In short I'm imagining there are several situations where you want to abstract away the configuration in a cluster-wide manner. (A real solution is obviously floating per-service IPs. Via HAProxy, Keepalived, ucarp, etc. People do that quite often for database specifically, but not for redis-servers, etc.)

So I'm pondering what is essentially a multi-cast accessible key-value storage system.

Have a deamon on the VLAN which will respond to multicast questions like "get db", or "get cache", with a hostname/IP/result.

Suddenly your code would read:

  • Send mcast question ("which db?").
  • Get mcast reply ("db1").
  • Connect to db1.

To me that seems like it should be genuinely useful. But I'm unsure if I'm trading one set of problems for another.

I can't find any examples of existing tools/deamons in this area, which either means I'm being novel, innovate, and interesting. Or I'm over thinking...



Is there a ACL system for "all" revision control systems?

Sunday, 16 December 2012

Once upon a time a company started using distributed version control, and setup several project repositories using darcs.

Over time people became more sane and new projects were created in mercurial.

Later still Git became available, and was used by a few of the brave.

Sadly each of these projects is hosted on the same host, and in the home directory of the same user. This means these two commands work:

hg clone ssh://projects@dev.host/foo

git clone ssh://projects@dev.host/bar

I'm now wanting to setup per-repository ACLs and have hit a problem...

There are several git-wrappers such as gitolite and gitosis. There is also the excellent hg-gateway and mercurial-server for dealing with mercurial.

However I've yet to find a wrapper which will handle both git & mercurial repositories, under the same UID. (+ Darcs too, of course).

So my question - is there such a beast out there, or do we need to write it? I expect such a thing would be useful for many people, so I'm surprised I've not yet found it.



Images transitioned, and mysql solications.

Monday, 16 May 2011

Images Moved

So I've retired my old picture hosting sub-domain, and moved all the files which were hosted by the dynamic system into a large web-root.

This means no more uploads are possible, but each link continues to work. For example:

Happily the system generated "random" links, and it was just a matter of moving each uploaded file into that static location, then removing the CGI application.

The code for the new site has now been made public, although I suspect there will need to be some email-pong if anybody wishes to use it. Comments welcome.

MySQL replacements?

Lets pretend I work for a company which has dealings with many MySQL users.

Lets pretend that, even though it is true, such that I don't have to get into specifics.

Let us pretend that we have many many hundreds of users who are very happy with MySQL, but that we have a few users who have "issues". That might be:

  • mysqld segfaulting every few months, with no real idea why.
    • Transactions are involved. So are stored proceedures.
    • MySQL paid support might have been perfect, or it might have lead to "yup, its a bug. good luck rebuilding with this patch. let us know how it turns out kthxbai."
    • Alternatively it might not have been re-producable.
  • Master-Master and Master-Slave setups being "unreliable" such that data inconsistencies arise despite MySQL regarding them as being in sync.
    • Good luck resolving that when you have two almost-identical "mysqldump" outputs which are 6Gb each and which cause "diff" to exit with "out of memory" even on a 64Gb host.
    • Is it possible to view differences in table-data, via the binary records? That'd be a fun project .. for a masochist.
  • Poor usage of resources.
  • Heavy concurrancy caused by poorly developed applications in a load-balanced environment, leading to stalling queries. (Wordpress + Poor wordpress plugins I'm looking at you; you're next on my killfile).

To compound this problem some of these installations may or may not be running Etch. let us pretend they are not, just to simplify things. (They mostly arent' these days, but I'm sure I could think of one or two if I tried)

So, in this hypothetical situation what would you recommend?

I know there are new forks aplenty of MySQL. Drizzle et al. I suspect most of the forks will be short-lived - lots of this stuff is hard and non-sexy. I suspect the long-lived forks are probably concentrating on edge-cases we've not hit (yet), or on sexy exciting things like new storage engines and going nosql like all the cool kids.

Realistically going down the postgresql road is liable to lead to wholly different sets of problems, and a significant re-engineering of several sites, applications and tools with no proof of stability.

Without wanting to jump ship entirely, what, if any, are our options?

PS. MySQL I still mostly love you, but my two most recent applications were written to use redis instead. Just a coincidence... I swear. No, put down that axe. Please can't we just talk about it?/p>

ObQoote: "I've calculated your chance of survival, but I don't think you'll like it. " - Hitchhikers Film.



That friend promises his undying friendship if you would do him a small favour.

Thursday, 17 June 2010

Perl & Apache?

Once upon a time, within the past year, I saw mention of a simpler version of mod_perl - an apache module which let you write code to run within the context of a persistent perl process.

However my DuckDuckGofu is weak, and I'm struggling to find this project.

Did I dream it, or could somebody tell me where it lives?

Dynamic Picture Frames

So I've been taking pictures recently. Lots of pictures.

Many times many images have been printed and hung upon my walls, and the price of frames is starting to become onerous.

I'd love to see some kind of "dynamic" picture wall - but the two alternatives I considered fail:

Metal & Magnets

Place a huge sheet of metal upon your wall. Then put wee magnets inside your frames.


Imagine a full wall that was paneled with what is essentially a large notice-board..

Both of these would look ugly; the metal one perhaps less so.

But the idea of having a wall which could have pictures mounted upon it, without having big nail holes if you rearranged and which could cope with dynamic repositioning and sizes is nice ..

Invent it for me? I'll buy one. Probably even two...

ObFilm: The Godfather



I go down with one helluva bang.

Saturday, 20 June 2009

Right now I have a lot of music, and I primarily interact with it via playlists.

I have a cronjob that generates, and populates, ~/Playlists/ every night. I generate playlists on multiple criterion:

  • ~/Playlists/Artist/
  • ~/Playlists/Albums/
  • ~/Playlists/Titles/
  • ~/Playlists/Keywords/

Playlists for specific artists & albums are probably self-explanatory, but the others might be interesting.

For every unique songtitle I have a playlist. In most cases that means there is a playlist called "Song Title" having one entry. But, as an explicit example, I have a playlist called "Under The Bridge" with two entries:

All Saints/Under The Bridge.mp3
Red Hot Chili Peppers/Under The Bridge.mp3

Similarly I break each song title into words, and generate one playlist for each distinct word discovered.

As a matter of randomness I have:


(e.g. Songs containing "girl" in their title: "Madonna:Material Girl", "Amy Whitehouse:Hey Little Rich Girl", "Garbage:Stupid Girl"..)

There are times when I want something specific and my playlist approach doesn't work. For example "All songs which are 2 minutes long, and happy". I guess the problem is working out which meta-data is worth searching/storing, and then working out how to jump from that data to a playlist.

Today, whilst walking into town to buy some new pies, I wondered "How many songs do I have that end in a chuckle, or laughter?"

If I wanted an "ends in laughter" playlist right now I'm screwed. Yet no system I've ever seen allows you to add that level of detail. (To be honest I'd probably give up even entering it.)

In conclusion, my music collection is vast and various, and dealing with it is sometimes harder than I'd like.

How do you handle the music on your computer(s)? (When it comes to mobile-music I just use an ipod telling it to play all, randomly. If a song comes on I don't like I just skip it.)

ObFilm: Lolita



You think we just work at a comic book store for our folks, huh?

Saturday, 19 July 2008

I'm only a minimal MySQL user, but I've got a problem with a large table full of data and I'm hoping for tips on how to improve it.

Right now I have a table which looks like this:

CREATE TABLE `books` (
  `id` int(11) NOT NULL auto_increment,
  `owner` int(11) NOT NULL,
  `title` varchar(200) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY( `owner`)
)  ;

This allows me to lookup all the BOOKS a USER has - because the user table has an ID and the books table has an owner attribute.

However I've got hundreds of users, and thousands of books. So I'm thinking I want to be able to find the list of books a user has.

Initially I thought I could use a view:

CREATE VIEW view_steve  AS select * FROM books WHERE owner=73

But that suffers from a problem - the table has discountinuous IDs coming from the books table, and I'd love to be able to work with them in steps of 1. (Also having to create a view for each user is an overhead I could live without. Perhaps some stored procedure magic is what I need?)

Is there a simple way that I can create a view/subtable which would allow me to return something like:

|id|book_id|owner | title      |....|
|0 | 17    | Steve| Pies       | ..|
|1 | 32    | Steve| Fly Fishing| ..|
|2 | 21    | Steve| Smiles     | ..|
|3 | 24    | Steve| Debian     | ..|

Where the "id" is a consecutive, incrementing number, such that "paging" becomes trivial?

ObQuote: The Lost Boys

Update: without going into details the requirement for known, static, and ideally consecutive identifiers is related to doing correct paging.



I got the poison

Wednesday, 20 February 2008

I've two video-related queries, which I'd be greatful if people could help me out with:

Mass Video Uploading

Is there any tool, or service, which will allow me to upload a random movie to multiple video-download sites? Specifically I'm curious to learn whether there is a facility to transcode as necessary a given input file and then upload to youtube, google video, and other sites as a one-step operation.

Mass Video Searching

Relating to that is there a service which will allow me to search for vidoes with given titles/tags/keywords across multiple video-hosting networks?

Regarding the searching I see that YouTube has support for "OpenSearch", but Google's video hosting has neither that nor a sitemap.xml file: Irony Is ...



No, no, no, no.

Wednesday, 20 February 2008

I'm going to admit up front here that I'm pushing my luck, and that I anticipate the chances of success are minimal. But that aside .. There are a lot of people who read my entries, because of syndication, and I'm optimistic that somebody here in the UK will have a copy of the following three books they could send me:

  • Flash Gordon vol 3: Crisis on Citiadel II
  • Flash Gordon vol 5: Citadels under attack
  • Flash Gordon vol 6: Citadels on Earth

(All three are cheap paperback pulp fiction novels from the 1980s written by Alex Raymond.)

If you have a copy of any of those three books, and are willing to part with them, then I'd love to hear from you. Either as a comment or via email.

I'm certainly expecting to pay for them up to around £5 for each volume.

Backstory: I read the first when I was 10-12, then mostly forgot about it.

A while back I remembered enjoying it and bought volumes 1, 2, 3, & 4 from an online store. I got screwed and volume 3 hasn't arrived, but possibly that will be rectified soon.

Here in the UK the last two volumes are either extremely rare or extremely in demand. Typically they seem to sell for £15-30 - I'm frustrated to not have the conclusion, but not desperate to spend so much money upon them, (been there, done that).

So if anybody has some or all of these books and can bear to part with them please do let me know.


| No comments


Its a lot like life

Thursday, 3 January 2008

Assume for a moment that you have 148 hosts logging, via syslog-ng, to a central host. That host is recording all log entries into an MySQL database. Assume that each of these machines is producing a total of 4698816 lines per day.

(Crazy random numbers pulled from thin air; globviously).

Now the question: How do you process, read, or pay attention to those logs?

Here is what we've done so far:


All the syslog-ng client machines are logging to a central machine, which inserts the records into a database.

This database may be queried using the php-syslog-ng script. Unfortunately this search is relatively slow, and also the user-interface is appallingly bad. Allowing only searches, not a view of most recent logs, auto-refreshing via AJAX etc.

rss feeds

To remedy the slowness, and poor usability of the PHP front-end to the database I wrote a quick hack which produces RSS feeds via queries, against that same database, accessed via URIs such as:

  • http://example.com/feeds/DriveReady
  • http://example.com/feeds/host/host1

The first query returns and RSS feed of log entries containing the given term. The second shows all recent entries from the machine host1.

That works nicely for a fixed set of patterns, but the problem with this approach, and that of php-syslog-ng in general, is that it will only show you things that you look for - it won't volunteer trends, patterns, or news.

The fundamental problem is a lack of notion in either system of "recent messages worth reading" (on a global or per-machine basis).

To put that into perspective given a logfile from one host containing, say, 3740 lines there are only approximately 814 unique lines if you ignore the date + timestamp.

Reducing logentries by that amount (78% decrease) is a significant saving, but even so you wouldn't want to read 22% of our original 4698816 lines of logs as that is still over a million log-entries.

I guess we could trim the results down further via a pipe through logcheck or similar, but I can't help thinking that still isn't going to give us enough interesting things to view.

To reiterate I would like to see:

  • per-machine anomolies.
  • global anomolies.

To that end I've been working on something, but I'm not too sure yet if it will go anywhere... In brief you take the logfiles and tokenize, then you record the token frequencies as groups within a given host's prior records. Unique pairings == logs you want to see.

(i.e. token frequency analysis on things like "<auth.info> yuling.example.com sshd[28427]: Did not receive identification string from"

What do other people do? There must be a huge market for this? Even amongst people who don't have more than 20 machines!



I just don't understand

Sunday, 30 December 2007

Whilst I'm very pleased with my new segmented network setup, and the new machine, I'm extremely annoyed that I cannot get a couple of (graphical) Xen guest desktop guests up and running.

The initial idea was that I would setup a 64-bit installation of Etch and then communicate with it via VNC - xen-tools will do the necessary magic if you create your guest with "--role=gdm". Unfortunately it doesn't work.

When vncserver attempts to start upon an AMD64 host it dies with a segfault - meaning that I cannot create a scratch desktop environment to play with.

All of this works perfectly with a 32-bit guest, and that actually is pretty neat. It lets me create a fully virtualised, restorable, environment for working with flash/java/etc.

The bug was filed over three years ago as #276948, but there doesn't appear to be a solution.

Also, only on the amd64 guest, I'm seeing errors when I try to start X which mention things like "no such file or directory /dev/tty0". I've no idea whats going on there - though it could be a vt (virtual terminal) thing?.

The upshot of all this is that I currenly have fewer guests than I was expecting:

skx@gold:~/blog/data$ xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3114     2 r-----   1180.6
cfmaster.services.xen                      1      256     1 -b----      1.0
etch32.desktop.xen                         2      256     1 -b----      1.4
etch32.security-build.xen                  3      128     1 -b----      1.4
etch64.security-build.xen                  4      128     1 -b----      1.4
sarge32.security-build.xen                 5      128     1 -b----      1.0



You're making me live

Monday, 26 November 2007

Is there an existing system which will allow me to query Apache logfiles via an SQL string? (Without importing into a database first).

I've found the perl library SQL::YASL - but that has a couple of omissions which mean it isn't ideal for my task:

  • It doesn't understand DISTINCT
  • It doesn't understand COUNT
  • It doesn't understand SUM

Still it did allow me to write a simple shell which works nicely for simple cases:

SQL>LOAD /home/skx/hg/engaging/logs/access.log;
SQL>select path,size from requests where size > 10000;
path size 
/css/default.css 13813 
/js/prototype.js 71261 
/js/effects.js 37872 
/js/dragdrop.js 30645 
/js/controls.js 28980 
/js/slider.js 10403 
/view/messages 15447 
/view/messages 15447 
/recent/messages 25378 

It does mandate the use of a "WHERE" clause, but that was easily fixed with "WHERE 1=1". If I could just have support for count I could do near realtime interesting things...

Then again maybe I should just log directly and not worry about it. I certainly don't want to create my own SQL engine .. it just seems that Perl doesn't have a suitable library already made which is a bit of a shocker!

| No comments


Open your eyes, look up to the skies and see

Sunday, 25 November 2007

If you have a (public) revision controlled ~/bin/, or bash/shell scripts I'd love to see them. Feel free to post links to your repositories as comments.

I'm certain there are some great tools and utilities out there with I could be using. Right now the only external thing I'm using is Martin Krafft's pub script. I don't use it often, but it is very neat and handy when I do want it. (Something that I'd never have considered writing myself, which suggests there are many more gems I'm missing!)

In other news my migration to mercurial is going extremely well. With only minimal downtime. Downtime for services really comes about because I have several websites which are powered entirely with a CVS checkout of remote repositories, so the process looks a little like this:

  • Convert CVS repository to hg.
  • Archive "live" CVS checkout from the server.
  • Move the local CVS checkout somewhere temporary.
  • Checkout from the new mercurial repository.
  • Fix any broken symlinks.
  • Do a recursive diff to make sure there are no unexpected changes.
  • Remove the previously archived local CVS checkout
  • Done!

| No comments


No, I don't want your number

Friday, 23 November 2007

I'm still in the middle of a quandry with regards to revision control.

90% of my open code is hosted via CVS at a central site.

I wish to migrate away from CVS in the very near future, and having ummed and ahhed for a while I've picked murcurial as my system of choice. There is extensive documentation, and it does everything I believe I need.

The close-runner was git, but on balance I've decided to choose mercurial as it wins in a few respects.

Now the plan. I have two options:

  • Leave each project in one central site.
  • Migrate project $foo to its own location.

e.g. My xen-tools could be hosted at mercurial.xen-tools.org, my blog compiler could live at mercurial.steve.org.uk.

Alternatively I could just leave the one site in place, ignoring the fact that the domain name is now inappropriate.

The problem? I can't decide which approach to go for. Both have plusses and minuses.

Suggestions or rationales welcome - but no holy wars on why any particular revision control system is best...

I guess ultimately it matters little, and short of mass-editing links its 50/50.

| No comments


I love this hive employee

Tuesday, 13 November 2007

Russell Coker wants something to save and restore file permissions en masse.

That exists already:

apt-get install acl

Once installed you can dump the filesystem permissions of, for example, /etc/ recursively with this:

 getfacl -R  /etc > orig.perms

Want to see what is different? First change something:

steve@steve:~$ sudo chmod 0 /etc/motd

Now see what would be restored:

setfacl --test -R --restore=./orig.perms /etc | grep -v "\*,\*"
etc/motd : u::rw-,g::r--,o::r--,*

Finally lets make it do the restoration:

steve:/# setfacl -R --restore=./orig.perms /etc

Job done.

| No comments


For you the sun will be shining

Saturday, 6 October 2007

Thanks to the people who commented on my post about a decent apt cacher, it was good to see that I'm not alone.

Thanks to RobertH for recommending the new tool acng - I've not used it yet, instead I gave it a quick look and reported a potentially serious bug. Hopefully that'll be fixed in the next release.

In the meantime apt-cacher actually appears to be holding up quite nicely and the nice HTML report it generates is cute!

Now onto the next challenge...

I would like some kind of tool to convert a random hierarchy of images (jpg) into a small gallery. (Utterly non-dynamic - but ideally with tagging support and RSS feeds).

There seem to be a plethora of options to the problem, suprisingly many of them involving Python ..

If anybody has any pointers I'd appreciate a link.

For reference my current galleries tend to look like this - warning fluffy animals!

Using "apt-cache search static gallery" I find three programs:

bins - Very heavyweight. Unattractive.

photon - Pretty. Requires GIMP for creating thumbnails - unsuitable for my lightweight webhost.

jigl - Looks great. Does 90% of what I want - specifically misses tags & rss.

| No comments


Are you talking to me?

Saturday, 11 August 2007

My GNOME desktop is broken upon my primary machine, and it has taken me too long to get it sorted out.

Short version: metacity will not run:

skx@vain:~$ metacity
metacity: symbol lookup error: /usr/lib/libgthread-2.0.so.0: undefined symbol: g_thread_gettime

The .so file referenced is a symlink to libgthread-2.0.so.0.1200.13, and using nm I can see there are no symbols listed:

skx@vain:~$ nm /usr/lib/libgthread-2.0.so.0.1200.13
nm: /usr/lib/libgthread-2.0.so.0.1200.13: no symbols

That seems weird to me, but libraries are mysterious beasts, so I might be expecting this behaviour?

Anyway dpkg claims this file is installed by libglib2.0-0, and the package hasn't had an upload since July 17th, so I can't believe this is the reason for the recent breakage (Even given that I don't logout often..)

Reinstalling both packages (metacity + libglib2.0-0) has failed to fix the problem so I'm lost.

Right now I'm running GNOME with a different window manager, icewm, via a ~/.gnome2/session file:

gnome-wm --default-wm /usr/bin/icewm-gnome --sm-client-id default0

This works almost perfectly - it is better than metacity in the sense that new windows don't overlap existing ones if there is spare screen space, but worse in that alt-TAB shows two windows "Top extended Edge Panel" and "Bottom Extended Edge Panel" - which I don't need/want to see.

I'd be happy to stay with IceWM if I could fix those two problems, but I'd love to know why metacity is broken, and how I can fix it. I can't see any obvious bug reports - and I'm not 100% certain that the gthread package is the source of the error...

Any suggestions welcome.

ii  metacity       1:2.18.5-1     A lightweight GTK2 based Window Manager
ii  libglib2.0-0   2.12.13-1      The GLib library of C routines

| No comments


Now some men like the fishing

Friday, 3 August 2007

Xen Migration

This afternoon I mostly migrated Xen guests from their old host to their new. (As part of a an upgrade of facilities. Upgrading in place would have been much fiddlier and more annoying!)

The migration took almost three hours, which was longer than anticipated but shorter than I'd feared. In the future I'll know to do it differently, but I managed to script it fairly well after the first couple were done manually.

Everything appears to be working correctly so I will soon nip out for some high quality beer.

Xen Help?

One thing that I wanted to do with the new host was track bandwidth usage upon a per-guest basis.

This should be possible with something like vnstat - however solutions counting traffic by interface name are not a good mesh with Xen - since by default a guest will have an interface with a name like 'vif20.0' - and no means of mapping that to a specific guest.

Each of my guests has been allocated three IPs which are defined like this in the Xen configuration file:

vif = [ 'ip=' ]

This works prefectly.

This also works:

vif = [ 'ip=,vifname=foo' ]

Unfortunately anything else I've tried to give each IP a static interface name fails. I've seen reports of this online but no solutions.

Given a configuration file like this the Xen guest doesn't receive any traffic upon the second + third address:

vif = [ 'ip=,vifname=foo1',
        'ip=,vifname=foo3' ]

Any suggestions welcome.

| No comments


She said she'd teach me 'bout voodoo

Tuesday, 10 July 2007

So I've been very happy with exaile - the media player - for the past week or so.

I think I'm going to switch to it full time.

The "random play" is suprisingly random. Despite listening to music 24x7 I'm finding myself hearing new music. I can only conclude that xmms and xmms2 have poor random functionlity ..

The bigger issue is the handling of plugins. How do plugins get loaded? Via an external website.

You do the pointy-clicky dance with the user-interface, and the system downloads arbitary code from exaile.org, installs it into ~/.exaile/plugins and executes it.

Double-plus ungood.

 download_url = "http://www.exaile.org/plugins/plugins.py?version=%s&plugin=%s" \
    % (self.app.get_plugin_location(), file)
  xlmisc.log('Downloading %s from %s' % (file, download_url))

Let us hope they never lose control of that domain, (and never implement automatic plugin updates) otherwise all current users will hit the site, be persuaded there are newer plugins available and be compromised en masse...

In other news, even with my planet-searching script, I cannot find the blog entry I wanted to refer people to. It involved people looking pretty and acting miserable. Possibly on buses?

| No comments


Other things just make you swear and curse

Tuesday, 26 June 2007

I find myself in need of a simple "blogging system" for a small non-dynamic site I'm putting together.

In brief I want to be able to put simple text files into "blog/", and have static HTML files build from them, with the most recent N being included in an index - and each one individually linked to.

At a push I could just read "entries/*.blog", then write a perl script to extract a date + title and code it myself - but I'm sure such a thing must already exist? I vaguely remember people using debian/changelog files as blogs a while back - that seems similar?

Update: NanoBlogger it is.

| No comments


Recent Posts

Recent Tags