About Archive Tags RSS Feed


Entries tagged tools

Tell you, might not believe it, but

25 January 2007 21:50

I would like to have a simple way of mirroring a webpage, including any referenced .css, .js, and images.

owever to complicated matters I wish to mandate that the file will be saved as "index.html" - regardless of what it was originally called.

This appears to rule wget out, as the -output=index.html option trumps the -page-requisites flag (which is used to download images, etc which are referenced.)

Is there a simple tool which will download a single webpage, save it to a user-defined local filename and also download referenced images/css files/javascript files? (Rewriting the file to make them work too)

Using Perl I could pull down the page, and I guess I could parse the HTML manually - but that seems non-trivial - but I'd imagine there is a tool out there to do the job.

So far I've looked at curl, httrack, and wget.

If I'm missing the obvious solution please point me at it ..

(Yes, this is so that I can take "snapshots" of links added to my bookmark server.)

| No comments


We've Been Out All Night And We Havn't Been Home,

21 June 2007 21:50

The source-searching system I was talking about previously is progressing slowly.

So far I've synced the source to Etch to my local machine, total size 29Gb, and this evening I've started unpacking all the source.

I'm still in the "a" section at the moment, but thanks to caching I should be able to re-sync the source archive and unpack newer revisions pretty speedily.

The big problem at the moment is that the unpacking of all the archives is incredibly slow. Still I do have one new bug to report aatv: Buffer overflow in handling environmental variables..

That was found with:

rgrep getenv /mnt/mirror/unpacked | grep sprintf

(A very very very slow pair of greps. Hopefully once the unpacking has finished it will become faster. ha!)

The only issue I see at the moment is that I might not have the disk space to store an unpacked tree. I've got 100Gb allocated, with 29Gb comprised of the source. I'll just have to hope that the source is less than 70Gb unpacked or do this in stages.)

I've been working on a list of patterns and processes to run, I think pscan, rats, and its should be the first tools to run on the archive. Then after that some directed use of grep.

If anybody else with more disk space and connectivity than myself be interested I can post the script(s) I'm using to sync and unpack .. Failing that I'll shut up now.

| No comments


don't go breaking my heart

4 December 2007 21:50

If you're interested in working upon your CV/Resume, as Otavio Salvador was recently, then I'd highly recommend the xml-resume-library.

It allows you to write your address, previous jobs, and skills as XML then generate PDF, HTML, and plain text format documents via a simple Makefile.

It won't help with clueless agencies that mandate the use of Microsoft Word Documents for submission, so they can butcher your submission and "earn" their fee(s), but otherwise it rocks.

| No comments


I'll be the only person in the nursin' home flirting

5 December 2007 21:50

After mentioning the xml-resume-library package I was reminded that the English translation has been out of date for over a year.

With permission from the maintainer I've made a new upload which fixes this, and a couple of other bugs.

On a different topic it seems that many Debian-related websites are having their designs tweaked.

I'm not redesigning mine, but I'd love other people to have a go.

Here's hoping.

| No comments


I've got a sick friend. I need her help.

30 September 2009 21:50

There was a recent post by Martin Meredith asking about dotfile management.

This inspired me to put together a simple hack which allows several operations to be carried out:

dotfile-manager update [directory]

Update the contents of the named directory to the most recent version, via "hg pull" or HTTP fetch.

This could be trivially updated to allow git/subversion/CVS to be used instead.

(directory defaults to ~/.dotfiles/ if not specified.)

dotfile-manager link [directory]

For each file in the named directory link _foo to ~/.foo.

(directory defaults to ~/.dotfiles/ if not specified.)

e.g. directory/_screenrc will be linked to from ~/.screenrc. But hostnames count too! So you can create directory/_screenrc.gold and that will be the target of ~/.screenrc on the host gold.my.flat

dotfile-manager tidy

This removes any dangling ~/.* symlinks.

dotfile-manager report

Report on any file ~/.* which isn't a symlink - those files might be added in the future.

Right now that lets me update my own dotfiles via:

dotfile-manager update ~/.dotfiles
dotfile-manager update ~/.dotfiles-private

dotfile-manager link ~/.dotfiles
dotfile-manager link ~/.dotfiles-private

It could be updated a little more, but it already supports profiles - if you assume "profile" means "input directory".

To be honest it probably needs to be perlified, rather than being hoky shell script. But otherwise I can see it being useful - much more so than my existing solution which is ~/.dotfiles/fixup.sh inside my dotfiles repository.

ObFilm: Forever Knight



Sanity testing drives

12 August 2010 21:50

Recently I came across a situation where moving a lot of data around on a machine with a 3Ware RAID card ultimately killed the machine.

To test the hardware in advance for this requires a test of both:

  • The individual drives, which make up the RAID array
  • The filesystem which is layered upon the top of it.

The former can be done with badblocks, etc. The latter requires a simple tool to create a bunch of huge files with "random" contents, then later verify they have the contents you expected.

With that in mind:

dt --files=1000  --size=100M [--no-delete|--delete]


  • Creates, in turn, 1000 files.
  • Each created file will be 100Mb long.
  • Each created file will have random contents written to it, and be closed.
  • Once closed the file will be re-opened and the MD5sum computed
    • Both in my code and by calling /usr/bin/md5sum.
    • If these sums mis-match, indicating a data-error, we abort.
  • Otherwise we delete the file and move on.

Adding "--no-delete" and "--files=100000" allows you to continue testing until your drive is full and you've tested every part of the filesystem.

Trivial toy, or possibly useful to sanity-check a filesystem? You decide. Or just:

hg clone http://dt.repository.steve.org.uk/

(dt == disk test)

ObQuote: "Stand back boy! This calls for divine intervention! " - "Brain Dead"



You are in a maze of twisty little passages, all alike

19 August 2010 21:50

Debian package building scripts consist of several inter-related tools, which are documented well with manpages, but not in the code itself.

Should you have some free time I urge you to take a peak at the source code of dpkg-buildpackage, debsign, and debuild. They work. They work well, but they're not shining examples of clear code are they?

Given that they work, and that they are complex I'm not sure I'm interested in improving things, but I do have a personal desire to run something like:

debuild -sa --output=/tmp/results/

What would that do? Instead of placing the resulting package in the parent directory, which is what we have by default, it would place them in the named directory.

However looking over the code I see that there are too many places where "../$file" is hardwired. e.g. Take a look at dpkg-buildpackage and you see:

my $chg = "../$pva.changes";
open OUT, '>', $chg or syserr(_g('write changes file'));

If you change that to the simple code below it works - but suddenly debsign fails to find the file:

my $dir = $ENV{'OUTPUT'} || "..";
my $chg = $dir . "/" . $pva . ".changes";


I guess I can come up with some hack to index files in the parent, run a build process, and move the only new files into place. But that's sleazy at best, and still doesn't solve the problem that we have tools which we rely upon which are .. unfriendly to changes.

I thought I could handle this via:

debuild \
 --post-dpkg-buildpackage-hook="move-changes-contents /tmp/output/ ../%p_%v_`dpkg-architecture -qDEB_HOST_ARCH`.changes" \
 --no-tgz-check -sa -us -uc

(Here "move-changes-contents" would read the given .changes file and move both it and the contents listed in it to the specified directory.)

Unfortunately %v corresponds to the package version - which isn't the same as the version in the .changes filename. (i.e. epochs are stripped out.)

ObQuote: I am the servant of the power behind the Nothing - The Neverending Story.