About Archive Tags RSS Feed


Entries tagged rsync

The plans you refer to will soon be back in our hands.

22 August 2009 21:50

Many of us use rsync to shuffle data around, either to maintain off-site backups, or to perform random tasks (e.g. uploading a static copy of your generated blog).

I use rsync in many ways myself, but the main thing I use it for is to copy backups across a number of hosts. (Either actual backups, or stores of Maildirs, or similar.)

Imagine you backup your MySQL database to a local system, and you keep five days of history in case of accidental error and deletion. Chances are that you'll have something like this:


(Here I guess it is obvious that you backup to /mysql/0, after rotating the contents of 0->1, 1->2, 2->3, & 3->4)

Now consider what happens when that rotation happens and you rsync to an off-site location afterward: You're copying way more data around than you need to because each directory will have different content every day.

To solve this I moved to storing my backups in directories such as this:


This probably simplifies the backup process a little too: just backup to $(date +%d-%m-%Y) after removing any directory older than four days.

Imagine you rsync now? The contents of previous days won't change at all, so you'll end up moving significantly less data around.

This is a deliberately contrived and simple example, but it also applies to common everyday logfiles such as /var/log/syslog, syslog.1, syslog.2.gz etc.

For example on my systems qpsmtpd.log is huge, and my apache access.log files are also very large.

Perhaps food for thought? One of those things that is obvious when you think about it, but doesn't jump out at you unless you schedule rsync to run very frequently and notice that it doesn't work as well as it "should".

ObFilm: Star Wars. The Family Guy version ;)



The pain of a new IP address

13 October 2010 21:50

Tonight I was having some connectivity issues, so after much diagnostic time and pain I decided to reboot my router. At the moment my home router came back my (external) IP address changed, and suddenly I found I could no longer login to my main site.

Happily however I have serial console access, and I updated things such that my new IP address was included in the hosts.allow file. [*]

The next step was to push that change round my other boxes, and happily I have my own tool slaughter which allows me to make such global changes in a client-pulled fashion. 60 minutes later cron did its magic and I was back.

This reminds me that I let the slaughter tool stagnate. Mostly because I only use it to cover my three remote boxes and my desktop, and although I received one bug report (+fix!) I never heard of anybody else using it.

I continue to use and like CFEngine at work. Puppet & Chef have been well argued against elsewhere, and I'm still to investigate BFG2 + FAI.

Mostly I'm happy with slaughter. My policies are simple, readable, and intuitive. Learn perl? Learn the "CopyFile" and you're done. For example.

By contrast the notion of state machines, functional operations, and similar seem over-engineered in other tools. Perhaps thats my bug, perhaps that's just the way things are - but the rants linked to above makes sense to me and I find myself agreeing 100%.

Anyway; slaughter? What I want to do is rework it such that all policies are served via rsync and not via HTTP. Other changes, such as the addition of new primitives, don't actually seem necessary. But serving content via rsync just seems like the right way to go. (The main benefit is recursive copies of files become trivial.)

I'd also add the ability to mandate GPG-signatures on policies, but that's possible even now. The only step backwards I see is that currently I can serve content over SSL, but that should be fixable even if via stunnel.


My /etc/hosts.allow file contains this:

ALL: /etc/hosts.allow.trusted
ALL: /etc/hosts.allow.trusted.apache

Then hosts.allow.trusted contains:

# www.steve.org.uk

# www.debian-administration.org

# my home.

I've never seen anybody describe something similar, though to be fair it is documented. To me it just seems clean to limit the IPs in a single place.

To conclude hosts.allow.trusted.apache is owned by root.www-data, and can be updated via a simple CGI script - which allows me to add a single IP address on the fly for the next 60 minutes. Neat.

ObQuote: Tony is a little boy that lives in my mouth. - The Shining

| 1 comment


So about that off-site encrypted backup idea ..

28 September 2012 21:50

I'm just back from having spent a week in Helsinki. Despite some minor irritations (the light-switches were always too damn low) it was a lovely trip.

There is a lot to be said for a place, and a culture, where shrugging and grunting counts as communication.

Now I'm back, catching up on things, and mostly plotting and planning how to handle my backups going forward.

Filesystem backups I generally take using backup2l, creating local incremental backup archives then shipping them offsite using rsync. For my personal stuff I have a bunch of space on a number of hosts and I just use rsync to literally copy my ~/Images, ~/Videos, etc..

In the near future I'm going to have access to a backup server which will run rsync, and pretty much nothing else. I want to decide how to archive my content to that - securely.

The biggest issue is that my images (.CR2 + .JPG) will want to be encrypted remotely, but not locally. So I guess if I re-encrypt transient copies and rsync them I'll end up having to send "full" changes each time I rsync. Clearly that will waste bandwidth.

So my alternatives are to use incrementals, as I do elsewhere, then GPG-encrypt the tar files that are produced - simple to do with backup2l - and us rsync. That seems like the best plan, but requires that I have more space available locally since :

  • I need the local .tar files.
  • I then need to .tar.gz.asc/.tar.gz.gpg files too.

I guess I will ponder. It isn't horrific to require local duplication, but it strikes me as something I'd rather avoid - especially given that we're talking about rsync from a home-broadband which will take weeks at best for the initial copy.



Updating my backups

6 November 2013 21:50

So I recently mentioned that I stopped myself from registering all the domains, but I did aquire rsync.io.

Today I'll be moving my last few backups over to using it.

In terms of backups I'm pretty well covered; I have dumps of databases and I have filesystem dumps too. The only niggles are the "random" directories I want to backup from my home desktop.

My home desktop is too large to backup completely, but I do want to archive some of it:

  • ~/Images/ - My archive of photos.
  • ~/Books/ - My calibre library.
  • ~/.bigv/ - My BigV details.
  • etc.

In short I have a random assortment of directories I want to backup, with pretty much no rhyme or reason to it.

I've scripted the backup right now, using SSH keys + rsync. But it feels like a mess.

PS. Proving my inability to concentrate I couldn't even keep up with a daily blog for more than two days. Oops.



Secure your rsync shares, please.

13 February 2014 21:50

Recently I started doing a internet-wide scan for rsync servers, thinking it might be fun to write a toy search-engine/indexer.

Even the basics such as searching against the names of exported shares would be interesting, I thought.

Today I abandoned that after exploring some of the results, (created with zmap), because there's just too much private data out there, wide open

IP redacted for obvious reason:

shelob ~ $ rsync  rsync://xx.xx.xx.xx/
ginevra        	Ginevra backup
krsna          	Alberto Laptop Backup
franziska      	Franz Laptop Backup
genoveffa      	Franz Laptop Backup 2

Some nice shares there. Lets see if they're as open as they appear to be:

shelob ~ $ rsync  rsync://xx.xx.xx.xx/ginevra/home/
drwxrwsr-x        4096 2013/10/30 13:42:29 .
drwxr-sr-x        4096 2009/02/03 10:32:27 abl
drwxr-s---       12288 2014/02/12 20:05:22 alberto
drwxr-xr-x        4096 2011/12/13 17:12:46 alessandra
drwxr-sr-x       20480 2014/02/12 22:55:01 backup
drwxr-xr-x        4096 2008/10/03 14:51:29 bertacci

Yup. Backups of /home, /etc/, and more.

I found numerous examples of this, along with a significant number of hosts that exported "www" + "sql", as a pair, and a large number of hosts that just exported "squid/". I assume they must be some cpanel-like system, because I can't understand why thousands of people would export the same shares with the same comments otherwise.

I still would like to run the indexer, but with so much easy content to steal, well I think the liability would kill me.

I considered not posting this, but I suspect "bad people" already know..,