Sometimes it is surprising how stable systems are

Monday, 5 May 2014

Yesterday I received an automated alert from my kvm-hosting host-machine, informing me that one of the drives in the RAID-pair had failed.

This particular machine has been up and running since 2009, and according to my outage log this is the first downtime in three years. (The uptime was over 1000 days, which seems to confirm that pretty nicely.)

I like reliable systems, and sometimes it's worth remembering just how well they can work.

In other news I'm currently continuing to chase a new job. The companies I've approached, or which have approached me, are being a little slow in replying which is a shame, but I'm not hugely concerned .. yet.

I'm going to give things another week, or so, and then add a banner to the Debian-Administration website, and see if that results in anything interesting.

In the meantime I've got some wood, and a new mitre saw, and I will be spending the remainder of today working on my new desk. Doing physical things is always fun, and right now especially.

| 9 comments.

 

Comments On This Entry

[gravitar] someone

Submitted at 09:30:33 on 5 may 2014

Surely you rebooted it after one of the Linux kernel updates?

[author] Steve Kemp

Submitted at 09:34:44 on 5 may 2014

For that particular system I've been happy to leave as-is, running squeeze, as it will be retired at the end of the year.

I keep on top of kernel updates for general-access hosts, which allow shell for potentially malicious actions. For this system there are only a couple of trusted users with access to a minimal "shell", so local attacks aren't likely. (Largely on the grounds of limited access, but also on the basis of the users being trusted.)

(I log commands, via snoopy, on the off-chance, but really the only concern I have for this machine is stability and potential remote attacks - none of which have been out recently.)

[gravitar] Andreas Schamanek

Submitted at 11:56:41 on 5 may 2014

Please do not forget that even "trusted users" sometimes loose equipment or have passwords stolen etc.. There's no trusted user. Or, at least, that's what I get reminded of every 3-4 years. Last time was May 1 :/

[author] Steve Kemp

Submitted at 12:00:01 on 5 may 2014

I guess that is always a risk, although I do mandate key-based authentication it is true a stolen device could result in a compromise.

Though as I said the remote access only allows a small amount of things to be done, via a custom login-shell. There's no arbitrary possibility of arbitrary command execution - barring bugs in my code, which I'm reasonably confident of.

[gravitar] Mr Fibbles

Submitted at 13:53:27 on 5 may 2014

Have you tried Skyscanner?
I know a guy working there and they do seem to be a very good company to work for.

[author] Steve Kemp

Submitted at 14:18:52 on 5 may 2014

I've heard of them a few times over the years, but they never seem to want a sysadmin, just random developers.

[gravitar] Charles Darke

Submitted at 16:44:55 on 5 may 2014

It wasn't possible to 'hot-replace' the drive?

[author] Steve Kemp

Submitted at 17:11:02 on 5 may 2014

No, my low-end boxes all use internal drives not hot-swappable ones.

It's an annoyance, but at the same time they're not sufficiently valuable being offline for potential failures needs to be protected against by increased costs.

[gravitar] Reader

Submitted at 00:11:11 on 6 may 2014

Have you tried Percona? They have an opening for a DevOps Engineer.

 

Comments are closed on posts which are more than ten days old.

Recent Posts

Recent Tags