Retiring the site

Thursday, 21 September 2017

So previously I've documented the setup of the Debian-Administration website, and now I'm going to retire it I'm planning how that will work.

There are currently 12 servers powering the site:

  • web1
  • web2
  • web3
  • web4
    • These perform the obvious role, serving content over HTTPS.
  • public
    • This is a HAProxy host which routes traffic to one of the four back-ends.
  • database
    • This stores the site-content.
  • events
    • There was a simple UDP-based protocol which sent notices here, from various parts of the code.
    • e.g. "Failed login for bob from".
  • mailer
    • Sends out emails. ("You have a new reply", "You forgot your password..", etc)
  • redis
    • This stored session-data, and short-term cached content.
  • backup
    • This contains backups of each host, via Obnam.
  • beta
    • A test-install of the codebase
  • planet
    • The blog-aggregation site

I've made a bunch of commits recently to drop the event-sending, since no more dynamic actions will be possible. So events can be retired immediately. redis will go when I turn off logins, as there will be no need for sessions/cookies. beta is only used for development, so I'll kill that too. Once logins are gone, and anonymous content is disabled there will be no need to send out emails, so mailer can be shutdown.

That leaves a bunch of hosts left:

  • database
    • I'll export the database and kill this host.
    • I will install mariadb on each web-node, and each host will be configured to talk to localhost only
    • I don't need to worry about four database receiving diverging content as updates will be disabled.
  • backup
  • planet
    • This will become orphaned, so I think I'll just move the content to the web-nodes.

All in all I think we'll just have five hosts left:

  • public to do the routing
  • web1-web4 to do the serving.

I think that's sane for the moment. I'm still pondering whether to export the code to static HTML, there's a lot of appeal as the load would drop a log, but equally I have a hell of a lot of mod_rewrite redirections in place, and reworking all of them would be a pain. Suspect this is something that will be done in the future, maybe next year.

| No comments is closing down

Monday, 11 September 2017

After 13 years the Debian-Administration website will be closing down towards the end of the year.

The site will go read-only at the end of the month, and will slowly be stripped back from that point towards the end of the year - leaving only a static copy of the articles, and content.

This is largely happening due to lack of content. There were only two articles posted last year, and every time I consider writing more content I lose my enthusiasm.

There was a time when people contributed articles, but these days they tend to post such things on their own blogs, on medium, on Reddit, etc. So it seems like a good time to retire things.

An official notice has been posted on the site-proper.



Interesting times debugging puppet

Saturday, 26 August 2017

I recently upgraded a bunch of systems from Jessie to Stretch, and as a result of that one of my hosts has started showing me a lot of noise in an hourly cron-email:

Command line is not complete. Try option "help"

I've been ignoring these emails for the past while, but today I sat down to track down the source. It was obviously coming from facter, the system that puppet uses to gather information about hosts.

Running facter -debug made that apparent:

 root@smaug ~ # facter --debug
 Found no suitable resolves of 1 for ec2_metadata
 value for ec2_metadata is still nil
 value for netmask_git is still nil
 value for ipaddress6_lo is still nil
 value for macaddress_lo is still nil
 value for ipaddress_master is still nil
 value for ipaddress6_master is still nil
 Command line is not complete. Try option "help"
 value for netmask_master is still nil
 value for ipaddress_skx_mail is still nil

There we see the issue, and it is obviously relating to our master interface.

To cut a long-story short /usr/lib/ruby/vendor_ruby/facter/util/ip.rb contains some code which eventually runs this:

 ip link show $interface

That works on all other interfaces I have:

  $ ip link show git
  6: git: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000

But not on master:

  $ ip link show master
  Command line is not complete. Try option "help"

I ninja-edited the code from this:

  ethbond = regex.match(%x{/sbin/ip link show '#{interface}'})


  ethbond = regex.match(%x{/sbin/ip link show dev '#{interface}'})

And suddenly puppet-runs without any errors. I'm not 100% sure if this is a bug bug, but it is something of a surprise anyway.

This host runs KVM guests, one of the guests is a puppet-master, with a local name master. Hence the name of the interface. Similarly the interface git is associated with the KVM guest behind



A day in the life of Steve

Sunday, 13 August 2017

I used to think I was a programmer who did "sysadmin-stuff". Nowadays I interact with too many real programmers to believe that.

Or rather I can code/program/develop, but I'm not often as good as I could be. These days I'm getting more consistent with writing tests, and I like it when things are thoroughly planned and developed. But too often if I'm busy, or distracted, I think to myself "Hrm .. compiles? Probably done. Oops. Bug, you say?"

I was going to write about working with golang today. The go language is minimal and quite neat. I like the toolset:

  • go fmt
    • Making everything consistent.
  • go test

Instead I think today I'm going to write about something else. Since having a child a lot of my life is different. Routine becomes something that is essential, as is planning and scheduling.

So an average week-day goes something like this:

  • 6:00AM
    • Wake up (naturally).
  • 7:00AM
    • Wake up Oiva and play with him for 45 minutes.
  • 7:45AM
    • Prepare breakfast for my wife, and wake her up, then play with Oiva for another 15 minutes while she eats.
  • 8:00AM
    • Take tram to office.
  • 8:30AM
    • Make coffee, make a rough plan for the day.
  • 9:00AM
    • Work, until lunchtime which might be 1pm, 2pm, or even 3pm.
  • 5:00PM
    • Leave work, and take bus home.
    • Yes I go to work via tram, but come back via bus. There are reasons.
  • 5:40PM
    • Arrive home, and relax in peace for 20 minutes.
  • 6:00PM-7:00PM
    • Take Oiva for a walk, stop en route to relax in a hammock for 30 minutes reading a book.
  • 7:00-7:20PM
    • Feed Oiva his evening meal.
  • 7:30PM
    • Give Oiva his bath, then pass him over to my wife to put him to bed.
  • 7:30PM - 8:00pm
    • Relax
  • 8:00PM - 10:00PM
    • Deal with Oiva waking up, making noises, or being unsettled.
    • Try to spend quality time with my wife, watch TV, read a book, do some coding, etc.
  • 10:00PM ~ 11:30PM
    • Go to bed.

In short I'm responsible for Oiva from 6ish-8ish in the morning, then from 6PM-10PM (with a little break while he's put to bed.) There are some exceptions to this routine - for example I work from home on Monday/Friday afternoons, and Monday evenings he goes to his swimming classes. But most working-days are the same.

Weekends are a bit different. There I tend to take him 6AM-8AM, then 1PM-10PM with a few breaks for tea, and bed. At the moment we're starting to reach the peak-party time of year, which means weekends often involve negotiation(s) about which parent is having a party, and which parent is either leaving early, or not going out at all.

Today I have him all day, and it's awesome. He's just learned to say "Daddy" which makes any stress, angst or unpleasantness utterly worthwhile.

| 1 comment.


So I did a thing, then another thing.

Thursday, 3 August 2017

So I did start a project, to write a puppet-dashboard, it is functionally complete, but the next step is to allow me to raise alerts based on failing runs of puppet - in real-time.

(i.e. Now that I have a dashboard I wish to not use it. I want to be alerted to failures, without having to remember to go look for them. Something puppet-dashboard can't do ..)

In other news a while back I slipped in a casual note about having a brain scan done, here in sunny Helsinki.

One of the cool things about that experience, in addition to being told I wasn't going to drop dead that particular day, was that the radiologist told me that I could pay €25 to get a copy of my brain data in DICOM format.

I've not yet played with this very much, but I couldn't resist a brief animation:

  • See my brain.
    • Not the best quality, or the best detail, but damn. It is my brain.
    • I shall do better with more experimentation I think.
    • After I posted it my wife, a doctor, corrected me: That wasn't a gif of my brain, instead it was a gif of my skull. D'oh!

| No comments


So I'm considering a new project

Saturday, 29 July 2017

In the past there used to be a puppet-labs project called puppet-dashboard, which would let you see the state of your managed-nodes. Having even a very basic and simple "report user-interface" is pretty neat when you're pushing out a change, and you want to see it be applied across your fleet of hosts.

There are some other neat features, such as allowing you to identify failures easily, and see nodes that haven't reported-in recently.

This was spun out into a community-supported project which is largely stale:

Having a dashboard is nice, but the current state of the software is less good. It turns out that the implementation is pretty simple though:

  • Puppet runs on a node.
  • The node reports back to the puppet-master what happened.
  • The puppet-master can optionally HTTP-post that report to the reporting node.

The reporting node can thus receive real-time updates, and do what it wants with them. You can even sidestep the extra server if you wish:

  • The puppet-master can archive the reports locally.

For example on my puppet-master I have this:

  root@master /var/lib/puppet/reports # ls | tail -n4

Inside each directory is a bunch of YAML files which describe the state of the host, and the recipes that were applied. Parsing those is pretty simple, the hardest part would be making a useful/attractive GUI. But happily we have the existing one to "inspire" us.

I think I just need to write down a list of assumptions and see if they make sense. After all the existing installation(s) won't break, it's just a matter of deciding whether it is useful/worthwhile way to spend some time.

  • Assume you have 100+ hosts running puppet 4.x
  • Assume you want a broad overview:
    • All the nodes you're managing.
    • Whether their last run triggered a change, resulted in an error, or logged anything.
    • If so what changed/failed/was output?
  • For each individual run you want to see:
    • Rough overview.
  • Assume you don't want to keep history indefinitely, just the last 50 runs or so of each host.

Beyond that you might want to export data about the managed-nodes themselves. For example you might want a list of all the hosts which have "bash" installed on them. Or "All nodes with local user "steve"." I've written that stuff already, as it is very useful for auditing & etc.

The hard part about that is that to get the extra data you'll need to include a puppet module to collect it. I suspect a new dashboard would be broadly interesting/useful but unless you have that extra detail it might not be so useful. You can't point to a slightly more modern installation and say "Yes this is worth migrating to". But if you have extra meta-data you can say:

  • Give me a list of all hosts running wheezy.
  • Give me a list of all hosts running exim4 version 4.84.2-2+deb8u4.

And that facility is very useful when you have shellshock, or similar knocking at your door.

Anyway as a hacky start I wrote some code to parse reports, avoiding the magic object-fu that the YAML would usually invoke. The end result is this:

 root@master ~# dump-run
    Puppet Version: 4.8.2
    Runtime: 2.16
    Time:2017-07-29 18:13:04 +0000
            total -> 176
            skipped -> 2
            failed -> 0
            changed -> 3
            out_of_sync -> 3
            scheduled -> 0
            corrective_change -> 3
    Changed Resources
            Ssh_authorized_key[skx@shelob-s-fi] /etc/puppet/code/environments/production/modules/ssh_keys/manifests/init.pp:17
            Ssh_authorized_key[skx@deagol-s-fi] /etc/puppet/code/environments/production/modules/ssh_keys/manifests/init.pp:22
            Ssh_authorized_key[] /etc/puppet/code/environments/production/modules/ssh_keys/manifests/init.pp:27
    Skipped Resources
            Exec[clone sysadmin utils]
            Exec[update sysadmin utils]



bind9 considered harmful

Tuesday, 11 July 2017

Recently there was another bind9 security update released by the Debian Security Team. I thought that was odd, so I've scanned my mailbox:

  • 11 January 2017
    • DSA-3758 - bind9
  • 26 February 2017
    • DSA-3795-1 - bind9
  • 14 May 2017
    • DSA-3854-1 - bind9
  • 8 July 2017
    • DSA-3904-1 - bind9

So in the year to date there have been 7 months, in 3 of them nothing happened, but in 4 of them we had bind9 updates. If these trends continue we'll have another 2.5 updates before the end of the year.

I don't run a nameserver. The only reason I have bind-packages on my system is for the dig utility.

Rewriting a compatible version of dig in Perl should be trivial, thanks to the Net::DNS::Resolver module:

These are about the only commands I ever run:

dig -t a +short
dig -t aaaa +short
dig -t a @

I should do that then. Yes.



I've never been more proud

Wednesday, 5 July 2017

This morning I remembered I had a beefy virtual-server setup to run some kernel builds upon (when I was playing with Linux security moduels), and I figured before I shut it down I should use the power to run some fuzzing.

As I was writing some code in Emacs at the time I figured "why not fuzz emacs?"

After a few hours this was the result:

 deagol ~ $ perl -e 'print "`" x ( 1024 * 1024  * 12);' > t.el
 deagol ~ $ /usr/bin/emacs --batch --script ./t.el
 Segmentation fault (core dumped)

Yup, evaluating an lisp file caused a segfault, due to a stack-overflow (no security implications). I've never been more proud, even though I have contributed code to GNU Emacs in the past.

| No comments


Yet more linux security module craziness ..

Thursday, 29 June 2017

I've recently been looking at linux security modules. My first two experiments helped me learn:

My First module - whitelist_lsm.c

This looked for the presence of an xattr, and if present allowed execution of binaries.

I learned about the Kernel build-system, and how to write a simple LSM.

My second module - hashcheck_lsm.c

This looked for the presence of a "known-good" SHA1 hash xattr, and if it matched the actual hash of the file on-disk allowed execution.

I learned how to hash the contents of a file, from kernel-space.

Both allowed me to learn things, but both were a little pointless. They were not fine-grained enough to allow different things to be done by different users. (i.e. If you allowed "alice" to run "wget" you'd also allow www-data to do the same.)

So, assuming you wanted to do your security job more neatly what would you want? You'd want to allow/deny execution of commands based upon:

  • The user who was invoking them.
  • The path of the binary itself.

So your local users could run "bad" commands, but "www-data" (post-compromise) couldn't.

Obviously you don't want to have to recompile your kernel to change the rules of who can execute what. So you think to yourself "I'll write those rules down in a file". But of course reading a file from kernel-space is tricky. And parsing any list of rules, in a file, from kernel-space would prone to buffer-related problems.

So I had a crazy idea:

  • When a user attempts to execute a program.
  • Call back to user-space to see if that should be permitted.
    • Give the user-space binary the UID of the invoker, and the path to the command they're trying to execute.

Calling userspace? Every time a command is to be executed? Crazy. But it just might work.

One problem I had with this approach is that userspace might not even be available, when you're booting. So I setup a flag to enable this stuff:

# echo 1 >/proc/sys/kernel/can-exec/enabled

Now the kernel will invoke the following on every command:

/sbin/can-exec $UID $PATH

Because the kernel waits for this command to complete - as it reads the exit-code - you cannot execute any child-processes from it as you'd end up in recursive hell, but you can certainly read files, write to syslog, etc. My initial implementionation was as basic as this:

int main( int argc, char *argv[] )

  // Get the UID + Program
  int uid = atoi( argv[1] );
  char *prg = argv[2];

  // syslog
  openlog ("can-exec", LOG_CONS | LOG_PID | LOG_NDELAY, LOG_LOCAL1);
  syslog (LOG_NOTICE, "UID:%d CMD:%s", uid, prg );

  // root can do all.
  if ( uid == 0 )
    return 0;

  // nobody
  if ( uid == 65534 ) {
    if ( ( strcmp( prg , "/bin/sh" ) == 0 ) ||
         ( strcmp( prg , "/usr/bin/id" ) == 0 ) ) {
      fprintf(stderr, "Allowing 'nobody' access to shell/id\n" );
      return 0;

  fprintf(stderr, "Denied\n" );
  return -1;

Although the UIDs are hard-code it actually worked! Yay!

I updated the code to convert the UID to a username, then check executables via the file /etc/can-exec/$USERNAME.conf, and this also worked.

I don't expect anybody to actually use this code, but I do think I've reached a point where I can pretend I've written a useful (or non-pointless) LSM at last. That means I can stop.

| 1 comment.


Linux security modules, round two.

Sunday, 25 June 2017

So recently I wrote a Linux Security Module (LSM) which would deny execution of commands, unless an extended attribute existed upon the filesystem belonging to the executables.

The whitelist-LSM worked well, but it soon became apparent that it was a little pointless. Most security changes are pointless unless you define what you're defending against - your "threat model".

In my case it was written largely as a learning experience, but also because I figured it seemed like it could be useful. However it wasn't actually as useful because you soon realize that you have to whitelist too much:

  • The redis-server binary must be executable, to the redis-user, otherwise it won't run.
  • /usr/bin/git must be executable to the git user.

In short there comes a point where user alice must run executable blah. If alice can run it, then so can mallory. At which point you realize the exercise is not so useful.

Taking a step back I realized that what I wanted to to prevent was the execution of unknown/unexpected, and malicious binaries How do you identify known-good binaries? Well hashes & checksums are good. So for my second attempt I figured I'd not look for a mere "flag" on a binary, instead look for a valid hash.

Now my second LSM is invoked for every binary that is executed by a user:

  • When a binary is executed the sha1 hash is calculated of the files contents.
  • If that matches the value stored in an extended attribute the execution is permitted.
    • If the extended-attribute is missing, or the checksum doesn't match, then the execution is denied.

In practice this is the same behaviour as the previous LSM - a binary is either executable, because there is a good hash, or it is not, because it is missing or bogus. If somebody deploys a binary rootkit this will definitely stop it from executing, but of course there is a huge hole - scripting-languages:

  • If /usr/bin/perl is whitelisted then /usr/bin/perl /tmp/ will succeed.
  • If /usr/bin/python is whitelisted then the same applies.

Despite that the project was worthwhile, I can clearly describe what it is designed to achieve ("Deny the execution of unknown binaries", and "Deny binaries that have been modified"), and I learned how to hash a file from kernel-space - which was surprisingly simple.

(Yes I know about IMA and EVM - this was a simple project for learning purposes. Public-key signatures will be something I'll look at next/soon/later. :)

Perhaps the only other thing to explore is the complexity in allowing/denying actions based on the user - in a human-readable fashion, not via UIDs. So www-data can execute some programs, alice can run a different set of binaries, and git can only run /usr/bin/git.

Of course down that path lies apparmour, selinux, and madness..



Recent Posts

Recent Tags