Entries tagged thttpd

Related tags: apache, apache2, docker, graphite, libapache2-mod-rpaf, lighttpd, lumail, migration, nginx, node-reverse-proxy, nodejs, random, robots, roomba.

Changing my stack ..

Saturday, 22 February 2014

For the past few years I've hosted all my websites in a "special" way:

  • Each website runs under its own UID.
  • Each website runs a local thttpd / webserver.
  • Each server binds to localhost, on a high-port.
    • My recipe is that the port of the webserver for user "foo" is "$(id -u foo)".
  • On the front-end I have a proxy to route connections to the appropriate back-end, based on the Host header.

The webserver I chose initially was thttpd, which gained points because it was small, auditable, and simple to launch. Something like this was my recipe:

#!/bin/sh
exec thttpd -D -C /srv/steve.org.uk/thttpd.conf

Unfortunately thttpd suffers from a few omissions, most notably it doesn't support either "Keep-Alive", or "Compression" (i.e. gzip/deflate), so it would always be slower than I wanted.

On the plus side it was simple to use, supported CGI scripts, and served me well once I'd patched it to support X-Forwarded-For for IPv6 connections.

Recently I setup a server optimization site and was a little disappointed that the site itself scored poorly on Google's page-speed test. So I removed thttpd for that site, and replacing it with nginx. The end result was that the site scored 98/100 on Google's page-speed test. Progress. Unfortunately I couldn't do that globally because nginx doesn't support old-school plain CGI scripts.

So last night I removed both nginx and thttpd, and now every site on my box is hosted using lighttpd.

There weren't too many differences in the setup, though I had to add some rules to add caching for *.css, etc, and some of my code needed updating.

Beyond that today I've setup a dedicated docker host - which allows me to easily spin up containers. Currently I've got graphite monitoring for my random hosts, and a wordpress guest for plugin development/testing.

Now to go back to reading Off to be the wizard .. - not as good as Rick Cook's wizardry series (which got less good as time went on, but started off strongly), but still entertaining.

| 2 comments.

 

Migrations and movements

Saturday, 8 June 2013

Recently I wanted to cleanup my "main" remote machine. It is a system I've had for many years, which started off as a i386 KVM-guest and was later migrated in place to an AMD64 installation.

These days the host runs:

  • Mail for many domains, via QPSMTPD and ms-lite.
  • Website hosting for many domains. Each site running under a dedicated per-UID thttpd instance, behind a node reverse proxy.
  • IRC for kirsi.

The plan was originally to move the "mail stuff" to a new (wheezy) guest. I aborted that after discovering that the mutt-patched package has (IMHO) regressed.

So today I spun up a new virtual machine, and configured it to host websites.

Thus far I've migrated steve.org.uk, and lumail.org to the new host. Both simple sites that are built via my static-site-generator. Migration mostly involved configuring the proxy and the thttpd instances - then using rsync to migrate the content.

I've renamed my current host, which was previously www.steve.org.uk and is now ssh.steve.org.uk, and pushed DNS changes.

If all is smooth and happy I'll slowly migrate the rest of the sites. Fingers crossed this will be painless and I'll have a clean split between "login + mail" and "websites".

| 3 comments.

 

IPv6 and thttpd

Friday, 8 April 2011

thttpd is a simple, small, portable, fast, and secure HTTP server which supports both IPv4 & IPv6.

However one noticable omission in the handling of requests for thttpd is support for the X-Forwarded-For header - which is even noted upo nthe thttpd wikipedia entry.

There is a simple patch floating around which claims to fix this; but as I belatedly noticed tonight it only works for IPv4.

If you look at libhttpd.h of the thttpd source you'll see this:

typedef union {
    struct sockaddr sa;
    struct sockaddr_in sa_in;
#ifdef USE_IPV6
    struct sockaddr_in6 sa_in6;
    struct sockaddr_storage sa_stor;
#endif /* USE_IPV6 */
    } httpd_sockaddr;

As a quick hack I updated this structure to add the following member:

    char real_ip[200];

Now I could update that member when a client connects and later update it as a result of any X-Forwarded-For: headers which might be present in incoming requests. Finally I updated the logging to use this field rather than anything else and the job was complete.

Without this work if you're running as a proxy and you receive an IPv6 connection you'll see it reported as 127.0.0.1.

I'm sure my approach isn't as clean as it could be - due to the extra member- but it will suffice for now.

ObQuote: "This gun you're holding belonged to your father; he could conduct a symphony orchestra with it. " - Wanted

| 3 comments.

 

New software always causes surprises

Monday, 21 March 2011

I recently deployed my node.js proxy server, removing all traces of Apache2 from my main server. During the course of this transition I discovered:

Bugs in my code

Not unexpected, in all honesty.

There were two main issues; the first was relating to how I handled the 304 response, the second was relating to how I performed rewrites for my mercurial repository vhost.

Bugs in node.js

Given how new node.js is there wasn't a huge surprise here either, although I thought I'd been good testing against 0.2.x. As it turned out I needed to run the more recent 0.4.x to avoid a couple of issues:

In short I have a backported node.js package for Squeeze which is almost worthless. I'll update it in the near future. For the moment:

cd node-v0.4.3/
./configure --without-ssl --prefix=/opt/node-0.4.3 && make && make install
ln -fs /opt/node-0.4.3  /opt/node
Oddities in thttpd
thttpd is what actually runs my websites and I discovered during some extended debugging sessions that it just does not like HTTP requests starting with a doubled "/" character.

For example this works fine:

wget http://www.acme.com/software/thttpd/

But this fails:

wget http://www.acme.com//software/thttpd/

Previously it seems that Apache was (silently) fixing this up before it proxied requests. Now I have to do it myself, no big thing, but still a surprise.

All in all it was worth it to be able to run:

dpkg --purge libapache2-mod-rpaf \
             apache2.2-common \
             apache2.2-bin \
             apache2-utils \
             apache2-mpm-prefork \
             apache2 \
             libaprutil1-dbd-sqlite3 \
             libaprutil1-ldap \
             libaprutil1 \
             libapr1

ObQuote: "Bad news. The fog's getting thicker." - Airplane!

| 4 comments.

 

Proxies and Robots

Sunday, 29 August 2010

I don't like repeating myself, but I'm very tempted to past my mini-review of the Roomba Vacuum Cleaner robot into this blog.

Instead I will practise restraint and summerise:

  • It works. It works well.
  • It is a little noisy, but despite this it is great fun to watch.
  • It takes a long time to clean a few rooms, due to the "random walk" it performs. Despite this it is still fun to watch and actually useful.
  • Have I mentioned I grin like a child when it doesn't crash into things, and hums away past me on the floor?

£250. Worth. Every. Penny.

In more Debian-friendly news I've been fighting HTTP proxies today. I've noticed a lot of visitors to the various websites I host are logged as 127.0.0.1 - which is an irritation. My personal machine looks like this:

Internet -> Apache listening on *:80 -> thttpd on 127.0.0.1:xxxx

(This has been documented previously - primarily it is a security restriction. It means I can run per-UID web-servers.)

I had previous added a patch to thttpd to honour the X-Forwarded-For: header - so that it would receive the correct remote address passed on from Apache. However the fact that so many visitors are logged as coming from 127.0.0.1 meant it wasn't working 100% correctly, and I wanted to understand why.

Today I used ngrep to capture the incoming headers and the source of the problem became apparent:

skx:~# ngrep  -d lo  X-For ' port 1007'
..
T 127.0.0.1:41886 -> 127.0.0.1:1007 [AP]
  GET /about/ HTTP/1.1..Host: images.steve.org.uk..If-Modified-Since: Mon, 07
   Jun 2010 15:24:33 GMT..User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-U
  S; rv:1.9.1.10) Gecko/20100701 Iceweasel/3.5.10 (like Firefox/3.5.10)..Acce
  pt: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept
  -Language: en-us,en;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: I
  SO-8859-1,utf-8;q=0.7,*;q=0.7..Referer: http://images.steve.org.uk/2009/11/
  20/img_0471.html..X-Forwarded-For: 127.0.0.1, 11.22.33.123..Cache-Control:
  max-age=0..X-Forwarded-Host: images.steve.org.uk..X-Forwarded-Server: image
  s.steve.org.uk..Connection: Keep-Alive....

I bolded the important input; just in case that didn't jump out it was:

X-Forwarded-For: 127.0.0.1, 11.22.33.123

My patch to thttpd was making it read the first address, rather than the second - which meant that requests were being logged as coming from 127.0.0.1 and avoiding my efforts to track sources.

Now I understand the problem - The X-Forwarded-Host header is being tweaked by a proxy server, such as Squid, upstream of my server.

For the moment I've updated the thttpd patch to read:

        else if ( strncasecmp( buf, "X-Forwarded-For:", 16 ) == 0 )
          { char *tmp = NULL;

            /* Jump to the header-value  */
            cp = &buf[16];
            cp += strspn( cp, " \t" );

            /*
             * If the first change is a 127.0.0.1, then we'll
             * jump over it.  Cope with Squid, et al.
             */
            if (  ( tmp = strstr( cp, "127.0.0.1, " ) ) != NULL )
              cp = tmp + strlen( "127.0.0.1, " );

            /* Parse the IP */
           inet_aton( cp, &(hc->client_addr.sa_in.sin_addr) );
        }

That's not perfect, but the alternative would be:

  • Install a patched version of libapache2-mod-rpaf to add a X-HONEST-REMOTE-IP
  • Update thttpd to use that header.

Or something equally hacky and security-by-obscurity-alike.

Really I just want a simple way of always getting the correct remote IP. Shouldn't be so hard, should it? *pout*.

ObQuote: "You don't mess with fate, Peanut. People die when they are meant to die. There's no discussion. There's no negotiation. When life's done, it's done." - Dead Like Me.

| 4 comments.

 

This is my land. All that pass through pay me tribute.

Monday, 1 March 2010

As previously mentioned I've switched my webserving over to a mixture of apache2 & thttpd.

I chose thttpd as it is simple to configure for my needs, and supports the execution of CGI scripts. Some of the other simple webservers available to Debian's current stable release (such as nginx) don't support CGI so they were ruled out.

Of course prior to choosing thttpd I looked at the state of the Debian package. Distressingly the package has no current maintainer and has several bugs open, including some that have been open for several years without comment.

I've just made my second upload fixing a couple of bugs, including ones that I could see affecting myself, but now I'm done with it.

In conclusion:

  • I've fixed a few bugs.
  • I suspect that many of the open bugs are 100% unreproducable and should be closed after checking with the submitter.
  • The package could do with a volunteer to maintain it.

On the one hand it is "just another webserver", on the other hand it is genuinely small, simple to configure, and has a couple of compelling features (CGI + throttling).

So. Go. Adopt. Maintain.

Pretty please...

ObFilm: Red Sonia

| 4 comments.

 

If you were a comic book character, what character would you be?

Friday, 19 February 2010

I've been overhauling the way that I am host a number of virtual websites upon my main box. Partly to increase security, and partly for a cleaner separation or roles, ownership, and control. (In general everything on my box is "mine", but some things are "ours"...)

After a fair amount of experimentation I decided that I wasn't willing or able to rewrite all my Apache mod_rewrite rules just yet. So my interim plan was to update each existing virtual host:

  • Add a dedicated user & group to run it under.
  • Launch it via a minimal server listening upon the loopback adapter.
  • Have Apache 2.x proxy through to it.
    • Expanding any mod_rewrite rules prior to the proxying.

To make it clear what the users were for I decided that every hosting-user would have an "s-" prefix. So the virtual host "static.steve.org.uk" was initially going to be served by the s-static user.

The thttpd configuration file would look like this, and would be located in /etc/thttpd/sites/static.steve.org.uk:

host=127.0.0.1
port=1008
dir=/home/www/static.steve.org.uk/htdocs/
chroot
user=s-static
throttles=/etc/thttpd/throttle.conf
logfile=/home/www/static.steve.org.uk/logs/thttpd.log
pidfile=/home/www/static.steve.org.uk/pid/file

(I wrote a trivial script to stop/start all the sites en mass, and removed the default thttpd init script, logrotation job, and similar things.)

How did I decide which port to run this instance under? By taking the UID of the user:

steve@skx:~$ id s-static
uid=1008(s-static) gid=1009(s-static) groups=1009(s-static)

With this in place I could then update the Apache configuration file from serving the site directly to merely proxying to the back-end server:

<VirtualHost *>
    ServerName  static.steve.org.uk

    # Proxy ACL
    <Proxy *>
        Order allow,deny
        Allow from all
    </Proxy>

    # Proxy directives
    ProxyPass          /   http://localhost:1008/
    ProxyPassReverse   /   http://localhost:1008/
    ProxyPreserveHost on
</VirtualHost>

So was that all there is to it? Sadly not. There were a couple of minor issues, some of which were:

cronjobs

I have various cron-jobs in my main steve account which previously updated blog indexes, etc. (I use namazu2 to make my blog searchable.)

I had to change the ownership of the existing indexes, the scripts themselves, and move the cronjob to the new s-blog user.

cross-user dependencies

I run a couple of sites which pull in content from other locations. For example a couple of list summaries, and archives. These are generally fed from a ~/.procmail snippet under my primary login.

Since my primary login no longer owns the web-tree it is no longer able to update things directly. Instead I had to duplicate a couple of subscriptions and move this work under the UID of the site-owner.

I'm no longer running apache

For a day or two I'd forgotten I was using the apache facility to include snippets in my site; such as links to my wishlist.

Since I'm not using Apache in the back-end server-parsed files no longer work. Happily I'm using a simple template-based setup for my main sites, so I updated the template-parser to understand "##include ./path/to/file". For example this source file produces my donation page.

The upshot is my "static" site is even more static, which is a good thing.

uploads are harder

Several of my domains host entirely static content which is generated on my main desktop machine, and then uploaded via rsync post-build.

I had to add some more accounts and configure SSH keys, then update the uploading routines/Makefiles appropriately. Not a major annoyance, but suddenly my sshd_config file has gone from "PermitUser steve,backup" to including many additional accounts.

The single biggest pain was handling my my mercurial repositories - overhauling that took a bit of creativity to ensure that nothing was broken for existing or new checkouts. I wish that a backport of mercurial-server was trivial because I'd love to be using that.

In general though watching the thttpd logs has been sufficient to spot problems. I had to tweak things a little to generate statistics properly, but otherwise all is good.

Why thttpd? Well small, lightweight, and the ability to run CGI scripts. Something missing from nginx for example.

I'm still aiming to remove apache2 from the front-end - it is mostly just a dumb proxy, but it does perform some ACL operations and expand mod_rewrite rules. I could port those to another engine .. but not today.

The most likely candidates are nginx, perlbal, or lighttpd - each of these should be capable of doing simple ACL checks, and performing mod_rewrite-like rules.

ObFilm: Mallrats

| 5 comments.

 

Recent Posts

Recent Tags