About Archive Tags RSS Feed


Entries tagged robots

Proxies and Robots

29 August 2010 21:50

I don't like repeating myself, but I'm very tempted to past my mini-review of the Roomba Vacuum Cleaner robot into this blog.

Instead I will practise restraint and summerise:

  • It works. It works well.
  • It is a little noisy, but despite this it is great fun to watch.
  • It takes a long time to clean a few rooms, due to the "random walk" it performs. Despite this it is still fun to watch and actually useful.
  • Have I mentioned I grin like a child when it doesn't crash into things, and hums away past me on the floor?

£250. Worth. Every. Penny.

In more Debian-friendly news I've been fighting HTTP proxies today. I've noticed a lot of visitors to the various websites I host are logged as - which is an irritation. My personal machine looks like this:

Internet -> Apache listening on *:80 -> thttpd on

(This has been documented previously - primarily it is a security restriction. It means I can run per-UID web-servers.)

I had previous added a patch to thttpd to honour the X-Forwarded-For: header - so that it would receive the correct remote address passed on from Apache. However the fact that so many visitors are logged as coming from meant it wasn't working 100% correctly, and I wanted to understand why.

Today I used ngrep to capture the incoming headers and the source of the problem became apparent:

skx:~# ngrep  -d lo  X-For ' port 1007'
T -> [AP]
  GET /about/ HTTP/1.1..Host: images.steve.org.uk..If-Modified-Since: Mon, 07
   Jun 2010 15:24:33 GMT..User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-U
  S; rv: Gecko/20100701 Iceweasel/3.5.10 (like Firefox/3.5.10)..Acce
  pt: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept
  -Language: en-us,en;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: I
  SO-8859-1,utf-8;q=0.7,*;q=0.7..Referer: http://images.steve.org.uk/2009/11/
  max-age=0..X-Forwarded-Host: images.steve.org.uk..X-Forwarded-Server: image
  s.steve.org.uk..Connection: Keep-Alive....

I bolded the important input; just in case that didn't jump out it was:


My patch to thttpd was making it read the first address, rather than the second - which meant that requests were being logged as coming from and avoiding my efforts to track sources.

Now I understand the problem - The X-Forwarded-Host header is being tweaked by a proxy server, such as Squid, upstream of my server.

For the moment I've updated the thttpd patch to read:

        else if ( strncasecmp( buf, "X-Forwarded-For:", 16 ) == 0 )
          { char *tmp = NULL;

            /* Jump to the header-value  */
            cp = &buf[16];
            cp += strspn( cp, " \t" );

             * If the first change is a, then we'll
             * jump over it.  Cope with Squid, et al.
            if (  ( tmp = strstr( cp, ", " ) ) != NULL )
              cp = tmp + strlen( ", " );

            /* Parse the IP */
           inet_aton( cp, &(hc->client_addr.sa_in.sin_addr) );

That's not perfect, but the alternative would be:

  • Install a patched version of libapache2-mod-rpaf to add a X-HONEST-REMOTE-IP
  • Update thttpd to use that header.

Or something equally hacky and security-by-obscurity-alike.

Really I just want a simple way of always getting the correct remote IP. Shouldn't be so hard, should it? *pout*.

ObQuote: "You don't mess with fate, Peanut. People die when they are meant to die. There's no discussion. There's no negotiation. When life's done, it's done." - Dead Like Me.