About Archive Tags RSS Feed


Proxies and Robots

29 August 2010 21:50

I don't like repeating myself, but I'm very tempted to past my mini-review of the Roomba Vacuum Cleaner robot into this blog.

Instead I will practise restraint and summerise:

  • It works. It works well.
  • It is a little noisy, but despite this it is great fun to watch.
  • It takes a long time to clean a few rooms, due to the "random walk" it performs. Despite this it is still fun to watch and actually useful.
  • Have I mentioned I grin like a child when it doesn't crash into things, and hums away past me on the floor?

£250. Worth. Every. Penny.

In more Debian-friendly news I've been fighting HTTP proxies today. I've noticed a lot of visitors to the various websites I host are logged as - which is an irritation. My personal machine looks like this:

Internet -> Apache listening on *:80 -> thttpd on

(This has been documented previously - primarily it is a security restriction. It means I can run per-UID web-servers.)

I had previous added a patch to thttpd to honour the X-Forwarded-For: header - so that it would receive the correct remote address passed on from Apache. However the fact that so many visitors are logged as coming from meant it wasn't working 100% correctly, and I wanted to understand why.

Today I used ngrep to capture the incoming headers and the source of the problem became apparent:

skx:~# ngrep  -d lo  X-For ' port 1007'
T -> [AP]
  GET /about/ HTTP/1.1..Host: images.steve.org.uk..If-Modified-Since: Mon, 07
   Jun 2010 15:24:33 GMT..User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-U
  S; rv: Gecko/20100701 Iceweasel/3.5.10 (like Firefox/3.5.10)..Acce
  pt: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept
  -Language: en-us,en;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: I
  SO-8859-1,utf-8;q=0.7,*;q=0.7..Referer: http://images.steve.org.uk/2009/11/
  max-age=0..X-Forwarded-Host: images.steve.org.uk..X-Forwarded-Server: image
  s.steve.org.uk..Connection: Keep-Alive....

I bolded the important input; just in case that didn't jump out it was:


My patch to thttpd was making it read the first address, rather than the second - which meant that requests were being logged as coming from and avoiding my efforts to track sources.

Now I understand the problem - The X-Forwarded-Host header is being tweaked by a proxy server, such as Squid, upstream of my server.

For the moment I've updated the thttpd patch to read:

        else if ( strncasecmp( buf, "X-Forwarded-For:", 16 ) == 0 )
          { char *tmp = NULL;

            /* Jump to the header-value  */
            cp = &buf[16];
            cp += strspn( cp, " \t" );

             * If the first change is a, then we'll
             * jump over it.  Cope with Squid, et al.
            if (  ( tmp = strstr( cp, ", " ) ) != NULL )
              cp = tmp + strlen( ", " );

            /* Parse the IP */
           inet_aton( cp, &(hc->client_addr.sa_in.sin_addr) );

That's not perfect, but the alternative would be:

  • Install a patched version of libapache2-mod-rpaf to add a X-HONEST-REMOTE-IP
  • Update thttpd to use that header.

Or something equally hacky and security-by-obscurity-alike.

Really I just want a simple way of always getting the correct remote IP. Shouldn't be so hard, should it? *pout*.

ObQuote: "You don't mess with fate, Peanut. People die when they are meant to die. There's no discussion. There's no negotiation. When life's done, it's done." - Dead Like Me.



Comments on this entry

icon Adam at 18:52 on 29 August 2010

Nice ngrep option: -Wbyline


icon Alex at 22:04 on 29 August 2010

> I don't like repeating myself, but I'm very tempted to past
> my mini-review of the Roomba Vacuum Cleaner robot into this blog.
If you already did a review, could you post a link?



icon Steve Kemp at 22:12 on 29 August 2010

Adam: Thanks for the tip.

Alex: The review isn't public, its on a local site for local people. So sadly I'd have to copy & paste in an unpleasant fashion.

icon Gernot Hassenpflug at 07:05 on 3 September 2010

I had a similar issue and discovered that X-Forwarded-For can be a list of as many values as proxy servers were in use for that connection, and there is no way to tell if the values are reliable, but the last value of the list is supposed to be the originating client IP address if the values are true.