Entries tagged http

Related tags: c, discovery, doh, experiments, json, load-balancing, mod_rewrite, proxy, reinventing the wheel.

Discovering back-end servers automatically?

Tuesday, 11 March 2014

Recently I've been pondering how to do service discovery.

Pretend you have a load balancer which accepts traffic and routes incoming requests to different back-ends. The loadbalancer might be pound, varnish, haproxy, nginx, or similar. The back-ends might be node applications, apache, or similar.

The typical configuration of the load-balancer will read:

# forward

# backends
backend web1  { .host = ""; }
backend web2  { .host = ""; }
backend web3  { .host = ""; }

#  afterword

I've seen this same setup in many situations, and while it can easily be imagined that there might be "random HTTP servers" on your (V)LAN which shouldn't receive connections it seems like a pain to keep updating the backends.

Using UDP/multicast broadcasts it is trivial to announce "Hey I'm a HTTP-server with the name 'foo'", and it seems to me that this should allow seamless HTTP load-balancing.

To be more explicit - this is normal:

  • The load-balancer listens for HTTP requests, and forwards them to back-ends.
  • When back-ends go away they stop receiving traffic.

What I'd like to propose is another step:

  • When a new back-end advertises itself with the tag "foo" it should be automatically added and start to receive traffic.

i.e. This allows backends to be removed from service when they go offline but also to be added when they come online. Without the load-balancer needing its configuration to be updated.

This means you'd not give a static list of back-ends to your load-balancer, instead you'd say "Route traffic to any service that adfvertises itself with the tag 'foo'.".

VLANS, firewalls, multicast, udp, all come into play, but in theory this strikes me as being useful, obvious, and simple.

(Failure cases? Well if the "announcer" dies then the backend won't get traffic routed to it. Just like if the backend were offline. And clearly if a backend is announced, but not receiving HTTP-requests it would be dropped as normal.)

If I get the time this evening I'll sit down and look at some load-balancer source code to see if any are written in such a way that I could add this "broadcast discovery" as a plugin/minor change.



Minor success has been achieved in reimplimenting mod_rewrite

Tuesday, 28 December 2010

So yesterday I mentioned the mod_rewrite-compatible proxy server. Today I've spent several hours getting to grips with that.

I've progressed far enough along that the trivial cases are handled, as the following test case shows:

TestParamteredMatch (CuTest * tc)
     * We pretend this came in via the network.
    char *request = "GET /login/steve.kemp/secret#password HTTP/1.1\n\n";

     * This is the mod_rewrite rule we're going to test.
    char *rule    = "RewriteRule

    int res = 0;

    /* Parse the HTTP request */
    struct http_request *req = http_request_new (request);
    CuAssertPtrNotNull (tc, req);

    /* Ensure it looks sane. */
    CuAssertStrEquals(tc, "/login/steve.kemp/secret#password", req->path );

    /* Create the rewrite rule */
    struct rewrite_rule *r = rewrite_rule_new (rule);
    CuAssertPtrNotNull (tc, r);

    /* Assert it contains what we think it should. */
    CuAssertStrEquals(tc, "^/login/(.*)/(.*)/*$", r->pattern );

    /* Apply - expect success (==1) */
    res = rewrite_rule_apply( r, req );
    CuAssertIntEquals (tc, 1, res );

    /* Ensure path is updated. */
    CuAssertStrEquals(tc, "/cgi-bin/index.cgi?mode=login;lname=steve.kemp;lpass=secret#password", req->path );

    free_http_request (req);
    free_rewrite_rule (r);

So all is good? Sadly not.

I was expecting to handle a linked list of simple rules, but I've now realised that this isn't sufficient. Consider the following two (real) examples:

#  If the path is /robots.txt and the hostname isn't repository.steve.org.uk
# then redirect to the master one.
RewriteCond %{http_host} !^repository\.steve\.org\.uk
RewriteRule /robots.txt$  http://repository.steve.org.uk/robots.txt [R=permanent,L]

#  Request for :  http://foo.repository.steve.org.uk 
#  becomes:  http://repository.steve.org.uk/cgi-bin/hgwebdir.cgi/foo/file/tip
RewriteCond %{http_host} .
RewriteCond %{http_host} !^repository.steve.org.uk [NC]
RewriteCond %{http_host} ^([^.]+)\.repository.steve.org.uk [NC]
RewriteRule ^/$ http://repository.steve.org.uk/cgi-bin/hgwebdir.cgi/%1/file/tip [QSA,L]

So rather than having a simple linked list of rules for each domain I need to have a list of rules - each of which might in turn contain sub-rules. In terms of parsing this is harder than I'd like because it means I need to maintain state to marry up the RewriteCond & RewriteRules.

Still the problem isn't insurmountable and I'm pleased with the progress I've made. Currently I can implement enough of mod_rewrite that I could handle all of my existing sites except the single site I have with the complex rule demonstrated above.

(In all honesty I guess I could simplify my setup by dropping the wildcard hostname handling for the repository.steve.org.uk name, but I do kinda like it, and it makes for simple canonical mercurial repositories.)

ObQuote: - 300

| No comments


This weekend I have mostly been parsing HTTP

Monday, 27 December 2010

A few weeks ago I was lamenting the lack of of a reverse proxy that understood Apache's mod_rewrite syntax.

On my server I have a bunch of thttpd processes, each running under their own UID, each listening on local ports. For example I have this running:

thttpd -C /etc/thttpd/sites.enabled/blog.steve.org.uk

This is running under the Unix-user "s-blog", with UID 1015, and thus is listening on

steve@steve:~$ id s-blog
uid=1015(s-blog) gid=1016(s-blog) groups=1016(s-blog)

steve@steve:~$ lsof -i :1015
thttpd  26293 s-blog    0u  IPv4 1072632       TCP localhost:1015 (LISTEN)

Anyway in front of these thttpd processes I have Apache. It does only two things:

  • Expands mod_rewrite rules.
  • Serves as a proxy to the back-end thttpd processes.

Yes other webservers could be run in front of these processes, and yes other webservers have their own rewrite-rule-syntax. I'm going to say "Lala lala can't hear you". Why? Because mod_rewrite is the defacto standard. It is assumed, documented, and included with a gazillion projects from wikipedia to wordpress to ...

So this weekend I decided I'd see what I needed to do to put together a simple proxy that only worked for reverse HTTP, and understood Apache's mod_rewrite rules.

I figure there are three main parts to such a beast:

Be a network server

Parse configuration file, accept connections, use fork(), libevevent(), or similar such that you ultimately receive HTTP requests..

Process HTTP Requests, rewriting as required

Once you have a parsed HTTP-request you need to test against each rule for the appropriate destination domain. Rewriting the request as appopriate.


Send your (potentially modified) request to the back-end, and then send the response back to the client.

Thus far I've written code which takes this:

GET /etc/passwd?root=1 HTTP/1.0
Host: foo.example.com:80
Accept-Language: en-us,en-gb?q=0.5
Refer: http://foo.bar.com/
Keep-Alive: 300
Connection: keep-alive

Then turns it into this:

struct http_request
   * Path being requested. "/index.html", etc.
  char *path;

   * Query string
  char *qs;

   * Method being requested "GET/POST/PUT/DELETE/HEAD/etc".
  char *method;

   * A linked list of headers such as "Referer: foo",
   * "Connection: close"
  struct http_headers *headers;

There are methods for turning that back to a string, (so that you can send it on to the back-end), finding headers such as "Referer:", and so on.

The parser is pretty minimal C and takes only a "char *buffer" to operate on. It has survived a lot of malicious input, as proved by a whole bunch of test cases.

My next job is to code up the mod_rewrite rule-processor to apply a bunch of rules to one of these objects - updating the struct as we go. Assuming that I can get that part written cleanly over the next week or two then I'll be happy and can proceed to write the networking parts of the code - both the initial accepting, and the proxying to the back-end.

In terms of configuration I'm going to assume something like:

/etc/proxy/global.conf                    : Global directives.

/etc/proxy/steve.org.uk/back-end          : Will contain
/etc/proxy/steve.org.uk/rewrite.conf      : Will contain the domain-wide rules

/etc/proxy/blog.steve.org.uk/back-end     : Will contain
/etc/proxy/blog.steve.org.uk/rewrite.conf : Will contain the domain-wide rules


That seems both sane & logical to me.

ObQuote: "I shall control the fate of the world... " - Day Watch.

ObRandom: Coding in C is pleasant again.

| No comments


Fire and wind come from the sky, from the gods of the sky.

Sunday, 28 February 2010

Recently I was flirting with the idea of creating an online game, but I got distracted by wondering how to make the back-end more flexible.

To communicate the state of the game to N connected clients I figured I needed some kind of server which would accept "join"/"quit" requests and then make changes available.

To that end I came up with the idea that a client would make requests via HTTP such as:


This would regard the originating client as part of a new chess game, for example, and return a UID identifying the "game channel".


This will retrieve a list of all events which had occurred in the game which had not already been sent.

(Here 1-2-3-4 is obviously the UID previously allocated.)


This would submit the move "move" to the server.

After mulling this over for a while it seemed like a great reusable solution, I'd make an initial "join" request, then repeated polling with the allocated UID would allow game moves to be observed. All using JSON over HTTP as the transport.

It was only this morning that I realised I'd have saved a lot of time if I'd just proxied requests to a private IRC server, as the functionality is essentially the same.

Still I'm sure this pattern of "join"/"poll"/"quit" could be useful for a lot of dynamic websites, even in the non-gaming world. So although the idea was mostly shelved it was an interesting thing to have experimented with.


ObFilm: Conan The Barbarian



Recent Posts

Recent Tags