Minor success has been achieved in reimplimenting mod_rewrite

28 December 2010 21:50

So yesterday I mentioned the mod_rewrite-compatible proxy server. Today I've spent several hours getting to grips with that.

I've progressed far enough along that the trivial cases are handled, as the following test case shows:

TestParamteredMatch (CuTest * tc)
     * We pretend this came in via the network.
    char *request = "GET /login/steve.kemp/secret#password HTTP/1.1\n\n";

     * This is the mod_rewrite rule we're going to test.
    char *rule    = "RewriteRule

    int res = 0;

    /* Parse the HTTP request */
    struct http_request *req = http_request_new (request);
    CuAssertPtrNotNull (tc, req);

    /* Ensure it looks sane. */
    CuAssertStrEquals(tc, "/login/steve.kemp/secret#password", req->path );

    /* Create the rewrite rule */
    struct rewrite_rule *r = rewrite_rule_new (rule);
    CuAssertPtrNotNull (tc, r);

    /* Assert it contains what we think it should. */
    CuAssertStrEquals(tc, "^/login/(.*)/(.*)/*$", r->pattern );

    /* Apply - expect success (==1) */
    res = rewrite_rule_apply( r, req );
    CuAssertIntEquals (tc, 1, res );

    /* Ensure path is updated. */
    CuAssertStrEquals(tc, "/cgi-bin/index.cgi?mode=login;lname=steve.kemp;lpass=secret#password", req->path );

    free_http_request (req);
    free_rewrite_rule (r);

So all is good? Sadly not.

I was expecting to handle a linked list of simple rules, but I've now realised that this isn't sufficient. Consider the following two (real) examples:

#  If the path is /robots.txt and the hostname isn't repository.steve.org.uk
# then redirect to the master one.
RewriteCond %{http_host} !^repository\.steve\.org\.uk
RewriteRule /robots.txt$  http://repository.steve.org.uk/robots.txt [R=permanent,L]

#  Request for :  http://foo.repository.steve.org.uk 
#  becomes:  http://repository.steve.org.uk/cgi-bin/hgwebdir.cgi/foo/file/tip
RewriteCond %{http_host} .
RewriteCond %{http_host} !^repository.steve.org.uk [NC]
RewriteCond %{http_host} ^([^.]+)\.repository.steve.org.uk [NC]
RewriteRule ^/$ http://repository.steve.org.uk/cgi-bin/hgwebdir.cgi/%1/file/tip [QSA,L]

So rather than having a simple linked list of rules for each domain I need to have a list of rules - each of which might in turn contain sub-rules. In terms of parsing this is harder than I'd like because it means I need to maintain state to marry up the RewriteCond & RewriteRules.

Still the problem isn't insurmountable and I'm pleased with the progress I've made. Currently I can implement enough of mod_rewrite that I could handle all of my existing sites except the single site I have with the complex rule demonstrated above.

(In all honesty I guess I could simplify my setup by dropping the wildcard hostname handling for the repository.steve.org.uk name, but I do kinda like it, and it makes for simple canonical mercurial repositories.)

