Entries tagged nodejs

Related tags: bbcode, birthday, cidr, forks, html, ms-lite, node-reverse-proxy, qpsmptd, random, redisfs, software, thttpd.

Putting the finishing touches to a nodejs library

Friday, 11 April 2014

For the past few years I've been running a simple service to block blog/comment-spam, which is (currently) implemented as a simple JSON API over HTTP, with a minimal core and all the logic in a series of plugins.

One obvious thing I wasn't doing until today was paying attention to the anchor-text used in hyperlinks, for example:

  <a href="http://fdsf.example.com/">buy viagra</a>

Blocking on the anchor-text is less prone to false positives than blocking on keywords in the comment/message bodies.

Unfortunately there seem to exist no simple nodejs modules for extracting all the links, and associated anchors, from a random Javascript string. So I had to write such a module, but .. given how small it is there seems little point in sharing it. So I guess this is one of the reasons why there often large gaps in the module ecosystem.

(Equally some modules are essentially applications; great that the authors shared, but virtually unusable, unless you 100% match their problem domain.)

I've written about this before when I had to construct, and publish, my own cidr-matching module.

Anyway expect an upload soon, currently I "parse" HTML and BBCode. Possibly markdown to follow, since I have an interest in markdown.

| No comments


A couple of small tweaks

Sunday, 3 April 2011

Today I made a couple of small tweaks to my systems. First of all I updated my email handling to allow further restrictions to be applied to incoming mail.

Generally the way I handle incoming mail is to first of all test the recipient, then apply spam checks. To allow mail for a new localpart to be received I'll run this:

cd /srv/example.com/users/valid
touch localpart

The result of the file /srv/example.com/users/valid/localpart existing is that mail is accepted for localpart@example.com.

As of today that still works - any file beneath users/valid allows the appropriate localpart to receive email. But now a directory indicates an ACL. So I can create:

rm    /srv/example.com/users/valid/root
mkdir /srv/example.com/users/valid/root/
touch /srv/example.com/users/valid/root/
touch /srv/example.com/users/valid/root/

Now only the named IP addresses may mail root@example.com.

Finally I updated the proxy which is handling incoming access to my websites, so that I actually take advantage of the regular expression support.

    '(xen-hosting.net|www.xen-hosting.net)': {
        'rules': {
            '^/': 'http://kvm-hosting.org/'

    '(xen-hosting.org|www.xen-hosting.org)': {
        'rules': {
            '^/': 'http://kvm-hosting.org/'

Using regular expressions meant that I didn't have seperate matches for xen-hosting.org and www.xen-hosting.org. (Obviously I could have combined all four hosts into one regexp, but it looks cleaner this way.)

ObQuote: "Nobody likes a perky goth" - Blood Ties.

| No comments


New software always causes surprises

Monday, 21 March 2011

I recently deployed my node.js proxy server, removing all traces of Apache2 from my main server. During the course of this transition I discovered:

Bugs in my code

Not unexpected, in all honesty.

There were two main issues; the first was relating to how I handled the 304 response, the second was relating to how I performed rewrites for my mercurial repository vhost.

Bugs in node.js

Given how new node.js is there wasn't a huge surprise here either, although I thought I'd been good testing against 0.2.x. As it turned out I needed to run the more recent 0.4.x to avoid a couple of issues:

In short I have a backported node.js package for Squeeze which is almost worthless. I'll update it in the near future. For the moment:

cd node-v0.4.3/
./configure --without-ssl --prefix=/opt/node-0.4.3 && make && make install
ln -fs /opt/node-0.4.3  /opt/node
Oddities in thttpd
thttpd is what actually runs my websites and I discovered during some extended debugging sessions that it just does not like HTTP requests starting with a doubled "/" character.

For example this works fine:

wget http://www.acme.com/software/thttpd/

But this fails:

wget http://www.acme.com//software/thttpd/

Previously it seems that Apache was (silently) fixing this up before it proxied requests. Now I have to do it myself, no big thing, but still a surprise.

All in all it was worth it to be able to run:

dpkg --purge libapache2-mod-rpaf \
             apache2.2-common \
             apache2.2-bin \
             apache2-utils \
             apache2-mpm-prefork \
             apache2 \
             libaprutil1-dbd-sqlite3 \
             libaprutil1-ldap \
             libaprutil1 \

ObQuote: "Bad news. The fog's getting thicker." - Airplane!



My node-reverse-proxy is both stable and public

Sunday, 20 March 2011

I posted a brief snippet of code on Friday which was my initial stab at a reverse HTTP proxy in Javascript (using node.js).

Over the past couple of days I've tidied it up, added a command line parser, and made it flexible enough that it works for me.

My node reverse HTTP proxy is now both documented ( a little ) and available for further eyeballs.

Usage is pretty much:

$ node ./node-reverse-proxy.js --config ./path/to/config.file.js

The configuration file defines lists of virtual hosts along with the destination back-ends to proxy to - which is usually going to be a server running upon a high port on the loopback adapter, but might not be.

In addition to that we can perform rewrites such as:

  * Handler for wildcard host: *.repository.steve.org.uk
         * Rewrites for static files - these will be handled via a
         * separate virtual host.
        'rules': {
            '^/robots.txt':  'http://repository.steve.org.uk/robots.txt',
            '^/favicon.ico': 'http://repository.steve.org.uk/favicon.ico',

That says requests for http://chronicle.repository.steve.org.uk/robots.txt will be redirected to http://repository.steve.org.uk/robots.txt.

Alternatively we can invoke javascript for each request matching a pattern:

     * static.steve.org.uk will mostly proxy to
     * but files beneath /private/ have an IP-based ACL.
        host: 'localhost',
        port: '1008',

        'functions': {
            '/private': (function(orig_host, vhost,req,res) {
                var remote = req.connection.remoteAddress;;

                if ( ( remote != "" ) &&
                     ( remote != "" ) &&
                     ( remote != "") &&
                     ( remote != "" ) )
                    res.write( "Denied access to " + req.url  + " from " + remote );

Fun stuff. It was live for my server, replacing apache, for a few hours today. I need to add some trivial HTTP Basic-Auth handling then it will go back.

Otherwise I hope it is vaguely useful to others, and that the provided examples explain things neatly.

ObQuote: "Only one thing alive with less than four legs can hear this frequency" - Superman.



nodejs is fun

Friday, 18 March 2011

A while back I was working on a mod_rewrite compatible proxy server written in C. The reason for this is that my current webhost uses Apache2 in front ofa number of thttpd processes, and I'd like to remove apache and use something smaller/faster/neater.

Dividing things up I'm running about ten domains, and only around half of them use mod_rewrite rules - small enough perhaps to port the rules, large enough to make it annoying.

Upon reflection I think the thing to do is to replace apache with javascript - via node.js. Writing proxies with node.js is almost ridiculously simple - in fact doing anything HTTP-like is very very simple thanks to the bundled libraries.

Time will tell whether this is a waste of time or not, but I'm confident I could listen upon *:80 and route requests to localhost:N for a few domains.

As a trivial toy I wrote a simple transforming proxy last night which performed simple rewrites as it passed traffic about.

Another fun, and possibly useful, thing I put together was a node.js port of the simple httpstat.us website. That server would have been easy to write in any number of languages, but in node it was almost too easy.

Update - here is my rewriting node.js-based reverse-proxy.

ObQuote: "I just need you to stop being nice to me unless you're gonna marry me. " - He's just not that into you.

| No comments


A few forks.

Sunday, 13 March 2011

Although I promised in my previous entry that I'd made my last mention of the redisfs - replication friendly redis-based filesystem I have to disappoint.

Ben Sykes informed me that he'd made a fork of redisfs, called shredisfs. I like forks. Forks are good, and this particular one was very welcome - from the README:

Steve's original managed around 250k per second for writes on my test machine, this version does about 15MB a second writes.


Read speed on the test machine here now is around 180MB/sec..

Needless to say I "stole" the improvements and rolled them into my original release. Thanks all round to those of you that submitted bug reports, suggestions, and codez.

I also had a release out briefly which used zLib to compress the values of keys stored in memory. Unfortunately that lead to a net-slowdown for files which were bigger than a single block - due to the overhead of file-system appends which translated to: "fetch", "compress", "append", and "decompress".

Finally on the subject of qpsmtpd, my favourite SMTP-server - the exceptionally talented Matt Sergeant released a proof of concept rewrite in Javascript. Using nodejs this new SMTP server is heavily asynchronous and you can find it online under the name Haraka.

Although my javascript-fu is weak I'm very impressed by the codebase out there so far, and I'd expect good things of it in the future. Even if it does use Javascript and not my beloved Perl.

Finally I'm a year older. But birthdays aren't important.

ObQuote: "Do you wanna know what makes all my candy taste so special? " - Epic Movie.

| 1 comment.


Recent Posts

Recent Tags