For the past few years I've been running a simple service to block blog/comment-spam, which is (currently) implemented as a simple JSON API over HTTP, with a minimal core and all the logic in a series of plugins.
One obvious thing I wasn't doing until today was paying attention to the anchor-text used in hyperlinks, for example:
<a href="http://fdsf.example.com/">buy viagra</a>
Blocking on the anchor-text is less prone to false positives than blocking on keywords in the comment/message bodies.
Unfortunately there seem to exist no simple nodejs modules for extracting all the links, and associated anchors, from a random Javascript string. So I had to write such a module, but .. given how small it is there seems little point in sharing it. So I guess this is one of the reasons why there often large gaps in the module ecosystem.
(Equally some modules are essentially applications; great that the authors shared, but virtually unusable, unless you 100% match their problem domain.)
I've written about this before when I had to construct, and publish, my own cidr-matching module.
Anyway expect an upload soon, currently I "parse" HTML and BBCode. Possibly markdown to follow, since I have an interest in markdown.