Entries tagged markdown

Related tags: backups, chronicle, docker, pastebin, perl, random, redis, security, sqlite.

A simple Perl alternative to storing data in Redis

Friday, 16 December 2016

I continue to be a big user of Perl, and for many of my sites I avoid the use of MySQL which means that I largely store data in flat files, SQLite databases, or in memory via Redis.

One of my servers was recently struggling with RAM, and the suprising cause was "too much data" in Redis. (Surprising because I'd not been paying attention and seen how popular it was, and also because ASCII text compresses pretty well).

Read/Write speed isn't a real concern, so I figured I'd move the data into an SQLite database, but that would require rewriting the application.

The client library for Perl is pretty awesome, and simple usage looks like this:

# Connect to localhost.
my $r = Redis->new()

# simple storage
$r->set( "key", "value" );

# Work with sets
$r->sadd( "fruits", "orange" );
$r->sadd( "fruits", "apple" );
$r->sadd( "fruits", "blueberry" );
$r->sadd( "fruits", "banannanananananarama" );

# Show the set-count
print "There are " . $r->scard( "fruits" ) . " known fruits";

# Pick a random one
print "Here is a random one " . $r->srandmember( "fruits" ) . "\n";

I figured, if I ignored the Lua support and the other more complex operations, creating a compatible API implementation wouldn't be too hard. So rather than porting my application to using SQLite directly I could juse use a different client-library.

In short I change this:

use Redis;
my $r = Redis->new();

To this:

use Redis::SQLite;
my $r = Redis::SQLite->new();

And everything continues to work. I've implemented all the set-related functions except one, and a random smattering of the other simple operations.

The appropriate test-cases in the Redis client library (i.e. removing all references to things I didn't implement) pass, and my own new tests also make me confident.

It's obviously not a hard job, but it was a quick solution to a real problem and might be useful to others.

My image hosting site, and my markdown sharing site now both use this wrapper and seem to be performing well - but with more free RAM.

No doubt I'll add more of the simple primitives as time goes on, but so far I've done enough to be useful.

| No comments

 

If your code accepts URIs as input..

Monday, 12 September 2016

There are many online sites that accept reading input from remote locations. For example a site might try to extract all the text from a webpage, or show you the HTTP-headers a given server sends back in response to a request.

If you run such a site you must make sure you validate the schema you're given - also remembering to do that if you're sent any HTTP-redirects.

Really the issue here is a confusion between URL & URI.

The only time I ever communicated with Aaron Swartz was unfortunately after his death, because I didn't make the connection. I randomly stumbled upon the html2text software he put together, which had an online demo containing a form for entering a location. I tried the obvious input:

file:///etc/passwd

The software was vulnerable, read the file, and showed it to me.

The site gives errors on all inputs now, so it cannot be used to demonstrate the problem, but on Friday I saw another site on Hacker News with the very same input-issue, and it reminded me that there's a very real class of security problems here.

The site in question was http://fuckyeahmarkdown.com/ and allows you to enter a URL to convert to markdown - I found this via the hacker news submission.

The following link shows the contents of /etc/hosts, and demonstrates the problem:

http://fuckyeahmarkdown.example.com/go/?u=file:///etc/hosts&read=1&preview=1&showframe=0&submit=go

The output looked like this:

..
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
127.0.0.1 stage
127.0.0.1 files
127.0.0.1 brettt..
..

In the actual output of '/etc/passwd' all newlines had been stripped. (Which I now recognize as being an artifact of the markdown processing.)

UPDATE: The problem is fixed now.

| 9 comments.

 

Switched to using attic for backups

Friday, 19 December 2014

Even though seeing the word attic reminds me too much of leaking roofs and CVS, I've switched to using the attic backup tool.

I want a simple system which will take incremental backups, perform duplication-elimination (to avoid taking too much space), support encryption, and be fast.

I stopped using backup2l because the .tar.gz files were too annoying, and it was too slow. I started using obnam because I respect Lars and his exceptionally thorough testing-regime, but had to stop using it when things started getting "too slow".

I'll document the usage/installation in the future. For the moment the only annoyance is that it is contained in the Jessie archive, not the Wheezy one. Right now only 2/19 of my hosts are Jessie.

| 6 comments.

 

If this goes well I have a new blog engine

Wednesday, 17 September 2014

Assuming this post shows up then I'll have successfully migrated from Chronicle to a temporary replacement.

Chronicle is awesome, and despite a lack of activity recently it is not dead. (No activity because it continued to do everything I needed for my blog.)

Unfortunately though there is a problem with chronicle, it suffers from a bit of a performance problem which has gradually become more and more vexing as the nubmer of entries I have has grown.

When chronicle runs it :

  • It reads each post into a complex data-structure.
  • Then it walks this multiple times.
  • Finally it outputs a whole bunch of posts.

In the general case you rebuild a blog because you've made a entry, or received a new comment. There is some code which tries to use memcached for caching, but in general chronicle just isn't fast and it is certainly memory-bound if you have a couple of thousand entries.

Currently my test data-set contains 2000 entries and to rebuild that from a clean start takes around 4 minutes, which is pretty horrific.

So what is the alternative? What if you could parse each post once, add it to an SQLite database, and then use that for writing your output pages? Instead of the complex data-structure in-RAM and the need to parse a zillion files you'd have a standard/simple SQL structure you could use to build a tag-cloud, an archive, & etc. If you store the contents of the parsed-blog, along with the mtime of the source file you can update it if the entry is changed in the future, as I sometimes make typos which I only spot once Ive run make steve on my blog sources.

Not surprisingly the newer code is significantly faster if you have 2000+ posts. If you've imported the posts into SQLite the most recent entries are updated in 3 seconds. If you're starting cold, parsing each entry, inserting it into SQLite, and then generating the blog from scratch the build time is still less than 10 seconds.

The downside is that I've removed features, obviously nothing that I use myself. Most notably the calendar view is gone, as is the ability to use date-based URLs. Less seriously there is only a single theme, which is what is used upon this site.

In conclusion I've written something last night which is a stepping stone between the current chronicle and chronicle2 which will appear in due course.

PS. This entry was written in markdown, just because I wanted to be sure it worked.

| 9 comments.

 

My pastebin will now run under docker.

Monday, 17 February 2014

I've updated my markdown-pastebin site, to be a little cleaner, and to avoid spidering issues.

Previously every piece of uploaded text received an incrementing integer to describe it - which meant it was trivially easy for others to see how many pieces of text had been uploaded, and to spider all past uploads (unless the user deleted them).

Now each fresh paste receives a random UUID to describe it, and this means spidering is no longer feasible.

I've also posted the source code to Gitub so folk can report bugs, fork, etc:

That source code now includes a Dockerfile which allows you to quickly and easily build your own container running this wonderful service, and launch it without worrying about trashing your server ;)

Anyway other than the user-interface overhaul it is still as functional, or not, as it used to be!

| No comments

 

Pastebin site with markdown support

Sunday, 16 February 2014

Today I setup a new website:

Something I want, something I'll use, and something that might be useful to others?

| 4 comments.

 

Recent Posts

Recent Tags