About Archive Tags RSS Feed

 

A final update on redisfs

4 March 2011 21:50

I think I'm done with the redis filesystem for the moment. It does everything I need it to do, although I am curious to see how much faster it could go if it were to use non-blocking writes that isn't a major concern.

There is only one missing feature I'm planning to play with - and that is the ability to implement snapshots.

As a refresher redis is a key+value store, which mostly uses system memory. I've built a simple FUSE filesystem on top of that (now with symlink support!) and so all the file contents, meta-data, and similar is stored in memory.

Implementing snapshots could be done by two different routes:

Copying all keys and their values, under a new name.

For example right now, by default, all the filesystem entries in the root directory are stored beneath the key "SKX:/" - where "SKX" is the key prefix.

Assume I copy each existing key, and the associated value(s), giving them a new prefix such as "SKX2:" I can mount the filesystem against that prefix - and we've got a point-in-time snapshot.

Serialising all keys & values

Redis has a primitive which allows you to determine the names of keys at runtime. Given that all my filesystem keys have a prefix ("SKX:" by default) it wouldn't be difficult to find them, and serialise them.

This would require more effort to re-import and re-mount, but it should be portable across hosts.

Anyway assuming I get this right we'll have a filesystem which is replication-friendly and snapshot-able. A fun combination.

ObQuote: "We had been everywhere. We had really seen nothing. " - Lolita

| 7 comments

 

Comments on this entry

icon Andrew at 02:03 on 6 March 2011
http://profiles.google.com/andmalc

Thanks - seems to work fine. One thing I don't get: how or where is file content saved?

icon Steve Kemp at 03:02 on 6 March 2011
http://www.steve.org.uk/

For each new object you create, be it a file, a directory, or a symlink, a new number will be allocated. (See the source code function "get_next_inode()".)

The new entry will have a lot of keys created for it, in Redis, to hold different pieces of information. These keys will hold data such as the owner, the group ID, the access time, the creation time, and so on.

The very first number you create, once you mount the filesystem, will be 1. The second will be 2, and so on. And this number will be part of the keys stored in redis. So for example you might see you have keys like this:

  • SKX:INODE:1:GID - The file owner, for file number 1.
  • SKX:INODE:6:ATIME - The access-time of the file, for file number 6.

Some keys will be present for all directory entries. Others are type-specific. (You'll see "SKX:INODE:4:TYPE", for example, which will hold one of "file", "dir", or "linK".)

For a file? The contents of that file are stored in the key "SKX:INODE:??:DATA" with "SKX:INODE:??:SIZE" holding the size of that data.

If you get bored you can play around with:

$ redis-cli keys \*

That'll show you the names of each key stored in redis at that moment in time. Then:

$ redis-cli get SKX:INODE:1:NAME

To see the name of the first file, and so on.

Hope that helps.


icon yaarg at 15:49 on 6 March 2011

Nice. Out of interest, what are you using this for?

icon Steve Kemp at 08:59 on 7 March 2011
http://www.steve.org.uk/

At the moment I'm using this to store tracker information in, for a global distributed tracking client/server.

The tracker allows file lookups, but requires a shared storage area to function.

icon Steve Kemp at 09:06 on 7 March 2011

As I've already described there are lots of keys for file entries. Those same keys are used for subdirectories.

The only difference is that I also use a "SET" of directory members. If there are three files in the "/ directory" I'll have a set called:


skx:/

That set will contain entries "2", "4", "1". Which will allow for all the lookups on the keys I've mentioned previously. e.g. "skx:INODE:2:NAME".


icon Andrew at 16:09 on 7 March 2011

@Steve: thanks for your detailed answer above. I'd mounted redisfs and used touch to create some test files. Of course, no file data = no *DATA key. Apparently I need more coffee.

I ran redisfs in debug mode and saw fs_create and fs_write called when adding a file to the filesystem. I'd be interested in creating files using Python, so I suppose the way to do this is to create a Python extension to call these functions, is that correct?

icon Steve Kemp at 17:54 on 7 March 2011

I know almost nothing about Python, but if you were wanting to create files and "inject" them into the fileystem I'm sure you could use a redis client library & python code to do so.

Perhaps the saner approach might be to combine a Python redis client and a python FUSE client - and combine the two - in the same way that I did with the C code.