A final update on redisfs

Friday, 4 March 2011

I think I'm done with the redis filesystem for the moment. It does everything I need it to do, although I am curious to see how much faster it could go if it were to use non-blocking writes that isn't a major concern.

There is only one missing feature I'm planning to play with - and that is the ability to implement snapshots.

As a refresher redis is a key+value store, which mostly uses system memory. I've built a simple FUSE filesystem on top of that (now with symlink support!) and so all the file contents, meta-data, and similar is stored in memory.

Implementing snapshots could be done by two different routes:

Copying all keys and their values, under a new name.

For example right now, by default, all the filesystem entries in the root directory are stored beneath the key "SKX:/" - where "SKX" is the key prefix.

Assume I copy each existing key, and the associated value(s), giving them a new prefix such as "SKX2:" I can mount the filesystem against that prefix - and we've got a point-in-time snapshot.

Serialising all keys & values

Redis has a primitive which allows you to determine the names of keys at runtime. Given that all my filesystem keys have a prefix ("SKX:" by default) it wouldn't be difficult to find them, and serialise them.

This would require more effort to re-import and re-mount, but it should be portable across hosts.

Anyway assuming I get this right we'll have a filesystem which is replication-friendly and snapshot-able. A fun combination.

ObQuote: "We had been everywhere. We had really seen nothing. " - Lolita

| 7 comments.

 

Comments On This Entry

[gravitar] Andrew

Submitted at 02:03:28 on 6 march 2011

Thanks - seems to work fine. One thing I don't get: how or where is file content saved?

[author] Steve Kemp

Submitted at 03:02:32 on 6 march 2011

For each new object you create, be it a file, a directory, or a symlink, a new number will be allocated. (See the source code function "get_next_inode()".)

The new entry will have a lot of keys created for it, in Redis, to hold different pieces of information. These keys will hold data such as the owner, the group ID, the access time, the creation time, and so on.

The very first number you create, once you mount the filesystem, will be 1. The second will be 2, and so on. And this number will be part of the keys stored in redis. So for example you might see you have keys like this:

  • SKX:INODE:1:GID - The file owner, for file number 1.
  • SKX:INODE:6:ATIME - The access-time of the file, for file number 6.

Some keys will be present for all directory entries. Others are type-specific. (You'll see "SKX:INODE:4:TYPE", for example, which will hold one of "file", "dir", or "linK".)

For a file? The contents of that file are stored in the key "SKX:INODE:??:DATA" with "SKX:INODE:??:SIZE" holding the size of that data.

If you get bored you can play around with:

$ redis-cli keys \*

That'll show you the names of each key stored in redis at that moment in time. Then:

$ redis-cli get SKX:INODE:1:NAME

To see the name of the first file, and so on.

Hope that helps.


[gravitar] yaarg

Submitted at 15:49:05 on 6 march 2011

Nice. Out of interest, what are you using this for?

[author] Steve Kemp

Submitted at 08:59:21 on 7 march 2011

At the moment I'm using this to store tracker information in, for a global distributed tracking client/server.

The tracker allows file lookups, but requires a shared storage area to function.

[author] Steve Kemp

Submitted at 09:06:47 on 7 march 2011

As I've already described there are lots of keys for file entries. Those same keys are used for subdirectories.

The only difference is that I also use a "SET" of directory members. If there are three files in the "/ directory" I'll have a set called:


skx:/

That set will contain entries "2", "4", "1". Which will allow for all the lookups on the keys I've mentioned previously. e.g. "skx:INODE:2:NAME".


[gravitar] Andrew

Submitted at 16:09:28 on 7 march 2011

@Steve: thanks for your detailed answer above. I'd mounted redisfs and used touch to create some test files. Of course, no file data = no *DATA key. Apparently I need more coffee.

I ran redisfs in debug mode and saw fs_create and fs_write called when adding a file to the filesystem. I'd be interested in creating files using Python, so I suppose the way to do this is to create a Python extension to call these functions, is that correct?

[author] Steve Kemp

Submitted at 17:54:02 on 7 march 2011

I know almost nothing about Python, but if you were wanting to create files and "inject" them into the fileystem I'm sure you could use a redis client library & python code to do so.

Perhaps the saner approach might be to combine a Python redis client and a python FUSE client - and combine the two - in the same way that I did with the C code.

 

Comments are closed on posts which are more than ten days old.

Recent Posts

Recent Tags