Entries posted in May 2015

A brief examination of tahoe-lafs

Thursday, 28 May 2015

Continuing the theme from the last post I made, I've recently started working my way down the list of existing object-storage implementations.

tahoe-LAFS is a well-established project which looked like a good fit for my needs:

  • Simple API.
  • Good handling of redundancy.

Getting the system up and running, on four nodes, was very simple. Setup a single/simple "introducer" which is a well-known node that all hosts can use to find each other, and then setup four deamons for storage.

When files are uploaded they are split into chunks, and these chunks are then distributed amongst the various nodes. There are some configuration settings which determine how many chunks files are split into (10 by default), how many chunks are required to rebuild the file (3 by default) and how many copies of the chunks will be created.

The biggest problem I have with tahoe is that there is no rebalancing support: Setup four nodes, and the space becomes full? You can add more nodes, new uploads go to the new nodes, while old ones stay on the old. Similarly if you change your replication-counts because you're suddenly more/less paranoid this doesn't affect existing nodes.

In my perfect world you'd distribute blocks around pretty optimistically, and I'd probably run more services:

  • An introducer - To allow adding/removing storage-nodes on the fly.
  • An indexer - to store the list of "uploads", meta-data, and the corresponding block-pointers.
  • The storage-nodes - to actually store the damn data.

The storage nodes would have the primitives "List all blocks", "Get block", "Put block", and using that you could ensure that each node had sent its data to at least N other nodes. This could be done in the background.

The indexer would be responsible for keeping track of which blocks live where, and which blocks are needed to reassemble upload N. There's probably more that it could do.

| 2 comments.

 

On de-duplicating uploaded file-content.

Thursday, 7 May 2015

This evening I've been mostly playing with removing duplicate content. I've had this idea for the past few days about object-storage, and obviously in that context if you can handle duplicate content cleanly that's a big win.

The naive implementation of object-storage involves splitting uploaded files into chunks, storing them separately, and writing database-entries such that you can reassemble the appropriate chunks when the object is retrieved.

If you store chunks on-disk, by the hash of their contents, then things are nice and simple.

The end result is that you might upload the file /etc/passwd, split that into four-byte chunks, and then hash each chunk using SHA256.

This leaves you with some database-entries, and a bunch of files on-disk:

/tmp/hashed/ef267892ee080862c96a8d2d05de62f48e20f0875f27379e7d58c73ea4455bf1
/tmp/hashed/a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921
..
/tmp/hashed/3805b0245bc8375be7125ae228eef711552ac082ffb9bf8756e2964a2393a9de

In my toy-code I wrote out the data in 4-byte chunks, which is grossly ineffeciant. But the value of using such small pieces is that there is liable to be a lot of collisions, and that means we save-space. It is a trade-off.

So the main thing I was experimenting with was the size of the chunks. If you make them too small you lose I/O due to the overhead of writing out so many small files, but you gain because collisions are common.

The rough testing I did involved using chunks of 16, 32, 128, 255, 512, 1024, 2048, and 4096 bytes. As sizes went up the overhead shrank, but also so did the collisions.

Unless you could handle the case of users uploading a lot of files like /bin/ls which are going to collide 100% of the time with prior uploads using larger chunks just didn't win as much as I thought they would.

I wrote a toy server using Sinatra & Ruby, which handles the splitting/hashing/and stored block-IDs in SQLite. It's not so novel given that it took only an hour or so to write.

The downside of my approach is also immediately apparent. All the data must live on a single machine - so that reassmbly works in the simple fashion. That's possible, even with lots of content if you use GlusterFS, or similar, but it's probably not a great approach in general. If you have large capacity storage avilable locally then this might would well enough for storing backups, etc, but .. yeah.

| 7 comments.

 

A weekend of migrations

Monday, 4 May 2015

This weekend has been all about migrations:

Host Migrations

I've migrated several more systems to the Jessie release of Debian GNU/Linux. No major surprises, and now I'm in a good state.

I have 18 hosts, and now 16 of them are running Jessie. One of them I won't touch for a while, and the other is a KVM-host which runs about 8 guests - so I won't upgraded that for a while (because I want to schedule the shutdown of the guests for the host-reboot).

Password Migrations

I've started migrating my passwords to pass, which is a simple shell wrapper around GPG. I generated a new password-managing key, and started migrating the passwords.

I dislike that account-names are stored in plaintext, but that seems known and unlikely to be fixed.

I've "solved" the problem by dividing all my accounts into "Those that I wish to disclose post-death" (i.e. "banking", "amazon", "facebook", etc, etc), and those that are "never to be shared". The former are migrating, the latter are not.

(Yeah I'm thinking about estates at the moment, near-death things have that effect!)

| No comments

 

Recent Posts

Recent Tags