Now that I've got a citizen-ID, a pair of Finnish bank accounts, and have enrolled in a Finnish language-course (due to start next month) I guess I can go back to looking at object stores, and replicated filesystems.
To recap my current favourite, despite the lack of documentation, is the Camlistore project which is written in Go.
Looking around there are lots of interesting projects being written in Go, and so is my next one the seaweedfs, which despite its name is not a filesystem at all, but a store which is accessed via HTTP.
Installation is simple, if you have a working go-lang environment:
go get github.com/chrislusf/seaweedfs/go/weed
Once that completes you'll find you have the executable bin/weed placed beneath your $GOPATH. This single binary is used for everything though it is worth noting that there are distinct roles:
- A key concept in weed is "volumes". Volumes are areas to which files are written. Volumes may be replicated, and this replication is decided on a per-volume basis, rather than a per-upload one.
- Clients talk to a master. The master notices when volumes spring into existance, or go away. For high-availability you can run multiple masters, and they elect the real master (via RAFT).
In our demo we'll have three hosts one, the master,
two and three which are storage nodes. First of all we start the master:
root@one:~# mkdir /node.info
root@one:~# weed master -mdir /node.info -defaultReplication=001
Then on the storage nodes we start them up:
root@two:~# mkdir /data;
root@two:~# weed volume -dir=/data -max=1 -mserver=one.our.domain:9333
Then the second storage-node:
root@three:~# mkdir /data;
root@three:~# weed volume -dir=/data -max=1 -mserver=one.our.domain:9333
At this point we have a master to which we'll talk (on port :9333), and a pair of storage-nodes which will accept commands over :8080. We've configured replication such that all uploads will go to both volumes. (The -max=1 configuration ensures that each volume-store will only create one volume each. This is in the interest of simplicity.)
Uploading content works in two phases:
- First tell the master you wish to upload something, to gain an ID in response.
- Then using the upload-ID actually upload the object.
We'll do that like so:
laptop ~ $ curl -X POST http://one.our.domain:9333/dir/assign
client ~ $ curl -X PUT -F file=@/etc/passwd http://192.168.1.101:8080/1,06c3add5c3
In the first command we call /dir/assign, and receive a JSON response which contains the IPs/ports of the storage-nodes, along with a "file ID", or fid. In the second command we pick one of the hosts at random (which are the IPs of our storage nodes) and make the upload using the given ID.
If the upload succeeds it will be written to both volumes, which we can see directly by running strings on the files beneath /data on the two nodes.
The next part is retrieving a file by ID, and we can do that by asking the master server where that ID lives:
client ~ $ curl http://one.our.domain:9333/dir/lookup?volumeId=1,06c3add5c3
Or, if we prefer we could just fetch via the master - it will issue a redirect to one of the volumes that contains the file:
client ~$ curl http://one.our.domain:9333/1,06c3add5c3
<a href="http://192.168.1.100:8080/1,06c3add5c3">Moved Permanently</a>
If you follow redirections then it'll download, as you'd expect:
client ~ $ curl -L http://one.our.domain:9333/1,06c3add5c3
That's about all you need to know to decide if this is for you - in short uploads require two requests, one to claim an identifier, and one to use it. Downloads require that your storage-volumes be publicly accessible, and will probably require a proxy of some kind to make them visible on :80, or :443.
A single "weed volume .." process, which runs as a volume-server can support multiple volumes, which are created on-demand, but I've explicitly preferred to limit them here. I'm not 100% sure yet whether it's a good idea to allow creation of multiple volumes or not. There are space implications, and you need to read about replication before you go too far down the rabbit-hole. There is the notion of "data centres", and "racks", such that you can pretend different IPs are different locations and ensure that data is replicated across them, or only within-them, but these choices will depend on your needs.
Writing a thin middleware/shim to allow uploads to be atomic seems
simple enough, and there are options to allow exporting the data from the volumes as .tar files, so I have no undue worries about data-storage.
This system seems reliable, and it seems well designed, but people keep saying "I'm not using it in production because .. nobody else is" which is an unfortunate problem to have.
Anyway, I like it. The biggest omission is really authentication. All files are public if you know their IDs, but at least they're not sequential ..