Entries tagged go

Related tags: file-hosting, github, golang, object-storage, sinatra.

All about sharing files easily

Sunday, 13 September 2015

Although I've been writing a bit recently about file-storage, this post is about something much more simple: Just making a random file or two available on an ad-hoc basis.

In the past I used to have my email and website(s) hosted on the same machine, and that machine was well connected. Making a file visible just involved running ~/bin/publish, which used scp to write a file beneath an apache document-root.

These days I use "my computer", "my work computer", and "my work laptop", amongst other hosts. The SSH-keys required to access my personal boxes are not necessarily available on all of these hosts. Add in firewall constraints and suddenly there isn't an obvious way for me to say "Publish this file online, and show me the root".

I asked on twitter but nothing useful jumped out. So I ended up writing a simple server, via sinatra which would allow:

  • Login via the site, and a browser. The login-form looks sexy via bootstrap.
  • Upload via a web-form, once logged in. The upload-form looks sexy via bootstrap.
  • Or, entirely seperately, with HTTP-basic-auth and a HTTP POST (i.e. curl)

This worked, and was even secure-enough, given that I run SSL if you import my CA file.

But using basic auth felt like cheating, and I've been learning more Go recently, and I figured I should start taking it more seriously, so I created a small repository of learning-programs. The learning programs started out simply, but I did wire up a simple TOTP authenticator.

Having TOTP available made me rethink things - suddenly even if you're not using SSL having an eavesdropper doesn't compromise future uploads.

I'd also spent a few hours working out how to make extensible commands in go, the kind of thing that lets you run:

cmd sub-command1 arg1 arg2
cmd sub-command2 arg1 .. argN

The solution I came up with wasn't perfect, but did work, and allow the seperation of different sub-command logic.

So suddenly I have the ability to run "subcommands", and the ability to authenticate against a time-based secret. What is next? Well the hard part with golang is that there are so many things to choose from - I went with gorilla/mux as my HTTP-router, then I spend several hours filling in the blanks.

The upshot is now that I have a TOTP-protected file upload site:

publishr init    - Generates the secret
publishr secret  - Shows you the secret for import to your authenticator
publishr serve   - Starts the HTTP daemon

Other than a lack of comments, and test-cases, it is complete. And stand-alone. Uploads get dropped into ./public, and short-links are generated for free.

If you want to take a peak the code is here:

The only annoyance is the handling of dependencies - which need to be "go got ..". I guess I need to look at godep or similar, for my next learning project.

I guess there's a minor gain in making this service available via golang. I've gained protection against replay attacks, assuming non-SSL environment, and I've simplified deployment. The downside is I can no longer login over the web, and I must use curl, or similar, to upload. Acceptible tradeoff.

| 2 comments.

 

A brief look at the weed file store

Monday, 10 August 2015

Now that I've got a citizen-ID, a pair of Finnish bank accounts, and have enrolled in a Finnish language-course (due to start next month) I guess I can go back to looking at object stores, and replicated filesystems.

To recap my current favourite, despite the lack of documentation, is the Camlistore project which is written in Go.

Looking around there are lots of interesting projects being written in Go, and so is my next one the seaweedfs, which despite its name is not a filesystem at all, but a store which is accessed via HTTP.

Installation is simple, if you have a working go-lang environment:

go get github.com/chrislusf/seaweedfs/go/weed

Once that completes you'll find you have the executable bin/weed placed beneath your $GOPATH. This single binary is used for everything though it is worth noting that there are distinct roles:

  • A key concept in weed is "volumes". Volumes are areas to which files are written. Volumes may be replicated, and this replication is decided on a per-volume basis, rather than a per-upload one.
  • Clients talk to a master. The master notices when volumes spring into existance, or go away. For high-availability you can run multiple masters, and they elect the real master (via RAFT).

In our demo we'll have three hosts one, the master, two and three which are storage nodes. First of all we start the master:

root@one:~# mkdir /node.info
root@one:~# weed master -mdir /node.info -defaultReplication=001

Then on the storage nodes we start them up:

root@two:~# mkdir /data;
root@two:~# weed volume -dir=/data -max=1  -mserver=one.our.domain:9333

Then the second storage-node:

root@three:~# mkdir /data;
root@three:~# weed volume -dir=/data -max=1 -mserver=one.our.domain:9333

At this point we have a master to which we'll talk (on port :9333), and a pair of storage-nodes which will accept commands over :8080. We've configured replication such that all uploads will go to both volumes. (The -max=1 configuration ensures that each volume-store will only create one volume each. This is in the interest of simplicity.)

Uploading content works in two phases:

  • First tell the master you wish to upload something, to gain an ID in response.
  • Then using the upload-ID actually upload the object.

We'll do that like so:

laptop ~ $ curl -X POST http://one.our.domain:9333/dir/assign
{"fid":"1,06c3add5c3","url":"192.168.1.100:8080","publicUrl":"192.168.1.101:8080","count":1}

client ~ $ curl -X PUT -F file=@/etc/passwd  http://192.168.1.101:8080/1,06c3add5c3
{"name":"passwd","size":2137}

In the first command we call /dir/assign, and receive a JSON response which contains the IPs/ports of the storage-nodes, along with a "file ID", or fid. In the second command we pick one of the hosts at random (which are the IPs of our storage nodes) and make the upload using the given ID.

If the upload succeeds it will be written to both volumes, which we can see directly by running strings on the files beneath /data on the two nodes.

The next part is retrieving a file by ID, and we can do that by asking the master server where that ID lives:

client ~ $ curl http://one.our.domain:9333/dir/lookup?volumeId=1,06c3add5c3
{"volumeId":"1","locations":[
 {"url":"192.168.1.100:8080","publicUrl":"192.168.1.100:8080"},
 {"url":"192.168.1.101:8080","publicUrl":"192.168.1.101:8080"}
]}

Or, if we prefer we could just fetch via the master - it will issue a redirect to one of the volumes that contains the file:

client ~$ curl http://one.our.domain:9333/1,06c3add5c3
<a href="http://192.168.1.100:8080/1,06c3add5c3">Moved Permanently</a>

If you follow redirections then it'll download, as you'd expect:

client ~ $ curl -L http://one.our.domain:9333/1,06c3add5c3
root:x:0:0:root:/root:/bin/bash
..

That's about all you need to know to decide if this is for you - in short uploads require two requests, one to claim an identifier, and one to use it. Downloads require that your storage-volumes be publicly accessible, and will probably require a proxy of some kind to make them visible on :80, or :443.

A single "weed volume .." process, which runs as a volume-server can support multiple volumes, which are created on-demand, but I've explicitly preferred to limit them here. I'm not 100% sure yet whether it's a good idea to allow creation of multiple volumes or not. There are space implications, and you need to read about replication before you go too far down the rabbit-hole. There is the notion of "data centres", and "racks", such that you can pretend different IPs are different locations and ensure that data is replicated across them, or only within-them, but these choices will depend on your needs.

Writing a thin middleware/shim to allow uploads to be atomic seems simple enough, and there are options to allow exporting the data from the volumes as .tar files, so I have no undue worries about data-storage.

This system seems reliable, and it seems well designed, but people keep saying "I'm not using it in production because .. nobody else is" which is an unfortunate problem to have.

Anyway, I like it. The biggest omission is really authentication. All files are public if you know their IDs, but at least they're not sequential ..

| No comments

 

Recent Posts

Recent Tags