10 August 2015 21:50
Now that I've got a citizen-ID, a pair of Finnish bank accounts, and have enrolled in a Finnish language-course (due to start next month) I guess I can go back to looking at object stores, and replicated filesystems.
To recap my current favourite, despite the lack of documentation, is the Camlistore project which is written in Go.
Looking around there are lots of interesting projects being written in Go, and so is my next one the seaweedfs, which despite its name is not a filesystem at all, but a store which is accessed via HTTP.
Installation is simple, if you have a working go-lang environment:
go get github.com/chrislusf/seaweedfs/go/weed
Once that completes you'll find you have the executable bin/weed placed beneath your $GOPATH. This single binary is used for everything though it is worth noting that there are distinct roles:
- A key concept in weed is "volumes". Volumes are areas to which files are written. Volumes may be replicated, and this replication is decided on a per-volume basis, rather than a per-upload one.
- Clients talk to a master. The master notices when volumes spring into existance, or go away. For high-availability you can run multiple masters, and they elect the real master (via RAFT).
In our demo we'll have three hosts one, the master,
two and three which are storage nodes. First of all we start the master:
root@one:~# mkdir /node.info
root@one:~# weed master -mdir /node.info -defaultReplication=001
Then on the storage nodes we start them up:
root@two:~# mkdir /data;
root@two:~# weed volume -dir=/data -max=1 -mserver=one.our.domain:9333
Then the second storage-node:
root@three:~# mkdir /data;
root@three:~# weed volume -dir=/data -max=1 -mserver=one.our.domain:9333
At this point we have a master to which we'll talk (on port :9333), and a pair of storage-nodes which will accept commands over :8080. We've configured replication such that all uploads will go to both volumes. (The -max=1 configuration ensures that each volume-store will only create one volume each. This is in the interest of simplicity.)
Uploading content works in two phases:
- First tell the master you wish to upload something, to gain an ID in response.
- Then using the upload-ID actually upload the object.
We'll do that like so:
laptop ~ $ curl -X POST http://one.our.domain:9333/dir/assign
client ~ $ curl -X PUT -F file=@/etc/passwd http://192.168.1.101:8080/1,06c3add5c3
In the first command we call /dir/assign, and receive a JSON response which contains the IPs/ports of the storage-nodes, along with a "file ID", or fid. In the second command we pick one of the hosts at random (which are the IPs of our storage nodes) and make the upload using the given ID.
If the upload succeeds it will be written to both volumes, which we can see directly by running strings on the files beneath /data on the two nodes.
The next part is retrieving a file by ID, and we can do that by asking the master server where that ID lives:
client ~ $ curl http://one.our.domain:9333/dir/lookup?volumeId=1,06c3add5c3
Or, if we prefer we could just fetch via the master - it will issue a redirect to one of the volumes that contains the file:
client ~$ curl http://one.our.domain:9333/1,06c3add5c3
<a href="http://192.168.1.100:8080/1,06c3add5c3">Moved Permanently</a>
If you follow redirections then it'll download, as you'd expect:
client ~ $ curl -L http://one.our.domain:9333/1,06c3add5c3
That's about all you need to know to decide if this is for you - in short uploads require two requests, one to claim an identifier, and one to use it. Downloads require that your storage-volumes be publicly accessible, and will probably require a proxy of some kind to make them visible on :80, or :443.
A single "weed volume .." process, which runs as a volume-server can support multiple volumes, which are created on-demand, but I've explicitly preferred to limit them here. I'm not 100% sure yet whether it's a good idea to allow creation of multiple volumes or not. There are space implications, and you need to read about replication before you go too far down the rabbit-hole. There is the notion of "data centres", and "racks", such that you can pretend different IPs are different locations and ensure that data is replicated across them, or only within-them, but these choices will depend on your needs.
Writing a thin middleware/shim to allow uploads to be atomic seems
simple enough, and there are options to allow exporting the data from the volumes as .tar files, so I have no undue worries about data-storage.
This system seems reliable, and it seems well designed, but people keep saying "I'm not using it in production because .. nobody else is" which is an unfortunate problem to have.
Anyway, I like it. The biggest omission is really authentication. All files are public if you know their IDs, but at least they're not sequential ..
Tags: go, object-storage
13 September 2015 21:50
Although I've been writing a bit recently about file-storage, this post is about something much more simple: Just making a random file or two available on an ad-hoc basis.
In the past I used to have my email and website(s) hosted on the same machine, and that machine was well connected. Making a file visible just involved running ~/bin/publish, which used scp to write a file beneath an apache document-root.
These days I use "my computer", "my work computer", and "my work laptop", amongst other hosts. The SSH-keys required to access my personal boxes are not necessarily available on all of these hosts. Add in firewall constraints and suddenly there isn't an obvious way for me to say "Publish this file online, and show me the root".
I asked on twitter but nothing useful jumped out. So I ended up writing a simple server, via sinatra which would allow:
- Login via the site, and a browser. The login-form looks sexy via bootstrap.
- Upload via a web-form, once logged in. The upload-form looks sexy via bootstrap.
- Or, entirely seperately, with HTTP-basic-auth and a HTTP POST (i.e. curl)
This worked, and was even secure-enough, given that I run SSL if you import my CA file.
But using basic auth felt like cheating, and I've been learning more Go recently, and I figured I should start taking it more seriously, so I created a small repository of learning-programs. The learning programs started out simply, but I did wire up a simple TOTP authenticator.
Having TOTP available made me rethink things - suddenly even if you're not using SSL having an eavesdropper doesn't compromise future uploads.
I'd also spent a few hours working out how to make extensible commands in go, the kind of thing that lets you run:
cmd sub-command1 arg1 arg2
cmd sub-command2 arg1 .. argN
The solution I came up with wasn't perfect, but did work, and allow the seperation of different sub-command logic.
So suddenly I have the ability to run "subcommands", and the ability to authenticate against a time-based secret. What is next? Well the hard part with golang is that there are so many things to choose from - I went with gorilla/mux as my HTTP-router, then I spend several hours filling in the blanks.
The upshot is now that I have a TOTP-protected file upload site:
publishr init - Generates the secret
publishr secret - Shows you the secret for import to your authenticator
publishr serve - Starts the HTTP daemon
Other than a lack of comments, and test-cases, it is complete. And stand-alone. Uploads get dropped into ./public, and short-links are generated for free.
If you want to take a peak the code is here:
The only annoyance is the handling of dependencies - which need to be "go got ..". I guess I need to look at godep or similar, for my next learning project.
I guess there's a minor gain in making this service available via golang. I've gained protection against replay attacks, assuming non-SSL environment, and I've simplified deployment. The downside is I can no longer login over the web, and I must use curl, or similar, to upload. Acceptible tradeoff.
Tags: file-hosting, github, go, golang, sinatra
16 September 2020 21:00
Four years ago somebody posted a comment-thread describing how you could start writing a little reverse-polish calculator, in C, and slowly improve it until you had written a minimal FORTH-like system:
At the time I read that comment I'd just hacked up a simple FORTH REPL of my own, in Perl, and I said "thanks for posting". I was recently reminded of this discussion, and decided to work through the process.
Using only minimal outside resources the recipe worked as expected!
The end-result is I have a working FORTH-lite, or FORTH-like, interpreter written in around 2000 lines of golang! Features include:
- Reverse-Polish mathematical operations.
- Comments between
) are ignored, as expected.
- Single-line comments
\ to the end of the line are also supported.
- Support for floating-point numbers (anything that will fit inside a
- Support for printing the top-most stack element (
- Support for outputting ASCII characters (
- Support for outputting strings (
." Hello, World ").
- Support for basic stack operations (
- Support for loops, via
- Support for conditional-execution, via
- Load any files specified on the command-line
- If no arguments are included run the REPL
- A standard library is loaded, from the present directory, if it is present.
To give a flavour here we define a word called
star which just outputs a single start-character:
: star 42 emit ;
Now we can call that (NOTE: We didn't add a newline here, so the REPL prompt follows it, that's expected):
To make it more useful we define the word "
stars" which shows N stars:
> : stars dup 0 > if 0 do star loop else drop then ;
> 0 stars
> 1 stars
*> 2 stars
**> 10 stars
This example uses both
if to test that the parameter on the stack was greater than zero, as well as
loop to handle the repetition.
Finally we use that to draw a box:
> : squares 0 do over stars cr loop ;
> 4 squares
> 10 squares
For fun we allow decompiling the words too:
> #words 0 do dup dump loop
0: store 1.000000
0: store 0.000000
4: [cond-jmp 7.000000]
Anyway if that is at all interesting feel free to take a peak. There's a bit of hackery there to avoid the use of return-stacks, etc. Compared to
gforth this is actually more featureful in some areas:
- I allow you to use conditionals in the REPL - outside a word-definition.
- I allow you to use loops in the REPL - outside a word-definition.
Find the code here:
Tags: forth, github, go, golang, hackernews
22 September 2020 13:00
So my previous post was all about implementing a simple FORTH-like language. Of course the obvious question is then "What do you do with it"?
So I present one possible use - turtle-graphics:
\ Draw a square of the given length/width
dup dup dup dup
4 0 do
\ pen down
\ move to the given pixel
100 100 move
\ draw a square of width 50 pixels
\ save the result (png + gif)
Tags: forth, github, go, golang, turtle
3 October 2020 13:00
Recently I've been writing a couple of simple compilers, which take input in a particular format and generate assembly language output. This output can then be piped through
gcc to generate a native executable.
Public examples include this trivial math compiler and my brainfuck compiler.
Of course there's always the nagging thought that relying upon
nasm) is a bit of a cheat. So I wondered how hard is it to write an assembler? Something that would take assembly-language program and generate a native (ELF) binary?
And the answer is "It isn't hard, it is just tedious".
I found some code to generate an ELF binary, and after that assembling simple instructions was pretty simple. I remember from my assembly-language days that the encoding of instructions can be pretty much handled by tables, but I've not yet gone into that.
(Specifically there are instructions like "
add rax, rcx", and the encoding specifies the source/destination registers - with different forms for various sized immediates.)
Anyway I hacked up a simple assembler, it can compile
a.out from this input:
.hello DB "Hello, world\n"
.goodbye DB "Goodbye, world\n"
mov rdx, 13 ;; write this many characters
mov rcx, hello ;; starting at the string
mov rbx, 1 ;; output is STDOUT
mov rax, 4 ;; sys_write
int 0x80 ;; syscall
mov rdx, 15 ;; write this many characters
mov rcx, goodbye ;; starting at the string
mov rax, 4 ;; sys_write
mov rbx, 1 ;; output is STDOUT
int 0x80 ;; syscall
xor rbx, rbx ;; exit-code is 0
xor rax, rax ;; syscall will be 1 - so set to xero, then increase
inc rax ;;
int 0x80 ;; syscall
The obvious omission is support for "JMP", "JMP_NZ", etc. That's painful because jumps are encoded with relative offsets. For the moment if you want to jump:
push foo ; "jmp foo" - indirectly.
nop ; Nothing happens
mov rbx,33 ; first syscall argument: exit code
mov rax,1 ; system call number (sys_exit)
int 0x80 ; call kernel
push bar ; "jmp bar" - indirectly.
I'll update to add some more instructions, and see if I can use it to handle the output I generate from a couple of other tools. If so that's a win, if not then it was a fun learning experience:
Tags: asm, assembly, github, go, golang
20 October 2020 13:00
For the past few years I've had a bunch of virtual machines hosting websites, services, and servers. Of course I want them to be available - especially since I charge people money to access at some of them (for example my dns-hosting service) - and that means I want to know when they're not.
The way I've gone about this is to have a bunch of machines running stuff, and then dedicate an entirely separate machine solely for monitoring and alerting. Sure you can run local monitoring, testing that services are available, the root-disk isn't full, and that kind of thing. But only by testing externally can you see if the machine is actually available to end-users, customers, or friends.
A local-agent might decide "I'm fine", but if the hosting-company goes dark due to a fibre cut you're screwed.
I've been hosting my services with Hetzner (cloud) recently, and their service is generally pretty good. Unfortunately I've started to see an increasing number of false-alarms. I'd have a server in Germany, with the monitoring machine in Helsinki (coincidentally where I live!). For the past month I've started to get pinged with a failure every three/four days on average, "service down - dns failed", or "service down - timeout". When the notice would wake me up I'd go check and it would be fine, it was a very transient failure.
To be honest the reason for this is my monitoring is just too damn aggressive, I like to be alerted immediately in case something is wrong. That means if a single test fails I get an alert, as rather than only if a test failed for something more reasonable like three+ consecutive failures.
I'm experimenting with monitoring in a less aggressive fashion, from my home desktop. Since my monitoring tool is a single self-contained golang binary, and it is already packaged as a docker-based container deployment was trivial. I did a little work writing an agent to receive failure-notices, and ping me via telegram - instead of the previous approach where I had an online status-page which I could view via my mobile, and alerts via pushover.
So far it looks good. I've tweaked the monitoring to setup a timeout of 15 seconds, instead of 5, and I've configured it to only alert me if there is an outage which lasts for >= 2 consecutive failures. I guess the TLDR is I now do offsite monitoring .. from my house, rather than from a different region.
The only real reason to write this post was mostly to say that the process of writing a trivial "notify me" gateway to interface with telegram was nice and straightforward, and to remind myself that transient failures are way more common than we expect.
I'll leave things alone for a moment, but it was a fun experiment. I'll keep the two systems in parallel for a while, but I guess I can already predict the outcome:
- The desktop monitoring will report transient outages now and again, because home broadband isn't 100% available.
- The heztner-based monitoring, in a different region, will report transient problems, because even hosting companies are not 100% available.
- Especially at the cheap prices I'm paying.
- The way to avoid being woken up by transient outages/errors is to be less agressive.
- I think my paying users will be OK if I find out a services is offline after 5 minutes, rather than after 30 seconds.
- If they're not we'll have to talk about budgets ..
Tags: go, monitoring, overseer, telegram