The past couple of days I've been reworking a few of my existing projects, and converting them from Perl into Golang.
Bytemark had a great alerting system for routing alerts to different enginners, via email, SMS, and chat-messages. The system is called mauvealert
and is available here on github.
The system is built around the notion of alerts which have different states (such as "pending", "raised", or "acknowledged"). Each alert is submitted via a UDP packet getting sent to the server with a bunch of fields:
- Source IP of the submitter (this is implicit).
- A human-readable ID such as "
heartbeat
", "disk-space-/
", "disk-space-/root
", etc. - A raise-field.
- More fields here ..
Each incoming submission is stored in a database, and events are considered unique based upon the source+ID pair, such that if you see a second submission from the same IP, with the same ID, then any existing details are updated. This update-on-receive behaviour is pretty crucial to the way things work, especially when coupled with the "raise
"-field.
A raise field might have values such as:
+5m
- This alert will be raised in 5 minutes.
now
- This alert will be raised immediately.
clear
- This alert will be cleared immediately.
One simple way the system is used is to maintain heartbeat-alerts. Imagine a system sends the following message, every minute:
id:heartbeat raise:+5m [source:1.2.3.4]
- The first time this is received by the server it will be recorded in the database.
- The next time this is received the existing event will be updated, and crucially the time to raise an alert will be bumped (i.e. it will become current-time + 5m).
- The next time the update is received the raise-time will also be bumped
- ..
At some point the submitting system crashes, and five minutes after the last submission the alert moves from "pending" to "raised" - which will make it visible in the web-based user-interface, and also notify an engineer.
With this system you could easily write trivial and stateless ad-hoc monitoring scripts like so which would raise/clear :
curl https://example.com && send-alert --id http-example.com --raise clear --detail "site ok" || \
send-alert --id http-example.com --raise now --detail "site down"
In short mauvealert allows aggregation of events, and centralises how/when engineers are notified. There's the flexibility to look at events, and send them to different people at different times of the day, decide some are urgent and must trigger SMSs, and some are ignorable and just generate emails .
(In mauvealert
this routing is done by having a configuration file containing ruby, this attempts to match events so you could do things like say "If the event-id contains "failed-disc" then notify a DC-person, or if the event was raised from $important-system then notify everybody.)
I thought the design was pretty cool, and wanted something similar for myself. My version, which I setup a couple of years ago, was based around HTTP+JSON, rather than UDP-messages, and written in perl:
The advantage of using HTTP+JSON is that writing clients to submit events to the central system could easily and cheaply be done in multiple environments for multiple platforms. I didn't see the need for the efficiency of using binary UDP-based messages for submission, given that I have ~20 servers at the most.
Anyway the point of this blog post is that I've now rewritten my simplified personal-clone as a golang project, which makes deployment much simpler. Events are stored in an SQLite database and when raised they get sent to me via pushover:
The main difference is that I don't allow you to route events to different people, or notify via different mechanisms. Every raised alert gets sent to me, and only me, regardless of time of day. (Albeit via an pluggable external process such that you could add your own local logic.)
I've written too much already, getting sidetracked by explaining how neat mauvealert
and by extension purple
was, but also I rewrote the Perl DNS-lookup service at https://dns-api.org/ in golang too:
That had a couple of regressions which were soon reported and fixed by a kind contributor (lack of CORS headers, most obviously).
Tags: dns-api.org, golang, perl, purple, purppura, rewrite 2 comments
You know of Alertmanager (https://prometheus.io/docs/alerting/alertmanager/), right?