Here are some brief notes about metric-collection, for my own reference.
Collecting server and service metrics is a good thing because it lets you spot degrading performance, and see the effect of any improvements you've made.
Of course it is hard to know what metrics you might need in advance, so the common approach is to measure everything, and the most common way to do that is via collectd.
To collect/store metrics the most common approach is to use carbon and graphite-web. I tend to avoid that as being a little more heavyweight than I'd prefer. Instead I'm all about the modern alternatives:
- Collect metrics via go-carbon
- This will listen on
:2003
and write metrics beneath/srv/metrics
- This will listen on
- Export the metrics via carbonapi
- This will talk to the go-carbon instance and export the metrics in a compatible fashion to what carbon would have done.
- Finally you can view your metrics via grafana
- This lets you make pretty graphs & dashboards.
Configuring all this is pretty simple. Install go-carbon
, and give it a path to write data to (/srv/metrics
in my world). Enable the receiver on :2003
. Enable the carbonserver and make it bind to 127.0.0.1:8888
.
Now configure the carbonapi
with the backend of the server above:
# Listen address, should always include hostname or ip address and a port.
listen: "localhost:8080"
# "http://host:port" array of instances of carbonserver stores
# This is the *ONLY* config element in this section that MUST be specified.
backends:
- "http://127.0.0.1:8888"
And finally you can add your data-source to grafana of 127.0.0.1:8080
, and graph away.
The only part that I'm disliking at the moment is the sheer size of collectd
. Getting metrics of your servers (uptime, I/O performance, etc) is very useful, but it feels like installing 10Mb of software to do that is a bit excessive.
I'm sure there must be more lightweight systems out there for collecting "everything". On the other hand I've added metrics exporting to my puppet-master, and similar tools very easily so I have lightweight support for that in the tools themselves.
I have had a good look at metricsd which is exactly the kind of tool I was looking for, but I've not searched too far afield for other alternatives and choices just yet.
I should write more about application-specific metrics in the future, because I've quizzed a few people recently:
- What's the average response-time of your application? What's the effectiveness of your (gzip) compression?
- You don't know?
- What was the quietest time over the past 24 hours for your server?
- You don't know?
- What proportion of your incoming HTTP-requests were for HTTP?
- Do you monitor HTTP-status-codes? Can you see how many times people were served redirects to the SSL version of your site? Will using HST save you bandwidth, if so how much?
Fun times. (Terrible pun is terrible, but I was talking to a guy called Tim. So I could have written "Fun Tims".)
https://steve.fi/
Looks like telegraf might be a good contender :)