About Archive Tags RSS Feed

 

Its a lot like life

3 January 2008 21:50

Assume for a moment that you have 148 hosts logging, via syslog-ng, to a central host. That host is recording all log entries into an MySQL database. Assume that each of these machines is producing a total of 4698816 lines per day.

(Crazy random numbers pulled from thin air; globviously).

Now the question: How do you process, read, or pay attention to those logs?

Here is what we've done so far:

syslog-ng

All the syslog-ng client machines are logging to a central machine, which inserts the records into a database.

This database may be queried using the php-syslog-ng script. Unfortunately this search is relatively slow, and also the user-interface is appallingly bad. Allowing only searches, not a view of most recent logs, auto-refreshing via AJAX etc.

rss feeds

To remedy the slowness, and poor usability of the PHP front-end to the database I wrote a quick hack which produces RSS feeds via queries, against that same database, accessed via URIs such as:

  • http://example.com/feeds/DriveReady
  • http://example.com/feeds/host/host1

The first query returns and RSS feed of log entries containing the given term. The second shows all recent entries from the machine host1.

That works nicely for a fixed set of patterns, but the problem with this approach, and that of php-syslog-ng in general, is that it will only show you things that you look for - it won't volunteer trends, patterns, or news.

The fundamental problem is a lack of notion in either system of "recent messages worth reading" (on a global or per-machine basis).

To put that into perspective given a logfile from one host containing, say, 3740 lines there are only approximately 814 unique lines if you ignore the date + timestamp.

Reducing logentries by that amount (78% decrease) is a significant saving, but even so you wouldn't want to read 22% of our original 4698816 lines of logs as that is still over a million log-entries.

I guess we could trim the results down further via a pipe through logcheck or similar, but I can't help thinking that still isn't going to give us enough interesting things to view.

To reiterate I would like to see:

  • per-machine anomolies.
  • global anomolies.

To that end I've been working on something, but I'm not too sure yet if it will go anywhere... In brief you take the logfiles and tokenize, then you record the token frequencies as groups within a given host's prior records. Unique pairings == logs you want to see.

(i.e. token frequency analysis on things like "<auth.info> yuling.example.com sshd[28427]: Did not receive identification string from 1.3.3.4"

What do other people do? There must be a huge market for this? Even amongst people who don't have more than 20 machines!

| 17 comments

 

Comments on this entry

icon Thom May at 21:53 on 3 January 2008
Sounds like you're almost describing splunk - www.splunk.org.
icon Adrian Bridgett at 22:00 on 3 January 2008
I use SEC (Simple Event Correlator) to watch log entries. Whilst I've had to have some ugly perl functions to do complex matching for java logs (would be best offloaded to log4j or one of the specialist java log analysers), most stuff can be done in SEC for free - such as "warn if you get more than 5 of these messages in a minute" or "do this if you see this message and you _don't_ see this other message within 30secs".
You might also want to have a look at splunk - I've not tried it myself mind: http://www.splunk.com/
icon Philipp Kern at 22:08 on 3 January 2008
Do you know Splunk? (http://www.splunk.com)
icon Steve Kemp at 22:15 on 3 January 2008

Thanks for the comments. I've heard of Splunk, but never used it.

Ideally I'd not like to pay ..

icon Warren Guy at 22:27 on 3 January 2008
Check out Anton Chuvakin's blog, he's a bit of a logging evangelist: http://chuvakin.blogspot.com/
icon James at 23:41 on 3 January 2008
There is a huge market - see Splunk, Zenoss, Hyperic HQ, OpenNMS just off the top of my head. There's a huge list of free and non-free systems at http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html
icon Vincent Bernat at 23:42 on 3 January 2008
There are software like splunk that will match your requirements. But it is not free. Maybe someone will program a clone for it.
icon Wilfred at 23:54 on 3 January 2008
They buy a counterpane contract. :-)
icon Dale King at 02:49 on 4 January 2008
For some time I've considered implementing something like logbayes and NBS for syslog (http://www.ranum.com/security/computer_security/code/index.html)
on my home network, but like everything time gets in the way.
icon nico at 04:25 on 4 January 2008
Using sommething like splunk ? (http://www.splunk.com)
icon Scott Lamb at 06:03 on 4 January 2008
Have you looked at Splunk?
icon Sam at 06:09 on 4 January 2008
If you're lazy enough to do a lazyweb post, maybe you can be lazy enough to let someone else do the work for you and try splunk. I keep seeing ads for it on slashdot, and I've even done a test install, but since my work environment is woefully light on syslog, I haven't had a chance to set it up.
It seems like it does what you want, and maybe more.
icon Carsten Aulbert at 06:38 on 4 January 2008
Hi Steve, we are currently going the logcheck way but performing logchecks on each box locally and transferring only then the remaining stuff to a central node - we will have close to 1400 servers soon and I don't think logging everything into a central DB makes much sense on that scale. An RSS feed looks really sexy but of course there needs to be some kind of "tagging" mechanism which allows only to look at "important" stuff - however this is going to be decided upon. If you have a good solution, please post it :)
icon santi at 13:41 on 4 January 2008
I use moodss, but now it's uninstalable on debian due a depend problem.
icon Anonymous at 06:53 on 5 January 2008
While I wouldn't suggest using it as a sole solution, some people use crm114 to do log analysis, classifying log entries by relevance.
icon Alex at 19:46 on 5 January 2008
... and now you see why it's not a simple problem to solve ;) I'd have happily deployed Splunk for our uses if it didn't cost an absolute fortune in licensing; we'd be looking at between �15,000 and �20,000 for our current volume I believe!
It's also got the "closed nature" feel about it, sure it's very pretty and seems to work well, but you can't get elbow deep in it and muck about :( If it doesn't suit your environment and you want a major-ish change, sod off.
Regarding your post about not getting regularly refreshed logs with AJAX, you *can* actually tail our logs quite nicely - go to the "Input lots of criteria" page and instead of clicking the 'Search' button, click the 'Tail' button. ;)
I'm convinced that inserting logs into MySQL or PostgreSQL works and is the most powerful solution. Anything is better than having >50GB of logfiles and using grep ;) As it stands we're able to filter easily and searching over ~24h of logs isn't too bad with response times <5s - from when we discussed this before I'm sure the issue pertained to both the database schema and user interface :'(
I'd be up for improving the DB schema, but I think rewriting it in Ruby on Rails would be a bad idea. We don't need 10 more processes on loghost eating 50MB of RAM ;) Perhaps the current code written in PHP would be worth tweaking?
icon Joao Carneiro at 20:44 on 8 January 2008
i use zabbix www.zabbix.org that does the monitoring part, but i guess not syslog, it uses snmp or agents... It's quite good actually, it even produces great graphics and trends, sinoptic charts et al...
have fun