About Archive Tags RSS Feed

 

I didn't make a statement. I asked a question. Would you like me to ask it again?

18 July 2009 21:50

On my own personal machines I find the use of logfiles invaluable when installing new services but otherwise I generally ignore the bulk of the data. (I read one mail from each machine containing a summary of the days events.)

When you're looking after a large network having access to logfiles is very useful. It lets you see interesting things - if you take the time to look and if you have an idea of what you want to find out.

So, what do people do with their syslog data? How do you use it?

In many ways the problem of processing the data is that you have two conflicting goals:

  • You most likely want to have access to all logfiles recorded.
  • You want to pick out "unusual" and "important" data.

While it is the case you can easily find unique messages (given a history of all prior entries) it becomes a challenge to allow useful searches given the volume of data.

Consider a network of 100 machines. Syslog data for a single host can easily exceed 1,000,000 lines in a single day. (The total number of lines written beneath /var/log/ on the machine hosting www.debian-administration.org was 542,707 for the previous 24 hours. It is not a particularly busy machine, and the Apache logs were excluded.)

Right now I've got a couple of syslog-ng servers which simply accept all incoming messages from a large network and filter them briefly. The remaining messages are inserted into mysql via a FIFO. This approach is not very scalable and results in a table having millions of rows which is not pleasant to search.

I'm in the process of coming up with a replacement system - but at the same time I suspect that any real solution will depend a lot on what is useful to pull out.

On the one hand having unique messages only makes spotting new things easy. On the other hand if you start filtering out too much you lose detail. e.g. If you took 10 lines like this and removed all but one you lose important details about the number of attacks you've had:

  • Refused connect from 2001:0:53aa:64c:c9a:xx:xx:xx

Obviously you could come up with a database schema that had something like "count,message" and other host-tables which showed you where. The point I'm trying to make is that naive folding can mean you miss the fact that user admin@1.2.3.4 tried to login to host A, host B, and host C..

I'm rambling now, but I guess the point I'm trying to make is that depending on what you care about your optimisations will differ - and until you've done it you probably don't know what you want or need to keep, or how to organise it.

I hacked up a simple syslog server which accepts messages on port 514 via UDP and writes them to a FIFO /tmp/sys.log. I'm now using that pipe to read messages from a perl client and write them to local logfiles - so that I can see the kind of messages that I can filter, collapse, or ignore.

Its interesting the spread of severity. Things like NOTICE, INFO, and DEBUG can probably be ignored and just never examined .. but maybe, just maybe there is the odd deamon that writes interesting things with them..? Fun challenge.

Currently I'm producing files like this:

/var/log/skxlog/DEBUG.user.cron.log
/var/log/skxlog/ALERT.user.cron.log
/var/log/skxlog/INFO.authpriv.sshd.log

The intention is to get a reasonably good understanding of which facilities, priorities, and programs are the biggest loggers. After that I can actually decide how to proceed.

Remember I said you might just ignore INFO severities? If you do that you miss:

IP=127.0.0.1
Severity:INFO
Facility:authpriv
DATE:Jul 18 15:06:28
PROGRAM:sshd
PID:18770
MSG:Failed password for invalid user rootf from 127.0.0.1 port 53270 ssh2

ObFilm: From Dusk Til Dawn

| 9 comments

 

Comments on this entry

icon Aigars Mahinovs at 15:07 on 18 July 2009

I like the concept. What do you think, maybe we (the system administrators) need a piece of software that would present that data to us in a usable way:
* web based with interactivity
* 'spam' filters that we can train and correct manually that know what is normal 'white noise' from our systems (don't display, but save for later)
* use decision support system concepts to highligh where things are ok, where there are unclear things and where there are bad things
* use data mining approaches to allow an administrator to query, drill down and examine logs in all kinds of ways after a bad even, like after an intrusion
* a way for sysadmins to share generalized recipes of normal and bad stuff with each other easily

On the other hand this sounds so logical, that someone must have already made something like this.

icon Aigars Mahinovs at 15:27 on 18 July 2009

I should have googled before submitting the comment. There is a ton of such tools, both free and commercial. Not sure if any of them is both simple enough and powerful enough to be worth the time one would need to spend to set it al up, however.

icon Steve Kemp at 15:29 on 18 July 2009

I think that there is a huge need for this, but the available options are probably things like phpsyslogng or purely home-made internal systems that float around.

In the commercial world it seems that Splunk is the definite answer if you can afford it.

Getting started is very simple. You can configure syslog-ng to log messages from clients in a scalable fashion:


destination d_ext
{
file("/var/log/clients/$HOST/$YEAR-$MONTH-$DAY/$PROGRAM.$FACILITY-$PRIORITY.log" \
create_dirs(yes));
};
log
{
source(s_all);
destination(d_ext);
};

Or failing that use a FIFO to pass incoming messages from remote hosts to a local script.

Really the issue is one of data size. If you have too many hosts you really need to massage the incoming messages to filter/exclude/ignore some kind of messages - otherwise you'll end up with a pile of data in a database which is too slow to access.

I'm leaning towards the idea of using static output pages showing the most recent messages from hosts/programs/levels. That might mean you have a little lag, but it might mean you don't need to actually search very often..

Still the presentation options are almost endless the trick is coming up with something that is usable, scalable, and worth viewing/using.

I like your ideas of drilling down, and training, but you have to make sure that atypical messages such as "DriveReady SeekComplete - Drive Death /dev/sda" don't get lost - otherwise you've really broken things by being too clever!


Sadly I'm not really familiar with datamining to know how likely that space could map over into logfiles ..

icon Jason Hedden at 16:17 on 18 July 2009

I've ran into many problems searching through log data via a database. I've found what works best for me is to keep the files flat, and index the files using swish-e. A tiered search approach. First find the files that pertain to my search, and then dive into them. Example output http://files.getdropbox.com/u/50142/syslogdbi.png

(long time silent troll, thanks for the blog!)

icon madduck at 16:31 on 18 July 2009
http://madduck.net

I really hope that out of this might one day come a logcheck replacement.

As Aigars said "web-based", I'd instead suggest a bottom-up approach, using tags.

So you have filters for messages that you don't want to see, and each filter is associated with one or more tags. Then, you get to select filters that you deem applicable by specifying a combination of tags. In addition, only the software that's generating a log message should install these filters, or the filters should know which package needs to be installed or else they're just ignored.

Finally, there really ought to be a daemon sending notifications immediately, instead of the logcheck cron approach.

http://wiki.logcheck.org/index.cgi/logfilter has some brain dumps about the concept.


icon Steve Kemp at 17:40 on 18 July 2009
http://www.steve.org.uk/

Jason - Given that I'm already archiving logs to flat files the idea of using a text indexing system is great!

That sidesteps the issues of database completely.

I've had a brief play with swish but I admit I'm struggling to make it behave well. It indexes things beneath /var/log/clients easily enough - but then doesn't let me show the results easily. Experimentation is probably in order..

Madduck I like the idea of using tags like that, and being able to give a tag such as "ssh-attack" to match multiple patterns would be great:

  • sshd*: refused connect from
  • sshd*: invalid user

That would be very neat for searching, but it might be hard to get the patterns and tags in place easily given the data size .

icon anon at 20:17 on 18 July 2009

logcheck does some of this, but not very well

icon Jeff Schroeder at 23:38 on 20 July 2009

Steve, take a look at sec.pl, once you get past the syntax learning curve it does exactly what you want. Crazy thing I've seen deployed using sec in prod:

- ssh login attacks are bad and log to a file
- if > 100 login attacks are attempted in XXX minutes shoot off an email alert to a team
- if > 50 login attacks are attempted followed by a sucessful login from the same place run a script to wakeup the oncall.

Sec is really cool like that for correlating things together. http://kodu.neti.ee/~risto/sec/

The mailinglist is low volume and very friendly.

icon daveg at 06:13 on 21 July 2009
http://daveg.outer-rim.org

In my site, we've got over 350 odd boxes logging to two central rsyslog servers. If you haven't checked out rsyslog yet, I recommend it.

In our rsyslog configuration, I've got it sending off email alerts for any syslog alerts higher than INFO. OK, not perfect but its a start.

You can also make it do actions to various log files defined on regexes, which can be pretty cool.

I supports logging directly into databases (we're using MySQL but it supports Oracle, postgresql, etc) and it has a web frontend in the form of phplogcon, which is like an ugly, opensource cousin of Splunk.

http://www.rsyslog.com
http://www.phplogcon.org