I think you should let this one go

7 April 2009 21:50

I work with log files a lot.

Most of the logfiles I work with are in a standard format of some kind, and most often they are rotated upon a daily basis. (Examples include syslog, qpsmtpd, and Apache logfiles.)

I wish there were a general purpose way to say "grep time-range pattern logfile".

Right now, for example, I've just deployed some changes upon a cluster of hosts. Now I want to see only messages that refer to a particular area of the codebase only those that occurred after 23:00 - which is when I did the commit/push/pull dance.

I've written a quick hack - tgrep (time-grep) - which allows simple before/equal/after/range grepping :

# show matching lines after 23:00PM
tgrep \>23:00:00 -i subject /var/log/qpsmtpd/qpsmtpd.log

# show matching lines in the interval 23:00PM 23:15PM
tgrep 23:00:00-23:15:00 -i -r subject /var/log/qpsmtpd/

If there is a common way of doing this "properly" then I'd love to be educated, failing that take it if it is useful (moreutils?)

ObFilm: Chasing Amy

Tags: random hacks, tgrep, utilities | 8 comments

Comments on this entry

Matt Sayler at 23:17 on 7 April 2009

http://dmalcolm.livejournal.com/1301.html
Is this helpful? I've not used it, but it looks liek it should support time comparison.

Matt Simmons at 23:53 on 7 April 2009

Are you familiar with Logwatch? Sounds like it hits the mark on your requirements.
It would be nice if there was a site where you could go look up the current 'best application' or 'best solution'.

Graham Bleach at 06:44 on 8 April 2009

Great idea. I don't know of anything else to do that job; I usually either write long perl one liners, or a dedicated parser for that log format.

Steve Kemp at 08:18 on 8 April 2009

Matt Sayler

Yes I'm familiar with that tool, I even commented on that entry as skx!

It had slipped my mind, and I had mostly file it away as being useful for Apache, rather than more general purpose. It is probably the best thing out there at the moment.

Matt Simmons

I'm very familiar with Logwatch, but thats not the kind of examination I'm after at the moment - can you imagine getting a mail every day of your entire Apache logfile?

Mostly I'm making adhoc searches for debug messages, or trying to collect statistics before the daily "make graphs", or "make summery" emails get fired off.

Graham

Thanks! I'm sure the tool will be useful pretty generally now I have it, and it definitely solved my immediate needs.

Baz at 08:37 on 8 April 2009

Its a bit heavier weight, but what about splunk? http://www.splunk.com/ It seems designed for exactly your problem - time-related log messages in disparate logs in a cluster. I like Matt Sayler's solution though, I need to go grab that now.

Justin Ellison at 14:13 on 8 April 2009

I've used sed's "print a range of lines" feature. Don't recall of the top of my head, check the manpage. But you can feed it a regex to start printing, and a regex to stop printing. If you omit either, it just prints from the beginning, or to the end.
May not be as quick as the other tools, but sed is everywhere.
Justin

Steve Kemp at 18:22 on 8 April 2009

Baz

I think that splunk solves a different problem than that I'm interested in. I've certainly centralised logging in the past via syslog-ng, but generally that isn't useful to me.

Justin Ellison

Thanks for the reminder about the flip-flop operator. The following perl is almost as good as what I wrote:

 cat logfile.0 | perl -ne 'print if (/ 06:25:/ ... / 06:33:/); '

(That shows entries between 06:25 and 06:33)

Ben Zanin at 18:23 on 10 April 2009

Justin, beware using the sed or awk '/time1/,/time2/' pattern to extract a subsection of logs. It's quick, yes, but it can fail when multiple machines are logging to a single file and have slightly desynchronized clocks: you can end up extracting suspiciously tiny little spans instead of the entire range as desired.
Steve, for doing these kinds of extraction jobs, I've found that it's often a good idea to run a first pass through the data with a very simple awk script that grabs the byte (not the line) offsets of the first *and the last* instances of 15-minute intervals. These indices end up being very small, and they can be easily used with dd to examine only specific time intervals in the logs.
This tactic falls down when logs are gzipped, unfortunately (unless the --rsyncable option is used, or something like dictzip, zsync or whatever). bzip2 is easier to work with because of the independently compressed blocks, but an index is a bit harder to generate because those blocks are packed bitwise instead of bytewise, and I don't know of a simple tool to bit-shift an entire stream. There are a few more pointers over at http://perldition.org/articles/Random%20seeking%20on%20gzip%20streams.sbc , if you're interested.

I think you should let this one go

Comments on this entry

Recent Posts