Recipe 9.35 Combining Log Files
9.35.1 Problem
You want to merge a collection of log files
into a single, chronological log file.
9.35.2 Solution
#!/bin/sh
perl -ne \
'print $last, /last message repeated \d+ times$/ ? "\0" : "\n" if $last;
chomp($last = $_);
if (eof) {
print;
undef $last;
}' "$@" | sort -s -k 1,1M -k 2,2n -k 3,3 | tr '\0' '\n'
9.35.3 Discussion
The system logger automatically prepends a timestamp to each message,
like this:
Feb 21 12:34:56 buster kernel: device eth0 entered promiscuous mode
To merge log files, sort each one by its
timestamp entries, using the first three
fields (month, date, and time) as keys.
A complication arises because the system logger inserts
"repetition messages" to conserve
log file space:
Feb 21 12:48:16 buster last message repeated 7923 times
The timestamp for the repetition message is often later than the last
message. It would be terribly misleading if possibly unrelated
messages from other log files were merged between the last message
and its associated repetition message.
To avoid this, our Perl script glues together the last message with a
subsequent repetition message (if present), inserting a null
character between them: this is reliable because the system logger
never writes null characters to log files. The script writes out the
final line before the end of each file and then forgets the last
line, to avoid any possibility of confusion if the next file happens
to start with an unrelated repetition message.
The sort command sees these null-glued
combinations as single lines, and keeps them together as the files
are merged. The null characters are translated back to newlines after
the files are sorted, to split the combinations back into separate
lines.
We use sort -s to avoid sorting entire lines if
all of the keys are equal: this preserves the original order of
messages with the same timestamp, at least within each original log
file.
If you have configured the system logger to write messages to
multiple log files, then you may wish to remove duplicates as you
merge. This can be done by using sort -u instead
of -s, and adding an extra sort key -k
4 to compare the message contents. There is a drawback,
however: messages could be rearranged if they have the same
timestamp. All of the issues related to sort -s
and -u are consequences of the one-second
resolution of the timestamps used by the system logger.
We'll note a few other pitfalls related to
timestamps. The system logger does not record the year, so if your
log files cross a year boundary, then you will need to merge the log
files for each year separately, and concatenate the results.
Similarly, the system logger writes timestamps using the local time
zone, so you should avoid merging log files that cross a daylight
saving time boundary, when the timestamps can go backward. Again,
split the log files on either side of the discontinuity, merge
separately, and then concatenate.
If your system logger is configured to receive messages from other
machines, note that the timestamps are generated on the machine where
the log files are stored. This allows consistent sorting of messages
even from machines in different time zones.
9.35.4 See Also
sort(1).
|