Here’s the science bit #2 – consistency & patterns

Essentially, you can think of logstash as a pipeline. A message arrives (input), is processed (filter), and is output to somewhere else (output).

Within the filter part of the pipeline we can match data using expressions, fiddle with it, do logic on it, replace bits of it if necessary, ignore it if it’s not what we need and much more besides.

What we had to do was:

  • take each from the file – every line is a discrete “message” (input)
  • make sure we’re interested in them (filter)
  • match the meanings (filter)
  • extract meaningful data from them (filter)
  • do some clever looking back in the existing data (filter)
  • pump them into the elasticsearch backend (output)

Firstly, it’s absolutely essential to ensure your data is predictable and conforms with a known format. Luckily Exim’s log format is very well defined, and we made sure all of our servers in the farm were using the same format to avoid any difficulty. For us this means using the following additional configuration in our exim configuration:

## Logging
 syslog_facility = local5
 log_file_path = :syslog
 log_selector = +incoming_interface +subject +pid +queue_time +queue_time_overall

For those not familar with Exim, that says “log to normal files but also to syslog with facility local5, and add these extra bits of metadata to the log lines”. The local rsyslog server is configured to simply push local5.* to a central logging server which I’ve already mentioned, where the rsyslog receiver puts the data into the relevant farm-wide files. See the previous posts for why it doesn’t simply get stuffed into logstash directly.

Now, how do we do all that stuff in the logstash filter section to make this mass of data usable?

Patterns.

logstash uses a mixed regular expression engine which allows the definition and subsequent use of “fields” inside regex patterns. After a bit of work, we conjured up a set of predefined patterns specific to Exim which also make use of logstash’s predefined fields. As an example:

EXIM_MSGID [0-9A-Za-z]{6}-[0-9A-Za-z]{6}-[0-9A-Za-z]{2}
EXIM_FLAGS (<=|[-=>*]>|[*]{2}|==)
EXIM_REMOTE_HOST (H=(%{NOTSPACE:remote_hostname} )?(\(%{NOTSPACE:remote_heloname}\) )?\[%{IP:remote_host}\])
EXIM_SUBJECT (T=%{QS:exim_subject})

 

The full list of these can be found here (Github). It may grow.

The next question: how do these get applied?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s