Logs are external interfaces to software systems, and while normal external integration layer APIs are given much thought and care, logs are typically generated just by dumping ad-hoc messages, making them unnecessarily hard to use. Following these few simple guidelines can make lives of both developers and support much easier in the long term.
Although agreeing on log files and formats should definitely not take as much time as designing functional APIs, thinking about logs up front can make a big difference once a system goes live.
The following questions are a good start:
- What groups of people will use the logs and for what?
- What kind of information will they need in order to do their job?
- How will the logs be used (will they be parsed, filtered, imported, read by a human, analysed by a report tool)?
- How long should the logs be kept?
Here are a few examples:
Administrators need to know about operational problems. They will typically be checking the files once a day, possibly getting updates by mail. Sometimes, urgent messages will need to be sent to administrators straight away, possibly to beepers, mobile phones or by e-mail. Format of log messages for administrators does not need to be specially fixed or stable, but has to contain enough information to alert appropriate people.
Support staff should be able to diagnose recent problems - for example, they need audit information, SQL commands, and errors. Support logs should typically be kept for a few days or weeks, since customers might report minor problems only if they persist for a few days. Logs will typically be used from a file viewer either on the target system, or copied from the client to the support workstation (sometimes sent by e-mail), so files have to be kept fairly small. They also have to be in a format that is easy to parse, as support staff will often write ad-hoc scripts in awk or a similar tool to extract relevant data.
Supports also needs another log, much more detailed, for fire-fighting situations. It should enable trouble-shooters to find out exactly what the system is doing at the moment. This log type is typically not kept at all, and support staff should be able to turn it on or off on demand. It will typically be used on-line with tail or some other tool, should be human readable and easy to filter (hide unwanted messages). Sometimes these messages are not even stored in files, but pushed out to an UDP/TCP port.
Customers are looking for marketing-related information such as business volume, origin of users etc. They will typically use logs to produce reports, and delete/recycle the logs after the reports are produced. These logs will typically be analysed on the host system, or copied to a report server and analysed there. Typically, they will be parsed using a custom log analysis tool, or imported into the database. Format of this file must be fairly stable, as clients or third-parties might develop scripts to perform the analysis.
Customers might also be looking for user action audit logs (depending on the nature of the system), which will be archived and kept for a longer period (sometimes years). These logs will be used by customers to search for security or money related problems, typically by extracting entire user sessions, so they will primarily be parsed with awk, grep or a similar tool. Format of this file must be fairly stable, as clients or third-parties might develop scripts to perform the analysis. It should also be human-readable.
Developers will use logs for trouble-shooting and debugging on development systems, typically reading them in a text viewer or with more/tail. For developers, logs have to be human readable and contain a lot of fine-grained tracing information, but will not be copied, analysed or parsed.
Log file anti-patterns
Here are a few common errors which make log files much harder to use:
Single message dump
Often, it is best to produce a separate log file for each user group, as they have different needs, and appropriate log files will have different ‘expiry periods’. Support logs might be kept for a week, marketing logs might be rotated every day, audit logs might be kept for years. Keeping all these messages in a single file makes it impossible to archive only audit logs, and will typically make automated parsing much harder.
Logging incomplete data
Common examples are logging exceptions without stack traces or SQL commands without parameter values. Logs must contain all the information that enables users to get their job done. Think whether an error entry in the support log contains everything relevant for troubleshooting.
DIY separators
Using multiple separators or non-standard separators makes automated parsing very hard. Common examples of this anti-pattern are using pipes, tildas, dashes, various brackets or combinations of those. A script for parsing such files will be much more complex and much harder to write than a file with a a single entry separator, consistently used. It’s best to use standard commas or tabs - this makes parsing much easier, and a lot of existing tools can provide data extraction and filtering.
Inconsistent format for messages in a single file
Mixing formats for different types of messages in a single file makes it very hard to parse. Files intended for automated filtering and parsing should have a consistent message format - if using the same format for two types of messages does not make sense, perhaps they belong in different files.
Multi-line entries
Single-line entries are much, much easier to parse, analyse, transform and load into a database. Most multi-line entries can be transformed to a single-line by replacing CRLFs with blanks. Multi-line entries could be allowed in urgent administrator messages or on-line trace log, but not for support logs which should be automatically filtered and parsed. If you absolutely have to include multi-line entries, ensure that there is a separator line after each entry, and separators must be easy to find and filter. An exception to the rule are audit logs which will only be used off-line to generate reports, but they must be structured properly (i.e. XML).
Logging a single contextual message as multiple entries/lines
A common example is logging a SQL command and then logging parameter values as separate messages. This might seem quite fine for a developer working in an isolated environment, but on a production server, due to several threads, such contextual groups might overlap, and parameters of one command might be logged after another command. Again, a single log entry should contain all relevant information.
Audit logging after the action
A common example is logging a SQL command after the execution. In case of an exception, the log entry might not be written, so the audit log will not provide relevant contextual information for the exception. Audit log entries should be written before an action - not after.
A good log
On the end, a log file is useless if it’s just a pile of messages, hard to read or parse. Having a poor log file is better than not having one at all, but a good log system can save a lot of money and effort by making troubleshooting and support easier. Here is a short list of guidelines to produce good logs:
- Think up-front about the log format for the files which customers will use or keep – these logs are an external API to the system so changing the format later might cause problems for customers or break some third-party tools.
- Split messages into different files depending on intended audience/way of using.
- Log actions before they happen.
- Mind the file size if logs should be copied or archived.
- Use single-line entries with fields separated by tabs if the file should be parsed or filtered online.
- Make sure that each entry is contextually complete – it should contain all relevant information.
- Make sure that a single contextual message is logged as a single entity.
- Use a consistent format for all messages in a single file.