daneharrigan.com

Log For Humans and Machines

Logging is one of the easiest ways for an application to communicate information to humans or other machines. How many requests has the application received or how quickly has it served responses? How slow are the slowest actions and how often do they occur? These can be easily communicated through logs.

We log everything at Heroku. To make logging consumable by humans and machines, we standardized around key-value formatting. With the use of common keys, humans and machines can understand what an application is experiencing everywhere.

Common Practices

Logs should have an app key with the value matching the application or project name. These key-values will look like app=api, app=foo-bot, or something similar. This key-value pair becomes essential for filtering when multiple log streams are consumed by a single source.

The at key is used to indicate the overall action or purpose of the code being executed at the time of logging. If the code’s purpose is to output application state, each log line should have an at=info key-value pair. The at value may be seed for logs generated when the application is seeding the database.

The fn key can be used to show what function or method is being executed when the log line is generated. This key-value is very useful for troubleshooting.

Logs can be “tagged” with additional key-values to indicate what actions have or have not occurred. A tag is a key with a value of true.

Application errors can be passed to logs with a key error and value as the message string.

Key-value pairs that are meant to be read by machines are prefixed with measure.. This allows to fast and easy scanning.

app=example at=seed [email protected] fn=create exists=true
app=example at=seed [email protected] fn=create
app=example at=seed [email protected] fn=verify error="No DOB"

The logs above show the application was seeding a database with two users, Jane and John. When the create function was called for Jane’s user, it checked to see if the user already existed. Because her user already existed, the log was tagged with exists=true. John’s user, however, did not exist and was created. When the verify function was called for his user an error occurred and the message was logged.

Machines Do It Better

Humans can only read and process a handful of logs at one time, but machines can process an endless supply. Whether logs are received at the rate of ten per minute or ten thousand per second, machines do it better.

Trends, historical views, alerting, and much more can be offered by machines consuming logs. These tasks, at scale, would be impossible for a human.

app=example at=write measure.items=1000 measure.elapsed=5
app=example at=write measure.items=1500 measure.elapsed=10
app=example at=write measure.items=1010 measure.elapsed=4

The above logs indicate an application has completed three write actions. The number of items written is shown with measure.items and the value is intended to be consumed by a machine. The value of measure.elapsed shows how long the action took to complete in milliseconds and is also meant for machine consumers.

To make log parsing easy and consistent across languages, a series of libraries have been created to do this for us. These libraries exist for Golang, Python, Erlang, Node.JS, Ruby, and Java.

Conclusion

Logging is necessary for every application. Machines consuming logs can only produce information as useful the log data they are given. Applications should take full advantage of how and when logs are generated. Log for humans and machines.