I’m a big proponent of Edward Tufte and his information design principles. I’ve included a few key concepts in this post from Tufte’s writings that I have found useful in developing my own visualization techniques and strategies.
Small multiples
Small multiples is a concept that leverages small plots repeated in succession to illustrate how conditions may be changing over time. The use of a series of small plots can be very powerful in representing time series views of large data sets in a very small space. I probably use this concept the most for visualizing transactional data sets consisting of millions of data points. Typically we can squeeze 10’s of millions of transactions into a single page, presenting a full day of transactions in a very limited space.
Data density
One challenge that I enjoy working on is increasing the efficient use of space when representing a data set. This involves looking at the data density, or the amount of data points represented per square inch in a plot. This use of space focuses on maximizing the amount of information presented to the user, allowing for a rich visualization that is efficient in summarizing large amounts of data.
Sparklines
Sparklines are a concept created by Tufte that leverages small word graphs that are embedded in text or used in small multiples to visualize a data set and embed it directly in your field of view when reading a document or parsing data. The classic use case is to embed the sparkline directly in a paragraph describing the data. This allows the reader to view the data without having to move their eye to a different part of the page. It’s far more descriptive and efficient.
Negative space
Sometimes we can say a lot by saying nothing. With data visualizations, we can show a lack of activity by simply showing white space. For instance, for OLTP style workloads, this can be quite useful in illustrating server failures. A time series plot of transactional activity may show white gaps with a lack of data. This is a great indicator of a problem on the system, and large swaths of logs can quickly be parsed by looking at time series plots for small white gaps.