69  Degrees of Separation

We’ll encounter plenty of examples of explanatory plots having clear messages throughout the rest of this section. Before we dig in, I’d like you to begin thinking about data visualization as learning a new visual language for communicating your results. Just like written and oral language, our new visual language needs a solid grammar and vocabulary. The grammar and vocabulary we use changes according to the needs of our audience so that we can best communicate our message to them.

In this regard, I find a good way of distinguishing between exploratory and explanatory data visualizations is to think about how many degrees of separation exist between a visualization and the raw data that it describes.

Every manipulation of the data adds one degree of separation.

Each degree is intended to aid the viewer’s understanding of the data. At the same time, it also distances the viewer from the raw data. As an example, consider the following paradigm:

Zero degrees of separation is our starting point — a table of raw data. We have full access to all the data, including exact values. The specialist will feel right at home, mostly everyone else will just be overwhelmed. Very few people are interested in dealing with this level of detail. This is a true exploratory plot.

The first degree of separation is often a meaningful descriptive statistic, such as the mean. This allows the viewer to understand data-sets of any size. But be cautious! As we’ll see later on, we can already misrepresent our data at this point. Filtering is also a common first step.

The second degree of separation may be introduced when we choose a plot type to compare mean values. Which type of plot did we choose? Which values are intuitively compared to other values on the plot? Is this an appropriate arrangement? Is the data accurately represented?

Each degree of separation makes the data easier to understand, but this is accomplished by the loss of information. The descriptive statistic removes information about the distribution of the data. A plot removes the precise value of the mean, since we need to read it from an axis label.

Data visualisations typically result in a loss of precision.

But that’s perfectly fine! If you wanted the exact values, you can go to the raw data. We’re interested in trends and relationships here.

Exploratory data visualizations will have fewer degrees of separation than those for publication, since they are for the deeply-embedded specialist. Consider the following outline as a guide:

Figure 69.1: An example of the many degrees of separation between a visualization and its raw data.

There is no universal path. There are many steps that occur before you have data ready for plotting and there are many decisions to be made in the plotting process itself. All of these contribute to the plot’s readability.

Your goal is to minimize the degrees of separation, while making your data visualization as easy to understand for its intended purpose.

Lastly, realize that in all of this we’re maing two very different fields. The soft design concepts are combined with the hard statistical understanding.

Figure 69.2: Data Visualization is the marrage of design and statistics

Now that we have an overview of the purpose of data visualization, let’s consider some design and statistical topics.