54  Table Look-up

Slow Forms of visual perception

William Cleveland popularized the “table look-up” type of slow visual perception in the 1980s. This is a kind of visual perception typical with exploratory plots. It allows us to ask precise and detailed questions when we first begin to examine our data. They may make an appearance as explanatory plots, but because they are time-consuming to read, are less common. It also depends on the context and audience. You may see plots that activate slow forms of visual perception in specialist, data-heavy scientific journals, where the audience has the time and interest to pour over the details. In contrast, it’s unlikely that a large audience for a short conference presentation will get any meaningful information.

An example that Cleveland used to exemplify this concept is the Barley Yield data set. In this data set, the yield of 10 varieties of Barley in 1931 and 1932 are reported for 6 farms. That’s only 120 data points, so the issue is not too much data. Rather the issue is that we have 4 variables and 60 time series. The heat map presented in Figure 54.1 may be a first idea for a data visualization.1

  • 1 Heat maps can be a good choice as an explanatory plot if there is an immediate and clear message or only a few, very different categories. As an exploratory plot, it is not detailed enough. Basically, it’s as if we have used conditional formatting in an Excel spreadsheet.

  • Figure 54.1: The Barley Yield data set as a Heat Map. All values are displayed but trends are difficult to observe.

    A much more detailed view would be Figure 54.2, a Cleveland-style dot plot. In this plot type, we make some unconventional choices. First, the independent variable is presented on the y axis and the dependent variable is on the x. It works, since the long labels of the Barley varieties are easy to read. Time is encoded by color, instead of taking its typical place on the x axis. This arrangement means that we can read the plot like a table, hence slow table look-up. We can ask very detailed questions and scan the plot (i.e. table) from left to right and from top to bottom to retrieve exactly the information we need.2

  • 2 Remember, we can ask detailed questions, but precision is a different matter. Most data visualization suffers from some degree of imprecision, unless precise labels are added.

  • An example: Which variety had the worst yield in 1931 at the Waseca farm? All we have to do is move from top to bottom looking for the Waseca sub plot. Then we move from left to right looking for the first blue dot (the smallest value in 1931). It turns out to be No 475, which had a yield of ca. 47 bushels/acre. Try answering that with the heat map! Unless the value is striking, you’ll have a hard time.

    Figure 54.2: A dot plot of the barley data set popularised by William Cleveland. Three variables representing 120 data points are plotted.

    Figure 54.2 is the most data-heavy and time-consuming (to read) plot we could produce with this data set. But it’s not bad! It serves a specific purpose for an interested audience in the right context. Can you see some trends in the data set? Did you notice that the farms are arranged from low to high producers? That’s a useful feature. The sub-plots are not arranged alphabetically, further information is contained in their order! Also, notice that some farms have a low mean yield and variance, like Duluth, whereas others have a relatively large mean and variance, like Waseca. Did you also notice the anomaly in the data set? All farms suffered a decrease in yield from 1931 to 1932 except for Morris. The reason for this is a different, and somewhat contested, question. We’ll imagine that this is an interesting anomaly that we want to highlight.

    Systematic shifts in the location, spread or direction of change are important results that we typically want to highlight. They are not readily apparent in Figure 54.2. We need a plot type that allows us to communicate these messages a bit faster. A line plot, Figure 54.3, could come in handy here. In this case the most logical, and typical, choice for the x axis is going to be time.

    Figure 54.3: Barley data set, Line Plot

    Figure 54.3 is pretty detailed since we see all 60 time series. It’s still manageable, but consider that we have 10 distinct colors. We’re kind of pushing the limit on how many colors the human eye can easily distinguish. Nonetheless, we can see the trends we expected as we move from left to right. In particular we can see more clearly that Morris behaves differently compared to the other farms. On top of that we do gain some extra insights. For example, although many varieties decrease, some of them actually increase, and some are worse off than others.

    Edward Tufte, whom we’ll also encounter again later on in the workshop, developed a slope plot. For Tufte, an explanatory plot wasn’t complete until all non-data ink was removed. The slope plot in fig. (ref?)(fig:ClevelandSlope), does away with the axes. In their place are the actual mean values. So although it looks like we have lost precision, this is actually the most precise plot of the series since we know the exact value to two decimal places and we can see the values in a visual context. On top that Figure 54.4 communicates one very clear message by the clever use of color. Instead of coloring the lines according to farm, they are colored according to direction of change. Did the yield increase of decrease?

    There are two disturbing things about this slope plot. First, there is no legend. Any visual element that encodes information should be defined somewhere on the plot. In this case we may make the argument that it is obvious and so goes without saying. That’s a dangerous perspective, but you may be able to get away with it. Second, the spread is not depicted, which is typical for slope plots. That should be a major cause of concern for scientists. You never want to show the location without some measure of spread. This plot is not suitable for a scientific publication, but it may work well for lay people or in a report for managers. It’s easy to read and communicates a clear message. Extra information like the standard deviation or the 95% interval may already be information overload and just confuse the audience.

    Figure 54.4: Cleveland Barley data set slope plot.