Encoding values

Avoid using colors to encode values

continuous variables (see Figure fig-Cont-Encoding)

Many software applications allow continuous variables to be encoded as color. Sometimes this works really well, but often times it leaves a lot to be desired. Several issues need to be considered. 1

  • 1 Two notable instances of encoding a continuous variable onto color are topography on maps and density in 2d (i.e. smoothed-scatter) plots.

  • First, the brain does not recognize color as being ordered, like size or numbers. Although sequential wave lengths result in different colors, there is no logical ordering to the actual colors as distinct from the logical ordering of their wave lengths. We will return to this point when we discuss the different possibilities of visually encoding continuous variables.

    Second, in a rainbow, there are different widths for each “pure” color. This means that we cannot use the rainbow as an evenly dispersed gradient for a range of values because it is inherently uneven. Consider the rainbow spectrum shown below:2

  • 2 The topographical color palette suffers from the same draw back.

  • Figure 1: The wavelength ranges for visible colors are not equal.

    Figure 2: Quantitative interpretation of color is inaccurate. In the checker/shadow optical illusion, the gray color of boxes A and B appear different (left image) but are in reality identical (right image).

    Despite these short-comings color remains a popular choice for encoding continuous variable. Heat-maps are a popular example in the life sciences. For the most part heat maps are difficult to interpret for the reasons given above, so limit them to those instances where there is a clear message. There are specific cases where it works really well! One famous example is shown in Figure fig-MeaslesHeat. Here the number of measles cases per US state is encoded with an uneven color scale. It works because there is a dramatic shift after the vaccine is introduced. We’re less interested in the individual values than the extreme ends of the distribution.

    Although Figure fig-MeaslesHeat is a nice example of a heat map, heat maps can often be represented in a better way, especially if there is a temporal component. Consider Cleveland’s barley example earlier in this chapter. A reworking of the measles case study in Figure fig-MeaslesGAM uses a GAM model to make the same dramatic statement. Notice the two uses of color here, the purple semi-transparent dots in the background show all the US states in one pool, the bright orange line highlights the GAM trend line.