Jittering

In the previous chapter, we saw that jittering is useful when all points lie on a single axis, and we’ll see that again later on in this chapter. It’s clear what benefit jitter serves, so that wasn’t really controversial. In contrast, the previous example used jittering on two dimensions and gave the impresison of having more precision than we originally had. That’s perfectly acceptable, given that we are transparent about data transformations. We can make a convincing argument in favour of jittering in that it reveals more data points than we would otherwise, and all statistics are anyway calculated on the original non-transformed data.

So what would be an alternative? Well, we could have set the size of the circle to the number of values.

Figure 1: Change the size parameter.

Which is actually quite nice, even if size is not the most efficient encoding element. We do encounter a problem, however, when we try to include more species. When circles of the same size overlap, the data can be obscured, and I find that the large circles draw our attention. For example, if we had to draw the linear model for virginica by hand, the slope would probably be steeper, since we may disregard the influence of the less frequent values at the edge of the distribution. Faceting may solve this problem, but that point of this visualisaiton is to compare the three models in one plot.

Figure 2: Change the size parameter.

Figure 3: The vocab data set - imprecision due to integer data.

Here is a clear case for using jittering in two dimensions.

Figure 4: The vocab data set - 2-dimensional jittering.

previously, we mapped the total number of observations to size, we can do the same thing here…

Figure 5: size according to total number of observations at each point.

But actually, that’s not really the story of this data set, what we want to know is the relationship between vocab score and education. A more interesting statistic to map onto size would be the proportion of each vocab score per education group.

Figure 6: size according to proportion of vocab score in each education level.

So we can see that there is a trend towards higher vocab scores given higher education scores. Let’s draw some linear models and see how that has changed over time.

Figure 7: lm per year.