86  Best Practices

In the last sections we saw some great ways to avoid pie charts, in particular when you have many slices or there are interesting trends in the data that pie charts don’t convey. Nonetheless, sometimes pie charts are really just perfect. In particular when you have a small number of categories and can arrange the pie charts in a sensible manner.

Here, I’ll use the mushroom data set, which is popular in machine learning exercises. The underlying research question is, given many input variables, can we predict if a mushroom is going to be poisnous or not? Figure 86.1 is a very rough exploratory plot of two variables that influence edibility: the cap color and the stalk color.

Figure 86.1: An exploratory plot for the mushroom data set

Aside from the terrible color choice (see section XX). This plot is difficult to read because we can only roughly see the separation between the two categories. It’s essentially a scatter plot where the x and y axes are categorical (see scatter plots XX and Vocab example XXX for a thorough discussion of this plot type). Each observation is represented by one dot, which is draw in the order in which it appears in the data set. To show the pitfalls here, I’ve ordered the data set so that all the edible mushrooms appear first. That means that this entire, interesting, subset is obscured, although we think we have a good exploratory plot!

Let’s simplify the matter. We’re just interested in the ratio of edible to poisonous mushrooms in each category. With only two groups, we can manage a grid of different observations, same variables pie charts, depicted in fig. Figure 86.2.

Figure 86.2: An example of a differen observations, same variable pie chart that works well.