Pie Charts I

Showing parts-of-a-whole using Pie Charts and Stacked Bar Charts

We are often interested in understanding parts-of-a-whole, that is, the proportions of each group in a categorical variable. Molecular biologists will be familiar with a common part-of-a-whole question concerning gene classifications. In a list of significantly over-expressed genes identified by a large screen, like a microarray or a mass spectrometry, we can ask the question which gene families are more abundant than expected? which are less abundant? Consider the single variable, hair color in \(n = 592\) individuals.

Hair N
Black 108
Brown 286
Red 71
Blond 127

A common way to view this data would be a pie chart, which at this point is not to bad. There is only one variable, hair color, and there are only four groups to consider.

Figure 1: a pie chart and a stacked bar chart of the hair color of 592 individuals.

In this case a stacked bar chart, although easier to read, it not that much more informative than the pie chart. This changes quickly once we start adding variables, like eye color and sex. If we continue along these lines we’ll end up with a collection of pie charts and bar charts which fail to serve the purpose of

Males
Hair Eye N
Black Brown 32
Brown Brown 53
Red Brown 10
Blond Brown 3
Black Blue 11
Brown Blue 50
Red Blue 10
Blond Blue 30
Black Hazel 10
Brown Hazel 25
Red Hazel 7
Blond Hazel 5
Black Green 3
Brown Green 15
Red Green 7
Blond Green 8
Females
Hair Eye N
Black Brown 36
Brown Brown 66
Red Brown 16
Blond Brown 4
Black Blue 9
Brown Blue 34
Red Blue 7
Blond Blue 64
Black Hazel 5
Brown Hazel 29
Red Hazel 7
Blond Hazel 5
Black Green 2
Brown Green 14
Red Green 7
Blond Green 8

Beginning with

So the real question we are interested in is not just the proportions within a categorical variable, but which groups are over or under-represented. This qould require that we have some a priori

intersection between , but The implicit question is if any sub-groups are over- or under-represented.

As an example, we will consider the results for a survey of male and female hair and eye colour. We want to uncover and represent any biases in the data set which may reveal previously unappreciated associations between the three variables: sex, hair colour and eye colour.

As we have seen, bar charts are usually the first choice for nominal comparisons (fig. @ref(fig:bar-counts)). However, the major deficiency here is that we are presenting the absolute values, when we are really interested in the parts-of-a-whole distribution. Bar charts don’t reveal any interesting trends, such as over- or under-representation without a large time investment from the reader.

As an alternative to the bar chart, pie charts and stacked bar charts are commonly used for representing parts-of-a-whole. Unfortunately, they both have major drawbacks in visual perception and don’t excel at communicating an effective story. Here we will show how this data is presented and provide a solution using mosaic plots in the next sub-section.

Use pie charts and stacked bar charts with caution

1

  • 1 Your readers will preferentially and intuitively compare different aspects of a pie chart. Regardless of their particular preference, imagine the difficulty in calculating the angle (\(\theta\)), the area (\(\theta r^{2}/2\)), or the arc length (\(r(\theta\pi/180)\)), which which is what you are effectively asking your readers to do in their minds.

  • Pie charts are particularly appealing for representing parts-of-a-whole data sets since they intuitively tell the reader that all parts add up to 100%. In addition, the sample size can be encoded in the radius of the circle (fig. Figure fig-bar-and-pie). The major disadvantage of pie charts is that they encode values using slice area, arc length or angle at center, all of which are fairly inaccurate methods of encoding quantitative information (see Figure fig-cont-encoding).

    Perhaps the only instance where pie charts are suitable is for representing large quantitative differences in a small number of groups, which begs the question if a visualization is even necessary.

    Figure 2: dissecting

    Figure 3: Multiple pie charts, stacked bar plots and a filled bar chart copared. Each set contains information on 32 unique combinations of hair, eye and sex. The multiple pie charts have become unweildly. Both bar chart variants are acceptable, each containing diffrent information.

    Stacked bar charts, plotted on a relative scale, depict the relative proportions of each sub-group in a categorical variable (Figure fig-stacked-bar-plots). This provides a common scale of relative abundances. Similar to the radius of pie charts, we can encode the sample size in the width of each bar.

    Stacked bar charts are an improvement over the pie charts since at least some of the sub-groups are plotted on a common scale. However, since we have four categories, only the two at the bottom and top of the bar chart benefit from this feature. In Figure fig-stacked-bar-plots we have plotted the three variables as three pair-wise plots. Although all our data is visualized, these plots fail to really tell a story, we still don’t know which sub-groups are over- or under-represented and the relationship between hair and eye color is not displayed, a third plot would be required for that.