Adding Variables

Plot multivariate data sets

So far, we have been working with a bivariate data set, visualising the relationship between the distributions of two continuous variables. However, the only thing this plot shows us is that there is a strong positive correlation between the two variables. This is obvious, even without a regression line, and begs the question: What is the purpose of this visualisation? 1 Remember that versicolor and virginica, indistinguishable from each other, are distinct from setosa. They need to be plotted together on a multivariate plot.

  • 1 If we only want to emphasise the magnitude of the correlation coefficient (Pearson’s R = 0.74), then communicating it in the text would have sufficed.

  • Four options for encoding categorical variables. First: Shape does not provide enough distinction among groups. _Second-: Shape and colour are not necessary since they are redundant. Third: Colour alone allows easy identification of each group. Fourth: Colour and circle outlines allow group distinction and also relieves some problems of over-plotting not solved by jittering.

    Four options for encoding categorical variables. First: Shape does not provide enough distinction among groups. _Second-: Shape and colour are not necessary since they are redundant. Third: Colour alone allows easy identification of each group. Fourth: Colour and circle outlines allow group distinction and also relieves some problems of over-plotting not solved by jittering.

    Four options for encoding categorical variables. First: Shape does not provide enough distinction among groups. _Second-: Shape and colour are not necessary since they are redundant. Third: Colour alone allows easy identification of each group. Fourth: Colour and circle outlines allow group distinction and also relieves some problems of over-plotting not solved by jittering.

    Four options for encoding categorical variables. First: Shape does not provide enough distinction among groups. _Second-: Shape and colour are not necessary since they are redundant. Third: Colour alone allows easy identification of each group. Fourth: Colour and circle outlines allow group distinction and also relieves some problems of over-plotting not solved by jittering.

    Adding more information to the plot means we need to decide on which aesthetic attributes will best distinguish the different species. For categorical variables, the most distinguishable encoding elements are colour and shape. We only need one. Using two encoding elements for a single group only serves to confuse the reader. Each piece of information should be encoded by only one element. Consider the four variations shown in the figure above. Filled shapes are not easily distinguishable. Colour allows us to see the three groups very easily, but in this case is redundant with shape. Alpha-blending and open circles are useful complements to jittering.