Dotplots I

Comparing a Continuous and a Categorical Variable using Dot Plots

Scatter plots excel at comparing two continuous variables. If we wanted to plot several continuous variables, we can use a dot plot. 1

  • 1 Although you may be tempted to draw a 3-dimensional scatter plot for three variables, these are quite difficult for the reader to decode.

  • Thus far, we have considered five variables: Species, Sepal Length, Sepal Width, Petal Length and Petal Width. However, there is another way of looking at our data set. We have only three variables: Species, Item measured (both categorical) and Measurement (or “value”“, a continuous variable). 2

  • 2 This is essentially how we understand data with bar plots (see page @ref(sub:Bar-charts)), but instead of using bars, we will use dots to represent each observation.

  • Consider the two examples shown in figure @ref(fig:iris-dot-plots). In the first, the item measured is on the x-axis and colour encoding is used to identify the species. In the second, the encoding of the two variables are reversed. In both cases the measurement is on the y-axis.

    Dot plots of the iris data set. Left: All data points visible, sorting by measurement (top) or species (bottom). Right: Mean and standard deviations of each sub-group only.

    Dot plots of the iris data set. Left: All data points visible, sorting by measurement (top) or species (bottom). Right: Mean and standard deviations of each sub-group only.

    Dot plots of the iris data set. Left: All data points visible, sorting by measurement (top) or species (bottom). Right: Mean and standard deviations of each sub-group only.

    Dot plots of the iris data set. Left: All data points visible, sorting by measurement (top) or species (bottom). Right: Mean and standard deviations of each sub-group only.

    Sorting according to species on the x-axis is more intuitive because we are interested in how measured properties 3(the dependent variables) vary according to species (the independent variable). We can see that sepal length and petal width are positively correlated whereas sepal width and petal length are negatively correlated.

  • 3 Although the dot plot presents more data, is it more difficult to identify setosa as the linearly distinguishable species.