Parallel Plots

Comparing Position in Multiple Variables using Parallel Plots

When we made the dot plots of the iris data set (Fig. @ref(fig:iris-final), we took the approach that instead of five variables, we actually only had three: species, item measured and measurement (or “value”“). An extension of the dot plot is the parallel plot, where each observation is connected with a line.

Parallel plots of the iris data set drawn on three different scale types. Top: The raw data. Middle: Free scale: x minus the mean(x)/sd, and Bottom: Minimum value = 0, maximum value = 1.

Parallel plots of the iris data set drawn on three different scale types. Top: The raw data. Middle: Free scale: x minus the mean(x)/sd, and Bottom: Minimum value = 0, maximum value = 1.

Parallel plots of the iris data set drawn on three different scale types. Top: The raw data. Middle: Free scale: x minus the mean(x)/sd, and Bottom: Minimum value = 0, maximum value = 1.

paracord.

Our focus is on how the species vary according to the item measured (i.e. Lengths and Widths of both Sepals and Petals). The untransformed data is plotted in figure @ref(fig:Iris-parallel-plots), top. Notice the two variables we plotted as a scatter plot: Sepal Length versus Sepal Width. We can see the inverse relationship between versicolor and virginica compared to setosa. This is visible as two distinct groups in a scatter plot (see Fig. @ref(fig:iris-final)). What would a scatter plot of Petal Length versus Petal width look like?

Since the x-axis represents a nominal comparison, we can take advantage of the ordering of the categorical variable that best suits our needs. In this case the plotting function algorithm orders variables according to the most dramatic separation between any one class and the rest. This is distinct from the overall variation between classes. To do this, the F-statistic for each class can be calculated and compared to the rest. The axis variables were then plotted in order of decreasing order to emphasize the most discriminatory variable first.

There are a number of y-axis transformations that can be applied to the parallel plots. An example of a fixed scale transformation is plotted for the original values, minus the mean/sd (fig. @ref(fig:Iris-parallel-plots), middle). This transformation makes the Sepal Length and Sepal Width relationship more obvious. A free scale allows each variable to be plotted on its own scale, where the minimum is given 0 and the maximum 1 (fig. @ref(fig:Iris-parallel-plots), bottom). By placing each sub-group on a different scale, we can get an impression of the relative distribution of each group.