2D Density

Showing Distributions in Two Dimensions with a Smoothed Scatter Plot

A smoothed scatter plot is an interpretation of a two-dimensional density plot. Either drawing lines of the density function or using shading for a “smoothed scatter” are solutions to the problem of plotting large data sets (i.e. dense scatter plots). In this case, the color of the lines can also be used to represent density. In the example shown in fig. Figure fig-smoothed-scatter, the duration of eruptions and wait time between eruptions of the Old Faithful geyser in Yellowstone National Park in the US are plotted. The two variables are clearly positively correlated, but applying a linear model to explain their relationship would be insufficient, because the variables are both bi-modal, independently and in combination. This becomes evident when using density lines or a smoothed scatter. There are clearly two clusters, short eruptions occur after less than a one hour wait, longer eruptions occur after waiting approximately 80 minutes.

(a) A scatter plot of the Old Faithful geyser data set draws attention to the positive correlation between eruption duration and waiting time.

(b) A two-dimensional density plot highlights the clusters.
Figure 1: Two-dimensional density plots are an alternative to scatter plots when clustered points are of interest.

Figure 2: This trend is clearer when the contour lines are coloured according to density, or when a colour gradient is used.

Figure 3: This trend is clearer when the contour lines are coloured according to density, or when a colour gradient is used.

Figure 4: A lower bandwidth allows a more detailed image of the data to emerge.

Figure 5: A lower bandwidth allows a more detailed image of the data to emerge.