Lines — Density Plots

Density plots are an excellent way of visualizing the distribution of univariate data sets. Despite being less common than histograms, for example, and therefore poorly understood, they are intuitive. 1

Density plots are based on a “kernel density estimate” (KDE). Everitt and Hothorn describe the kernel estimator as:

… a sum of bumps placed at the observations. The kernel function determines the shape of the bumps while the window width h determines their width.

As an example, we will focus on a simple data set of 8 numbers:

\(x = (0.0, 1.0, 1.1, 1.5, 1.9, 2.8, 2.9, 3.5)\)

Using a Gaussian KDE with a bandwidth (h) of 0.4, as shown in Figure fig-density-small.

Figure 1: An example of how the Gaussian density function (red curve) is derived from 8 data points (green tick marks). The density of each individual data point (purple curves) reaches a maximum of 0.125 (1/8).

Notice that the density function creates values outside the range of the given values (e.g. less than zero in Figure fig-density-small). Although this can be confusing at first, by looking at the underlying ``bumps’’, we can understand how the density function was calculated without having to know the mathematical formulae underpinning the function.

The KDE can take a Gaussian, triangular or rectangular form. The choice can have a considerable affect on the visual outcome, but Gaussian is the preferred choice. In addition to the shape, changing the bandwidth can misleadingly give the appearance of several clusters (low bandwidth) or a more smoothed distribution (high bandwidth), as shown below.

To see the effect that bandwidth has on the shape of the density plot, consider the density plots using the mammalian total sleep time shown in Figure fig-density-example, using different bandwidth values. It is easy to imagine how our understanding of the data’s distribution is affected by using an inappropriate bandwidth!

Figure 2: Density plots of mammalian total sleep time appear strikingly different when using different bandwidths.