Bars — Histograms

Histograms are one of the most common visualizations of univariate data. However, they suffer from two main deficiencies. First, they are sensitive to the choice of binning statistic used. For example, trends like bi-modal distributions can be masked by choosing wide bins. The binning statistic adds one degree of separation to our visualization. Therefore, it is worthwhile to think about visualizations that do not rely on arbitrary choices like binning. Neither strip charts nor box plots transform the data based on a binning or distribution statistic, and are appropriate alternatives to histograms and density plots. Second, histograms are not suitable for super-positioning. If we want to compare multiple distributions, the plot can get rather complicated. This is discussed further in section sec-multiple-histograms.

There are three types of histograms:

Frequency
of actual counts.
Probability Density
where the histogram has a total area of 1, and
Cumulative
where successive bins provide a running count of all previous values.
Sturges’ method
creates implicitly sized bin sizes dependent on the range of the data. This is a common default choice in many software packages. Scott’s method creates bin sizes based on the estimate of the standard error.

To understand how the three types differ, consider the following histograms based on the total sleep time we have been working with so far.

Figure 1: Histograms representing the same data using different visual parameters.