Heatmaps

Multivariate Comparisons using Heat Maps

Heat maps are often used in combination with a clustering algorithm, although how data is clustered and how it is represented are independent. Here we will use the Golub data set comparing gene expression in different types of cancer cell lines to draw a heat map and consider some of the difficulties in this type of visualisation.

Consider a heat map depicting gene expression log2 fold-changes. log2 fold-change occurs on a continuous scale, but is often depicted with varying shades (saturation) of red and green. The first difficulty is that the reader must associate red (“bad”, “negative”) and green (“good”, “positive”) with a direction. This is unintuitive and raises the question of how no change, i.e. 0, should be represented. Often the choice is of colour black. There is clearly no logical reason why black is situated between red and green, but in this case it has been used to mask all uninteresting values with a log2 ratio close to 0. It is nearly impossible to interpret exact values from heat maps because of local effects on colour perception. In addition, colour saturation draws attention away from other more interesting values, for example a large number of moderately high log2 values, as opposed to the few very high values. Heat maps continue to be a common visualisation tool, although their popularity is arguably decreasing.

Colour is not appropriate for encoding continuous variables, but is popular with heat maps. Left: topological colour scheme, Middle: Red-black-green colour scheme, Right: sequential colour scheme.