72  Data Classes

Broadly speaking there are two ways in which we can describe a data’s class.1

  • 1 Other dichotomous naming conventions include predictor and response variables in regression. Either may be be continuous or categorical. In tidy data notation we have ID (always categorical) and measure variables (typically continuous).

  • Continuous Categorical
    AKA Quantitative Qualitative, discrete, factor
    Description Can take any numeric value within a range Distinct groups that differ in qualities
    Example Weight, continuous time, gene expression Location, genotype, time point

    A data’s class is malleable $mdash; it can change depending on how we understand it. For example we can break up a continuous variable, like p-values for many tests into those below and above a certain threshold.

    Sometimes we convert type when plotting. For example, we transform a continuous variable to a discrete variable into artificial artificial categories when we apply a binning statistic for histograms.

    There are three typical scales of categorical variables: binary, nominal and ordinal. Binary is the most basic type of data we can have. They are defined according to the properties in the following table:

    Table : Types of Categorical Scales.

    Scale Ordered Quantitative Number of levels/groups Example
    Binary - - \(2\) Status: Present, Absent
    Nominal - - \(>2\) Location: Berlin, Paris, London
    Ordinal Y - \(\geq{2}\) Severity: low, medium, high
    Interval Y Y \(\geq{2}\) Time: 0h, 24h, 48h, 96h

    Traditionally interval does not refer to categorical variables. Rather it’s used to distinguish between continuous variables that do not having a natural zero which those that do, termed ratio variables. Although this is an interesting distinction for data analysis, I find it useful, in the context of data visualization, to refer to categorical variables that are quantitative, in addition to ordered. That is, an axis may be ordinal, but does the distance between the categories, or even the size of each category contain information as well? See example XX, below.