Multiple Comparison Criteria for Predicted Values

compare() and plot_compare()

Author

biomAid package

Published

April 27, 2026

Overview

compare() computes a pairwise comparison criterion — HSD, LSD, or Bonferroni-corrected LSD — for predicted values obtained from an ASReml-R V4 model. The criterion is a single number: two predicted values in the same group are declared significantly different if and only if their absolute difference exceeds it.

Three comparison types are available:

Type	Formula	Multiple-testing control	Best used when
`"HSD"` (Tukey)	\frac{\overline{\text{SED}}}{\sqrt{2}} \times q_{1-\alpha}(n_g, \nu)	Family-wise (exact for balanced designs)	Routine variety/treatment comparisons
`"LSD"`	\overline{\text{SED}} \times t_{\alpha/2,\,\nu}	None	Exploratory or preplanned single comparisons
`"Bonferroni"`	\overline{\text{SED}} \times t_{\alpha/(2m),\,\nu}	Family-wise (conservative)	Small number of specific comparisons

The companion function plot_compare() turns a compare() result directly into publication-ready visualisations: a sorted dot plot with a criterion-width band, a compact letter display (CLD), or a pairwise significance heatmap.

Mathematical Framework

Prediction error variance and the average SED

Let \hat{\boldsymbol{\tau}}_g = (\hat{\tau}_{1g}, \ldots, \hat{\tau}_{n_g g})^\top be the vector of n_g predicted values in group g, with prediction error variance–covariance matrix

\mathbf{V}_g = \operatorname{Var}(\hat{\boldsymbol{\tau}}_g - \boldsymbol{\tau}_g),

returned by predict(model, classify = term, vcov = TRUE)$vcov in ASReml-R V4.

The standard error of a difference (SED) for the pair (i, j) is

\text{SED}_{ij} = \sqrt{V_{ii} + V_{jj} - 2V_{ij}}.

compare() uses the arithmetic mean SED across all m = \binom{n_g}{2} pairs as the representative spread of prediction error:

\overline{\text{SED}}_g = \frac{1}{m}\sum_{i < j} \text{SED}_{ij}.

Using the average SED (rather than a single representative pair) is robust when PEVs vary across units — as they do in unbalanced designs and spatial models.

Comparison criteria

Let \nu = \texttt{model\$nedf} be the error degrees of freedom.

Tukey HSD — controls the family-wise error rate (FWER) exactly for balanced designs by referring to the Studentised range distribution:

\text{HSD}_g = \frac{\overline{\text{SED}}_g}{\sqrt{2}} \times q_{1-\alpha}(n_g, \nu),

where q_{1-\alpha}(n_g, \nu) is the upper \alpha critical value of the Studentised range with n_g means and \nu degrees of freedom.

LSD — no multiplicity correction; equivalent to a two-sided t-test at level \alpha for each pair:

\text{LSD}_g = \overline{\text{SED}}_g \times t_{\alpha/2,\,\nu}.

Bonferroni — applies the Bonferroni correction across all m pairs, controlling the FWER conservatively:

\text{Bonferroni}_g = \overline{\text{SED}}_g \times t_{\alpha / (2m),\,\nu}.

For all three types: |\hat{\tau}_i - \hat{\tau}_j| > \text{criterion} declares pair (i, j) significantly different.

Posterior variance (`pev = FALSE`)

When term is in the random part of the model, the PEV \mathbf{V}_{\hat{\boldsymbol{\tau}}} underestimates the uncertainty of the difference between BLUPs. Setting pev = FALSE substitutes the posterior variance

\operatorname{Var}(\hat{\boldsymbol{\tau}} \mid \mathbf{y}) = \mathbf{G} - \mathbf{V}_{\hat{\boldsymbol{\tau}}}

which accounts for the shrinkage in BLUP estimation and gives a more appropriate measure of uncertainty for genotypic BLUPs.

Setup

Simulated data and model

Requires: ASReml-R V4. The vignette is pre-rendered so no licence is needed to read it; run the code chunks only if you have ASReml-R installed.

We use simTrialData() to generate a balanced multi-environment trial with 20 varieties, 4 sites, and 3 nitrogen treatments (T0, T1, T2).

out <- simTrialData(
  nvar          = 20L,
  nsite         = 4L,
  treatments    = c("T0", "T1", "T2"),
  treat_effects = c(0, 80, 200),
  nrep          = 3L,
  seed          = 42L,
  verbose       = FALSE
)
dat        <- out$data
dat$Row    <- factor(dat$Row)
dat$Column <- factor(dat$Column)

We fit a fixed Treatment × Site × Variety model with site-within-replicate random effects and a spatial residual structure. Variety must be in the fixed effects so that predict() returns BLUEs for every Variety × Site combination — the basis for the comparison criteria.

asreml.options(trace = FALSE)
model <- asreml(
  yield    ~ Treatment * Site * Variety,
  random   = ~ at(Site):Rep,
  residual = ~ dsum(~ ar1(Row):ar1(Column) | Site),
  data     = dat,
  maxit    = 30L
)
model <- update(model)

Example 1: HSD — single global group

The simplest use: compare all variety BLUEs across the trial in a single global group (no by argument). compare() calls predict() internally using term as the classify.

res1 <- compare(
  model,
  term = "Variety",
  type = "HSD"
)

head(res1[order(res1$predicted.value, decreasing = TRUE), ])

   Variety predicted.value std.error      HSD    avsed
12   Var12        5232.027  72.71496 296.5355 83.20462
9    Var09        5133.987  72.87949 296.5355 83.20462
20   Var20        4983.435  72.97436 296.5355 83.20462
1    Var01        4918.769  72.77043 296.5355 83.20462
11   Var11        4909.933  72.80557 296.5355 83.20462
7    Var07        4904.040  73.00495 296.5355 83.20462

The HSD column is constant within the group — all predicted values share the same criterion. Two varieties are significantly different if |pred_i - pred_j| > HSD. The avsed column records the average SED on which the criterion is based.

Applying the criterion manually:

# Are Var01 and Var20 significantly different?
preds <- setNames(res1$predicted.value, as.character(res1$Variety))
diff  <- abs(preds["Var01"] - preds["Var20"])
cat("|Var01 - Var20| =", round(diff, 1L),
    "  HSD =",           round(res1$HSD[1L], 1L),
    "  Significant:",    diff > res1$HSD[1L], "\n")

|Var01 - Var20| = 64.7   HSD = 296.5   Significant: FALSE

Example 2: HSD by site

Adding by = "Site" performs independent comparisons within each site, using each site’s own prediction error variance matrix and thus its own criterion. This is the most common usage: 20 varieties compared within each of 4 sites.

res_hsd <- compare(
  model,
  term = "Variety:Site",
  by   = "Site",
  type = "HSD"
)

# HSD varies by site (different PEV per site from the spatial model)
unique(res_hsd[, c("Site", "HSD", "avsed")])

  Site      HSD    avsed
1 Env1 582.2917 163.3847
2 Env2 611.5340 171.5898
3 Env3 602.9235 169.1737
4 Env4 574.6885 161.2513

The HSD differs between sites because the spatial AR1×AR1 residual structure gives each site its own effective error variance. Sites with tighter spatial control (lower avsed) have a smaller, more powerful criterion.

Example 3: Multi-factor `by`

When predictions span multiple factors, by accepts a character vector. Here varieties are compared within every Site × Treatment combination — asking “which varieties perform best under each treatment at each site?”. The colon-string form "Site:Treatment" is equivalent to the vector form.

# Compare varieties within each Site × Treatment combination
res3a <- compare(model, term = "Variety:Site:Treatment",
                 by = c("Site", "Treatment"), type = "HSD")
res3b <- compare(model, term = "Variety:Site:Treatment",
                 by = "Site:Treatment", type = "HSD")

# HSD per Site × Treatment group (first 6 of 12 groups)
unique(res3a[, c("Site", "Treatment", "HSD", "avsed")])[1:6, ]

  Site Treatment      HSD    avsed
1 Env1        T0 1011.127 283.7111
2 Env1        T1 1007.075 282.5742
3 Env1        T2 1006.158 282.3169
4 Env2        T0 1060.797 297.6481
5 Env2        T1 1061.415 297.8215
6 Env2        T2 1060.997 297.7040

The criterion varies across Site × Treatment combinations because each sub-environment has its own spatial error structure.

Example 4: LSD and Bonferroni

The three criteria differ only in the multiplier applied to avsed. LSD is the most liberal (no correction); Bonferroni is the most conservative; HSD sits between them for most group sizes.

res_lsd <- compare(model, term = "Variety:Site", by = "Site", type = "LSD")
res_bon <- compare(model, term = "Variety:Site", by = "Site",
                   type = "Bonferroni")

# Criteria for Env1 (res_hsd already computed in Example 2)
env1 <- function(r, col) round(unique(r[[col]][r$Site == "Env1"]), 1L)
cat("Env1 criteria (20 varieties):\n")

Env1 criteria (20 varieties):

cat("  LSD        =", env1(res_lsd, "LSD"),        "\n")

  LSD        = 321

cat("  HSD        =", env1(res_hsd, "HSD"),        "\n")

  HSD        = 582.3

cat("  Bonferroni =", env1(res_bon, "Bonferroni"), "\n")

  Bonferroni = 600.7

For 20 varieties, m = \binom{20}{2} = 190 pairs. The Bonferroni correction is very conservative at this scale — HSD is generally preferred for large variety trials.

Example 5: Variety BLUPs with `pev = FALSE`

When variety is in the random effects the PEV understates uncertainty because it ignores the shrinkage towards zero inherent in BLUP estimation. Setting pev = FALSE substitutes the posterior variance \mathbf{G} - \mathbf{PEV}, giving a more appropriate — and larger — criterion.

We refit the model with Variety in the random effects and Treatment × Site fixed, so predict() returns BLUPs rather than BLUEs.

model_rnd <- asreml(
  yield    ~ Treatment * Site,
  random   = ~ Variety + at(Site):Rep,
  residual = ~ dsum(~ ar1(Row):ar1(Column) | Site),
  data     = dat,
  maxit    = 30L
)
model_rnd <- update(model_rnd)

res_pev  <- compare(model_rnd, term = "Variety", type = "HSD", pev = TRUE)
res_post <- compare(model_rnd, term = "Variety", type = "HSD", pev = FALSE)

cat("HSD (PEV):                    ", round(res_pev$HSD[1L],  1L), "\n")

HSD (PEV):                     334.1

cat("HSD (posterior variance):     ", round(res_post$HSD[1L], 1L), "\n")

HSD (posterior variance):      2004.9

cat("Ratio (posterior / PEV HSD):  ",
    round(res_post$HSD[1L] / res_pev$HSD[1L], 2L), "\n")

Ratio (posterior / PEV HSD):   6

The posterior-variance HSD is larger than the PEV-based HSD, reflecting the additional uncertainty that BLUP shrinkage introduces. Using pev = TRUE on a random-effects term underestimates the true uncertainty of a difference between BLUPs and will declare too many pairs significant.

Note: pev = FALSE is only meaningful when term is in the random part of the model. When the term is fixed (as in Examples 1–4 above), pev = TRUE (the default) is always correct.

Example 6: Adjusting `alpha`

A stricter significance level widens the criterion, reducing the number of significant pairs.

r_05 <- compare(model, term = "Variety:Site", by = "Site",
                type = "HSD", alpha = 0.05)
r_01 <- compare(model, term = "Variety:Site", by = "Site",
                type = "HSD", alpha = 0.01)

env1_05 <- unique(r_05$HSD[r_05$Site == "Env1"])
env1_01 <- unique(r_01$HSD[r_01$Site == "Env1"])
cat("Env1  alpha=0.05 HSD:", round(env1_05, 1L), "\n")

Env1  alpha=0.05 HSD: 582.3

cat("Env1  alpha=0.01 HSD:", round(env1_01, 1L), "\n")

Env1  alpha=0.01 HSD: 657.4

Visualisation with `plot_compare()`

plot_compare() reads the compare() result directly and infers the grouping structure automatically — no extra arguments are needed to handle by groups.

Dot plot — `type = "dotplot"`

Varieties are sorted by predicted value, highest at the top. A shaded band of width equal to the HSD is drawn below the top-ranked variety. Any variety within the band is not significantly different from the best; those outside (coloured red) are. The dashed vertical line marks the top-ranked predicted value.

p1 <- plot_compare(res_hsd |> subset(Site == "Env1"), type = "dotplot") +
  ggtitle("Variety comparison — Env1 (HSD)")
print(p1)

When a by-grouped result is passed, each group becomes its own facet panel with its own site-specific criterion band:

p2 <- plot_compare(res_hsd, type = "dotplot") +
  ggtitle("Variety comparison by site (HSD)",
          subtitle = "Band width = site-specific HSD")
print(p2)

Reference variety — `reference =`

Setting reference centres the band on a named variety rather than the best. Points are coloured three ways: green (significantly better), blue (not significantly different), red (significantly worse). The reference variety is shown as a filled diamond.

This is the natural way to ask “which varieties are significantly better or worse than our commercial check?”

# Use Var10 as the commercial check reference
p3 <- plot_compare(res_hsd, type = "dotplot", reference = "Var10") +
  ggtitle("Variety comparison — reference: Var10",
          subtitle = "Green = sig. better | Blue = not sig. | Red = sig. worse")
print(p3)

Half-criterion error bars — `type = "errbar"`

Each variety is shown as a point (its predicted value) with a horizontal line segment spanning \hat{\tau} \pm \frac{1}{2}\text{HSD}. The visual significance rule is exact: two varieties are significantly different if and only if their bars do not overlap. This is equivalent to the criterion test |\hat{\tau}_i - \hat{\tau}_j| > \text{HSD}.

The error bar display is particularly useful when interest lies in all pairwise comparisons rather than just comparisons with the best variety — every non-overlapping pair stands out directly, not just those that fall outside a single band.

p_eb1 <- plot_compare(res_hsd |> subset(Site == "Env1"), type = "errbar") +
  ggtitle("Half-HSD error bars — Env1",
          subtitle = "Non-overlapping bars = significant difference")
print(p_eb1)

With a by-grouped result, each site gets its own panel. Bar widths differ between sites because each site has its own spatial error variance and therefore its own HSD.

p_eb2 <- plot_compare(res_hsd, type = "errbar") +
  ggtitle("Half-HSD error bars by site",
          subtitle = "Bar width = site-specific HSD/2")
print(p_eb2)

The criterion type changes the bar width directly — LSD bars are shorter (more discriminating) and Bonferroni bars are wider (more conservative). This makes the effect of the multiplicity correction immediately visible:

p_eb3 <- plot_compare(res_lsd |> subset(Site == "Env1"), type = "errbar") +
  ggtitle("Half-LSD error bars — Env1",
          subtitle = "LSD bars are shorter than HSD bars — more pairs declared significant")
print(p_eb3)

Compact letter display — `type = "letters"`

The compact letter display (CLD) is the standard reporting format in plant science. Varieties sharing at least one letter are not significantly different. Letters are assigned by the descending sweep algorithm using the HSD as the significance threshold.

p4 <- plot_compare(res_hsd |> subset(Site == "Env1"), type = "letters") +
  ggtitle("Compact letter display — Env1 (HSD)")
print(p4)

With by-groups, each site gets its own panel with independent letter assignments:

p5 <- plot_compare(res_hsd, type = "letters") +
  ggtitle("CLD — 20 varieties by site",
          subtitle = "Letters are independent within each site")
print(p5)

Pairwise significance heatmap — `type = "heatmap"`

The heatmap shows all \binom{20}{2} = 190 pairwise absolute differences as a colour-coded matrix. A white cross (×) marks pairs that exceed the HSD criterion. For 20+ varieties this reveals the full discrimination pattern at a glance — something no letter display can convey.

p6 <- plot_compare(res_hsd, type = "heatmap") +
  ggtitle("Variety pairwise significance — 4 sites",
          subtitle = "White × = significant at HSD level")
print(p6)

Interactive dotplot — `interactive = TRUE`

Setting interactive = TRUE returns a pc_interactive object. When printed, it renders an interactive plotly widget in the RStudio Viewer pane (not the Graphics pane). You can still use + to add ggplot2 layers before the conversion.

The dotplot has a special hover interaction: hovering over any variety moves the criterion band to be centred on that variety’s predicted value and recolours all points relative to it (green = significantly better, blue = not significant, red = significantly worse). Unhovering restores the original state.

# Single-site subset keeps the demo focused on the hover interaction
as_plotly(plot_compare(res_hsd |> subset(Site == "Env1"),
                       type = "dotplot", interactive = TRUE))

Note: interactive = TRUE requires the plotly package (install.packages("plotly")). The widget is embedded directly in the HTML document — hover over any variety point to move the criterion band and recolour all points relative to it.

Multi-panel use: pass the full by-grouped result to get one facet per group. as_plotly() automatically adjusts subplot spacing; set height (pixels) if you need more room, e.g. as_plotly(plot_compare(res_hsd, ...), height = 700).

Extracting plot data — `return_data = TRUE`

All three plot types support return_data = TRUE, which returns the tidy data frame used to build the plot:

df_dot <- plot_compare(res_hsd, type = "dotplot", return_data = TRUE)
head(df_dot[, c("Group", "Variety", "pred", "crit", "rank", "sig")])

   Group Variety     pred     crit rank sig
33  Env1   Var09 5144.623 582.2917    1  ns
45  Env1   Var12 5114.135 582.2917    2  ns
1   Env1   Var01 4904.885 582.2917    3  ns
77  Env1   Var20 4880.979 582.2917    4  ns
13  Env1   Var04 4777.878 582.2917    5  ns
41  Env1   Var11 4751.895 582.2917    6  ns

df_heat <- plot_compare(res_hsd |> subset(Site == "Env1"),
                        type   = "heatmap",
                        return_data = TRUE)
head(df_heat[, c("Var_x", "Var_y", "diff", "sig")])

  Var_x Var_y      diff   sig
1 Var09 Var09   0.00000 FALSE
2 Var12 Var09  30.48846 FALSE
3 Var01 Var09 239.73812 FALSE
4 Var20 Var09 263.64433 FALSE
5 Var04 Var09 366.74497 FALSE
6 Var11 Var09 392.72884 FALSE

The ggplot object returned by plot_compare() can also be extended with +:

plot_compare(res_hsd, type = "letters") +
  ggplot2::ggtitle("Tukey HSD — 20 varieties across 4 sites",
                   subtitle = "alpha = 0.05 | Letters independent per site") +
  ggplot2::xlab("Predicted yield (kg/ha)")

Summary

`compare()`

Task	Specification
HSD (Tukey) criterion	`type = "HSD"` (default)
LSD (unadjusted) criterion	`type = "LSD"`
Bonferroni criterion	`type = "Bonferroni"`
One global group	`by = NULL` (default)
Single factor by-group	`by = "Site"`
Multi-factor by-group	`by = c("Site", "Treatment")` or `by = "Site:Treatment"`
Stricter significance level	`alpha = 0.01`
Use posterior variance for BLUPs	`pev = FALSE`
Pass extra args to `predict()`	`...` (e.g. `present = ...`)

`plot_compare()`

Task	Specification
Sorted dot plot with criterion band	`type = "dotplot"` (default)
Half-criterion error bars (non-overlap = significant)	`type = "errbar"`
Compact letter display	`type = "letters"`
Pairwise significance heatmap	`type = "heatmap"`
Band anchored to best variety	`reference = NULL` (default)
Band anchored to specific variety	`reference = "VarName"`
Interactive plotly widget	`interactive = TRUE`
Extend before plotly conversion	`result + ggtitle(...)`
Export tidy data frame	`return_data = TRUE`

References

Tukey, J.W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114.

Piepho, H.P. (2004). An algorithm for a letter-based representation of all pairwise comparisons. Journal of Computational and Graphical Statistics, 13(2), 456–466.

Overview

Mathematical Framework

Prediction error variance and the average SED

Comparison criteria

Posterior variance (pev = FALSE)

Setup

Simulated data and model

Example 1: HSD — single global group

Example 2: HSD by site

Example 3: Multi-factor by

Example 4: LSD and Bonferroni

Example 5: Variety BLUPs with pev = FALSE

Example 6: Adjusting alpha

Visualisation with plot_compare()

Dot plot — type = "dotplot"

Reference variety — reference =

Half-criterion error bars — type = "errbar"

Compact letter display — type = "letters"

Pairwise significance heatmap — type = "heatmap"

Interactive dotplot — interactive = TRUE

Extracting plot data — return_data = TRUE