Wald Tests on Fixed-Effect Contrasts

waldTest() and plot_waldTest()

Author

Julian Taylor

Published

April 27, 2026


Overview

waldTest() performs Wald tests on linear contrasts of predicted values obtained from predict(). Because it works entirely from a prediction list (predicted values + prediction error variance–covariance matrix), it requires no access to model internals and works with any mixed-model software that exposes a vcov-style prediction object — including ASReml-R V4.

Two complementary test types are available:

Type Null hypothesis Use case
"con" — contrast H_0: \mathbf{c}^\top\hat{\boldsymbol{\tau}} = 0 Pairwise, treatment vs control, custom comparisons
"zero" — joint zero H_0: \mathbf{Z}\hat{\boldsymbol{\tau}} = \mathbf{0} Are a set of effects simultaneously zero?

The companion function plot_waldTest() produces a publication-ready forest plot directly from the result.


Mathematical Framework

Predicted values and their variance

Let \hat{\boldsymbol{\tau}} = (\hat{\tau}_1, \ldots, \hat{\tau}_n)^\top be the vector of n predicted treatment effects (BLUEs or BLUPs), obtained from a mixed model with prediction error variance–covariance matrix

\operatorname{Var}(\hat{\boldsymbol{\tau}} - \boldsymbol{\tau}) = \mathbf{V}_{\hat{\boldsymbol{\tau}}}.

In ASReml-R V4 this is returned by predict(model, classify = ..., vcov = TRUE)$vcov.


Contrast tests

A contrast is a scalar linear combination \hat{c} = \mathbf{c}^\top\hat{\boldsymbol{\tau}} where \mathbf{c} is a user-specified contrast vector. The null hypothesis is

H_0: \mathbf{c}^\top\boldsymbol{\tau} = 0.

The estimated contrast and its standard error are

\hat{c} = \mathbf{c}^\top\hat{\boldsymbol{\tau}}, \qquad \operatorname{SE}(\hat{c}) = \sqrt{\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}}.

The Wald statistic is

W = \frac{\hat{c}^2} {\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}} = \frac{\bigl(\mathbf{c}^\top\hat{\boldsymbol{\tau}}\bigr)^2} {\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}} \;\overset{H_0}{\sim}\; \chi^2_1.

When an error degrees of freedom \nu is available (e.g. from a mixed model), the F-statistic F = W is referred to an F_{1,\nu} distribution, providing better finite-sample calibration.

Pairwise contrasts

For k treatment levels the \binom{k}{2} pairwise differences are a convenient special case. For levels i and j, the contrast vector \mathbf{c}_{ij} has +1 in position i, -1 in position j, and zeros elsewhere, so

\hat{c}_{ij} = \hat{\tau}_i - \hat{\tau}_j, \qquad \operatorname{Var}(\hat{c}_{ij}) = [\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{ii} - 2[\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{ij} + [\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{jj}.

Setting comp = "pairwise" in waldTest() generates all \binom{k}{2} contrasts automatically.

Custom contrast matrices

Multiple contrasts can be tested in a single call by supplying a contrast matrix \mathbf{C} (rows = contrasts, columns = levels). Each row \mathbf{c}_r^\top is tested independently:

W_r = \frac{\bigl(\mathbf{c}_r^\top\hat{\boldsymbol{\tau}}\bigr)^2} {\mathbf{c}_r^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}_r}, \quad r = 1, \ldots, m.


Joint zero tests

A joint zero test simultaneously tests whether a subset of q effects are all zero. Define the selection matrix \mathbf{Z} (q \times n) with a single 1 per row selecting the coefficients of interest. The null hypothesis is

H_0: \mathbf{Z}\boldsymbol{\tau} = \mathbf{0}.

The Wald statistic is

W = \hat{\boldsymbol{\tau}}^\top \mathbf{Z}^\top \bigl(\mathbf{Z}\,\mathbf{V}_{\hat{\boldsymbol{\tau}}}\,\mathbf{Z}^\top\bigr)^{-1} \mathbf{Z}\hat{\boldsymbol{\tau}} \;\overset{H_0}{\sim}\; \chi^2_q,

or F = W / q \sim F_{q,\nu} for the F-test form.


P-value adjustment

When many contrasts are tested the family-wise error rate (FWER) or false discovery rate (FDR) inflates. waldTest() passes all raw p-values within each group to p.adjust(). The key options are:

Method Controls Best used when
"none" Exploratory or single contrast
"bonferroni" FWER (conservative) Few contrasts, strict control needed
"holm" FWER (less conservative) Few to moderate contrasts
"fdr" / "BH" FDR Many contrasts, some true effects expected
"BY" FDR (arbitrary dependence) Dependent test statistics

Setup

Simulated data and model

We use simTrialData() to generate a balanced multi-environment trial with 20 varieties, 4 sites, and 4 nitrogen treatments (Control, LowN, MidN, HighN). The treatment effects are deliberately set to span a range — a small step from Control to LowN, a moderate step to MidN, and a large step to HighN — so that the resulting contrasts include a mix of non-significant, borderline, and clearly significant comparisons. A Treatment * Site fixed model allows treatment effects to vary across sites, which is reflected in the by-group testing section.

out <- simTrialData(
  nvar          = 20L,
  nsite         = 4L,
  treatments    = c("Control", "LowN", "MidN", "HighN"),
  treat_effects = c(0, 50, 120, 250),
  nrep          = 3L,
  seed          = 42L,
  verbose       = FALSE
)
dat            <- out$data
treatments     <- c("Control", "LowN", "MidN", "HighN")
dat$Row        <- factor(dat$Row)
dat$Column     <- factor(dat$Column)
asreml.options(trace = FALSE)
model <- asreml(
  yield    ~ Treatment * Site,
  random   = ~ Variety + at(Site):Rep,
  residual = ~ dsum(~ ar1(Row):ar1(Column) | Site),
  data     = dat,
  maxit    = 30L
)
model <- update(model)

Treatment BLUEs are obtained by predicting the Treatment term with a full prediction error variance–covariance matrix:

pred <- predict(model, classify = "Treatment", vcov = TRUE)
pred$pvals[, c("Treatment", "predicted.value", "std.error")]
  Treatment predicted.value std.error
1   Control        4305.265  114.1842
2      LowN        4375.257  114.1959
3      MidN        4510.323  114.1743
4     HighN        4596.544  114.1991

For the by-group examples (Example 6) we also obtain predictions for every Treatment × Site combination:

pred3 <- predict(model, classify = "Treatment:Site", vcov = TRUE)

Example 1: Pairwise contrasts

The most common use case — test all \binom{4}{2} = 6 pairwise differences among the nitrogen treatments. Setting comp = "pairwise" generates the full contrast matrix automatically.

res1 <- waldTest(
  pred,
  cc = list(
    list(coef = treatments, type = "con", comp = "pairwise")
  )
)

res1$Contrasts
        Comparison   Estimate Std.Error Wald.Statistic df  P.Value
1  Control vs LowN  -69.99234  43.35122       2.606744  1 0.106410
2  Control vs MidN -205.05808  43.08954      22.646964  1 0.000002
3 Control vs HighN -291.27898  43.04651      45.787021  1 0.000000
4     LowN vs MidN -135.06574  43.03929       9.848276  1 0.001700
5    LowN vs HighN -221.28664  43.25291      26.174578  1 0.000000
6    MidN vs HighN  -86.22090  43.23593       3.976816  1 0.046131

Comparison reads as "A vs B" where A carries the positive coefficient (i.e. \hat{\tau}_A - \hat{\tau}_B). Wald.Statistic = \hat{c}^2 / \widehat{\operatorname{Var}}(\hat{c}); df = 1 for all single-contrast rows; P.Value is from \chi^2_1.

The HighN vs Control contrast should be the most significant, reflecting the largest mean difference in the simulated data.


Example 2: Manual contrast vector

A manual contrast lets you specify exactly which comparison to make and how to label it. Here we test HighN minus Control using \mathbf{c} = (-1,\; 0,\; 0,\; 1)^\top — one element per level in coef order.

res2 <- waldTest(
  pred,
  cc = list(
    list(
      coef  = treatments,
      type  = "con",
      comp  = c(-1, 0, 0, 1),     # HighN - Control
      group = list(left = "HighN", right = "Control")
    )
  )
)

res2$Contrasts
        Comparison Estimate Std.Error Wald.Statistic df P.Value
1 HighN vs Control  291.279  43.04651       45.78702  1       0

The group argument overrides the auto-generated label so the output reads "HighN vs Control".


Example 3: Custom contrast matrix

A contrast matrix \mathbf{C} (rows = contrasts, columns = levels) tests multiple specific comparisons in one call. Here we fit three polynomial contrasts across the four equally-spaced nitrogen levels.

# Orthogonal polynomial contrasts for 4 equally-spaced levels
C_poly <- matrix(
  c(-3, -1,  1,  3,   # linear trend
     1, -1, -1,  1,   # quadratic trend
    -1,  3, -3,  1),  # cubic trend
  nrow = 3, byrow = TRUE
)

res3 <- waldTest(
  pred,
  cc = list(
    list(
      coef  = treatments,
      type  = "con",
      comp  = C_poly,
      group = list(left  = c("Linear", "Quadratic", "Cubic"),
                   right = c("trend",  "trend",     "trend"))
    )
  )
)

res3$Contrasts
          Comparison   Estimate Std.Error Wald.Statistic df  P.Value
1    Linear vs trend 1008.90268 135.88947      55.122249  1 0.000000
2 Quadratic vs trend   16.22856  61.40695       0.069843  1 0.791565
3     Cubic vs trend -113.91825 136.33737       0.698163  1 0.403402

For a monotone nitrogen response we expect a strong linear trend and negligible quadratic or cubic components.


Example 4: Joint zero test

A joint zero test asks: are these effects simultaneously equal to zero? This uses type = "zero" and tests the q-dimensional null H_0: \mathbf{Z}\boldsymbol{\tau} = \mathbf{0}.

Here we test whether the three non-Control treatment effects are jointly zero — i.e. does nitrogen application have any effect at all?

res4 <- waldTest(
  pred,
  cc = list(
    list(
      coef  = c("LowN", "MidN", "HighN"),
      type  = "zero",
      group = "Nitrogen joint zero"
    )
  )
)

res4$Zero
                 Test Wald.Statistic df P.Value
1 Nitrogen joint zero       1653.257  3       0

df = 3 because three effects are tested simultaneously. The statistic follows \chi^2_3 under H_0.


Example 5: Combining contrast and zero tests

Both test types can be combined in a single cc list. Results are returned in separate $Contrasts and $Zero slots.

res5 <- waldTest(
  pred,
  cc = list(
    list(coef = treatments,
         type = "con", comp = "pairwise"),
    list(coef  = c("LowN", "MidN", "HighN"),
         type  = "zero",
         group = "Nitrogen joint zero")
  )
)

cat("--- Contrasts ---\n")
--- Contrasts ---
res5$Contrasts
        Comparison   Estimate Std.Error Wald.Statistic df  P.Value
1  Control vs LowN  -69.99234  43.35122       2.606744  1 0.106410
2  Control vs MidN -205.05808  43.08954      22.646964  1 0.000002
3 Control vs HighN -291.27898  43.04651      45.787021  1 0.000000
4     LowN vs MidN -135.06574  43.03929       9.848276  1 0.001700
5    LowN vs HighN -221.28664  43.25291      26.174578  1 0.000000
6    MidN vs HighN  -86.22090  43.23593       3.976816  1 0.046131
cat("\n--- Zero test ---\n")

--- Zero test ---
res5$Zero
                 Test Wald.Statistic df P.Value
1 Nitrogen joint zero       1653.257  3       0

Example 6: By-group testing

When predictions span multiple environments (e.g. a Treatment:Site classify), the by argument runs the same cc specification independently within each group. The grouping column is prepended to both output tables.

res6 <- waldTest(
  pred3,
  cc = list(
    list(coef = treatments, type = "con", comp = "pairwise")
  ),
  by = "Site"
)

res6$Contrasts
   Site       Comparison   Estimate Std.Error Wald.Statistic df  P.Value
1  Env1  Control vs LowN  -71.43071  84.22855       0.719203  1 0.396406
2  Env1  Control vs MidN -276.82203  82.73088      11.196080  1 0.000820
3  Env1 Control vs HighN -339.68511  83.69062      16.474015  1 0.000049
4  Env1     LowN vs MidN -205.39132  83.67663       6.024983  1 0.014105
5  Env1    LowN vs HighN -268.25440  84.64289      10.044140  1 0.001528
6  Env1    MidN vs HighN  -62.86308  83.97848       0.560345  1 0.454121
7  Env2  Control vs LowN -113.76305  93.92238       1.467115  1 0.225801
8  Env2  Control vs MidN -222.60831  93.45713       5.673588  1 0.017222
9  Env2 Control vs HighN -178.81930  92.46443       3.740067  1 0.053122
10 Env2     LowN vs MidN -108.84526  92.96257       1.370891  1 0.241659
11 Env2    LowN vs HighN  -65.05625  92.92325       0.490150  1 0.483860
12 Env2    MidN vs HighN   43.78901  93.38093       0.219894  1 0.639121
13 Env3  Control vs LowN   61.92793  84.61035       0.535706  1 0.464218
14 Env3  Control vs MidN -120.92694  84.49191       2.048405  1 0.152366
15 Env3 Control vs HighN -372.00852  84.82992      19.231255  1 0.000012
16 Env3     LowN vs MidN -182.85487  84.23246       4.712529  1 0.029944
17 Env3    LowN vs HighN -433.93645  84.53462      26.350150  1 0.000000
18 Env3    MidN vs HighN -251.08159  84.53452       8.821888  1 0.002976
19 Env4  Control vs LowN -156.70353  83.61919       3.511928  1 0.060929
20 Env4  Control vs MidN -199.87506  83.58736       5.717892  1 0.016793
21 Env4 Control vs HighN -274.60298  83.04045      10.935311  1 0.000943
22 Env4     LowN vs MidN  -43.17153  83.04810       0.270231  1 0.603177
23 Env4    LowN vs HighN -117.89945  83.59917       1.988931  1 0.158453
24 Env4    MidN vs HighN  -74.72792  83.61004       0.798820  1 0.371446

The Site column identifies which group each contrast belongs to. The coef labels always refer to within-group row positions.

Multi-factor by

When predictions cross two or more factors (e.g. Treatment:Site:Year), pass a character vector to by. The levels are pasted into a composite key and the output column is named accordingly:

# Illustrative — requires a Treatment:Site:Year classify
# pred_sy <- predict(model, classify = "Treatment:Site:Year", vcov = TRUE)
res_sy <- waldTest(
  pred_sy,
  cc = list(list(coef = treatments, type = "con", comp = "pairwise")),
  by = c("Site", "Year")
)
# Output column named "Site:Year"

Example 7: F-tests

When an error degrees of freedom \nu is available (e.g. model$nedf from ASReml-R), replacing the asymptotic \chi^2 with an F_{1,\nu} distribution gives better-calibrated p-values in small samples. Set test = "F" and supply df_error.

res7 <- waldTest(
  pred,
  cc       = list(list(coef = treatments, type = "con", comp = "pairwise")),
  test     = "F",
  df_error = model$nedf
)

res7$Contrasts
        Comparison   Estimate Std.Error F.Statistic df  P.Value
1  Control vs LowN  -69.99234  43.35122    2.606744  1 0.106744
2  Control vs MidN -205.05808  43.08954   22.646964  1 0.000002
3 Control vs HighN -291.27898  43.04651   45.787021  1 0.000000
4     LowN vs MidN -135.06574  43.03929    9.848276  1 0.001753
5    LowN vs HighN -221.28664  43.25291   26.174578  1 0.000000
6    MidN vs HighN  -86.22090  43.23593    3.976816  1 0.046418

The column is now F.Statistic (= Wald statistic for single contrasts). P-values are from F_{1,\nu} (where \nu = model$nedf) rather than \chi^2_1 — slightly more conservative in small samples.


Example 8: P-value adjustment

With \binom{4}{2} = 6 pairwise comparisons the chance of at least one false positive is inflated. The adjust argument applies p.adjust() across all contrasts within each group.

# Bonferroni
res_bon <- waldTest(pred,
                    cc     = list(list(coef = treatments, type = "con",
                                       comp = "pairwise")),
                    adjust = "bonferroni")

# FDR (Benjamini–Hochberg)
res_fdr <- waldTest(pred,
                    cc     = list(list(coef = treatments, type = "con",
                                       comp = "pairwise")),
                    adjust = "fdr")

cat("--- Bonferroni adjusted ---\n")
--- Bonferroni adjusted ---
res_bon$Contrasts[, c("Comparison", "P.Value")]
        Comparison  P.Value
1  Control vs LowN 0.638460
2  Control vs MidN 0.000012
3 Control vs HighN 0.000000
4     LowN vs MidN 0.010199
5    LowN vs HighN 0.000002
6    MidN vs HighN 0.276784
cat("\n--- FDR (BH) adjusted ---\n")

--- FDR (BH) adjusted ---
res_fdr$Contrasts[, c("Comparison", "P.Value")]
        Comparison  P.Value
1  Control vs LowN 0.106410
2  Control vs MidN 0.000004
3 Control vs HighN 0.000000
4     LowN vs MidN 0.002550
5    LowN vs HighN 0.000001
6    MidN vs HighN 0.055357

Bonferroni multiplies each p-value by the number of tests (capped at 1); FDR is less conservative and generally preferred when many contrasts are tested and some true effects are expected.


Forest plots with plot_waldTest()

plot_waldTest() turns a waldTest() result directly into a forest plot. Each contrast is shown as a filled circle with horizontal confidence interval bars. Points are coloured by -\log_{10}(p): non-significant results appear in grey; significant results follow a warm gradient from gold through orange to dark red as evidence strengthens. The raw p-value is printed beside each row.

Basic forest plot

p1 <- plot_waldTest(res1)
print(p1)

The vertical dashed line at x = 0 is the reference for no difference. The colour bar maps -\log_{10}(p) — results in the warm-gradient zone are significant at alpha = 0.05.

By-group with faceting (facet = TRUE)

When waldTest() was called with by, facet = TRUE (the default) produces one panel per group with free y-scales, making it easy to compare effect sizes across environments.

p_facet <- plot_waldTest(res6, facet = TRUE)
print(p_facet)

By-group on one panel (facet = FALSE)

facet = FALSE places all groups on a single panel with dotted horizontal separator lines between groups. Useful for compact side-by-side comparison when the number of contrasts per group is small.

p_single <- plot_waldTest(res6, facet = FALSE)
print(p_single)

Adjusting the CI level and significance threshold

ci_level controls the width of the CI arms; alpha shifts the colour-scale break between grey (non-significant) and the warm gradient.

# 99% CI arms; stricter significance threshold
p_99 <- plot_waldTest(res1, ci_level = 0.99, alpha = 0.01)
print(p_99)

Extracting the plot data

return_data = TRUE returns the tidy data frame used internally to build the plot, giving full control for bespoke customisation.

df <- plot_waldTest(res1, return_data = TRUE)
df[, c("label", "Estimate", "CI_lower", "CI_upper", "P.Value",
       "neg_log10_p", "significant", "p_label")]
             label   Estimate  CI_lower   CI_upper  P.Value neg_log10_p
1  Control vs LowN  -69.99234 -154.9592   14.97449 0.106410   0.9730176
2  Control vs MidN -205.05808 -289.5120 -120.60413 0.000002   5.6989700
3 Control vs HighN -291.27898 -375.6486 -206.90937 0.000000  15.6535598
4     LowN vs MidN -135.06574 -219.4212  -50.71028 0.001700   2.7695511
5    LowN vs HighN -221.28664 -306.0608 -136.51249 0.000000  15.6535598
6    MidN vs HighN  -86.22090 -170.9618   -1.48004 0.046131   1.3360071
  significant p_label
1       FALSE p=0.106
2        TRUE p<0.001
3        TRUE p<0.001
4        TRUE p=0.002
5        TRUE p<0.001
6        TRUE p=0.046

The ggplot object can be extended with + in the usual way:

p_custom <- plot_waldTest(res1) +
  ggplot2::ggtitle("Nitrogen treatment contrasts",
                   subtitle = "Wald tests (asymptotic chi-squared)")
print(p_custom)


Summary

Task Specification
All pairwise differences list(coef = levs, type = "con", comp = "pairwise")
Single manual contrast list(coef = levs, type = "con", comp = c(...))
Multiple contrasts list(coef = levs, type = "con", comp = matrix(...))
Joint zero test list(coef = levs, type = "zero", group = "label")
Both types Combine in the same cc list
By-group Add by = "ColumnName" to waldTest()
Multi-factor by Add by = c("Site", "Year")
F-test Add test = "F", df_error = model$nedf
Adjust p-values Add adjust = "fdr" (or "bonferroni", "holm", "BY")
Forest plot plot_waldTest(res)
Export plot data plot_waldTest(res, return_data = TRUE)

References

Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54(3), 426–482.

Searle, S.R., Casella, G. & McCulloch, C.E. (1992). Variance Components. Wiley.

Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997.

Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57(1), 289–300.