out <- simTrialData(
nvar = 20L,
nsite = 4L,
treatments = c("Control", "LowN", "MidN", "HighN"),
nrep = 3L,
seed = 42L,
verbose = FALSE,
sim.options = list(treat_effects = c(0, 50, 120, 250))
)
dat <- out$data
treatments <- c("Control", "LowN", "MidN", "HighN")
dat$Row <- factor(dat$Row)
dat$Column <- factor(dat$Column)Wald Tests on Fixed-Effect Contrasts
waldTest() and plot_waldTest()
Overview
waldTest() performs Wald tests on linear contrasts of predicted values obtained from predict(). Because it works entirely from a prediction list (predicted values + prediction error variance–covariance matrix), it requires no access to model internals and works with any mixed-model software that exposes a vcov-style prediction object — including ASReml-R V4.
Two complementary test types are available:
| Type | Null hypothesis | Use case |
|---|---|---|
"con" — contrast |
H_0: \mathbf{c}^\top\hat{\boldsymbol{\tau}} = 0 | Pairwise, treatment vs control, custom comparisons |
"zero" — joint zero |
H_0: \mathbf{Z}\hat{\boldsymbol{\tau}} = \mathbf{0} | Are a set of effects simultaneously zero? |
The companion function plot_waldTest() produces a publication-ready forest plot directly from the result.
Mathematical Framework
Predicted values and their variance
Let \hat{\boldsymbol{\tau}} = (\hat{\tau}_1, \ldots, \hat{\tau}_n)^\top be the vector of n predicted treatment effects (BLUEs or BLUPs), obtained from a mixed model with prediction error variance–covariance matrix
\operatorname{Var}(\hat{\boldsymbol{\tau}} - \boldsymbol{\tau}) = \mathbf{V}_{\hat{\boldsymbol{\tau}}}.
In ASReml-R V4 this is returned by predict(model, classify = ..., vcov = TRUE)$vcov.
Contrast tests
A contrast is a scalar linear combination \hat{c} = \mathbf{c}^\top\hat{\boldsymbol{\tau}} where \mathbf{c} is a user-specified contrast vector. The null hypothesis is
H_0: \mathbf{c}^\top\boldsymbol{\tau} = 0.
The estimated contrast and its standard error are
\hat{c} = \mathbf{c}^\top\hat{\boldsymbol{\tau}}, \qquad \operatorname{SE}(\hat{c}) = \sqrt{\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}}.
The Wald statistic is
W = \frac{\hat{c}^2} {\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}} = \frac{\bigl(\mathbf{c}^\top\hat{\boldsymbol{\tau}}\bigr)^2} {\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}} \;\overset{H_0}{\sim}\; \chi^2_1.
When an error degrees of freedom \nu is available (e.g. from a mixed model), the F-statistic F = W is referred to an F_{1,\nu} distribution, providing better finite-sample calibration.
Pairwise contrasts
For k treatment levels the \binom{k}{2} pairwise differences are a convenient special case. For levels i and j, the contrast vector \mathbf{c}_{ij} has +1 in position i, -1 in position j, and zeros elsewhere, so
\hat{c}_{ij} = \hat{\tau}_i - \hat{\tau}_j, \qquad \operatorname{Var}(\hat{c}_{ij}) = [\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{ii} - 2[\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{ij} + [\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{jj}.
Setting comp = "pairwise" in waldTest() generates all \binom{k}{2} contrasts automatically.
Custom contrast matrices
Multiple contrasts can be tested in a single call by supplying a contrast matrix \mathbf{C} (rows = contrasts, columns = levels). Each row \mathbf{c}_r^\top is tested independently:
W_r = \frac{\bigl(\mathbf{c}_r^\top\hat{\boldsymbol{\tau}}\bigr)^2} {\mathbf{c}_r^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}_r}, \quad r = 1, \ldots, m.
Joint zero tests
A joint zero test simultaneously tests whether a subset of q effects are all zero. Define the selection matrix \mathbf{Z} (q \times n) with a single 1 per row selecting the coefficients of interest. The null hypothesis is
H_0: \mathbf{Z}\boldsymbol{\tau} = \mathbf{0}.
The Wald statistic is
W = \hat{\boldsymbol{\tau}}^\top \mathbf{Z}^\top \bigl(\mathbf{Z}\,\mathbf{V}_{\hat{\boldsymbol{\tau}}}\,\mathbf{Z}^\top\bigr)^{-1} \mathbf{Z}\hat{\boldsymbol{\tau}} \;\overset{H_0}{\sim}\; \chi^2_q,
or F = W / q \sim F_{q,\nu} for the F-test form.
P-value adjustment
When many contrasts are tested the family-wise error rate (FWER) or false discovery rate (FDR) inflates. waldTest() passes all raw p-values within each group to p.adjust(). The key options are:
| Method | Controls | Best used when |
|---|---|---|
"none" |
— | Exploratory or single contrast |
"bonferroni" |
FWER (conservative) | Few contrasts, strict control needed |
"holm" |
FWER (less conservative) | Few to moderate contrasts |
"fdr" / "BH" |
FDR | Many contrasts, some true effects expected |
"BY" |
FDR (arbitrary dependence) | Dependent test statistics |
Setup
Simulated data and model
We use simTrialData() to generate a balanced multi-environment trial with 20 varieties, 4 sites, and 4 nitrogen treatments (Control, LowN, MidN, HighN). The treatment effects are deliberately set to span a range — a small step from Control to LowN, a moderate step to MidN, and a large step to HighN — so that the resulting contrasts include a mix of non-significant, borderline, and clearly significant comparisons. A Treatment * Site fixed model allows treatment effects to vary across sites, which is reflected in the by-group testing section.
asreml.options(trace = FALSE)
model <- asreml(
yield ~ Treatment * Site,
random = ~ Variety + at(Site):Rep,
residual = ~ dsum(~ ar1(Row):ar1(Column) | Site),
data = dat,
maxit = 30L
)
model <- update(model)Treatment BLUEs are obtained by predicting the Treatment term with a full prediction error variance–covariance matrix:
pred <- predict(model, classify = "Treatment", vcov = TRUE)
pred$pvals[, c("Treatment", "predicted.value", "std.error")] Treatment predicted.value std.error
1 Control 4349.234 59.03510
2 LowN 4434.391 58.97173
3 MidN 4434.080 58.99101
4 HighN 4622.774 59.02937
For the by-group examples (Example 6) we also obtain predictions for every Treatment × Site combination:
pred3 <- predict(model, classify = "Treatment:Site", vcov = TRUE)Example 1: Pairwise contrasts
The most common use case — test all \binom{4}{2} = 6 pairwise differences among the nitrogen treatments. Setting comp = "pairwise" generates the full contrast matrix automatically.
res1 <- waldTest(
pred,
cc = list(
list(coef = treatments, type = "con", comp = "pairwise")
)
)
res1$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Control vs LowN -85.156959 41.95440 4.119886 1 0.042382
2 Control vs MidN -84.845768 42.04940 4.071371 1 0.043616
3 Control vs HighN -273.539818 42.30613 41.805602 1 0.000000
4 LowN vs MidN 0.311191 42.00566 0.000055 1 0.994089
5 LowN vs HighN -188.382859 41.81043 20.300817 1 0.000007
6 MidN vs HighN -188.694050 42.13180 20.058406 1 0.000008
Comparison reads as "A vs B" where A carries the positive coefficient (i.e. \hat{\tau}_A - \hat{\tau}_B). Wald.Statistic = \hat{c}^2 / \widehat{\operatorname{Var}}(\hat{c}); df = 1 for all single-contrast rows; P.Value is from \chi^2_1.
The HighN vs Control contrast should be the most significant, reflecting the largest mean difference in the simulated data.
Example 2: Manual contrast vector
A manual contrast lets you specify exactly which comparison to make and how to label it. Here we test HighN minus Control using \mathbf{c} = (-1,\; 0,\; 0,\; 1)^\top — one element per level in coef order.
res2 <- waldTest(
pred,
cc = list(
list(
coef = treatments,
type = "con",
comp = c(-1, 0, 0, 1), # HighN - Control
group = list(left = "HighN", right = "Control")
)
)
)
res2$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 HighN vs Control 273.5398 42.30613 41.8056 1 0
The group argument overrides the auto-generated label so the output reads "HighN vs Control".
Example 3: Custom contrast matrix
A contrast matrix \mathbf{C} (rows = contrasts, columns = levels) tests multiple specific comparisons in one call. Here we fit three polynomial contrasts across the four equally-spaced nitrogen levels.
# Orthogonal polynomial contrasts for 4 equally-spaced levels
C_poly <- matrix(
c(-3, -1, 1, 3, # linear trend
1, -1, -1, 1, # quadratic trend
-1, 3, -3, 1), # cubic trend
nrow = 3, byrow = TRUE
)
res3 <- waldTest(
pred,
cc = list(
list(
coef = treatments,
type = "con",
comp = C_poly,
group = list(left = c("Linear", "Quadratic", "Cubic"),
right = c("trend", "trend", "trend"))
)
)
)
res3$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Linear vs trend 820.3083 133.47571 37.770234 1 0.000000
2 Quadratic vs trend 103.5371 59.13747 3.065253 1 0.079983
3 Cubic vs trend 274.4734 133.14305 4.249751 1 0.039256
For a monotone nitrogen response we expect a strong linear trend and negligible quadratic or cubic components.
Example 4: Joint zero test
A joint zero test asks: are these effects simultaneously equal to zero? This uses type = "zero" and tests the q-dimensional null H_0: \mathbf{Z}\boldsymbol{\tau} = \mathbf{0}.
Here we test whether the three non-Control treatment effects are jointly zero — i.e. does nitrogen application have any effect at all?
res4 <- waldTest(
pred,
cc = list(
list(
coef = c("LowN", "MidN", "HighN"),
type = "zero",
group = "Nitrogen joint zero"
)
)
)
res4$Zero Test Wald.Statistic df P.Value
1 Nitrogen joint zero 7015.662 3 0
df = 3 because three effects are tested simultaneously. The statistic follows \chi^2_3 under H_0.
Example 5: Combining contrast and zero tests
Both test types can be combined in a single cc list. Results are returned in separate $Contrasts and $Zero slots.
res5 <- waldTest(
pred,
cc = list(
list(coef = treatments,
type = "con", comp = "pairwise"),
list(coef = c("LowN", "MidN", "HighN"),
type = "zero",
group = "Nitrogen joint zero")
)
)
cat("--- Contrasts ---\n")--- Contrasts ---
res5$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Control vs LowN -85.156959 41.95440 4.119886 1 0.042382
2 Control vs MidN -84.845768 42.04940 4.071371 1 0.043616
3 Control vs HighN -273.539818 42.30613 41.805602 1 0.000000
4 LowN vs MidN 0.311191 42.00566 0.000055 1 0.994089
5 LowN vs HighN -188.382859 41.81043 20.300817 1 0.000007
6 MidN vs HighN -188.694050 42.13180 20.058406 1 0.000008
cat("\n--- Zero test ---\n")
--- Zero test ---
res5$Zero Test Wald.Statistic df P.Value
1 Nitrogen joint zero 7015.662 3 0
Example 6: By-group testing
When predictions span multiple environments (e.g. a Treatment:Site classify), the by argument runs the same cc specification independently within each group. The grouping column is prepended to both output tables.
res6 <- waldTest(
pred3,
cc = list(
list(coef = treatments, type = "con", comp = "pairwise")
),
by = "Site"
)
res6$Contrasts Site Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Env1 Control vs LowN -64.80676 72.56873 0.797520 1 0.371836
2 Env1 Control vs MidN 65.41152 72.77694 0.807832 1 0.368762
3 Env1 Control vs HighN -80.90097 72.86790 1.232636 1 0.266895
4 Env1 LowN vs MidN 130.21829 72.77922 3.201320 1 0.073579
5 Env1 LowN vs HighN -16.09421 72.95934 0.048661 1 0.825411
6 Env1 MidN vs HighN -146.31250 72.56287 4.065687 1 0.043763
7 Env2 Control vs LowN -42.57883 94.29794 0.203884 1 0.651604
8 Env2 Control vs MidN -114.73385 92.54982 1.536852 1 0.215087
9 Env2 Control vs HighN -316.78090 92.20569 11.803276 1 0.000591
10 Env2 LowN vs MidN -72.15501 92.19406 0.612530 1 0.433837
11 Env2 LowN vs HighN -274.20207 92.95697 8.701167 1 0.003180
12 Env2 MidN vs HighN -202.04705 94.29667 4.591053 1 0.032139
13 Env3 Control vs LowN -34.92614 74.59234 0.219236 1 0.639622
14 Env3 Control vs MidN -121.30041 74.61098 2.643133 1 0.103998
15 Env3 Control vs HighN -275.18021 75.99328 13.112456 1 0.000293
16 Env3 LowN vs MidN -86.37427 73.32536 1.387587 1 0.238813
17 Env3 LowN vs HighN -240.25407 74.70690 10.342370 1 0.001300
18 Env3 MidN vs HighN -153.87980 74.70576 4.242824 1 0.039417
19 Env4 Control vs LowN -198.31610 91.86757 4.660060 1 0.030872
20 Env4 Control vs MidN -168.76034 94.13216 3.214139 1 0.073005
21 Env4 Control vs HighN -421.29719 95.11883 19.617522 1 0.000009
22 Env4 LowN vs MidN 29.55576 95.18969 0.096406 1 0.756186
23 Env4 LowN vs HighN -222.98109 91.80270 5.899635 1 0.015144
24 Env4 MidN vs HighN -252.53685 93.05548 7.364879 1 0.006651
The Site column identifies which group each contrast belongs to. The coef labels always refer to within-group row positions.
Multi-factor by
When predictions cross two or more factors (e.g. Treatment:Site:Year), pass a character vector to by. The levels are pasted into a composite key and the output column is named accordingly:
# Illustrative — requires a Treatment:Site:Year classify
# pred_sy <- predict(model, classify = "Treatment:Site:Year", vcov = TRUE)
res_sy <- waldTest(
pred_sy,
cc = list(list(coef = treatments, type = "con", comp = "pairwise")),
by = c("Site", "Year")
)
# Output column named "Site:Year"Example 7: F-tests
When an error degrees of freedom \nu is available (e.g. model$nedf from ASReml-R), replacing the asymptotic \chi^2 with an F_{1,\nu} distribution gives better-calibrated p-values in small samples. Set test = "F" and supply df_error.
res7 <- waldTest(
pred,
cc = list(list(coef = treatments, type = "con", comp = "pairwise")),
test = "F",
df_error = model$nedf
)
res7$Contrasts Comparison Estimate Std.Error F.Statistic df P.Value
1 Control vs LowN -85.156959 41.95440 4.119886 1 0.042662
2 Control vs MidN -84.845768 42.04940 4.071371 1 0.043898
3 Control vs HighN -273.539818 42.30613 41.805602 1 0.000000
4 LowN vs MidN 0.311191 42.00566 0.000055 1 0.994091
5 LowN vs HighN -188.382859 41.81043 20.300817 1 0.000007
6 MidN vs HighN -188.694050 42.13180 20.058406 1 0.000008
The column is now F.Statistic (= Wald statistic for single contrasts). P-values are from F_{1,\nu} (where \nu = model$nedf) rather than \chi^2_1 — slightly more conservative in small samples.
Example 8: P-value adjustment
With \binom{4}{2} = 6 pairwise comparisons the chance of at least one false positive is inflated. The adjust argument applies p.adjust() across all contrasts within each group.
# Bonferroni
res_bon <- waldTest(pred,
cc = list(list(coef = treatments, type = "con",
comp = "pairwise")),
adjust = "bonferroni")
# FDR (Benjamini–Hochberg)
res_fdr <- waldTest(pred,
cc = list(list(coef = treatments, type = "con",
comp = "pairwise")),
adjust = "fdr")
cat("--- Bonferroni adjusted ---\n")--- Bonferroni adjusted ---
res_bon$Contrasts[, c("Comparison", "P.Value")] Comparison P.Value
1 Control vs LowN 0.254292
2 Control vs MidN 0.261695
3 Control vs HighN 0.000000
4 LowN vs MidN 1.000000
5 LowN vs HighN 0.000040
6 MidN vs HighN 0.000045
cat("\n--- FDR (BH) adjusted ---\n")
--- FDR (BH) adjusted ---
res_fdr$Contrasts[, c("Comparison", "P.Value")] Comparison P.Value
1 Control vs LowN 0.052339
2 Control vs MidN 0.052339
3 Control vs HighN 0.000000
4 LowN vs MidN 0.994089
5 LowN vs HighN 0.000015
6 MidN vs HighN 0.000015
Bonferroni multiplies each p-value by the number of tests (capped at 1); FDR is less conservative and generally preferred when many contrasts are tested and some true effects are expected.
Forest plots with plot_waldTest()
plot_waldTest() turns a waldTest() result directly into a forest plot. Each contrast is shown as a filled circle with horizontal confidence interval bars. Points are coloured by -\log_{10}(p): non-significant results appear in grey; significant results follow a warm gradient from gold through orange to dark red as evidence strengthens. The raw p-value is printed beside each row.
Basic forest plot
p1 <- plot_waldTest(res1)
print(p1)The vertical dashed line at x = 0 is the reference for no difference. The colour bar maps -\log_{10}(p) — results in the warm-gradient zone are significant at alpha = 0.05.
By-group with faceting (facet = TRUE)
When waldTest() was called with by, facet = TRUE (the default) produces one panel per group with free y-scales, making it easy to compare effect sizes across environments.
p_facet <- plot_waldTest(res6, facet = TRUE)
print(p_facet)By-group on one panel (facet = FALSE)
facet = FALSE places all groups on a single panel with dotted horizontal separator lines between groups. Useful for compact side-by-side comparison when the number of contrasts per group is small.
p_single <- plot_waldTest(res6, facet = FALSE)
print(p_single)Adjusting the CI level and significance threshold
ci_level controls the width of the CI arms; alpha shifts the colour-scale break between grey (non-significant) and the warm gradient.
# 99% CI arms; stricter significance threshold
p_99 <- plot_waldTest(res1, ci_level = 0.99, alpha = 0.01)
print(p_99)Extracting the plot data
return_data = TRUE returns the tidy data frame used internally to build the plot, giving full control for bespoke customisation.
df <- plot_waldTest(res1, return_data = TRUE)
df[, c("label", "Estimate", "CI_lower", "CI_upper", "P.Value",
"neg_log10_p", "significant", "p_label")] label Estimate CI_lower CI_upper P.Value neg_log10_p
1 Control vs LowN -85.156959 -167.38608 -2.927840 0.042382 1.372818553
2 Control vs MidN -84.845768 -167.26109 -2.430451 0.043616 1.360354166
3 Control vs HighN -273.539818 -356.45831 -190.621325 0.000000 15.653559775
4 LowN vs MidN 0.311191 -82.01838 82.640764 0.994089 0.002574732
5 LowN vs HighN -188.382859 -270.32979 -106.435924 0.000007 5.154901960
6 MidN vs HighN -188.694050 -271.27086 -106.117243 0.000008 5.096910013
significant p_label
1 TRUE p=0.042
2 TRUE p=0.044
3 TRUE p<0.001
4 FALSE p=0.994
5 TRUE p<0.001
6 TRUE p<0.001
The ggplot object can be extended with + in the usual way:
p_custom <- plot_waldTest(res1) +
ggplot2::ggtitle("Nitrogen treatment contrasts",
subtitle = "Wald tests (asymptotic chi-squared)")
print(p_custom)Summary
| Task | Specification |
|---|---|
| All pairwise differences | list(coef = levs, type = "con", comp = "pairwise") |
| Single manual contrast | list(coef = levs, type = "con", comp = c(...)) |
| Multiple contrasts | list(coef = levs, type = "con", comp = matrix(...)) |
| Joint zero test | list(coef = levs, type = "zero", group = "label") |
| Both types | Combine in the same cc list |
| By-group | Add by = "ColumnName" to waldTest() |
| Multi-factor by | Add by = c("Site", "Year") |
| F-test | Add test = "F", df_error = model$nedf |
| Adjust p-values | Add adjust = "fdr" (or "bonferroni", "holm", "BY") |
| Forest plot | plot_waldTest(res) |
| Export plot data | plot_waldTest(res, return_data = TRUE) |
References
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54(3), 426–482.
Searle, S.R., Casella, G. & McCulloch, C.E. (1992). Variance Components. Wiley.
Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997.
Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57(1), 289–300.