out <- simTrialData(
nvar = 20L,
nsite = 4L,
treatments = c("Control", "LowN", "MidN", "HighN"),
treat_effects = c(0, 50, 120, 250),
nrep = 3L,
seed = 42L,
verbose = FALSE
)
dat <- out$data
treatments <- c("Control", "LowN", "MidN", "HighN")
dat$Row <- factor(dat$Row)
dat$Column <- factor(dat$Column)Wald Tests on Fixed-Effect Contrasts
waldTest() and plot_waldTest()
Overview
waldTest() performs Wald tests on linear contrasts of predicted values obtained from predict(). Because it works entirely from a prediction list (predicted values + prediction error variance–covariance matrix), it requires no access to model internals and works with any mixed-model software that exposes a vcov-style prediction object — including ASReml-R V4.
Two complementary test types are available:
| Type | Null hypothesis | Use case |
|---|---|---|
"con" — contrast |
H_0: \mathbf{c}^\top\hat{\boldsymbol{\tau}} = 0 | Pairwise, treatment vs control, custom comparisons |
"zero" — joint zero |
H_0: \mathbf{Z}\hat{\boldsymbol{\tau}} = \mathbf{0} | Are a set of effects simultaneously zero? |
The companion function plot_waldTest() produces a publication-ready forest plot directly from the result.
Mathematical Framework
Predicted values and their variance
Let \hat{\boldsymbol{\tau}} = (\hat{\tau}_1, \ldots, \hat{\tau}_n)^\top be the vector of n predicted treatment effects (BLUEs or BLUPs), obtained from a mixed model with prediction error variance–covariance matrix
\operatorname{Var}(\hat{\boldsymbol{\tau}} - \boldsymbol{\tau}) = \mathbf{V}_{\hat{\boldsymbol{\tau}}}.
In ASReml-R V4 this is returned by predict(model, classify = ..., vcov = TRUE)$vcov.
Contrast tests
A contrast is a scalar linear combination \hat{c} = \mathbf{c}^\top\hat{\boldsymbol{\tau}} where \mathbf{c} is a user-specified contrast vector. The null hypothesis is
H_0: \mathbf{c}^\top\boldsymbol{\tau} = 0.
The estimated contrast and its standard error are
\hat{c} = \mathbf{c}^\top\hat{\boldsymbol{\tau}}, \qquad \operatorname{SE}(\hat{c}) = \sqrt{\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}}.
The Wald statistic is
W = \frac{\hat{c}^2} {\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}} = \frac{\bigl(\mathbf{c}^\top\hat{\boldsymbol{\tau}}\bigr)^2} {\mathbf{c}^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}} \;\overset{H_0}{\sim}\; \chi^2_1.
When an error degrees of freedom \nu is available (e.g. from a mixed model), the F-statistic F = W is referred to an F_{1,\nu} distribution, providing better finite-sample calibration.
Pairwise contrasts
For k treatment levels the \binom{k}{2} pairwise differences are a convenient special case. For levels i and j, the contrast vector \mathbf{c}_{ij} has +1 in position i, -1 in position j, and zeros elsewhere, so
\hat{c}_{ij} = \hat{\tau}_i - \hat{\tau}_j, \qquad \operatorname{Var}(\hat{c}_{ij}) = [\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{ii} - 2[\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{ij} + [\mathbf{V}_{\hat{\boldsymbol{\tau}}}]_{jj}.
Setting comp = "pairwise" in waldTest() generates all \binom{k}{2} contrasts automatically.
Custom contrast matrices
Multiple contrasts can be tested in a single call by supplying a contrast matrix \mathbf{C} (rows = contrasts, columns = levels). Each row \mathbf{c}_r^\top is tested independently:
W_r = \frac{\bigl(\mathbf{c}_r^\top\hat{\boldsymbol{\tau}}\bigr)^2} {\mathbf{c}_r^\top \mathbf{V}_{\hat{\boldsymbol{\tau}}} \mathbf{c}_r}, \quad r = 1, \ldots, m.
Joint zero tests
A joint zero test simultaneously tests whether a subset of q effects are all zero. Define the selection matrix \mathbf{Z} (q \times n) with a single 1 per row selecting the coefficients of interest. The null hypothesis is
H_0: \mathbf{Z}\boldsymbol{\tau} = \mathbf{0}.
The Wald statistic is
W = \hat{\boldsymbol{\tau}}^\top \mathbf{Z}^\top \bigl(\mathbf{Z}\,\mathbf{V}_{\hat{\boldsymbol{\tau}}}\,\mathbf{Z}^\top\bigr)^{-1} \mathbf{Z}\hat{\boldsymbol{\tau}} \;\overset{H_0}{\sim}\; \chi^2_q,
or F = W / q \sim F_{q,\nu} for the F-test form.
P-value adjustment
When many contrasts are tested the family-wise error rate (FWER) or false discovery rate (FDR) inflates. waldTest() passes all raw p-values within each group to p.adjust(). The key options are:
| Method | Controls | Best used when |
|---|---|---|
"none" |
— | Exploratory or single contrast |
"bonferroni" |
FWER (conservative) | Few contrasts, strict control needed |
"holm" |
FWER (less conservative) | Few to moderate contrasts |
"fdr" / "BH" |
FDR | Many contrasts, some true effects expected |
"BY" |
FDR (arbitrary dependence) | Dependent test statistics |
Setup
Simulated data and model
We use simTrialData() to generate a balanced multi-environment trial with 20 varieties, 4 sites, and 4 nitrogen treatments (Control, LowN, MidN, HighN). The treatment effects are deliberately set to span a range — a small step from Control to LowN, a moderate step to MidN, and a large step to HighN — so that the resulting contrasts include a mix of non-significant, borderline, and clearly significant comparisons. A Treatment * Site fixed model allows treatment effects to vary across sites, which is reflected in the by-group testing section.
asreml.options(trace = FALSE)
model <- asreml(
yield ~ Treatment * Site,
random = ~ Variety + at(Site):Rep,
residual = ~ dsum(~ ar1(Row):ar1(Column) | Site),
data = dat,
maxit = 30L
)
model <- update(model)Treatment BLUEs are obtained by predicting the Treatment term with a full prediction error variance–covariance matrix:
pred <- predict(model, classify = "Treatment", vcov = TRUE)
pred$pvals[, c("Treatment", "predicted.value", "std.error")] Treatment predicted.value std.error
1 Control 4305.265 114.1842
2 LowN 4375.257 114.1959
3 MidN 4510.323 114.1743
4 HighN 4596.544 114.1991
For the by-group examples (Example 6) we also obtain predictions for every Treatment × Site combination:
pred3 <- predict(model, classify = "Treatment:Site", vcov = TRUE)Example 1: Pairwise contrasts
The most common use case — test all \binom{4}{2} = 6 pairwise differences among the nitrogen treatments. Setting comp = "pairwise" generates the full contrast matrix automatically.
res1 <- waldTest(
pred,
cc = list(
list(coef = treatments, type = "con", comp = "pairwise")
)
)
res1$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Control vs LowN -69.99234 43.35122 2.606744 1 0.106410
2 Control vs MidN -205.05808 43.08954 22.646964 1 0.000002
3 Control vs HighN -291.27898 43.04651 45.787021 1 0.000000
4 LowN vs MidN -135.06574 43.03929 9.848276 1 0.001700
5 LowN vs HighN -221.28664 43.25291 26.174578 1 0.000000
6 MidN vs HighN -86.22090 43.23593 3.976816 1 0.046131
Comparison reads as "A vs B" where A carries the positive coefficient (i.e. \hat{\tau}_A - \hat{\tau}_B). Wald.Statistic = \hat{c}^2 / \widehat{\operatorname{Var}}(\hat{c}); df = 1 for all single-contrast rows; P.Value is from \chi^2_1.
The HighN vs Control contrast should be the most significant, reflecting the largest mean difference in the simulated data.
Example 2: Manual contrast vector
A manual contrast lets you specify exactly which comparison to make and how to label it. Here we test HighN minus Control using \mathbf{c} = (-1,\; 0,\; 0,\; 1)^\top — one element per level in coef order.
res2 <- waldTest(
pred,
cc = list(
list(
coef = treatments,
type = "con",
comp = c(-1, 0, 0, 1), # HighN - Control
group = list(left = "HighN", right = "Control")
)
)
)
res2$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 HighN vs Control 291.279 43.04651 45.78702 1 0
The group argument overrides the auto-generated label so the output reads "HighN vs Control".
Example 3: Custom contrast matrix
A contrast matrix \mathbf{C} (rows = contrasts, columns = levels) tests multiple specific comparisons in one call. Here we fit three polynomial contrasts across the four equally-spaced nitrogen levels.
# Orthogonal polynomial contrasts for 4 equally-spaced levels
C_poly <- matrix(
c(-3, -1, 1, 3, # linear trend
1, -1, -1, 1, # quadratic trend
-1, 3, -3, 1), # cubic trend
nrow = 3, byrow = TRUE
)
res3 <- waldTest(
pred,
cc = list(
list(
coef = treatments,
type = "con",
comp = C_poly,
group = list(left = c("Linear", "Quadratic", "Cubic"),
right = c("trend", "trend", "trend"))
)
)
)
res3$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Linear vs trend 1008.90268 135.88947 55.122249 1 0.000000
2 Quadratic vs trend 16.22856 61.40695 0.069843 1 0.791565
3 Cubic vs trend -113.91825 136.33737 0.698163 1 0.403402
For a monotone nitrogen response we expect a strong linear trend and negligible quadratic or cubic components.
Example 4: Joint zero test
A joint zero test asks: are these effects simultaneously equal to zero? This uses type = "zero" and tests the q-dimensional null H_0: \mathbf{Z}\boldsymbol{\tau} = \mathbf{0}.
Here we test whether the three non-Control treatment effects are jointly zero — i.e. does nitrogen application have any effect at all?
res4 <- waldTest(
pred,
cc = list(
list(
coef = c("LowN", "MidN", "HighN"),
type = "zero",
group = "Nitrogen joint zero"
)
)
)
res4$Zero Test Wald.Statistic df P.Value
1 Nitrogen joint zero 1653.257 3 0
df = 3 because three effects are tested simultaneously. The statistic follows \chi^2_3 under H_0.
Example 5: Combining contrast and zero tests
Both test types can be combined in a single cc list. Results are returned in separate $Contrasts and $Zero slots.
res5 <- waldTest(
pred,
cc = list(
list(coef = treatments,
type = "con", comp = "pairwise"),
list(coef = c("LowN", "MidN", "HighN"),
type = "zero",
group = "Nitrogen joint zero")
)
)
cat("--- Contrasts ---\n")--- Contrasts ---
res5$Contrasts Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Control vs LowN -69.99234 43.35122 2.606744 1 0.106410
2 Control vs MidN -205.05808 43.08954 22.646964 1 0.000002
3 Control vs HighN -291.27898 43.04651 45.787021 1 0.000000
4 LowN vs MidN -135.06574 43.03929 9.848276 1 0.001700
5 LowN vs HighN -221.28664 43.25291 26.174578 1 0.000000
6 MidN vs HighN -86.22090 43.23593 3.976816 1 0.046131
cat("\n--- Zero test ---\n")
--- Zero test ---
res5$Zero Test Wald.Statistic df P.Value
1 Nitrogen joint zero 1653.257 3 0
Example 6: By-group testing
When predictions span multiple environments (e.g. a Treatment:Site classify), the by argument runs the same cc specification independently within each group. The grouping column is prepended to both output tables.
res6 <- waldTest(
pred3,
cc = list(
list(coef = treatments, type = "con", comp = "pairwise")
),
by = "Site"
)
res6$Contrasts Site Comparison Estimate Std.Error Wald.Statistic df P.Value
1 Env1 Control vs LowN -71.43071 84.22855 0.719203 1 0.396406
2 Env1 Control vs MidN -276.82203 82.73088 11.196080 1 0.000820
3 Env1 Control vs HighN -339.68511 83.69062 16.474015 1 0.000049
4 Env1 LowN vs MidN -205.39132 83.67663 6.024983 1 0.014105
5 Env1 LowN vs HighN -268.25440 84.64289 10.044140 1 0.001528
6 Env1 MidN vs HighN -62.86308 83.97848 0.560345 1 0.454121
7 Env2 Control vs LowN -113.76305 93.92238 1.467115 1 0.225801
8 Env2 Control vs MidN -222.60831 93.45713 5.673588 1 0.017222
9 Env2 Control vs HighN -178.81930 92.46443 3.740067 1 0.053122
10 Env2 LowN vs MidN -108.84526 92.96257 1.370891 1 0.241659
11 Env2 LowN vs HighN -65.05625 92.92325 0.490150 1 0.483860
12 Env2 MidN vs HighN 43.78901 93.38093 0.219894 1 0.639121
13 Env3 Control vs LowN 61.92793 84.61035 0.535706 1 0.464218
14 Env3 Control vs MidN -120.92694 84.49191 2.048405 1 0.152366
15 Env3 Control vs HighN -372.00852 84.82992 19.231255 1 0.000012
16 Env3 LowN vs MidN -182.85487 84.23246 4.712529 1 0.029944
17 Env3 LowN vs HighN -433.93645 84.53462 26.350150 1 0.000000
18 Env3 MidN vs HighN -251.08159 84.53452 8.821888 1 0.002976
19 Env4 Control vs LowN -156.70353 83.61919 3.511928 1 0.060929
20 Env4 Control vs MidN -199.87506 83.58736 5.717892 1 0.016793
21 Env4 Control vs HighN -274.60298 83.04045 10.935311 1 0.000943
22 Env4 LowN vs MidN -43.17153 83.04810 0.270231 1 0.603177
23 Env4 LowN vs HighN -117.89945 83.59917 1.988931 1 0.158453
24 Env4 MidN vs HighN -74.72792 83.61004 0.798820 1 0.371446
The Site column identifies which group each contrast belongs to. The coef labels always refer to within-group row positions.
Multi-factor by
When predictions cross two or more factors (e.g. Treatment:Site:Year), pass a character vector to by. The levels are pasted into a composite key and the output column is named accordingly:
# Illustrative — requires a Treatment:Site:Year classify
# pred_sy <- predict(model, classify = "Treatment:Site:Year", vcov = TRUE)
res_sy <- waldTest(
pred_sy,
cc = list(list(coef = treatments, type = "con", comp = "pairwise")),
by = c("Site", "Year")
)
# Output column named "Site:Year"Example 7: F-tests
When an error degrees of freedom \nu is available (e.g. model$nedf from ASReml-R), replacing the asymptotic \chi^2 with an F_{1,\nu} distribution gives better-calibrated p-values in small samples. Set test = "F" and supply df_error.
res7 <- waldTest(
pred,
cc = list(list(coef = treatments, type = "con", comp = "pairwise")),
test = "F",
df_error = model$nedf
)
res7$Contrasts Comparison Estimate Std.Error F.Statistic df P.Value
1 Control vs LowN -69.99234 43.35122 2.606744 1 0.106744
2 Control vs MidN -205.05808 43.08954 22.646964 1 0.000002
3 Control vs HighN -291.27898 43.04651 45.787021 1 0.000000
4 LowN vs MidN -135.06574 43.03929 9.848276 1 0.001753
5 LowN vs HighN -221.28664 43.25291 26.174578 1 0.000000
6 MidN vs HighN -86.22090 43.23593 3.976816 1 0.046418
The column is now F.Statistic (= Wald statistic for single contrasts). P-values are from F_{1,\nu} (where \nu = model$nedf) rather than \chi^2_1 — slightly more conservative in small samples.
Example 8: P-value adjustment
With \binom{4}{2} = 6 pairwise comparisons the chance of at least one false positive is inflated. The adjust argument applies p.adjust() across all contrasts within each group.
# Bonferroni
res_bon <- waldTest(pred,
cc = list(list(coef = treatments, type = "con",
comp = "pairwise")),
adjust = "bonferroni")
# FDR (Benjamini–Hochberg)
res_fdr <- waldTest(pred,
cc = list(list(coef = treatments, type = "con",
comp = "pairwise")),
adjust = "fdr")
cat("--- Bonferroni adjusted ---\n")--- Bonferroni adjusted ---
res_bon$Contrasts[, c("Comparison", "P.Value")] Comparison P.Value
1 Control vs LowN 0.638460
2 Control vs MidN 0.000012
3 Control vs HighN 0.000000
4 LowN vs MidN 0.010199
5 LowN vs HighN 0.000002
6 MidN vs HighN 0.276784
cat("\n--- FDR (BH) adjusted ---\n")
--- FDR (BH) adjusted ---
res_fdr$Contrasts[, c("Comparison", "P.Value")] Comparison P.Value
1 Control vs LowN 0.106410
2 Control vs MidN 0.000004
3 Control vs HighN 0.000000
4 LowN vs MidN 0.002550
5 LowN vs HighN 0.000001
6 MidN vs HighN 0.055357
Bonferroni multiplies each p-value by the number of tests (capped at 1); FDR is less conservative and generally preferred when many contrasts are tested and some true effects are expected.
Forest plots with plot_waldTest()
plot_waldTest() turns a waldTest() result directly into a forest plot. Each contrast is shown as a filled circle with horizontal confidence interval bars. Points are coloured by -\log_{10}(p): non-significant results appear in grey; significant results follow a warm gradient from gold through orange to dark red as evidence strengthens. The raw p-value is printed beside each row.
Basic forest plot
p1 <- plot_waldTest(res1)
print(p1)The vertical dashed line at x = 0 is the reference for no difference. The colour bar maps -\log_{10}(p) — results in the warm-gradient zone are significant at alpha = 0.05.
By-group with faceting (facet = TRUE)
When waldTest() was called with by, facet = TRUE (the default) produces one panel per group with free y-scales, making it easy to compare effect sizes across environments.
p_facet <- plot_waldTest(res6, facet = TRUE)
print(p_facet)By-group on one panel (facet = FALSE)
facet = FALSE places all groups on a single panel with dotted horizontal separator lines between groups. Useful for compact side-by-side comparison when the number of contrasts per group is small.
p_single <- plot_waldTest(res6, facet = FALSE)
print(p_single)Adjusting the CI level and significance threshold
ci_level controls the width of the CI arms; alpha shifts the colour-scale break between grey (non-significant) and the warm gradient.
# 99% CI arms; stricter significance threshold
p_99 <- plot_waldTest(res1, ci_level = 0.99, alpha = 0.01)
print(p_99)Extracting the plot data
return_data = TRUE returns the tidy data frame used internally to build the plot, giving full control for bespoke customisation.
df <- plot_waldTest(res1, return_data = TRUE)
df[, c("label", "Estimate", "CI_lower", "CI_upper", "P.Value",
"neg_log10_p", "significant", "p_label")] label Estimate CI_lower CI_upper P.Value neg_log10_p
1 Control vs LowN -69.99234 -154.9592 14.97449 0.106410 0.9730176
2 Control vs MidN -205.05808 -289.5120 -120.60413 0.000002 5.6989700
3 Control vs HighN -291.27898 -375.6486 -206.90937 0.000000 15.6535598
4 LowN vs MidN -135.06574 -219.4212 -50.71028 0.001700 2.7695511
5 LowN vs HighN -221.28664 -306.0608 -136.51249 0.000000 15.6535598
6 MidN vs HighN -86.22090 -170.9618 -1.48004 0.046131 1.3360071
significant p_label
1 FALSE p=0.106
2 TRUE p<0.001
3 TRUE p<0.001
4 TRUE p=0.002
5 TRUE p<0.001
6 TRUE p=0.046
The ggplot object can be extended with + in the usual way:
p_custom <- plot_waldTest(res1) +
ggplot2::ggtitle("Nitrogen treatment contrasts",
subtitle = "Wald tests (asymptotic chi-squared)")
print(p_custom)Summary
| Task | Specification |
|---|---|
| All pairwise differences | list(coef = levs, type = "con", comp = "pairwise") |
| Single manual contrast | list(coef = levs, type = "con", comp = c(...)) |
| Multiple contrasts | list(coef = levs, type = "con", comp = matrix(...)) |
| Joint zero test | list(coef = levs, type = "zero", group = "label") |
| Both types | Combine in the same cc list |
| By-group | Add by = "ColumnName" to waldTest() |
| Multi-factor by | Add by = c("Site", "Year") |
| F-test | Add test = "F", df_error = model$nedf |
| Adjust p-values | Add adjust = "fdr" (or "bonferroni", "holm", "BY") |
| Forest plot | plot_waldTest(res) |
| Export plot data | plot_waldTest(res, return_data = TRUE) |
References
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54(3), 426–482.
Searle, S.R., Casella, G. & McCulloch, C.E. (1992). Variance Components. Wiley.
Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997.
Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57(1), 289–300.