BLUP Accuracy in Multi-Environment Trials

accuracy() and plot_accuracy()

Author

Julian Taylor

Published

May 8, 2026


Overview

accuracy() computes model-based BLUP accuracy for each environment (or group) in a multi-environment trial (MET) fitted with ASReml-R V4. Two complementary metrics are available:

  • Mrode accuracy — per-variety accuracy derived from the prediction error variance (PEV) and the genetic variance.
  • Generalised H² — a reliability analogue based on the mean squared standard error of difference (SED) between BLUPs.

Both metrics are bounded between 0 and 1, with values closer to 1 indicating better prediction. The function handles all interaction combinations of major ASReml-R V4 random structures for the Site term: fa(), diag(), corgh(), corh(), us(), as well as structures for the variety term id(),ide(),vm() (see below for more info).

plot_accuracy() visualises these results with six complementary plot types covering single-model summaries and head-to-head model comparisons.


Mathematical Framework

Genetic variance and BLUP prediction error

Let \hat{u}_{ij} be the BLUP for variety i in environment j, with prediction error variance

\operatorname{PEV}_{ij} = \operatorname{Var}(\hat{u}_{ij} - u_{ij}).

The genetic (co)variance for environment j is denoted G_{jj} — the j-th diagonal of the \mathbf{G} (or the corresponding variance component for simpler structures such as diag()).


Mrode accuracy

Following Mrode (2014), the accuracy for variety i in environment j is

r_{ij} = \sqrt{1 - \frac{\operatorname{PEV}_{ij}}{G_{jj}}},

which is the square root of the reliability — the proportion of true genetic variance that is recovered by the BLUP. The mean accuracy for environment j is

\bar{r}_j = \frac{1}{n_j}\sum_{i=1}^{n_j} r_{ij},

where n_j is the number of varieties with observed data in environment j.

accuracy() invokes the argument only = in predict.asreml() to ensure the correct \operatorname{PEV}_{ij} are returned for the target random term supplied.


Generalised H²

Cullis et al. (2006) proposed a heritability analogue based on the mean pairwise squared standard error of difference (SED):

H^2_j = 1 - \frac{\overline{\text{SED}}^2_j}{2\,G_{jj}},

where \overline{\text{SED}}^2_j is the mean of the upper-triangular elements of the SED matrix for environment j. Unlike Mrode accuracy, H^2_j captures the precision of variety comparisons rather than individual predictions, and is equivalent to heritability on a variety-mean basis.


ASReml-R term specifications

The term argument of accuracy() takes the full interaction string from the model’s random = formula (auto-detected when NULL) that is used to specify the Site \times Variety or Site \times Treatment \times Variety interaction.

Supported ASReml-R covariance/correlation structures are:

Description Structure functions
Site relationships fa(), us(), corgh(), corh(), diag()
Variety relationships id(), ide(), vm()

For ASReml-R multi-site models that used corgh() and corh(), accuracy() will extract the estimated matrix and convert it to a fully unstructured variance matrix \mathbf{G} before accuracy or generalised heritability calculations are undertaken. For single-environment models users can omit the site components and supply the variety term alone, e.g. "id(Variety)" or "vm(Variety, K)".

Example term strings

# Multi-environment
accuracy(model, term = "fa(Site, 2):id(Variety)")
accuracy(model, term = "fa(Site, 1):vm(Variety, K)")
accuracy(model, term = "diag(Site):id(Variety)")
accuracy(model, term = "corgh(Site):id(Variety)")
accuracy(model, term = "us(Site):vm(Variety, K)")
# Single-environment
accuracy(model, term = "id(Variety)")
accuracy(model, term = "vm(Variety, K)")

Setup

Simulated data

Requires: ASReml-R V4. The vignette is pre-rendered so no licence is needed to read it; run the code chunks only if you have the package installed.

We use simTrialData() to generate a MET with 50 varieties across 8 environments, 2 replicates, and an "unbalanced" incidence structure. The auto-generated genetic covariance matrix \mathbf{G} ensures environments share genuine genetic covariance (pairwise correlations in the default range [0.20, 0.90]), and the unbalanced incidence ensures that varieties appear in different subsets of environments and this produces meaningful variety-to-variety variation in prediction error variance that makes the per-variety accuracy plots informative.

out <- simTrialData(
  nvar      = 50L,
  nsite     = 8L,
  nrep      = 2L,
  incidence = "unbalanced",
  seed      = 11L,
  verbose   = FALSE
)
dat         <- out$data
dat$Row     <- factor(dat$Row)
dat$Column  <- factor(dat$Column)
dat$Site    <- factor(dat$Site)
dat$Variety <- factor(dat$Variety)

# Connectivity summary
cat("Varieties per site: mean =", round(mean(colSums(out$params$incidence)), 1),
    " range [", min(colSums(out$params$incidence)), ",",
    max(colSums(out$params$incidence)), "]\n")
Varieties per site: mean = 32.2  range [ 28 , 37 ]
cat("Sites per variety:  mean =", round(mean(rowSums(out$params$incidence)), 1),
    " range [", min(rowSums(out$params$incidence)), ",",
    max(rowSums(out$params$incidence)), "]\n")
Sites per variety:  mean = 5.2  range [ 4 , 6 ]

A quick look at the data structure:

head(dat[, c("Site", "Variety", "Rep", "Row", "Column", "yield")])
  Site Variety Rep Row Column  yield
1 Env1   Var24   1   1      1 4261.8
2 Env1   Var20   1   1      2 4448.4
3 Env1   Var38   1   1      3 4371.9
4 Env1   Var01   1   1      4 4627.5
5 Env1   Var23   1   1      5 5393.0
6 Env1   Var03   1   1      6 5204.1

Models

We fit three random structures for the Site × Variety genetic term, in increasing order of complexity:

Model Variety random structure Genetic parameters
Diag diag(Site):id(Variety) 6 (one \sigma^2_j per site)
FA(1) fa(Site, 1):id(Variety) 12 (6 loadings + 6 specific variances)
FA(2) fa(Site, 2):id(Variety) 18 (12 loadings + 6 specific variances)

All three models include at(Site):Rep for replicate effects and a common homogeneous residual. The diag model assumes no genetic correlation between sites — varieties are re-ranked completely independently across environments. The FA models borrow information across environments: FA(1) captures one dominant axis of Site \times Variety interaction; FA(2) adds a second, allowing richer rank-change patterns of the varieties across sites.

m_diag <- asreml(
  yield    ~ Site,
  random   = ~ diag(Site):id(Variety) + at(Site):Rep,
  data     = dat,
  maxit    = 30L
)
m_diag <- update(m_diag)
m_fa1 <- asreml(
  yield    ~ Site,
  random   = ~ fa(Site, 1):id(Variety) + at(Site):Rep,
  data     = dat,
  maxit    = 30L
)
m_fa1 <- update(m_fa1)
m_fa2 <- asreml(
  yield    ~ Site,
  random   = ~ fa(Site, 2):id(Variety) + at(Site):Rep,
  data     = dat,
  maxit    = 30L
)
m_fa2 <- update(m_fa2)

A quick comparison of log-likelihoods confirms the FA models provide a better fit to the data. AIC is computed as -2 \ell + 2p where p is the number of variance parameters:

.aic <- function(m) -2 * m$loglik + 2 * nrow(summary(m)$varcomp)
data.frame(
  Model  = c("Diag", "FA(1)", "FA(2)"),
  LogLik = round(c(m_diag$loglik, m_fa1$loglik, m_fa2$loglik), 2),
  AIC    = round(c(.aic(m_diag),  .aic(m_fa1),  .aic(m_fa2)),  2)
)
  Model   LogLik     AIC
1  Diag -3324.67 6683.34
2 FA(1) -3307.76 6665.53
3 FA(2) -3303.50 6673.00

Computing accuracy with accuracy()

Group-level summaries

The default call returns both Mrode accuracy and Cullis H² for every site. Results are at the group level — one row per environment.

acc_diag <- accuracy(m_diag, term = "diag(Site):id(Variety)",
                     metric = c("accuracy", "gen.H2"))
acc_diag
  group n_vars     G_jj  mean_acc sd_acc    gen.H2
1  Env1     31 42455.56 0.6421706      0 0.4261292
2  Env2     28 35813.71 0.6094134      0 0.3851397
3  Env3     30 94677.26 0.7763371      0 0.6234820
4  Env4     35 86156.99 0.7641505      0 0.6011002
5  Env5     29 71125.42 0.7316075      0 0.5543655
6  Env6     37 45692.91 0.6574073      0 0.4441894
7  Env7     33 57384.15 0.6969455      0 0.5009121
8  Env8     35 79318.10 0.7513391      0 0.5811137
acc_fa1 <- accuracy(m_fa1, term = "fa(Site, 1):id(Variety)",
                    metric = c("accuracy", "gen.H2"))
acc_fa1
  group n_vars     G_jj  mean_acc      sd_acc    gen.H2
1  Env1     31 43627.74 0.7513244 0.010951316 0.5788061
2  Env2     28 35484.86 0.7118326 0.014463595 0.5204685
3  Env3     30 94638.84 0.8602221 0.013229917 0.7575082
4  Env4     35 79527.63 0.8130065 0.014177024 0.6768774
5  Env5     29 71800.83 0.7979008 0.010769787 0.6537135
6  Env6     37 44947.67 0.7183730 0.009046940 0.5286547
7  Env7     33 57023.52 0.7082196 0.001432365 0.5165038
8  Env8     35 77488.21 0.8144103 0.013056715 0.6791303
acc_fa2 <- accuracy(m_fa2, term = "fa(Site, 2):id(Variety)",
                    metric = c("accuracy", "gen.H2"))
acc_fa2
  group n_vars     G_jj  mean_acc      sd_acc    gen.H2
1  Env1     31 40356.25 0.7644036 0.018761968 0.5989723
2  Env2     28 41547.91 0.7554403 0.040908976 0.5875338
3  Env3     30 92519.54 0.8612144 0.019033489 0.7597134
4  Env4     35 78624.75 0.8286386 0.018439622 0.7028509
5  Env5     29 78774.04 0.8327142 0.021363641 0.7109168
6  Env6     37 46589.57 0.7215698 0.006399877 0.5334692
7  Env7     33 57249.83 0.7336345 0.011556271 0.5532187
8  Env8     35 84699.92 0.8447034 0.023837796 0.7307984

The G_jj column reports the environment-specific genetic variance used as the denominator. mean_acc is \bar{r}_j and sd_acc captures the spread of per-variety accuracies within the environment. gen.H2 is H^2_j.

Requesting a single metric

Pass a single string to metric when you only need one measure:

# Mrode accuracy only
accuracy(m_fa2, term = "fa(Site, 2):id(Variety)", metric = "accuracy")
  group n_vars     G_jj  mean_acc      sd_acc
1  Env1     31 40356.25 0.7644036 0.018761968
2  Env2     28 41547.91 0.7554403 0.040908976
3  Env3     30 92519.54 0.8612144 0.019033489
4  Env4     35 78624.75 0.8286386 0.018439622
5  Env5     29 78774.04 0.8327142 0.021363641
6  Env6     37 46589.57 0.7215698 0.006399877
7  Env7     33 57249.83 0.7336345 0.011556271
8  Env8     35 84699.92 0.8447034 0.023837796
# Cullis H² (gen.H2) only
accuracy(m_fa2, term = "fa(Site, 2):id(Variety)", metric = "gen.H2")
  group n_vars     G_jj    gen.H2
1  Env1     31 40356.25 0.5989723
2  Env2     28 41547.91 0.5875338
3  Env3     30 92519.54 0.7597134
4  Env4     35 78624.75 0.7028509
5  Env5     29 78774.04 0.7109168
6  Env6     37 46589.57 0.5334692
7  Env7     33 57249.83 0.5532187
8  Env8     35 84699.92 0.7307984

Per-variety accuracies

Set by_variety = TRUE to return one row per variety × environment. All combinations with a finite prediction standard error are returned, including variety × site combinations that were not directly observed in the data (for FA models the G-matrix supports prediction at unobserved combinations). The present column indicates which combinations were observed in the data (TRUE) and which are unobserved (FALSE).

acc_fa2_bv    <- accuracy(m_fa2,  term = "fa(Site, 2):id(Variety)",
                           metric = c("accuracy", "gen.H2"),
                           by_variety = TRUE)
acc_diag_bv   <- accuracy(m_diag, term = "diag(Site):id(Variety)",
                           metric = c("accuracy", "gen.H2"),
                           by_variety = TRUE)

head(acc_fa2_bv)
  group variety     G_jj      pev present  accuracy    gen.H2
1  Env1   Var01 40356.25 17361.30    TRUE 0.7548503 0.5336221
2  Env1   Var02 40356.25 17040.40    TRUE 0.7600992 0.5336221
3  Env1   Var03 40356.25 15624.40    TRUE 0.7828397 0.5336221
4  Env1   Var04 40356.25 20717.32   FALSE 0.6975953 0.5336221
5  Env1   Var05 40356.25 20631.04   FALSE 0.6991260 0.5336221
6  Env1   Var06 40356.25 15629.92    TRUE 0.7827525 0.5336221

The present column flags which variety × site combinations were directly observed in the trial data. For an unbalanced trial, some combinations will be FALSE where varieties are not grown at that site but still receive a BLUP prediction. in most instances they will have a a higher (worse) PEV.

# How many combinations are observed vs unobserved?
table(Observed = acc_fa2_bv$present)
Observed
FALSE  TRUE 
  142   258 

Visualising with plot_accuracy()

Lollipop — per-site accuracy summary

The lollipop plot is the go-to summary for group-level results. Each site appears as a row, with the lollipop head at the mean accuracy. Horizontal error bars span ±1 standard deviation of the per-variety accuracies (Mrode metric only). When both metrics are requested the plot facets into two panels.

plot_accuracy(acc_fa2, type = "lollipop",
              metric = c("accuracy", "gen.H2"))

A single-metric call gives a cleaner, unfaceted plot:

plot_accuracy(acc_fa2, type = "lollipop", metric = "accuracy")

Note that Cullis H² is systematically lower than Mrode accuracy — H^2 measures comparison precision while r measures individual prediction reliability, and the two capture subtly different aspects of model quality.


Lollipop — observed vs unobserved accuracy

Passing by_variety = TRUE data to type = "lollipop" automatically triggers a dumbbell layout when both observed and unobserved combinations are present. The blue dot shows the mean accuracy across varieties directly observed at each site; the orange dot shows the mean accuracy for unobserved varieties. The dashed segment connecting the two makes the size of the accuracy gap visible at a glance.

plot_accuracy(acc_fa2_bv, type = "lollipop", metric = "accuracy")

Sites where the two dots sit close together have strong inter-site genetic correlations — the FA G-matrix can borrow enough information from other sites to predict unobserved varieties almost as well as observed ones. Sites with a wider gap have weaker covariance structure and provide less support for unobserved predictions.


Violin — per-variety accuracy distribution

The violin plot requires by_variety = TRUE data and displays the Mrode Accuracy metric only. Cullis H² is a group-level constant and is suppressed automatically — use type = "lollipop" to visualise it. The width of each violin reflects the spread of per-variety accuracies within each environment. Individual points are coloured by present status: blue for varieties directly observed at that site, orange for varieties that were unobserved.

plot_accuracy(acc_fa2_bv, type = "violin", metric = "accuracy")

The orange points cluster at the lower end of each violin, confirming that unobserved varieties have higher PEV and lower accuracy. “Core” varieties observed at most sites appear as blue points at the high-accuracy end; rarely-observed varieties as blue points lower down; and the orange unobserved predictions occupy the tail.


Heatmap — variety × site accuracy grid

The heatmap visualises per-variety accuracy across all environments simultaneously. The full variety × site grid is displayed — observed combinations (present = TRUE) and unobserved combinations (present = FALSE) alike. Varieties are sorted from highest mean accuracy (top) to lowest (bottom); environments are ordered by increasing mean accuracy. The viridis colour scale maps accuracy onto a perceptually uniform gradient. A small grey circle marks each tile where the variety was not observed at that site (present = FALSE); the underlying colour still shows the model-based accuracy for that combination.

plot_accuracy(acc_fa2_bv, type = "heatmap", metric = "accuracy")

Tiles with a grey circle are systematically darker than their circled-free neighbours in the same column, reflecting the higher PEV of unobserved predictions. A column that is uniformly darker than its neighbours indicates an environment where all varieties are predicted less reliably, regardless of observed status.


Accuracy gap — unobserved vs observed

Passing by_variety = TRUE data to type = "diff" without a second model computes the accuracy gap per site: the signed difference between the mean accuracy of unobserved varieties and the mean accuracy of observed varieties (Unobserved − Observed). All sites are expected to show a negative gap — unobserved varieties have higher PEV, so their accuracy is lower.

plot_accuracy(acc_fa2_bv, type = "diff", metric = "accuracy")

Sites with the smallest gap (least negative) have the strongest inter-site genetic correlations. The connecticity in the FA modelling approach can borrow enough information from neighbouring sites to support near-observed-quality predictions. Sites with a larger gap are more informationally isolated: varieties not grown there receive much less support from the rest of the trial.


Comparing models using the res2 argument

The three comparison plot types take a second accuracy object via res2 and directly show the change in accuracy between models. This is where the story of moving from diag to FA models becomes clearly visible.

Dumbbell — paired group-level comparison

The dumbbell plot places two dots per site connected by a horizontal segment. The segment colour encodes the direction of change: green when the second model improves accuracy, red when it declines.

Diag → FA(1)

plot_accuracy(acc_diag, acc_fa1,
              type   = "dumbbell",
              metric = "accuracy",
              label1 = "Diag",
              label2 = "FA(1)")

FA(1) → FA(2)

plot_accuracy(acc_fa1, acc_fa2,
              type   = "dumbbell",
              metric = "accuracy",
              label1 = "FA(1)",
              label2 = "FA(2)")

With both metrics faceted:

plot_accuracy(acc_diag, acc_fa2,
              type   = "dumbbell",
              metric = c("accuracy", "gen.H2"),
              label1 = "Diag",
              label2 = "FA(2)")

The improvement in FA(2) over diag tends to be larger for environments with stronger GEI — those are the environments where borrowing information via the factor structure is most beneficial.


Scatter — per-variety comparison

The scatter plot places each variety × environment combination as a point, with model1 accuracy on the x-axis and model2 accuracy on the y-axis. Points above the dashed diagonal (x = y) represent variety × environment combinations where model2 is more accurate. Points are coloured by environment.

plot_accuracy(acc_diag_bv, acc_fa2_bv,
              type   = "scatter",
              metric = "accuracy",
              label1 = "Diag",
              label2 = "FA(2)")

The scatter plot reveals whether the accuracy gain is uniform across the genetic material or concentrated in particular varieties — for instance, rare or poorly-replicated varieties often benefit most from the additional shrinkage imposed by the FA structure.


Diff — signed accuracy gain between models

The diff plot shows (model2 − model1) for each environment as a lollipop centred on zero. It distils the dumbbell into a single signed quantity, making it easy to rank environments by how much they benefit from the more complex model. The x-axis is always symmetric around zero.

Diag → FA(2)

plot_accuracy(acc_diag, acc_fa2,
              type   = "diff",
              metric = "accuracy",
              label1 = "Diag",
              label2 = "FA(2)")

FA(1) → FA(2), both metrics

plot_accuracy(acc_fa1, acc_fa2,
              type   = "diff",
              metric = c("accuracy", "gen.H2"),
              label1 = "FA(1)",
              label2 = "FA(2)")


Extracting plot data — return_data = TRUE

All six plot types support return_data = TRUE, returning a named list with $plot and $data. This is useful for further customisation or downstream analysis.

rd <- plot_accuracy(acc_diag, acc_fa2,
                    type        = "diff",
                    metric      = c("accuracy", "gen.H2"),
                    label1      = "Diag",
                    label2      = "FA(2)",
                    return_data = TRUE)

head(rd$data)
  group   metric_label   value_1 n_vars   value_2   diff_val gain_dir
1  Env1        Gen. H² 0.4261292     31 0.5989723 0.17284307 Improved
2  Env1 Mrode Accuracy 0.6421706     31 0.7644036 0.12223299 Improved
3  Env2        Gen. H² 0.3851397     28 0.5875338 0.20239409 Improved
4  Env2 Mrode Accuracy 0.6094134     28 0.7554403 0.14602683 Improved
5  Env3        Gen. H² 0.6234820     30 0.7597134 0.13623137 Improved
6  Env3 Mrode Accuracy 0.7763371     30 0.8612144 0.08487727 Improved

The ggplot object is fully extensible with standard ggplot2 additions:

rd$plot +
  ggplot2::ggtitle(
    "Accuracy gain: FA(2) over Diag",
    subtitle = "Positive values indicate FA(2) is more accurate"
  )


Summary

accuracy() arguments

Argument Default Description
model Fitted ASReml-R V4 model object
term Auto-detected Full random-effect interaction term, e.g. "fa(Site, 2):id(Variety)" or "corgh(Site):vm(Variety, giv1)"; auto-detected from the model formula when NULL
metric c("accuracy","gen.H2") Metrics to compute; one or both
pworkspace "2gb" Passed to predict.asreml()
by_variety FALSE Return one row per variety × group; includes a present column (TRUE = observed in data, FALSE = unobserved)

plot_accuracy() types

Type Input Best for
"lollipop" Group-level or by_variety Accuracy summary per site; dumbbell of observed vs unobserved when present split present
"violin" by_variety Distribution of accuracy; points coloured blue (observed) / orange (unobserved)
"heatmap" by_variety Variety × site accuracy grid; grey circle marks unobserved tiles
"dumbbell" Group-level + res2 Paired model comparison per site
"scatter" by_variety + res2 Per-variety model comparison
"diff" Group-level + res2 or by_variety alone Signed model accuracy gain; or unobserved vs observed accuracy gap when no res2

Faceting into two panels is automatic when metric = c("accuracy","gen.H2").


References

Cullis, B.R., Smith, A.B. & Coombes, N.E. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological and Environmental Statistics, 11(4), 381–393. https://doi.org/10.1198/108571106X154443

Mrode, R.A. (2014). Linear Models for the Prediction of Animal Breeding Values, 3rd edition. CAB International, Wallingford.

Smith, A., Cullis, B. & Thompson, R. (2001). Analysing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics, 57(4), 1138–1147. https://doi.org/10.1111/j.0006-341X.2001.01138.x

Gilmour, A.R., Gogel, B.J., Cullis, B.R. & Thompson, R. (2009). ASReml User Guide Release 3.0. VSN International, Hemel Hempstead.