Simulate Multi-Environment Plant Breeding Trial Data
simTrialData.RdGenerates a balanced or unbalanced field trial dataset with a realistic genetic covariance structure across environments.
Trial structure: treatments = NULL produces a MET-only randomised
complete block design. Supply treatments = c("T0", "T1", ...) for a
split-plot design where whole plots are treatment strips and sub-plots are
varieties; the genetic structure then operates over all Treatment x Site
(TSite) combinations.
Genetic covariance is controlled by the G argument:
G = "auto"(default): a random symmetric positive-definite covariance matrix is generated with pairwise correlations approximately in the range (g_cor_min,g_cor_max), set viasim.options. Scales tosigma_genetic.G = matrix: a user-supplied J x J symmetric positive-definite covariance matrix used directly. J =nsitefor MET-only, or J =nsite * length(treatments)for multi-treatment designs. Group order must matchouter(treatments, sites).
Variety BLUPs are drawn from \(\mathcal{N}(\mathbf{0}, \mathbf{G})\) via Cholesky decomposition regardless of which mode is used.
Incidence structure (incidence argument) controls which varieties
are observed at which sites. All nvar varieties have a true genetic
value at every site but only observed varieties contribute plots.
Usage
simTrialData(
nvar = 20L,
nsite = 10L,
treatments = NULL,
nrep = 2L,
G = "auto",
incidence = "balanced",
seed = NULL,
verbose = TRUE,
sim.options = list()
)Arguments
- nvar
Integer. Maximum number of varieties. Default
20.- nsite
Integer. Number of environments (sites). Default
10.- treatments
Character vector of treatment labels (length >= 2), or
NULL(default) for a MET-only trial with no treatment structure.- nrep
Integer. Replicates per environment. Default
2.- G
Controls the true genetic covariance structure. One of:
"auto"(Default) Auto-generate a random SPD covariance matrix with pairwise correlations approximately in the range (
g_cor_min,g_cor_max) set viasim.options. Scale is controlled bysigma_genetic.- Matrix
A user-supplied J x J symmetric positive-definite covariance matrix used directly. J =
nsitefor MET-only designs or J =nsite * length(treatments)for multi-treatment designs. Group order must matchouter(treatments, sites).
- incidence
Controls which varieties are observed at which sites. One of:
"balanced"(Default) All
nvarvarieties are observed at every site. Produces identical PEV for all varieties within a site."unbalanced"Automatically generates a two-tier incidence structure. Proportions are controlled by five
sim.optionsentries:core_pct,core_min_sites_pct,reg_min_sites_pct,reg_max_sites_pct, andmin_vars_pct(see below).- Matrix
An
nvar x nsitematrix of0/1orTRUE/FALSE. Entry[v, s] = 1means varietyvis observed at sites. Every variety must appear in at least 1 site; every site must have at least 2 varieties.
- seed
Integer or
NULL. Passed toset.seed. DefaultNULL.- verbose
Logical. Print a design summary and suggested ASReml-R model. Default
TRUE.- sim.options
Named list of optional simulation controls. Unknown names produce a warning. Recognised elements (with defaults) are:
site_meanNumeric. Grand mean yield. Default
4500. When changed, any of the six scale-dependent parameters below that are not explicitly supplied are automatically scaled proportionally tosite_mean / 4500, preserving the default signal-to-noise ratios at any yield scale. Supply a parameter explicitly to override auto-scaling for that parameter.site_sdNumeric. SD of site mean yields. Auto-scaled default:
site_mean * (600 / 4500).treat_effectsNumeric vector (length =
length(treatments)) of fixed treatment effects, orNULLfor auto-spacing from 0 tosite_mean * 0.10. Multi-treatment only.sigma_geneticNumeric. Target mean genetic SD per group. Auto-scaled default:
site_mean * (250 / 4500).rep_sdNumeric. SD of replicate effects. Auto-scaled default:
site_mean * (150 / 4500).row_sdNumeric. SD of row spatial effects. Auto-scaled default:
site_mean * (80 / 4500).col_sdNumeric. SD of column spatial effects. Auto-scaled default:
site_mean * (60 / 4500).error_sdNumeric. SD of the independent (nugget) plot error. Auto-scaled default:
site_mean * (350 / 4500).ar1_rho_rowControls the first-order autoregressive (AR1) spatial correlation in the row direction.
NULL(default) disables the spatial component entirely. A scalar in \((-1, 1)\) applies the same correlation at every site. A length-2 vectorc(lo, hi)draws an independent per-site correlation uniformly from[lo, hi], so each site has its own spatial structure.ar1_rho_colControls AR1 correlation in the column direction. Same specification as
ar1_rho_row. DefaultNULL.sigma_spatialNumeric. SD of the spatially correlated error component. Active only when at least one of
ar1_rho_roworar1_rho_colis non-NULL. When not supplied, defaults toerror_sd * 0.5(at the current auto-scaled level). The covariance between plots at row-distance \(\Delta r\) and column-distance \(\Delta c\) within a site is \(\sigma^2_{\text{spatial}} \rho_r^{|\Delta r|} \rho_c^{|\Delta c|}\), and total plot-level variance is \(\sigma^2_{\text{spatial}} + \sigma^2_{\text{error}}\).sepCharacter. Separator for
TSitelabels. Default"-".variety_prefixCharacter. Prefix for variety labels. Default
"Var".site_prefixCharacter. Prefix for site labels. Default
"Env".outfileCharacter path or
NULL. CSV output. DefaultNULL.g_cor_minNumeric in (-1, 1). Minimum pairwise genetic correlation for the auto-generated G matrix. Used only when
G = "auto". Default0.20.g_cor_maxNumeric in (-1, 1]. Maximum pairwise genetic correlation. Must be >
g_cor_min. Used only whenG = "auto". Default0.90.core_pctNumeric in (0, 1). Proportion of varieties designated as "core" entries when
incidence = "unbalanced". Default0.20.core_min_sites_pctNumeric in (0, 1]. Minimum proportion of sites that each core variety must appear in. Default
0.75.reg_min_sites_pctNumeric in (0, 1]. Minimum proportion of sites for regular (non-core) varieties. Default
0.40.reg_max_sites_pctNumeric in (0, 1]. Maximum proportion of sites for regular varieties. Must be >=
reg_min_sites_pct. Default0.85.min_vars_pctNumeric in (0, 1]. Minimum proportion of
nvarthat every site must contain. Default0.40.
Value
A named list:
dataData frame ordered by
Site,Row,Columnwith columns:Site,Variety,Rep,Row,Column,yield(MET-only), or additionallyTreatmentandTSite(multi-treatment). Only observed variety x site combinations are included.paramsNamed list of true simulation parameters:
GTrue genetic covariance matrix (J x J).
site_meansTrue site mean yields (length
nsite).incidenceInteger
nvar x nsiteincidence matrix.treat_effectsTrue treatment fixed effects (multi-treatment only).
g_arrnvar x ngroupmatrix of true genetic BLUPs. Always returned. Column names are site names for MET-only designs andTSitelabels for multi-treatment designs.
Examples
if (FALSE) { # \dontrun{
# Simplest call — just run it
out <- simTrialData(verbose = FALSE)
head(out$data)
# MET-only: 30 varieties, 8 sites
out <- simTrialData(nvar = 30, nsite = 8, seed = 42)
round(cov2cor(out$params$G), 2) # true genetic correlations
# Unbalanced: produces spread in per-variety accuracy
out2 <- simTrialData(nvar = 30, nsite = 8,
incidence = "unbalanced", seed = 42)
table(rowSums(out2$params$incidence)) # sites per variety
# Custom correlation range
out3 <- simTrialData(nvar = 20, nsite = 6, seed = 99,
sim.options = list(g_cor_min = 0.3, g_cor_max = 0.8))
round(cov2cor(out3$params$G), 2)
# User-supplied G matrix
G_true <- matrix(c(40000, 15000, 15000, 40000), 2, 2)
out4 <- simTrialData(nvar = 20, nsite = 2, G = G_true, seed = 10)
# User-supplied incidence matrix
inc <- matrix(1L, nrow = 20, ncol = 6)
inc[1:5, 1:3] <- 0L
out5 <- simTrialData(nvar = 20, nsite = 6, incidence = inc, seed = 1)
# Multi-treatment split-plot
out6 <- simTrialData(nvar = 20, nsite = 6,
treatments = c("T0", "T1", "T2"),
incidence = "unbalanced",
seed = 1,
sim.options = list(treat_effects = c(0, 150, 350)))
} # }