Skip to contents

Generates a balanced or unbalanced field trial dataset with a realistic genetic covariance structure across environments.

Trial structure: treatments = NULL produces a MET-only randomised complete block design. Supply treatments = c("T0", "T1", ...) for a split-plot design where whole plots are treatment strips and sub-plots are varieties; the genetic structure then operates over all Treatment x Site (TSite) combinations.

Genetic covariance is controlled by the G argument:

  • G = "auto" (default): a random symmetric positive-definite covariance matrix is generated with pairwise correlations approximately in the range (g_cor_min, g_cor_max), set via sim.options. Scales to sigma_genetic.

  • G = matrix: a user-supplied J x J symmetric positive-definite covariance matrix used directly. J = nsite for MET-only, or J = nsite * length(treatments) for multi-treatment designs. Group order must match outer(treatments, sites).

Variety BLUPs are drawn from \(\mathcal{N}(\mathbf{0}, \mathbf{G})\) via Cholesky decomposition regardless of which mode is used.

Incidence structure (incidence argument) controls which varieties are observed at which sites. All nvar varieties have a true genetic value at every site but only observed varieties contribute plots.

Usage

simTrialData(
  nvar = 20L,
  nsite = 10L,
  treatments = NULL,
  nrep = 2L,
  G = "auto",
  incidence = "balanced",
  seed = NULL,
  verbose = TRUE,
  sim.options = list()
)

Arguments

nvar

Integer. Maximum number of varieties. Default 20.

nsite

Integer. Number of environments (sites). Default 10.

treatments

Character vector of treatment labels (length >= 2), or NULL (default) for a MET-only trial with no treatment structure.

nrep

Integer. Replicates per environment. Default 2.

G

Controls the true genetic covariance structure. One of:

"auto"

(Default) Auto-generate a random SPD covariance matrix with pairwise correlations approximately in the range (g_cor_min, g_cor_max) set via sim.options. Scale is controlled by sigma_genetic.

Matrix

A user-supplied J x J symmetric positive-definite covariance matrix used directly. J = nsite for MET-only designs or J = nsite * length(treatments) for multi-treatment designs. Group order must match outer(treatments, sites).

incidence

Controls which varieties are observed at which sites. One of:

"balanced"

(Default) All nvar varieties are observed at every site. Produces identical PEV for all varieties within a site.

"unbalanced"

Automatically generates a two-tier incidence structure. Proportions are controlled by five sim.options entries: core_pct, core_min_sites_pct, reg_min_sites_pct, reg_max_sites_pct, and min_vars_pct (see below).

Matrix

An nvar x nsite matrix of 0/1 or TRUE/FALSE. Entry [v, s] = 1 means variety v is observed at site s. Every variety must appear in at least 1 site; every site must have at least 2 varieties.

seed

Integer or NULL. Passed to set.seed. Default NULL.

verbose

Logical. Print a design summary and suggested ASReml-R model. Default TRUE.

sim.options

Named list of optional simulation controls. Unknown names produce a warning. Recognised elements (with defaults) are:

site_mean

Numeric. Grand mean yield. Default 4500. When changed, any of the six scale-dependent parameters below that are not explicitly supplied are automatically scaled proportionally to site_mean / 4500, preserving the default signal-to-noise ratios at any yield scale. Supply a parameter explicitly to override auto-scaling for that parameter.

site_sd

Numeric. SD of site mean yields. Auto-scaled default: site_mean * (600 / 4500).

treat_effects

Numeric vector (length = length(treatments)) of fixed treatment effects, or NULL for auto-spacing from 0 to site_mean * 0.10. Multi-treatment only.

sigma_genetic

Numeric. Target mean genetic SD per group. Auto-scaled default: site_mean * (250 / 4500).

rep_sd

Numeric. SD of replicate effects. Auto-scaled default: site_mean * (150 / 4500).

row_sd

Numeric. SD of row spatial effects. Auto-scaled default: site_mean * (80 / 4500).

col_sd

Numeric. SD of column spatial effects. Auto-scaled default: site_mean * (60 / 4500).

error_sd

Numeric. SD of the independent (nugget) plot error. Auto-scaled default: site_mean * (350 / 4500).

ar1_rho_row

Controls the first-order autoregressive (AR1) spatial correlation in the row direction. NULL (default) disables the spatial component entirely. A scalar in \((-1, 1)\) applies the same correlation at every site. A length-2 vector c(lo, hi) draws an independent per-site correlation uniformly from [lo, hi], so each site has its own spatial structure.

ar1_rho_col

Controls AR1 correlation in the column direction. Same specification as ar1_rho_row. Default NULL.

sigma_spatial

Numeric. SD of the spatially correlated error component. Active only when at least one of ar1_rho_row or ar1_rho_col is non-NULL. When not supplied, defaults to error_sd * 0.5 (at the current auto-scaled level). The covariance between plots at row-distance \(\Delta r\) and column-distance \(\Delta c\) within a site is \(\sigma^2_{\text{spatial}} \rho_r^{|\Delta r|} \rho_c^{|\Delta c|}\), and total plot-level variance is \(\sigma^2_{\text{spatial}} + \sigma^2_{\text{error}}\).

sep

Character. Separator for TSite labels. Default "-".

variety_prefix

Character. Prefix for variety labels. Default "Var".

site_prefix

Character. Prefix for site labels. Default "Env".

outfile

Character path or NULL. CSV output. Default NULL.

g_cor_min

Numeric in (-1, 1). Minimum pairwise genetic correlation for the auto-generated G matrix. Used only when G = "auto". Default 0.20.

g_cor_max

Numeric in (-1, 1]. Maximum pairwise genetic correlation. Must be > g_cor_min. Used only when G = "auto". Default 0.90.

core_pct

Numeric in (0, 1). Proportion of varieties designated as "core" entries when incidence = "unbalanced". Default 0.20.

core_min_sites_pct

Numeric in (0, 1]. Minimum proportion of sites that each core variety must appear in. Default 0.75.

reg_min_sites_pct

Numeric in (0, 1]. Minimum proportion of sites for regular (non-core) varieties. Default 0.40.

reg_max_sites_pct

Numeric in (0, 1]. Maximum proportion of sites for regular varieties. Must be >= reg_min_sites_pct. Default 0.85.

min_vars_pct

Numeric in (0, 1]. Minimum proportion of nvar that every site must contain. Default 0.40.

Value

A named list:

data

Data frame ordered by Site, Row, Column with columns: Site, Variety, Rep, Row, Column, yield (MET-only), or additionally Treatment and TSite (multi-treatment). Only observed variety x site combinations are included.

params

Named list of true simulation parameters:

G

True genetic covariance matrix (J x J).

site_means

True site mean yields (length nsite).

incidence

Integer nvar x nsite incidence matrix.

treat_effects

True treatment fixed effects (multi-treatment only).

g_arr

nvar x ngroup matrix of true genetic BLUPs. Always returned. Column names are site names for MET-only designs and TSite labels for multi-treatment designs.

Examples

if (FALSE) { # \dontrun{
# Simplest call — just run it
out <- simTrialData(verbose = FALSE)
head(out$data)

# MET-only: 30 varieties, 8 sites
out <- simTrialData(nvar = 30, nsite = 8, seed = 42)
round(cov2cor(out$params$G), 2)   # true genetic correlations

# Unbalanced: produces spread in per-variety accuracy
out2 <- simTrialData(nvar = 30, nsite = 8,
                     incidence = "unbalanced", seed = 42)
table(rowSums(out2$params$incidence))   # sites per variety

# Custom correlation range
out3 <- simTrialData(nvar = 20, nsite = 6, seed = 99,
                     sim.options = list(g_cor_min = 0.3, g_cor_max = 0.8))
round(cov2cor(out3$params$G), 2)

# User-supplied G matrix
G_true <- matrix(c(40000, 15000, 15000, 40000), 2, 2)
out4 <- simTrialData(nvar = 20, nsite = 2, G = G_true, seed = 10)

# User-supplied incidence matrix
inc <- matrix(1L, nrow = 20, ncol = 6)
inc[1:5, 1:3] <- 0L
out5 <- simTrialData(nvar = 20, nsite = 6, incidence = inc, seed = 1)

# Multi-treatment split-plot
out6 <- simTrialData(nvar = 20, nsite = 6,
                     treatments  = c("T0", "T1", "T2"),
                     incidence   = "unbalanced",
                     seed        = 1,
                     sim.options = list(treat_effects = c(0, 150, 350)))
} # }