Fit a spatial or spatiotemporal predictive-process GLMM with TMB. This can be useful for (dynamic) species distribution models and relative abundance index standardization among many other uses.

sdmTMB(
  formula,
  data,
  spde,
  time = NULL,
  family = gaussian(link = "identity"),
  time_varying = NULL,
  weights = NULL,
  extra_time = NULL,
  reml = FALSE,
  silent = TRUE,
  anisotropy = FALSE,
  control = sdmTMBcontrol(),
  priors = sdmTMBpriors(),
  fields = c("IID", "AR1", "RW"),
  include_spatial = TRUE,
  spatial_trend = FALSE,
  spatial_only = identical(length(unique(data[[time]])), 1L),
  share_range = TRUE,
  previous_fit = NULL,
  epsilon_predictor = NULL,
  do_fit = TRUE,
  ...
)

Arguments

formula

Model formula. See the Details section below for how to specify offsets and threshold parameters. For index standardization, you may wish to include 0 + as.factor(year) (or whatever the time column is called) in the formula. IID random intercepts are possible using lme4 syntax, e.g., + (1 | g) where g is a column with factor levels. Penalized splines are possible via mgcv with s(). See examples and details below.

data

A data frame.

spde

An object from make_mesh().

time

An optional time column name (as character). Can be left as NULL for a model with only spatial random fields unless you wish to use one of the index or center of gravity functions over time.

family

The family and link. Supports gaussian(), Gamma(), binomial(), poisson(), Beta(), nbinom2(), and tweedie()]. For binomial family options, see the 'Binomial families' in the Details section below.

time_varying

An optional formula describing covariates that should be modelled as a random walk through time. Be careful not to include covariates (including the intercept) in both the main and time-varying formula. I.e., at least one should have ~ 0 or ~ -1.

weights

Optional likelihood weights for the conditional model. Implemented as in glmmTMB. In other words, weights do not have to sum to one and are not internally modified. Can also be used for trials with the binomial family. See Details below.

extra_time

Optional extra time slices (e.g., years) to include for interpolation or forecasting with the predict function. See the Details section below.

reml

Logical: use REML (restricted maximum likelihood) estimation rather than maximum likelihood? Internally, this adds the fixed effects to the list of random effects to intetgrate over.

silent

Silent or include optimization details?

anisotropy

Logical: allow for anisotropy? See plot_anisotropy().

control

Optimization control options. See sdmTMBcontrol().

priors

Optional penalties/priors. See sdmTMBpriors().

fields

Estimate the spatiotemporal random fields as "IID" (independent and identically distributed; default), stationary "AR1" (first-order autoregressive), or as a random walk ("RW"). Note that the spatiotemporal standard deviation represents the marginal steady-state standard deviation of the process in the case of the AR1. I.e., it is scaled according to the correlation. See the TMB documentation. If the AR1 correlation coefficient (rho) is estimated close to 1, say > 0.99, then you may wish to switch to the random walk "RW".

include_spatial

Should a separate spatial random field be estimated? If enabled then there will be separate spatial and spatiotemporal fields.

spatial_trend

Should a separate spatial field be included in the trend that represents local time trends? Requires spatiotemporal data. See doi: 10.1111/ecog.05176 and the spatial trends vignette.

spatial_only

Logical: should only a spatial model be fit (i.e., do not include spatiotemporal random effects)? By default a spatial-only model will be fit if there is only one unique value in the time column or the time argument is left at its default value of NULL.

share_range

Logical: estimate a shared spatial and spatiotemporal range parameter (TRUE) or independent range parameters (FALSE).

previous_fit

A previously fitted sdmTMB model to initialize the optimization with. Can greatly speed up fitting. Note that the data and model must be set up exactly the same way. However, the weights argument can change, which can be useful for cross-validation.

epsilon_predictor

(Experimental) A column name (as character) of a predictor of a linear trend (in log space) of the spatiotemporal standard deviation. By default, this is NULL and fits a model with a constant spatiotemporal variance. However, this argument can also be a character name in the original data frame (a covariate that ideally has been standardized to have mean 0 and standard deviation = 1). Because the spatiotemporal field varies by time step, the standardization should be done by time. If the name of a predictor is included, a log-linear model is fit where the predictor is used to model effects on the standard deviation, e.g. log(sd(i)) = B0 + B1 * epsilon_predictor(i).

do_fit

Fit the model (TRUE) or return the processed data without fitting (FALSE)?

...

Not currently used.

Details

Model description

For now, see the model description vignette for a start. There are also descriptions of particular models in Anderson et al. (2019) and Barnett et al. (2020) (see reference list below).

Offsets

In the model formula, an offset can be included by including + offset in the model formula (a reserved word). The offset will be included in any prediction. offset must be a column in data.

Binomial families

Following the structure of stats::glm() and glmmTMB, a binomial family can be specified in one of 4 ways: (1) the response may be a factor (and the model classifies the first level versus all others), (2) the response may be binomial (0/1), (3) the response can be a matrix of form cbind(success, failure), and (4) the response may be the observed proportions, and the 'weights' argument is used to specify the Binomial size (N) parameter (prob ~ ..., weights = N).

Smooth terms

Smooth terms can be included following GAMs (generalized additive models) using + s(x), which implements a smooth from mgcv::s(). sdmTMB uses penalized smooths, constructed via mgcv::smooth2random(). This is a similar approach implemented in gamm4 and brms, among other packages. Within these smooths, the same syntax commonly used in mgcv::s() can be applied, e.g. 2-dimensional smooths may be constructed with + s(x, y); smooths can be specific to various factor levels, + s(x, by = group); the basis function dimensions may be specified, e.g. + s(x, k = 4); and various types of splines may be constructed such as cyclic splines to model seasonality, + s(month, bs = "cc", k = 12). Prior to version 0.0.18, sdmTMB implemented unpenalized splines.

Threshold models

A linear break-point relationship for a covariate can be included via + breakpt(variable) in the formula, where variable is a single covariate corresponding to a column in data. In this case, the relationship is linear up to a point and then constant.

Similarly, a logistic-function threshold model can be included via + logistic(variable). This option models the relationship as a logistic function of the 50% and 95% values. This is similar to length- or size-based selectivity in fisheries, and is parameterized by the points at which f(x) = 0.5 or 0.95. See the vignette.

Note that only a single threshold covariate can be included.

See the threshold vignette.

Forecasting or interpolating

Extra time slices (e.g., years) can be included for interpolation or forecasting with the predict function via the extra_time argument. The predict function requires all time slices to be defined when fitting the model to ensure the various time indices are set up correctly. Be careful if including extra time slices that the model remains identifiable. For example, including + as.factor(year) in formula will render a model with no data to inform the expected value in a missing year. sdmTMB() makes no attempt to determine if the model makes sense for forecasting or interpolation. The options time_varying, fields = "RW", and fields = "AR1" provide mechanisms to predict over missing time slices with process error.

Index standardization

For index standardization, you may wish to include 0 + as.factor(year) (or whatever the time column is called) in the formula. See a basic example of index standardization in the relevant package vignette. You will need to specify the time argument. See get_index() and/or get_index_sims().

Regularization and priors

You can achieve regularization via penalties (priors) on the fixed effect parameters. See sdmTMBpriors(). These should not include offset terms and care should be taken if used with splines. You can fit the model once without penalties and inspect the element head(your_model$tmb_data$X_ij) if you want to see how the formula is translated to the fixed effect model matrix.

References

Main reference/report introducing the package. We plan to write a paper to cite in the near future:

Anderson, S.C., E.A. Keppel, A.M. Edwards, 2019. A reproducible data synopsis for over 100 species of British Columbia groundfish. DFO Can. Sci. Advis. Sec. Res. Doc. 2019/041. vii + 321 p. https://www.dfo-mpo.gc.ca/csas-sccs/Publications/ResDocs-DocRech/2019/2019_041-eng.html

Reference for local trends:

Barnett, L.A.K., E.J. Ward, S.C. Anderson. Improving estimates of species distribution change by incorporating local trends. Ecography. 44(3):427-439. doi: 10.1111/ecog.05176 .

Further explanation of the model and application to calculating climate velocities:

English, P., E.J. Ward, C.N. Rooper, R.E. Forrest, L.A. Rogers, K.L. Hunter, A.M. Edwards, B.M. Connors, S.C. Anderson. 2021. Contrasting climate velocity impacts in warm and cool locations show that effects of marine warming are worse in already warmer temperate waters. In press at Fish and Fisheries. doi: 10.1111/faf.12613 .

Code for implementing the barrier-SPDE written by Olav Nikolai Breivik and Hans Skaug and adapted via the VAST R package.

A number of sections of the original TMB model code were adapted from the VAST R package:

Thorson, J.T., 2019. Guidance for decisions using the Vector Autoregressive Spatio-Temporal (VAST) package in stock, ecosystem, habitat and climate assessments. Fish. Res. 210:143–161. doi: 10.1016/j.fishres.2018.10.013 .

Code for the family R-to-TMB implementation, selected parameterizations of the observation distributions, general package structure inspiration, and the idea behind the TMB prediction approach were adapted from the glmmTMB R package:

Mollie E. Brooks, Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler and Benjamin M. Bolker (2017). glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal, 9(2):378-400. doi: 10.32614/rj-2017-066 .

Examples

if (inla_installed()) {

pcod_spde <- make_mesh(pcod_2011, c("X", "Y"), cutoff = 25) # a coarse mesh for example speed
plot(pcod_spde)

# Tweedie:
m <- sdmTMB(density ~ 0 + depth_scaled + depth_scaled2 + as.factor(year),
  data = pcod_2011, time = "year", spde = pcod_spde, family = tweedie(link = "log"))
print(m)
tidy(m, conf.int = TRUE)
tidy(m, effects = "ran_par", conf.int = TRUE)

# Run extra optimization steps to help convergence:
# (Not typically needed)
m1 <- run_extra_optimization(m, nlminb_loops = 0, newton_loops = 1)
max(m$gradients)
max(m1$gradients)

# Bernoulli:
pcod_binom <- pcod_2011
pcod_binom$present <- ifelse(pcod_binom$density > 0, 1L, 0L)
m_bin <- sdmTMB(present ~ 0 + as.factor(year) + depth_scaled + depth_scaled2,
  data = pcod_binom, time = "year", spde = pcod_spde,
  family = binomial(link = "logit"))
print(m_bin)

# Fit a spatial-only model (by not specifying `time`):
m <- sdmTMB(
  density ~ depth_scaled + depth_scaled2, data = pcod_2011,
  spde = pcod_spde, family = tweedie(link = "log"))
print(m)

# Gaussian:
pcod_gaus <- subset(pcod_2011, density > 0 & year >= 2013)
pcod_spde_gaus <- make_mesh(pcod_gaus, c("X", "Y"), cutoff = 30)
m_pos <- sdmTMB(log(density) ~ 0 + as.factor(year) + depth_scaled + depth_scaled2,
  data = pcod_gaus, time = "year", spde = pcod_spde_gaus)
print(m_pos)

# With p-splines via mgcv:
m_gam <- sdmTMB(log(density) ~ s(depth_scaled),
  data = pcod_gaus, time = "year", spde = pcod_spde_gaus)

# With IID random intercepts:
# Simulate some data:
set.seed(1)
x <- runif(500, -1, 1)
y <- runif(500, -1, 1)
loc <- data.frame(x = x, y = y)
spde <- make_mesh(loc, c("x", "y"), n_knots = 50, type = "kmeans")
s <- sdmTMB_sim(x = x, y = y, betas = 0, time = 1L,
  phi = 0.1, range = 1.4, sigma_O = 0.2, sigma_E = 0, mesh = spde)
s$g <- gl(50, 10)
iid_re_vals <- rnorm(50, 0, 0.3)
s$observed <- s$observed + iid_re_vals[s$g]

# Fit it:
m <- sdmTMB(observed ~ 1 + (1 | g), spde = spde, data = s)
print(m)
tidy(m, "ran_pars", conf.int = TRUE) # see tau_G
theta <- as.list(m$sd_report, "Estimate")
plot(iid_re_vals, theta$RE)
}
#> Spatial model fit by ML ['sdmTMB']
#> Formula: density ~ 0 + depth_scaled + depth_scaled2 + as.factor(year)
#> Time column: "year"
#> SPDE: pcod_spde
#> Data: pcod_2011
#> Family: tweedie(link = 'log')
#>                     coef.est coef.se
#> depth_scaled           -1.56    0.21
#> depth_scaled2          -1.09    0.10
#> as.factor(year)2011     3.70    0.30
#> as.factor(year)2013     3.89    0.30
#> as.factor(year)2015     3.74    0.30
#> as.factor(year)2017     3.02    0.32
#> 
#> Dispersion parameter: 13.99
#> Tweedie p: 1.57
#> Matern range: 3.70
#> Spatial SD: 7.86
#> Spatiotemporal SD: 7.24
#> ML criterion at convergence: 2948.031
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> Spatial model fit by ML ['sdmTMB']
#> Formula: present ~ 0 + as.factor(year) + depth_scaled + depth_scaled2
#> Time column: "year"
#> SPDE: pcod_spde
#> Data: pcod_binom
#> Family: binomial(link = 'logit')
#>                     coef.est coef.se
#> as.factor(year)2011     0.35    0.45
#> as.factor(year)2013     1.63    0.46
#> as.factor(year)2015     1.21    0.46
#> as.factor(year)2017     0.31    0.45
#> depth_scaled           -2.39    0.31
#> depth_scaled2          -1.72    0.18
#> 
#> Matern range: 14.16
#> Spatial SD: 4.30
#> Spatiotemporal SD: 1.13
#> ML criterion at convergence: 465.476
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> Spatial model fit by ML ['sdmTMB']
#> Formula: density ~ depth_scaled + depth_scaled2
#> Time column: NULL
#> SPDE: pcod_spde
#> Data: pcod_2011
#> Family: tweedie(link = 'log')
#>               coef.est coef.se
#> (Intercept)       3.73    0.27
#> depth_scaled     -1.52    0.21
#> depth_scaled2    -1.01    0.10
#> 
#> Dispersion parameter: 14.74
#> Tweedie p: 1.59
#> Matern range: 8.93
#> Spatial SD: 4.39
#> Spatiotemporal SD: not estimated
#> ML criterion at convergence: 2971.253
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> Spatial model fit by ML ['sdmTMB']
#> Formula: log(density) ~ 0 + as.factor(year) + depth_scaled + depth_scaled2
#> Time column: "year"
#> SPDE: pcod_spde_gaus
#> Data: pcod_gaus
#> Family: gaussian(link = 'identity')
#>                     coef.est coef.se
#> as.factor(year)2013     3.63    0.12
#> as.factor(year)2015     3.68    0.13
#> as.factor(year)2017     3.36    0.15
#> depth_scaled           -0.21    0.14
#> depth_scaled2          -0.39    0.10
#> 
#> Dispersion parameter: 1.35
#> Matern range: 0.22
#> Spatial SD: 0.01
#> Spatiotemporal SD: 0.01
#> ML criterion at convergence: 599.014
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> Warning: Detected a `s()` smoother. Smoothers are penalized in sdmTMB as
#> of version 0.0.19, but used to be unpenalized.
#> You no longer need to specify `k` since the degree of wiggliness
#> is determined by the data.
#> Warning: NaNs produced
#> Warning: The model may not have converged: non-positive-definite Hessian matrix.

#> Spatial model fit by ML ['sdmTMB']
#> Formula: observed ~ 1 + (1 | g)
#> Time column: NULL
#> SPDE: spde
#> Data: s
#> Family: gaussian(link = 'identity')
#>             coef.est coef.se
#> (Intercept)     0.16    0.38
#> 
#>               Std. Dev.
#> g (Intercept)      0.28
#> 
#> Dispersion parameter: 0.10
#> Matern range: 2.39
#> Spatial SD: 0.29
#> Spatiotemporal SD: not estimated
#> ML criterion at convergence: -273.194
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.


# \donttest{
if (inla_installed()) {

# Spatial-trend example:
m <- sdmTMB(density ~ depth_scaled, data = pcod_2011,
  spde = pcod_spde, family = tweedie(link = "log"),
  spatial_trend = TRUE, time = "year")
tidy(m, effects = "ran_par")

# Time-varying effects of depth and depth squared:
m <- sdmTMB(density ~ 0 + as.factor(year),
  time_varying = ~ 0 + depth_scaled + depth_scaled2,
  data = pcod_2011, time = "year", spde = pcod_spde,
  family = tweedie(link = "log"))
print(m)

# See the b_rw_t estimates; these are the time-varying (random walk) effects.
# These could be added to tidy.sdmTMB() eventually.
summary(m$sd_report)[1:19,]

# Linear breakpoint model on depth:
m_pos <- sdmTMB(log(density) ~ 0 + as.factor(year) +
    breakpt(depth_scaled) + depth_scaled2, data = pcod_gaus,
  time = "year", spde = pcod_spde_gaus)
print(m_pos)

# Linear covariate on log(sigma_epsilon):
# First we will center the years around their mean
# to help with convergence. Also constrain the slope to be (-1, 1)
# to help convergence (estimation is done in log-space)
pcod_2011$year_centered <- pcod_2011$year - mean(pcod_2011$year)
m <- sdmTMB(density ~ 0 + depth_scaled + depth_scaled2 + as.factor(year),
  data = pcod_2011, time = "year", spde = pcod_spde, family = tweedie(link = "log"),
  epsilon_predictor = "year_centered",
  control = sdmTMBcontrol(lower = list(b_epsilon = -1), upper = list(b_epsilon = 1)))
print(m) # sigma_E varies with time now

# coefficient is not yet in tidy.sdmTMB:
as.list(m$sd_report, "Estimate", report = TRUE)$b_epsilon
as.list(m$sd_report, "Std. Error", report = TRUE)$b_epsilon
}
#> Spatial model fit by ML ['sdmTMB']
#> Formula: density ~ 0 + as.factor(year)
#> Time column: "year"
#> SPDE: pcod_spde
#> Data: pcod_2011
#> Family: tweedie(link = 'log')
#>                     coef.est coef.se
#> as.factor(year)2011     4.03    0.30
#> as.factor(year)2013     3.67    0.29
#> as.factor(year)2015     3.97    0.31
#> as.factor(year)2017     3.38    0.33
#> 
#> Time-varying parameters:
#>                    coef.est coef.se
#> depth_scaled-2011     -0.82    0.16
#> depth_scaled-2013     -0.77    0.13
#> depth_scaled-2015     -0.71    0.14
#> depth_scaled-2017     -1.06    0.24
#> depth_scaled2-2011    -1.81    0.26
#> depth_scaled2-2013    -0.79    0.13
#> depth_scaled2-2015    -1.48    0.21
#> depth_scaled2-2017    -2.10    0.35
#> 
#> Dispersion parameter: 13.57
#> Tweedie p: 1.57
#> Matern range: 0.02
#> Spatial SD: 1571.16
#> Spatiotemporal SD: 1347.30
#> ML criterion at convergence: 2936.098
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> Warning: NaNs produced
#> Warning: The model may not have converged: non-positive-definite Hessian matrix.
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Warning: NaNs produced
#> Spatial model fit by ML ['sdmTMB']
#> Formula: log(density) ~ 0 + as.factor(year) + breakpt(depth_scaled) + 
#> Formula:     depth_scaled2
#> Time column: "year"
#> SPDE: pcod_spde_gaus
#> Data: pcod_gaus
#> Family: gaussian(link = 'identity')
#>                      coef.est coef.se
#> as.factor(year)2013      4.24    0.20
#> as.factor(year)2015      4.30    0.21
#> as.factor(year)2017      4.03    0.23
#> depth_scaled2            0.21    0.19
#> depth_scaled-slope       1.35    0.43
#> depth_scaled-breakpt    -0.72    0.14
#> 
#> Dispersion parameter: 1.33
#> Matern range: 0.21
#> Spatial SD: 0.01
#> Spatiotemporal SD: 0.01
#> ML criterion at convergence: 592.503
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> Setting lower limit for b_epsilon to -1.
#> Setting upper limit for b_epsilon to 1.
#> Spatial model fit by ML ['sdmTMB']
#> Formula: density ~ 0 + depth_scaled + depth_scaled2 + as.factor(year)
#> Time column: "year"
#> SPDE: pcod_spde
#> Data: pcod_2011
#> Family: tweedie(link = 'log')
#>                     coef.est coef.se
#> depth_scaled           -1.57    0.21
#> depth_scaled2          -1.10    0.10
#> as.factor(year)2011     3.67    0.32
#> as.factor(year)2013     3.89    0.31
#> as.factor(year)2015     3.74    0.30
#> as.factor(year)2017     3.04    0.31
#> 
#> Dispersion parameter: 13.98
#> Tweedie p: 1.57
#> Matern range: 4.02
#> Spatial SD: 7.40
#> Spatiotemporal SD: 7.40
#> Spatiotemporal SD: 6.83
#> Spatiotemporal SD: 6.31
#> Spatiotemporal SD: 5.83
#> ML criterion at convergence: 2947.949
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.
#> [1] 0.09951372
# }