Predict from an sdmTMB model — predict.sdmTMB • sdmTMB

Make predictions from an sdmTMB model; can predict on the original or new data.

Usage

# S3 method for class 'sdmTMB'
predict(
  object,
  newdata = NULL,
  type = c("link", "response"),
  se_fit = FALSE,
  re_form = NULL,
  re_form_iid = NULL,
  nsim = 0,
  sims_var = "est",
  model = c(NA, 1, 2),
  offset = NULL,
  mcmc_samples = NULL,
  return_tmb_object = FALSE,
  return_tmb_report = FALSE,
  return_tmb_data = FALSE,
  tmbstan_model = deprecated(),
  sims = deprecated(),
  area = deprecated(),
  ...
)

Arguments

object: A model fitted with sdmTMB().
newdata: A data frame to make predictions on. This should be a data frame with the same predictor columns as in the fitted data and a time column (if this is a spatiotemporal model) with the same name as in the fitted data.
type: Should the est column be in link (default) or response space?
se_fit: Should standard errors on predictions at the new locations given by newdata be calculated? Warning: the current implementation can be slow for large data sets or high-resolution projections unless re_form = NA (omitting random fields). A faster option to approximate point-wise uncertainty is to use the nsim argument.
re_form: NULL to specify including all spatial/spatiotemporal random effects in predictions. ~0 or NA for population-level predictions. Likely to be used in conjunction with se_fit = TRUE. This does not affect get_index() calculations.
re_form_iid: NULL to specify including all random intercepts in the predictions. ~0 or NA for population-level predictions. No other options (e.g., some but not all random intercepts) are implemented yet. Only affects predictions with newdata. This does affects get_index().
nsim: If > 0, simulate from the joint precision matrix with nsim draws. Returns a matrix of nrow(data) by nsim representing the estimates of the linear predictor (i.e., in link space). Can be useful for deriving uncertainty on predictions (e.g., apply(x, 1, sd)) or propagating uncertainty. This is currently the fastest way to characterize uncertainty on predictions in space with sdmTMB.
sims_var: Experimental: Which TMB reported variable from the model should be extracted from the joint precision matrix simulation draws? Defaults to link-space predictions. Options include: "omega_s", "zeta_s", "epsilon_st", and "est_rf" (as described below). Other options will be passed verbatim.
model: Type of prediction if a delta/hurdle model and nsim > 0 or mcmc_samples is supplied: NA returns the combined prediction from both components on the link scale for the positive component; 1 or 2 return the first or second model component only on the link or response scale depending on the argument type. For regular prediction from delta models, both sets of predictions are returned.
offset: A numeric vector of optional offset values. If left at default NULL, the offset is implicitly left at 0.
mcmc_samples: See extract_mcmc() in the sdmTMBextra package for more details and the Bayesian vignette. If specified, the predict function will return a matrix of a similar form as if nsim > 0 but representing Bayesian posterior samples from the Stan model.
return_tmb_object: Logical. If TRUE, will include the TMB object in a list format output. Necessary for the get_index() or get_cog() functions.
return_tmb_report: Logical: return the output from the TMB report? For regular prediction, this is all the reported variables at the MLE parameter values. For nsim > 0 or when mcmc_samples is supplied, this is a list where each element is a sample and the contents of each element is the output of the report for that sample.
return_tmb_data: Logical: return formatted data for TMB? Used internally.
tmbstan_model: Deprecated. See mcmc_samples.
sims: Deprecated. Please use nsim instead.
area: Deprecated. Please use area in get_index().
...: Not implemented.

Value

If return_tmb_object = FALSE (and nsim = 0 and mcmc_samples = NULL):

A data frame:

est: Estimate in link space (everything is in link space)
est_non_rf: Estimate from everything that isn't a random field
est_rf: Estimate from all random fields combined
omega_s: Spatial (intercept) random field that is constant through time
zeta_s: Spatial slope random field
epsilon_st: Spatiotemporal (intercept) random fields, could be off (zero), IID, AR1, or random walk

If return_tmb_object = TRUE (and nsim = 0 and mcmc_samples = NULL):

A list:

data: The data frame described above
report: The TMB report on parameter values
obj: The TMB object returned from the prediction run
fit_obj: The original TMB model object

In this case, you likely only need the data element as an end user. The other elements are included for other functions.

If nsim > 0 or mcmc_samples is not NULL:

A matrix:

Columns represent samples
Rows represent predictions with one row per row of newdata

Examples


d <- pcod_2011
mesh <- make_mesh(d, c("X", "Y"), cutoff = 30) # a coarse mesh for example speed
m <- sdmTMB(
 data = d, formula = density ~ 0 + as.factor(year) + depth_scaled + depth_scaled2,
 time = "year", mesh = mesh, family = tweedie(link = "log")
)

# Predictions at original data locations -------------------------------

predictions <- predict(m)
head(predictions)
#> # A tibble: 6 × 17
#>    year     X     Y depth density present   lat   lon depth_mean depth_sd
#>   <int> <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl> <dbl>      <dbl>    <dbl>
#> 1  2011  435. 5718.   241    245.       1  51.6 -130.       5.16    0.445
#> 2  2011  487. 5719.    52      0        0  51.6 -129.       5.16    0.445
#> 3  2011  490. 5717.    47      0        0  51.6 -129.       5.16    0.445
#> 4  2011  545. 5717.   157      0        0  51.6 -128.       5.16    0.445
#> 5  2011  404. 5720.   398      0        0  51.6 -130.       5.16    0.445
#> 6  2011  420. 5721.   486      0        0  51.6 -130.       5.16    0.445
#> # ℹ 7 more variables: depth_scaled <dbl>, depth_scaled2 <dbl>, est <dbl>,
#> #   est_non_rf <dbl>, est_rf <dbl>, omega_s <dbl>, epsilon_st <dbl>

predictions$resids <- residuals(m) # randomized quantile residuals
#> Note what used to be the default sdmTMB residuals (before version 0.4.3.9005)
#> are now `type = 'mle-eb'`. We recommend using the current default `'mle-mvn'`,
#> which takes one sample from the approximate posterior of the random effects or
#> `dharma_residuals()` using a similar approach.

library(ggplot2)
ggplot(predictions, aes(X, Y, col = resids)) + scale_colour_gradient2() +
  geom_point() + facet_wrap(~year)

hist(predictions$resids)

qqnorm(predictions$resids);abline(a = 0, b = 1)


# Predictions onto new data --------------------------------------------

qcs_grid_2011 <- replicate_df(qcs_grid, "year", unique(pcod_2011$year))
predictions <- predict(m, newdata = qcs_grid_2011)

# \donttest{
# A short function for plotting our predictions:
plot_map <- function(dat, column = est) {
  ggplot(dat, aes(X, Y, fill = {{ column }})) +
    geom_raster() +
    facet_wrap(~year) +
    coord_fixed()
}

plot_map(predictions, exp(est)) +
  scale_fill_viridis_c(trans = "sqrt") +
  ggtitle("Prediction (fixed effects + all random effects)")


plot_map(predictions, exp(est_non_rf)) +
  ggtitle("Prediction (fixed effects and any time-varying effects)") +
  scale_fill_viridis_c(trans = "sqrt")


plot_map(predictions, est_rf) +
  ggtitle("All random field estimates") +
  scale_fill_gradient2()


plot_map(predictions, omega_s) +
  ggtitle("Spatial random effects only") +
  scale_fill_gradient2()


plot_map(predictions, epsilon_st) +
  ggtitle("Spatiotemporal random effects only") +
  scale_fill_gradient2()


# Visualizing a marginal effect ----------------------------------------

# See the visreg package or the ggeffects::ggeffect() or
# ggeffects::ggpredict() functions
# To do this manually:

nd <- data.frame(depth_scaled =
  seq(min(d$depth_scaled), max(d$depth_scaled), length.out = 100))
nd$depth_scaled2 <- nd$depth_scaled^2

# Because this is a spatiotemporal model, you'll need at least one time
# element. If time isn't also a fixed effect then it doesn't matter what you pick:
nd$year <- 2011L # L: integer to match original data
p <- predict(m, newdata = nd, se_fit = TRUE, re_form = NA)
ggplot(p, aes(depth_scaled, exp(est),
  ymin = exp(est - 1.96 * est_se), ymax = exp(est + 1.96 * est_se))) +
  geom_line() + geom_ribbon(alpha = 0.4)


# Plotting marginal effect of a spline ---------------------------------

m_gam <- sdmTMB(
 data = d, formula = density ~ 0 + as.factor(year) + s(depth_scaled, k = 5),
 time = "year", mesh = mesh, family = tweedie(link = "log")
)
if (require("visreg", quietly = TRUE)) {
  visreg::visreg(m_gam, "depth_scaled")
}


# or manually:
nd <- data.frame(depth_scaled =
  seq(min(d$depth_scaled), max(d$depth_scaled), length.out = 100))
nd$year <- 2011L
p <- predict(m_gam, newdata = nd, se_fit = TRUE, re_form = NA)
ggplot(p, aes(depth_scaled, exp(est),
  ymin = exp(est - 1.96 * est_se), ymax = exp(est + 1.96 * est_se))) +
  geom_line() + geom_ribbon(alpha = 0.4)


# Forecasting ----------------------------------------------------------
mesh <- make_mesh(d, c("X", "Y"), cutoff = 15)

unique(d$year)
#> [1] 2011 2013 2015 2017
m <- sdmTMB(
  data = d, formula = density ~ 1,
  spatiotemporal = "AR1", # using an AR1 to have something to forecast with
  extra_time = 2019L, # `L` for integer to match our data
  spatial = "off",
  time = "year", mesh = mesh, family = tweedie(link = "log")
)

# Add a year to our grid:
grid2019 <- qcs_grid_2011[qcs_grid_2011$year == max(qcs_grid_2011$year), ]
grid2019$year <- 2019L # `L` because `year` is an integer in the data
qcsgrid_forecast <- rbind(qcs_grid_2011, grid2019)

predictions <- predict(m, newdata = qcsgrid_forecast)
plot_map(predictions, exp(est)) +
  scale_fill_viridis_c(trans = "log10")

plot_map(predictions, epsilon_st) +
  scale_fill_gradient2()


# Estimating local trends ----------------------------------------------

d <- pcod
d$year_scaled <- as.numeric(scale(d$year))
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
m <- sdmTMB(data = d, formula = density ~ depth_scaled + depth_scaled2,
  mesh = mesh, family = tweedie(link = "log"),
  spatial_varying = ~ 0 + year_scaled, time = "year", spatiotemporal = "off")
nd <- replicate_df(qcs_grid, "year", unique(pcod$year))
nd$year_scaled <- (nd$year - mean(d$year)) / sd(d$year)
p <- predict(m, newdata = nd)

plot_map(subset(p, year == 2003), zeta_s_year_scaled) + # pick any year
  ggtitle("Spatial slopes") +
  scale_fill_gradient2()


plot_map(p, est_rf) +
  ggtitle("Random field estimates") +
  scale_fill_gradient2()


plot_map(p, exp(est_non_rf)) +
  ggtitle("Prediction (fixed effects only)") +
  scale_fill_viridis_c(trans = "sqrt")


plot_map(p, exp(est)) +
  ggtitle("Prediction (fixed effects + all random effects)") +
  scale_fill_viridis_c(trans = "sqrt")

# }