Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Model selection with sdmTMB

NOAA PSAW Seminar Series

March 9, 2022

1 / 16

What is the goal of modeling?

  • Straight prediction?

    • in-sample, out-of-sample, both?
  • Parsimony?

    • balance of bias-variance tradeoff
  • Not all metrics appropriate for all applications

2 / 16

Too many metrics to discuss

  • Root mean squared error

  • AUC (see pseudo-absence example)

    • true and false positive rates
  • AIC: widely used tool in ecology

    • AIC=2logL+2K, where K is the number of parameters
    • designed for fixed effects models (Burnham and Anderson 2002)
3 / 16

AIC and likelihood with sdmTMB

  • AIC() and logLik() methods work, just like glm()
mesh <- make_mesh(pcod,
xy_cols = c("X", "Y"),
cutoff = 10
)
fit <- sdmTMB(
present ~ 1,
data = pcod,
mesh = mesh,
family = binomial(link = "logit"),
spatial = "on"
)
logLik(fit) # log likelihood
AIC(fit) # AIC
4 / 16

When to use restricted maximum likelihood (REML)

  • Integrates over random effects and fixed effects; sometimes helps convergence too

  • Can use REML when comparing different random effect structures

  • Don't use REML when comparing alternative fixed effects

  • Don't use REML for index standardization

fit <- sdmTMB(..., reml = FALSE)
5 / 16

Reminder: fixed and random effects in sdmTMB

Random effects include

  • random intercepts: (1 | group)
  • smooth terms: s(year)
  • time-varying coefficients
  • all random fields
  • spatially varying coefficients (also random fields)
6 / 16

Limitations of AIC

  • Originally conceived for fixed effects

    • Burnham and Anderson (2002)
  • Approximate & problematic for mixed effects

    • Vaida and Blanchard (2005)
    • Liang et al. (2008)
  • Great FAQ on glmmTMB by Ben Bolker

7 / 16

Predictive model selection

  • Ideal world: use cross validation to evaluate predictive accuracy

  • Split data into train / test sets

  • Objective function:

    • maximize log-likelihood of test data
8 / 16

ELPD

Expected log pointwise predictive density

Interpreted as theoretical expectation for new dataset

ELPD: log((exp(logL(y|θ))))N,

where: y are left-out data, θ are parameters, and N is the number of left-out data points

Vehtari, Gelman, and Gabry (2017)

9 / 16

Cross validation in sdmTMB

fit <- sdmTMB_cv(
present ~ 1,
data = pcod,
mesh = mesh,
family = binomial(link = "logit"),
k_folds = 8,
fold_ids = NULL,
parallel = TRUE,
use_initial_fit = FALSE
)
  • More folds = more computation time
  • use_initial_fit = TRUE
    • fits first fold, and initializes subsequent model fits from that
10 / 16

Cross validation in sdmTMB

sdmTMB_cv() returns:

  • A list of models (each sdmTMB() object)

  • fold_loglik: sum of held out likelihoods for each fold

  • sum_loglik: sum across fold_loglik, or all data

  • fold_elpd: expected log predictive density for each fold

  • elpd: Expected log predictive density across all data

11 / 16

How to choose folds? How many?

  • Words of wisdom:

    • Can be highly variable based on data, folds, sampling scheme
  • Spatial sampling or random?

  • How to sample with time / years?

    • LOOCV (leave-one-out...) vs. LFOCV (leave-future-out...)
  • sdmTMB_cv() does random fold assignment

    • Custom folds can be specified with fold_ids
12 / 16

Basic cross validation example

Set a future::plan(); the folds would be fit in parallel

library(future)
plan(multisession)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
m_cv <- sdmTMB_cv(
density ~ s(depth, k = 5), data = pcod, mesh = mesh,
family = tweedie(link = "log"),
k_folds = 2
)
# Sum of log likelihoods of left-out data:
m_cv$sum_loglik
#> [1] -12541.86
# Expected log pointwise predictive density
# of left-out data:
m_cv$elpd
#> [1] -1.00918
13 / 16

Ensembling

Rather than choose best model, we can do model averaging

  • Ensemble may perform better than any single model

  • AIC weights (please don't!)

  • Stacking

14 / 16

Stacking

  • Borrowed from Bayesian literature, e.g. Yao et al. (2018)

  • Given set of models, m1,m2,...,mM

  • Estimate weights to maximize combined predictive density

p(y|y)=Mi=1wip(y|y,mi) y is new data (e.g. test); Mi=1wi=1

15 / 16

Stacking in sdmTMB

sdmTMB_stacking(model_list,
include_folds = NULL
)
  • model_list is a collection of models fit with sdmTMB_cv

  • optim() is used to find the best weights wi to maximize the combined predictive density

  • Not in main branch yet!

16 / 16

What is the goal of modeling?

  • Straight prediction?

    • in-sample, out-of-sample, both?
  • Parsimony?

    • balance of bias-variance tradeoff
  • Not all metrics appropriate for all applications

2 / 16
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow