Introduction to sdmTMB

# Introduction to sdmTMB
## NOAA PSAW Seminar Series
### March 2, 2022

---

# sdmTMB origins

A need for reproducible code and methods for assessments [(Anderson et al. 2019)](https://www.dfo-mpo.gc.ca/csas-sccs/Publications/ResDocs-DocRech/2019/2019_041-eng.html)

---

# sdmTMB highlights

sdmTMB is a user-friendly R package for modeling spatial processes

* Familiar syntax to widely used functions/packages; `glm()`, mgcv, glmmTMB, etc.
  
* Performs fast (marginal) maximum likelihood estimation via TMB
  
* Widely applicable to species distribution modelling, stock assessment inputs, bycatch models, etc. that include spatially referenced data

---

# Installing sdmTMB

Currently:

* Install a C++ compiler
  * e.g., [Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools40.html) on Windows
  * e.g., Xcode command line tools on a Mac:  
  `xcode-select --install`

```r
# install.packages("remotes")
remotes::install_github("pbs-assess/sdmTMB", 
  dependencies = TRUE)
```
]

Should be on CRAN soon so the C++ compiler won't be necessary.

---

# Installing sdmTMB

Warning: The INLA package is large, not on CRAN, and can be slow to install. If you run into problems, try installing it separately first:

<https://www.r-inla.org/download-install>

---

# sdmTMB workflow

1. Prepare data .xsmall[(convert to UTMs, scale covariates, ...)]

2. Construct a mesh

3. Fit the model

4. Inspect the model .xsmall[(and possibly refit the model)]

5. Predict from the model

6. Calculate any derived quantities

---

# sdmTMB workflow

1. Prepare data: .blue[`add_utm_columns()`]

2. Construct a mesh: .blue[`make_mesh()`]

3. Fit the model: .blue[`sdmTMB()`]

4. Inspect the model: .blue[`print()`], .blue[`tidy()`], .blue[`residuals()`]

5. Predict from the model: .blue[`predict()`]

6. Get derived quantities: .blue[`get_index()`], .blue[`get_cog()`]

---

# Preparing data: getting into constant distance coordinates

---

# Getting into UTMs

Need a projection that preserves constant distance

UTMs work well for regional analyses

Helper function: `sdmTMB::add_utm_columns()`

Internally, guesses at UTM zone and uses the sf package:  
`sf::st_as_sf()`, `sf::st_transform()`, and `sf::st_coordinates()`

Or use the sf or sp packages yourself

---

# Example of adding UTM columns

```r
d <- data.frame(
  lat = c(52.1, 53.4), 
  lon = c(-130.0, -131.4)
)
d <- sdmTMB::add_utm_columns(d, c("lon", "lat"))
#> Detected UTM zone 9N; CRS = 32609.
#> Visit https://epsg.io/32609 to verify.
d
#>    lat    lon        X        Y
#> 1 52.1 -130.0 431.5034 5772.632
#> 2 53.4 -131.4 340.4411 5919.452
```

* Note default `units = "km"`
* Why? Range parameter estimated in units of X and Y
* Should be not too big or small for estimation

---

# Constructing a mesh

---

# Constructing a mesh

`make_mesh()` has 2 shortcuts to mesh construction

1. K-means algorithm: used to cluster points (e.g., `n_knots = 100`); approach taken in VAST; sensitive to random `seed` argument!

2. Cutoff: minimum allowed distance between vertices (e.g., `cutoff = 10`)
  
Alternatively, build any INLA mesh and supply it to the `mesh` argument in `make_mesh()`

---

# Constructing a mesh

`cutoff` is in units of x and y (minimum triangle size)
]

```r
d <- data.frame(x = runif(500), y = runif(500))
mesh <- make_mesh(d, xy_cols = c("x", "y"), cutoff = 0.1)
mesh$mesh$n
#> [1] 91
plot(mesh)
```

<img src="02-intro-sdmTMB_files/figure-html/make-mesh-1.png" width="280px" style="display: block; margin: auto;" />
]

---

# Fitting the model: sdmTMB()

---

# sdmTMB()

Set up is similar to glmmTMB(). Common arguments:

```r
fit <- sdmTMB(
  formula,
  data,
  mesh,
  time = NULL,
  family = gaussian(link = "identity"),
  spatial = c("on", "off"),
  spatiotemporal = c("iid", "ar1", "rw", "off"),
  silent = TRUE,
  ...
)
```

See `?sdmTMB`

---

# Syntax: formulas for non-spatial model components

---

# Formula interface

sdmTMB uses a similar formula interface to widely used R packages

A formula is used to specify fixed effects and (optionally) random intercepts

```r
# linear effect of x1:
formula = y ~ x1

# add smoother effect of x2:
formula = y ~ x1 + s(x2)

# add random intercept by group g:
formula = y ~ x1 + s(x2) + (1 | g)
```
]

---

# Smoothers (as in mgcv)

```r
# smoother effect of x:
formula = y ~ s(x)

# basis dimension of 5:
formula = y ~ s(x, k = 5)

# bivariate smoother effect of x & y:
formula = y ~ s(x, y)

# smoother effect of x1 varying by x2:
formula = y ~ s(x1, by = x2)

# other kinds of mgcv smoothers:
formula = ~ s(month, bs = "cc", k = 12)
```

Smoothers are penalized ('p-splines'), i.e. data determine 'wiggliness'
]

---

# Other common R formula options

Polynomials and omitting the intercept:

```r
# transformations using `I()` notation
y ~ depth + I(depth^2)

# polynomial functions using `poly`
y ~ poly(depth, degree = 2)

# omit intercept
y ~ -1 + as.factor(year)
y ~ 0 + as.factor(year)
```

---

# Breakpoint functions for threshold analyses

```r
cpue ~ breakpt(temperature)
```

---

# Logistic functions for threshold analyses

```r
cpue ~ logistic(temperature)
```

---

# Syntax: families and links

---

# Families

Many of the same families used in `glm()`, `glmmTMB()`, `mgcv::gam()` can be used here

Includes:
  `gaussian()`, `Gamma()`, 
  `binomial()`, `poisson()`, `Beta()`, 
  `student()`, `tweedie()`, 
  `nbinom1()`, `nbinom2()`,
  `truncated_nbinom1()`, `truncated_nbinom2()`
  
All have `link` arguments
  
See `?sdmTMB::Families`

---

# An aside on the Tweedie

Dispersion ( `$\phi$` ) and power ( `$p$` ) parameters allow for a wide variety of shapes including many zeros

Also known as compound Poisson-Gamma distribution
]

---

# Syntax: spatial model components

---

# Spatial vs. spatiotemporal fields

* A spatial field can be thought of as a spatial intercept

* a wiggly spatial process that is constant in time
  
--

* Spatiotemporal variation represents separate fields estimated for each time
  slice (possibly correlated)
  
  * wiggly spatial processes that change through time

* Refer to [model description](https://pbs-assess.github.io/sdmTMB/articles/model-description.html) vignette for math notation

---

# Spatial fields can be turned on/off

* By default `sdmTMB()` estimates a spatial field

```r
fit <- sdmTMB(
  y ~ x,
  family = gaussian(),
  data = dat,
  mesh = mesh,
* spatial = "on",
  ...
)
```

---

# Why *not* estimate a spatial field?

* If shared process across time slices isn't of interest

* If magnitude of spatiotemporal variability >> spatial variation

* If confounded with other parameters (more later)

---

# Spatiotemporal fields can be turned on/off

* By default `sdmTMB()` estimates a spatiotemporal field if the `time` argument is specified

```r
fit <- sdmTMB(
  y ~ x,
  family = gaussian(),
  data = dat,
  mesh = mesh,
* time = "year", # column in `data`
* spatiotemporal = "iid",
  ...
)
```

---

# Types of spatiotemporal fields

* None (`spatiotemporal = "off"`)

* Independent (`spatiotemporal = "iid"`)

* Random walk (`spatiotemporal = "rw"`)

* Autoregressive (`spatiotemporal = "ar1"`)

---

# Independent (IID) spatiotemporal fields

---

# AR1 spatiotemporal fields

---

# Spatiotemporal fields

* Why include spatiotemporal fields?

* If the data are collected in both space and time *and* 
    there are 'latent' spatial processes that vary through time
  * E.g., effect of water temperature on abundance if temperature wasn't in the model
  * Represents all the missing variables that vary through time

* Why would a field be IID vs RW/AR1?

* Do we expect hotspots to be independent with each time slice or adapt slowly over time?

---

# After model fitting

Inspecting, summarizing, predicting, etc.

Covered in examples in next slides.

* `predict()`: `?predict.sdmTMB`
* `residuals()`: `?residuals.sdmTMB`
* `print()`
* `tidy()`: `?tidy.sdmTMB`
* `get_index()`
*  `...`