class: center, middle, inverse, title-slide .title[ # Introduction to sdmTMB ] .subtitle[ ## IMR sdmTMB workshop ] .author[ ### ] .date[ ### May 23–25 2023 ] --- <!-- Build with: xaringan::inf_mr() --> # sdmTMB origins A need for reproducible code and methods for assessments [(Anderson et al. 2019)](https://www.dfo-mpo.gc.ca/csas-sccs/Publications/ResDocs-DocRech/2019/2019_041-eng.html) .center[ <img src="images/yellowtail_synopsis.png" width="450px" height = "400px"/> ] --- # sdmTMB highlights sdmTMB is a user-friendly R package for modeling spatial processes * Familiar syntax to widely used functions/packages; `glm()`, mgcv, glmmTMB, etc. * Performs fast (marginal) maximum likelihood estimation via TMB * Widely applicable to species distribution modelling, stock assessment inputs, bycatch models, etc. that include spatially referenced data --- # Installing sdmTMB From CRAN: ```r install.packages("sdmTMB", dependencies = TRUE) ``` From source on GitHub: * Install a C++ compiler * e.g., [Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools40.html) on Windows * e.g., Xcode command line tools on a Mac: `xcode-select --install` .small[ ```r # install.packages("remotes") remotes::install_github("pbs-assess/sdmTMB", dependencies = TRUE) ``` ] --- # Installing sdmTMB Warning: The INLA package is large, not on CRAN, and can be slow to install. If you run into problems, try installing it separately first: <https://www.r-inla.org/download-install> --- # General sdmTMB workflow 1. Prepare data .xsmall[(convert to UTMs, scale covariates, ...)] 2. Construct a mesh 3. Fit the model 4. Inspect the model .xsmall[(and possibly refit the model)] 5. Predict from the model 6. Calculate any derived quantities --- # General sdmTMB workflow 1. Prepare data: .blue[`add_utm_columns()`] 2. Construct a mesh: .blue[`make_mesh()`] 3. Fit the model: .blue[`sdmTMB()`] 4. Inspect the model: .blue[`print()`], .blue[`sanity()`], .blue[`tidy()`], .blue[`residuals()`] 5. Predict from the model: .blue[`predict()`] 6. Get derived quantities: .blue[`get_index()`] <!-- <img src="images/sdmTMB_workflow.png" height = "500px" width="900px" /> --> --- class: center, middle, inverse # Preparing the data --- # The usual things Prepare the data in wide format (i.e. row = observation) Make sure there is no **NA** in the used covariates **NA** in the response value is OK (internally checked) --- # Preparing the "space": the use of UTMs Need a projection that preserves constant distance UTMs work well for regional analyses Helper function: `sdmTMB::add_utm_columns()` Internally, guesses at UTM zone and uses the sf package: `sf::st_as_sf()`, `sf::st_transform()`, and `sf::st_coordinates()` Or use the sf or sp packages yourself --- # Example of adding UTM columns ```r d <- data.frame( lat = c(52.1, 53.4), lon = c(-130.0, -131.4) ) d <- sdmTMB::add_utm_columns(d, c("lon", "lat")) #> Detected UTM zone 9N; CRS = 32609. #> Visit https://epsg.io/32609 to verify. d #> lat lon X Y #> 1 52.1 -130.0 431.5034 5772.632 #> 2 53.4 -131.4 340.4411 5919.452 ``` * Note default `units = "km"` * Why? Range parameter estimated in units of X and Y * Should be not too big or small for estimation --- class: center, middle, inverse # Constructing a mesh --- # Constructing a mesh `make_mesh()` has 2 shortcuts to mesh construction 1. K-means algorithm: used to cluster points (e.g., `n_knots = 100`); approach taken in VAST; sensitive to random `seed` argument! 2. Cutoff: minimum allowed distance between vertices (e.g., `cutoff = 10`) - **_the preferred option_** Alternatively, build any INLA mesh and supply it to the `mesh` argument in `make_mesh()` --- # Constructing a mesh .small[ Size of mesh has the single largest impact on fitting speed `cutoff` is in units of x and y (minimum triangle size) ] .small[ ```r d <- data.frame(x = runif(500), y = runif(500)) mesh <- make_mesh(d, xy_cols = c("x", "y"), cutoff = 0.1) mesh$mesh$n #> [1] 89 plot(mesh) ``` <img src="02-intro-sdmTMB_files/figure-html/make-mesh-1.png" width="280px" style="display: block; margin: auto;" /> ] --- class: center, middle, inverse # Fitting the model: sdmTMB() --- # sdmTMB() Set up is similar to `glmmTMB()`. Common arguments: ```r fit <- sdmTMB( formula, data, mesh, time = NULL, family = gaussian(link = "identity"), spatial = c("on", "off"), spatiotemporal = c("iid", "ar1", "rw", "off"), silent = TRUE, ... ) ``` See `?sdmTMB` --- class: center, middle, inverse # Syntax: formulas for non-spatial model components --- # Formula interface sdmTMB uses a similar formula interface to widely used R packages A formula is used to specify fixed effects and (optionally) random intercepts .small[ ```r # linear effect of x1: formula = y ~ x1 # add smoother effect of x2: formula = y ~ x1 + s(x2) # add random intercept by group g: formula = y ~ x1 + s(x2) + (1 | g) ``` ] --- # Smoothers (as in mgcv) .small[ ```r # smoother effect of x: formula = y ~ s(x) # basis dimension of 5: formula = y ~ s(x, k = 5) # bivariate smoother effect of x & y: formula = y ~ s(x, y) # smoother effect of x1 varying by x2: formula = y ~ s(x1, by = x2) # other kinds of mgcv smoothers: formula = ~ s(month, bs = "cc", k = 12) ``` Smoothers are penalized ('p-splines'), i.e. data determine 'wiggliness' ] --- # Other common R formula options Polynomials and omitting the intercept: ```r # transformations using `I()` notation y ~ depth + I(depth^2) # polynomial functions using `poly` y ~ poly(depth, degree = 2) # omit intercept y ~ -1 + as.factor(year) y ~ 0 + as.factor(year) ``` --- # Breakpoint functions for threshold analyses ```r cpue ~ breakpt(temperature) ``` <img src="02-intro-sdmTMB_files/figure-html/make-breakpt-1.png" width="400px" style="display: block; margin: auto;" /> .tiny[ Essington, T.E. S.C. Anderson, L.A.K. Barnett, H.M. Berger, S.A. Siedlecki, E.J. Ward. Advancing statistical models to reveal the effect of dissolved oxygen on the spatial distribution of marine taxa using thresholds and a physiologically based index. 2022. Ecography 2022(8): e06249. <https://doi.org/10.1111/ecog.06249> ] --- # Logistic functions for threshold analyses ```r cpue ~ logistic(temperature) ``` <img src="02-intro-sdmTMB_files/figure-html/make-logistic-1.png" width="500px" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Syntax: families and links --- # Families Many of the same families used in `glm()`, `glmmTMB()`, `mgcv::gam()` can be used here Includes: `gaussian()`, `Gamma()`, `binomial()`, `poisson()`, `Beta()`, `student()`, `tweedie()`, `nbinom1()`, `nbinom2()`, `truncated_nbinom1()`, `truncated_nbinom2()`, `delta_gamma()`, `delta_lognormal()`, `delta_beta()`, and more... All have `link` arguments See `?sdmTMB::Families` --- # An aside on the Tweedie .small[ Useful for positive continuous data with zeros (e.g., biomass density per unit effort) Dispersion ( `\(\phi\)` ) and power ( `\(p\)` ) parameters allow for a wide variety of shapes including many zeros Also known as compound Poisson-Gamma distribution ] <img src="02-intro-sdmTMB_files/figure-html/sim-tweedie-1.png" width="400px" style="display: block; margin: auto;" /> --- # An aside on the student-t .small[ Useful for continuous data with extreme values. Can also be used to deal with positive data when using _log_ `link` `df` parameters controls for the amount of tail in the distribution i.e. degree of "extreme" values P.S: beware when df << 3. All noise is assimilated in the "observation error" ] .tiny[ Anderson, S. C., Branch, T. A., Cooper, A. B., and Dulvy, N. K. 2017. Black-swan events in animal populations. Proceedings of the National Academy of Sciences, 114: 3252–3257. http://www.pnas.org/lookup/doi/10.1073/pnas.16115. ] --- # An aside on delta models * Delta/hurdle model has 2 sub-models: - presence/absence (binomial, logit link) - positive model (link varies by family) * `family` argument to sdmTMB can be a list() - for convenience, many delta- families implemented: `delta_gamma`, `delta_lognormal`, `delta_truncated_nbinom2` etc * More detail coming on day 3 --- # An aside on mixture models Positive components may be modeled as a mixture of 2 distributions * Finite mixture model (2 components) * Also referred to as "ECE" (extreme catch event) model, Thorson et al. (2012) * Mechanisms: shoaling, etc. * See `gamma_mix()` and `lognormal_mix()` * But requires data --- # Decision tree for family choice .center[ <img src="images/family_diagram_highlighted.png" width="800px" height = "500px"/> ] --- class: center, middle, inverse # Syntax: spatial model components --- # Spatial vs. spatiotemporal fields - notion * A spatial field can be thought of as a spatial intercept * a wiggly spatial process that is constant in time. e.g. areas that has on average a higher/lower animal density, constant "hot/cold-spot" -- * Spatiotemporal variation represents separate fields estimated for each time slice (possibly correlated) * wiggly spatial processes that change through time. e.g. inter-annual variability in "hot/cold-spot" locations -- * See [model description](https://pbs-assess.github.io/sdmTMB/articles/model-description.html) vignette for math notation --- # Spatial fields can be turned on/off * By default `sdmTMB()` estimates a spatial field ```r fit <- sdmTMB( y ~ x, family = gaussian(), data = dat, mesh = mesh, * spatial = "on", ... ) ``` --- # Why *not* estimate a spatial field? * If shared process across time slices isn't of interest * If magnitude of spatiotemporal variability >> spatial variation * If confounded with other parameters (more later) --- # Spatiotemporal fields can be turned on/off * By default `sdmTMB()` estimates a spatiotemporal field if the `time` argument is specified ```r fit <- sdmTMB( y ~ x, family = gaussian(), data = dat, mesh = mesh, * time = "year", # column in `data` * spatiotemporal = "iid", ... ) ``` --- # Types of spatiotemporal fields * None (`spatiotemporal = "off"`) * Independent (`spatiotemporal = "iid"`) * Random walk (`spatiotemporal = "rw"`) * Autoregressive (`spatiotemporal = "ar1"`) --- # Independent (IID) spatiotemporal fields .small[ Useful if pattern changes much between years ] <img src="02-intro-sdmTMB_files/figure-html/iid-demo-1.png" width="700px" style="display: block; margin: auto;" /> --- # AR1 spatiotemporal fields .small[ Useful if pattern are related between years. P.S: Random walk = AR1 with 1.0 correlation ] <img src="02-intro-sdmTMB_files/figure-html/ar1-demo-1.png" width="700px" style="display: block; margin: auto;" /> --- # Spatiotemporal fields * Why include spatiotemporal fields? * If the data are collected in both space and time *and* there are 'latent' spatial processes that vary through time * E.g., effect of water temperature on abundance if temperature wasn't in the model * Represents all the missing variables that vary through time -- * Why would a field be IID vs RW/AR1? * Do we expect hotspots to be independent with each time slice or adapt slowly over time? --- # After model fitting (more tomorrow) Inspecting, summarizing, predicting, etc. Covered in examples in tomorrow's slides. * `predict()`: `?predict.sdmTMB` * `residuals()`: `?residuals.sdmTMB` * `tidy()`: `?tidy.sdmTMB` * `print()` * `sanity()` * `get_index()` * `...`