class: center, middle, inverse, title-slide .title[ # Spatial modeling of presence-only data ] .subtitle[ ## DFO TESA sdmTMB workshop ] .author[ ### ] .date[ ### January 16–19 2023 ] --- <!-- Build with: xaringan::inf_mr() --> # Spatial patterning of trees: bei dataset * Barro Colorado Island = classic in ecology * Hubbell's (2001) unified neutral theory of biodiversity <img src="13-presence-only_files/figure-html/plot-trees-1.png" width="700px" style="display: block; margin: auto;" /> --- # Pseudo-absences * Georeferenced locations of trees, no 0s * Need to generate 0s: how? * quadrature points ([Renner et al. 2015](https://doi.org/10.1111/2041-210X.12352)) * Strategy? * generate regularly spaced or random? * Large enough so that predictive performance does not change as more are added --- # Pseudo-absences from sdmTMB * Example of uniform grid strategy ```r res <- 5 zeros <- expand.grid( x = seq(0, 1000, by = res), y = seq(0, 500, by = res) ) ``` --- # Bind the observed and pseudo-zeros together .small[ ```r dat$present <- 1 zeros$present <- 0 all_dat <- rbind(dat, zeros) mesh <- make_mesh( all_dat, xy_cols = c("x", "y"), cutoff = 25 # min. distance ) mesh$mesh$n # extract number of vertices/knots #> [1] 678 ``` ] --- # Combined data * blue dots are data; red grid dots are quadrature points * grey triangles are from the SPDE mesh .small[ <img src="13-presence-only_files/figure-html/mesh-viz-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Infinitely Weighted Logistic Regression (IWLR) * [Fithian & Hastie (2013)](https://doi.org/10.1214/13-AOAS667) ```r nW <- 1.0e6 all_dat$wt <- nW^(1 - all_dat$present) ``` * weights can be passed into model of choice * `glm()`, `glmmTMB()`, etc. * adding random fields makes this a "spatial log-Gaussian Cox process" --- # IWLR & sdmTMB * convergence may be affected by size of pseudo-absences * intercept and log-likelihood affected by `nW` ```r fit <- sdmTMB( present ~ 1, data = all_dat, mesh = mesh, family = binomial(link = "logit"), * weights = all_dat$wt ) ``` --- # Downweighted Poisson Regression (DWPR) * Similar to IWLR, uses different weights * Doesn't have same arbitrary effects on intercept, likelihood ```r # small values at presence locations all_dat$wt <- 1e-6 # pseudo-absences: area per quadrature point tot_area <- diff(range(dat$x)) * diff(range(dat$y)) n_zeros <- length(which(all_dat$present == 0)) all_dat$wt <- ifelse(all_dat$present == 1, 1e-6, tot_area / n_zeros ) ``` --- # DWPR & sdmTMB ```r fit <- sdmTMB( present / wt ~ 1, data = all_dat, mesh = mesh, family = poisson(link = "log"), * weights = all_dat$wt ) ``` --- # Plotting spatial random effects .small[ <img src="13-presence-only_files/figure-html/plot-rf-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Predictions in link (log) space .xsmall[ <img src="13-presence-only_files/figure-html/plot-link-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Does adding more pseudo-absences improve performance? * Increase 0s from ~ 20K to 30K * AUC similar (other criteria could be used) ``` #> [1] 0.8162975 ``` --- # What about using a higher resolution mesh? * Change cutoff from 25 to 15 * Knots change from ~700 to ~1750 * Marginal gains in AUC with finer mesh * Note: it's not adding more pseudo-absences but changing the mesh that's more important here ``` #> [1] 0.8451451 ``` --- # Benefits of pseudo-absence modeling * Estimate of spatial range isn't sensitive to choice of raster / lattice resolution * Doesn't require wrangling raw data (e.g. aggregating to larger cell size to model counts)