class: center, middle, inverse, title-slide # Spatial modeling of presence-only data ## NOAA PSAW Seminar Series ### March 9, 2022 --- <!-- Build with: xaringan::inf_mr() --> # Spatial patterning of trees: bei dataset <img src="06-presence-only_files/figure-html/plot-trees-1.png" width="700px" style="display: block; margin: auto;" /> --- # Pseudo-absences * Need to generate 0s: how? * quadrature points ([Renner et al. 2015](https://doi.org/10.1111/2041-210X.12352)) * Strategy? * generate regularly spaced or random * or generate higher density where environmental variability high * How many? * Large enough so that predictive performance does not change as more are added --- # Pseudo-absences from sdmTMB * uniform grid strategy * this creates ~ 20000 points ```r res <- 5 zeros <- expand.grid( x = seq(0, 1000, by = res), y = seq(0, 500, by = res) ) ``` --- # Bind the observed and pseudo-zeros together .small[ ```r dat$present <- 1 zeros$present <- 0 all_dat <- rbind(dat, zeros) mesh <- make_mesh( all_dat, xy_cols = c("x", "y"), cutoff = 25 # min. distance between knots in X-Y units ) mesh$mesh$n # extract number of vertices/knots #> [1] 678 ``` ] --- # Combined data * blue dots are data; red grid dots are quadrature points * grey triangles are from the SPDE mesh .small[ <img src="06-presence-only_files/figure-html/mesh-viz-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Infinitely Weighted Logistic Regression (IWLR) * [Fithian & Hastie (2013)](https://doi.org/10.1214/13-AOAS667) ```r # results sensitive to choice of nW nW <- 1.0e6 all_dat$wt <- nW^(1 - all_dat$present) ``` * weights can be passed into model of choice * `glm()`, `glmmTMB()`, etc. * these options don't deal with spatial correlation/latent spatial processes * adding random fields makes this a "spatial log-Gaussian Cox process" --- # IWLR & sdmTMB * convergence may be affected by size of pseudo-absences ```r fit <- sdmTMB( present ~ 1, data = all_dat, mesh = mesh, family = binomial(link = "logit"), * weights = all_dat$wt ) ``` --- # Inspecting the model output * But intercept and log-likelihood affected by `nW` ([Renner et al. 2015](https://doi.org/10.1111/2041-210X.12352)) .small[ ```r summary(fit) #> Spatial model fit by ML ['sdmTMB'] #> Formula: present ~ 1 #> Mesh: mesh #> Data: all_dat #> Family: binomial(link = 'logit') #> coef.est coef.se #> (Intercept) -16.7 0.26 #> #> Matern range: 103.93 #> Spatial SD: 1.63 #> ML criterion at convergence: 57397.974 #> #> See ?tidy.sdmTMB to extract these values as a data frame. ``` ] --- # Downweighted Poisson Regression (DWPR) * Similar to IWLR, uses different weights * Doesn't have same arbitrary effects on intercept, likelihood ```r # small values at presence locations all_dat$wt <- 1e-6 # pseudo-absences: area per quadrature point tot_area <- diff(range(dat$x)) * diff(range(dat$y)) n_zeros <- length(which(all_dat$present == 0)) all_dat$wt <- ifelse(all_dat$present == 1, 1e-6, tot_area / n_zeros ) ``` --- # DWPR & sdmTMB ```r fit <- sdmTMB( present / wt ~ 1, data = all_dat, mesh = mesh, family = poisson(link = "log"), * weights = all_dat$wt ) ``` --- # Inspecting the model output .small[ ```r summary(fit) #> Spatial model fit by ML ['sdmTMB'] #> Formula: present/wt ~ 1 #> Mesh: mesh #> Data: all_dat #> Family: poisson(link = 'log') #> coef.est coef.se #> (Intercept) -6.09 0.26 #> #> Matern range: 103.93 #> Spatial SD: 1.63 #> ML criterion at convergence: 65335.224 #> #> See ?tidy.sdmTMB to extract these values as a data frame. ``` ] --- # Plotting spatial random effects .small[ <img src="06-presence-only_files/figure-html/plot-rf-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Predictions in link (log) space .xsmall[ <img src="06-presence-only_files/figure-html/plot-link-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Predictions in natural space .xsmall[ <img src="06-presence-only_files/figure-html/plot-norm-1.png" width="700px" style="display: block; margin: auto;" /> ] --- # Quantifying predictive performance * Lots of options for binary data, AUC common * Values near 0.5 ~ random, want close to 1 ```r all_dat$p <- predict(fit)$est # total predictions, logit rocr <- ROCR::prediction(exp(all_dat$p), all_dat$present) ROCR::performance(rocr, measure = "auc")@y.values[[1]] #> [1] 0.8155031 ``` --- # Does adding more pseudo-absences improve performance? * Increase 0s from ~ 20K to 30K * AUC similar ``` #> [1] 0.8162975 ``` --- # Does decreasing pseudo-absences worsen things? * Decrease 0s from ~ 20K to 5K * AUC similar * So, 20K probably overkill for this mesh ``` #> [1] 0.8148655 ``` --- # What about using a higher resolution mesh? * Change cutoff from 25 to 15 * Knots change from ~700 to ~1750 * Marginal gains in AUC with finer mesh * Note: it's not adding more pseudo-absences but changing the mesh that's more important here ``` #> [1] 0.8451724 ``` --- # Benefits of pseudo-absence modeling * Estimate of spatial range isn't sensitive to choice of raster / lattice resolution --- # How is the result different from the truncated negative binomial? <img src="06-presence-only_files/figure-html/compare-fig-1.png" width="500px" style="display: block; margin: auto;" />