Implements geostatistical models of trawl or longline survey data. Uses INLA or glmmfields (currently disabled) for Bayesian inference.

fit_survey_sets(dat, years, survey = NULL,
  density_column = "density_kgpm2", chains = 4, iter = 1000,
  max_knots = 20, adapt_delta = 0.95, thin = 1,
  mcmc_posterior_samples = 150, required_obs_percent = 0.05,
  utm_zone = 9, model = c("inla", "sdmTMB", "glmmfields"),
  include_depth = TRUE, survey_boundary = NULL, premade_grid = NULL,
  tmb_knots = 200, inla_knots_pos = 75, inla_knots_bin = 100,
  gamma_scaling = 1000, cell_width = 2, ...)

Arguments

dat

Output from get_survey_sets().

years

The year to include in the model. Should be a single year.

survey

The survey abbreviation. Should match the contents of the column survey_abbrev in the data frame returned by get_survey_sets().

density_column

The name of the column that includes the relative biomass density to use. E.g. "density_kgpm2" for trawl surveys or "density_ppkm2" for the long line surveys.

chains

The number of MCMC chains. Only applies to the glmmfields model.

iter

The number of MCMC chains. Only applies to the glmmfields model.

max_knots

The maximum number of knots to use in the predictive process approximation in the glmmfields model. If this number is larger than the number of data points then the number of knots is set to the number of data points minus 2. Only applies to the glmmfields model.

adapt_delta

Value to pass to rstan::sampling(). Values closer to 1 make smaller steps in the Hamiltonian MCMC at the expense of speed. Only applies to the glmmfields model.

thin

Value to pass to rstan::sampling() to thin the MCMC samples. Only applies to the glmmfields model.

mcmc_posterior_samples

Number of final MCMC samples to return. Applies to both glmmfields and INLA.

required_obs_percent

A required fraction of positive sets before a model is fit.

utm_zone

The UTM zone to perform the modeling in. Defaults to zone 9.

model

The backend software to fit the model. Options are "inla" or "glmmfields" (currently disabled until on CRAN). INLA implements in approximation of the posterior via the integrated nested Laplace approximation, but the results are usually quite good and are likely drive considerably faster than glmmfields, especially for larger data sets or for many knots.

include_depth

Logical: should depth be included as a predictor? If FALSE then the model will only have a spatial random field as the predictor. Currently only applies to the INLA model.

survey_boundary

If not NULL, a data frame with the survey boundary defined in columns X and Y in longitude and latitude coordinates. If NULL, the functions will search for a matching element in the included the data object gfplot::survey_boundaries based on the survey argument (after removing "SYN" from the name).

premade_grid

If not NULL, a list object with an element grid that contains a data frame with columns X, Y, and depth, and another element cell_area the content a single numeric value describing the grid size in kilometers. The package includes a survey grid for the HBLL surveys in gfplot::hbll_grid.

tmb_knots

The number of knots to pass to sdmTMB::sdmTMB().

inla_knots_pos

The number of knots for the positive component model if fit with INLA.

inla_knots_bin

The number of knots for the binary component model if fit with INLA.

gamma_scaling

A value to multiply the positive densities by internally before fitting the Gamma GLMM. The predictions are then divided by this value internally to render a prediction on the original scale. The reason for this is that values to are too small can create computational problems for INLA.

cell_width

The cell width if a prediction grid is made on the fly.

...

Any other arguments to pass on to the modelling function.

Examples

set.seed(123) # pop_surv <- get_survey_sets("pacific ocean perch") # or use built-in data: fit <- fit_survey_sets(pop_surv, years = 2015, survey = "SYN QCS")
#> Preloading interpolated depth for prediction grid...
#> Predicting density onto grid...
#> INLA max_edge = c(20, 100)
#> INLA max_edge = c(20, 100)
names(fit)
#> [1] "predictions" "data" "models" "survey" "years"
plot_survey_sets(fit$predictions, fit$data, fill_column = "combined")