Get all data — get_all_survey

These functions get all survey set or sample data for a set of species by major area, activity, or specific surveys. The main functions in this package focus on retrieving the more commonly used typs of data and are often limited to sets and samples that conform to current design-based standards and survey grids. These functions will retrieve everything and therefore require careful consideration of what data types are reasonable to include depending on the purpose. For this reason these function return a lot of columns, although the exact number depends on which types of surveys are being returned.

Usage

get_all_survey_samples(
  species,
  ssid = NULL,
  major = NULL,
  usability = NULL,
  unsorted_only = FALSE,
  random_only = FALSE,
  grouping_only = FALSE,
  keep_all_ages = FALSE,
  include_event_info = FALSE,
  include_activity_matches = FALSE,
  remove_bad_data = TRUE,
  remove_duplicates = TRUE,
  return_dna_info = FALSE,
  return_specimen_type = FALSE,
  drop_na_columns = TRUE,
  quiet_option = "message"
)

get_all_survey_sets(
  species,
  ssid = NULL,
  major = NULL,
  years = NULL,
  join_sample_ids = FALSE,
  remove_false_zeros = TRUE,
  remove_bad_data = TRUE,
  remove_duplicates = TRUE,
  include_activity_matches = FALSE,
  usability = NULL,
  grouping_only = FALSE,
  drop_na_columns = TRUE,
  quiet_option = "message"
)

Arguments

species: One or more species common names (e.g. "pacific ocean perch") or one or more species codes (e.g. 396). Species codes can be specified as numeric vectors c(396, 442) or characters c("396", "442"). Numeric values shorter than 3 digits will be expanded to 3 digits and converted to character objects (1 turns into "001"). Species common names and species codes should not be mixed. If any element is missing a species code, then all elements will be assumed to be species common names. Does not work with non-numeric species codes, so in those cases the common name will be needed.
ssid: A numeric vector of survey series IDs. Run get_ssids() for a look-up table of available survey series IDs with surveys series descriptions. Default is to return all data from all surveys. Some of the most useful ids include: contemporary trawl (1, 3, 4, 16), historic trawl (2), IPHC (14), sablefish (35), and HBLL (22, 36, 39, 40).
major: Character string (or vector) of major stat area code(s) to include (characters). Use get_major_areas() to lookup area codes with descriptions. Default is NULL.
usability: A vector of usability codes to include. Defaults to NULL, but typical set for a design-based trawl survey index is c(0, 1, 2, 6). IPHC codes may be different to other surveys and the modern Sablefish survey doesn't seem to assign usabilities.
unsorted_only: Defaults to FALSE, which will return all specimens collected on research trips. TRUE returns only unsorted (1) and NA specimens for both species_category_code and sample_source_code.
random_only: Defaults to FALSE, which will return all specimens collected on research trips. TRUE returns only randomly sampled specimens (sample_type_code = 1, 2, 6, 7, or 8).
grouping_only: Defaults to FALSE, which will return all specimens or sets collected on research trips. TRUE returns only sets or specimens from fishing events with grouping codes that match that expected for a survey. Can also be achieved by filtering for specimens where !is.na(grouping_code).
keep_all_ages: Defaults to FALSE to keep only ages with standard methods for all surveys other than the NMFS Triennial.
include_event_info: Logical for whether to append all relevant fishing event info (location, timing, effort, catch, etc.). Defaults to TRUE.
include_activity_matches: Get all surveys with activity codes that match chosen ssids.
remove_bad_data: Remove known bad data, such as unrealistic length or weight values and duplications due to trips that include multiple surveys. Default is TRUE.
remove_duplicates: Logical for whether to remove duplicated event records due to overlapping survey stratifications when original_ind = 'N'. Default is FALSE. This option only remains possible when ssids are supplied and activity matches aren't included. Otherwise turns on automatically.
return_dna_info: Should DNA container ids and sample type be returned? This can create duplication of specimen ids for some species. Defaults to FALSE.
return_specimen_type: Should non-otolith structure types be returned? This can create duplication of specimen ids for some species. Defaults to FALSE.
drop_na_columns: Logical for removing all columns that only contain NAs. Defaults to TRUE.
quiet_option: Default option, "message", suppresses messages from sections of code with lots of join_by messages. Any other string will allow messages.
years: Default is NULL, which returns all years.
join_sample_ids: This option was problematic, so now reverts to FALSE.
remove_false_zeros: Default of TRUE will make sure weights > 0 don't have associated counts of 0 and vice versa. Mostly useful for trawl data where counts are only taken for small catches.

Examples

if (FALSE) { # \dontrun{
## Import survey catch density and location data by tow or set for plotting
## Specify single or multiple species by common name or species code and
## single or multiple survey series id(s).
## Notes:
## `area_km` is the stratum area used in design-based index calculation.
## `area_swept` is in m^2 and is used to calculate density for trawl surveys
## It is based on `area_swept1` (`doorspread_m` x `tow_length_m`) except
## when `tow_length_m` is missing, and then we use `area_swept2`
## (`doorspread` x `duration_min` x `speed_mpm`).
## `duration_min` is derived in the SQL procedure "proc_catmat_2011" and
## differs slightly from the difference between `time_deployed` and
## `time_retrieved`.
} # }