Visualise and calculate life history parameters for fisheries science using ggplot2. R package version 0.3.0
Overview
The ggFishPlots package for R allows quickly plotting and calculating life history parameters required by stock assessment models. The package uses ggplot2 for plotting and the tidyverse packages for calculations.
The ggOceanMaps package has been developed by the Institute of Marine Research. Note that the package comes with absolutely no warranty. Any bug reports and code fixes are warmly welcomed. See Contributions for further details.
If you are looking for other similar packages, make sure to check out the AquaticLifeHistory, FSA, and fishmethods packages, as well as the FishR webpage.
Installation
The most up to date version of the package can always be found from GitHub. If the CRAN version does not work as shown in the examples on this website, try installing the GitHub version. You can do this by using the devtools or remotes packages.
remotes::install_github("DeepWaterIMR/ggFishPlots")
Usage
The package has a website. At the time of writing, the package produces four kinds of plots: growth curves, maturity plots, length-weight relationships, and catch curves. Each function returns a ggplot2 plot and the estimated parameters as a text string that can be used in Rmarkdown and Shiny applications as well as a data frame for further use of the parameters. The elements are returned as a list. The package contains example data to illustrate the functionality.
Growth curves
Note how the text
and params
are returned as a list together with plot
.
library(ggFishPlots)
data(survey_ghl) # example data
head(survey_ghl)
#> # A tibble: 6 × 5
#> age sex length weight maturity
#> <dbl> <chr> <dbl> <dbl> <int>
#> 1 NA <NA> 35 NA 0
#> 2 NA <NA> 43 NA 0
#> 3 NA <NA> 51 NA 0
#> 4 NA <NA> 31 NA 0
#> 5 NA <NA> 32 NA 0
#> 6 NA <NA> 32 NA 0
plot_growth(survey_ghl, length = "length", age = "age")
#> $plot
#>
#> $text
#> [1] "von Bertalanffy growth function coefficients: \n Linf (asymptotic average length) = 91.2 cm +/- 88.3 - 94.6 (95% CIs) \n K (growth rate coefficient) = 0.0633 +/- 0.059 - 0.068 (95% CIs) \n t0 (age at length 0) = -3.04 (years) +/- -3.337 - -2.769 (95% CIs) \n tmax (life span; t0 + 3/K) = 44.4 years \n Number of included specimens = 10401 \n Total number of measured = 618779 \n Excluded (length or age missing): \n Length = 0; age = 608378"
#>
#> $params
#> # A tibble: 3 × 7
#> term estimate std.error statistic p.value conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sinf 91.2 1.51 60.3 0 88.3 94.6
#> 2 K 0.0633 0.00231 27.4 4.90e-160 0.0586 0.0680
#> 3 t0 -3.04 0.139 -21.8 1.71e-103 -3.34 -2.77
The text
object can be rendered to R markdown documents using the results = 'asis'
setting in the code chunk header (i.e. {r, results = 'asis'}
) and the cat()
function after replacing "\n"
by "\ n"
:
htmlcat <- function(text){
cat(gsub(pattern = "\n", replacement = " \n", x = text))
}
htmlcat(plot_growth(survey_ghl)$text)
von Bertalanffy growth function coefficients:
Linf (asymptotic average length) = 91.2 cm +/- 88.3 - 94.6 (95% CIs)
K (growth rate coefficient) = 0.0633 +/- 0.059 - 0.068 (95% CIs)
t0 (age at length 0) = -3.04 (years) +/- -3.337 - -2.769 (95% CIs)
tmax (life span; t0 + 3/K) = 44.4 years
Number of included specimens = 10401
Total number of measured = 618779
Excluded (length or age missing):
Length = 0; age = 608378
Split by sex
Specifying length
, age
and sex
arguments have been omitted since they are the argument names by default and the same than in the example data. Only the plot
element is returned this time.
plot_growth(survey_ghl, split.by.sex = TRUE)$plot
The dashed lines are Sinf. Data behind the growth curves are shown as box plots by default. It is possible to plot the data as points by defining boxplot = FALSE
. We can also force zero group into the curves if know the length of it. Assumed as 14 cm here. Zero group forcing strength is 10 % of number of observations by default and can be adjusted using the force.zero.group.strength
argument.
plot_growth(survey_ghl, force.zero.group.length = 14, boxplot = FALSE)$plot
Maturity plots
L50 plots
Maturity ogives are estimated using a logistic (family = binomial(link = "logit")
) glm()
plot_maturity(survey_ghl, length = "length", maturity = "maturity")
#> $plot
#>
#> $text
#> [1] "50% maturity at length (L50) based on logit regressions:\n54.784 cm. 95% confidence intervals: 52.852 - 56.787\n Number of specimens: 64265.\n Confidence intervals estimated from the glm object."
#>
#> $params
#> mean ci.min ci.max sex intercept slope n
#> 1 54.78361 52.85249 56.787 both -5.755492 0.1050587 64265
The error bars represent 95% confidence intervals calculated from the model object using the confint()
function and back-transformed to the original scale. The grey stepped line is a binned average defined using the length.bin.width
argument.
The function also contains an option to bootstrap the confidence intervals (CIs) which will produce narrower CIs. Bootstrapping is probably a more correct way of estimating CIs in this application. Using only 10 replicates here to save processing time. In real application use at least 1000.
plot_maturity(survey_ghl, bootstrap.n = 10)
#> $plot
#>
#> $text
#> [1] "50% maturity at length (L50) based on logit regressions:\n54.812 cm. 95% confidence intervals: 54.584 - 54.924\n Number of specimens: 64265\n\n Confidence intervals estimated using 10 bootstrap replicates."
#>
#> $params
#> mean ci.min ci.max sex intercept slope n
#> 1 54.81204 54.58416 54.92397 both -5.755492 0.1050587 64265
Split by sex
plot_maturity(survey_ghl, split.by.sex = TRUE)$plot
A50 plots
The same principle can be used to produce A50 (50% maturity at age) plots:
plot_maturity(survey_ghl, length = "age", length.unit = "years",
xlab = "Age", length.bin.width = 1, split.by.sex = TRUE)$plot
Addition of juveniles
Also the plot_maturity()
function has the option to add juveniles (zero group fish). The addition of juveniles can be necessary for to make the glm()
to converge if small immature fish are scarce in the dataset.
plot_maturity(survey_ghl, length = "age", length.unit = "years",
xlab = "Age", length.bin.width = 1,
force.zero.group.length = 0,
force.zero.group.strength = 100,
split.by.sex = TRUE)$plot
Note how the addition increases the estimate for males because the dataset did not contain a sufficient number of young males which the number of females seems sufficient because females mature older in this species.
One option can be to estimate the number of added juveniles using the plot_catchcurve()
function:
plot_maturity(survey_ghl, length = "age", length.unit = "years",
xlab = "Age", length.bin.width = 1,
force.zero.group.length = 0,
force.zero.group.n = c("F" = exp(11.363), "M" = exp(11.885)),
split.by.sex = TRUE)$plot
Length-weight relationships
Log-linearized models
Simple plot using log-transformation and linear models by default.
plot_lw(survey_ghl, length = "length", weight = "weight")
#> $plot
#>
#> $text
#> [1] "Logarithm transformed linear length-weight model. Not splitted by sex: \n a = 3.8236e-06 +/- 3.7869e-06 - 3.8606e-06 (95% CIs). \n b = 3.221 +/- 3.22 - 3.22 (95% CIs). \n Length in cm and weight in kg \n Number of included specimens = 67457 \n Total number of measured = 618779 \n Excluded (data missing): \n Length = 0; weight = 551322; outlier = 0"
#>
#> $params
#> # A tibble: 2 × 14
#> term estimate std.error statistic p.value conf.low conf.high r.squared
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 0.00000382 0.00491 -2540. 0 0.00000379 0.00000386 0.989
#> 2 b 3.22 0.00128 2519. 0 3.22 3.22 0.989
#> # ℹ 6 more variables: AIC <dbl>, nobs <int>, length <chr>, length.unit <chr>,
#> # weight <chr>, weight.unit <chr>
The dashed lines represent 95% confidence intervals.
Non-linear least square models
Use non-linear least squares instead:
plot_lw(survey_ghl, use.nls = TRUE)
#> $plot
#>
#> $text
#> [1] "Nonlinear least squares length-weight model. Not splitted by sex: \n a = 1.7268e-06 +/- 1.6962e-06 - 1.758e-06 (95% CIs). \n b = 3.419 +/- 3.42 - 3.42 (95% CIs). \n Length in cm and weight in kg \n Number of included specimens = 67457 \n Total number of measured = 618779 \n Excluded (data missing): \n Length = 0; weight = 551322; outlier = 0"
#>
#> $params
#> # A tibble: 2 × 13
#> term estimate std.error statistic p.value conf.low conf.high AIC nobs
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 a 0.00000173 1.56e-8 110. 0 1.70e-6 1.76e-6 -27925. 67457
#> 2 b 3.42 2.15e-3 1590. 0 3.42e+0 3.42e+0 -27925. 67457
#> # ℹ 4 more variables: length <chr>, length.unit <chr>, weight <chr>,
#> # weight.unit <chr>
Split by sex
The decimal point of a and b estimates depends on the length and weight units. FishBase uses centimeters and grams. The function can correct for the units when asked (but length.unit
and weight.unit
parameters have to be defined correctly).
plot_lw(survey_ghl, split.by.sex = TRUE, correct.units = TRUE)
#> $plot
#>
#> $text
#> [1] "Logarithm transformed linear length-weight model for females and males, respectively: \n a = 0.0035 +/- 0.00344 - 0.00355 (95% CIs) and 0.00507 +/- 0.00497 - 0.00517 (95% CIs). \n b = 3.247 +/- 3.24 - 3.25 (95% CIs) and 3.143 +/- 3.14 - 3.15 (95% CIs). \n Length in cm and weight in g \n Number of included specimens = 34889 and 30354 \n Total number of measured = 618779 \n Excluded (data missing): \n Length = 0; weight = 456891; sex = 96645; outlier = 0"
#>
#> $params
#> # A tibble: 4 × 15
#> # Groups: sex [2]
#> sex term estimate std.error statistic p.value conf.low conf.high r.squared
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 F a 0.00350 0.00765 -739. 0 0.00344 0.00355 0.988
#> 2 F b 3.25 0.00195 1666. 0 3.24 3.25 0.988
#> 3 M a 0.00507 0.0101 -525. 0 0.00497 0.00517 0.979
#> 4 M b 3.14 0.00265 1188. 0 3.14 3.15 0.979
#> # ℹ 6 more variables: AIC <dbl>, nobs <int>, length <chr>, length.unit <chr>,
#> # weight <chr>, weight.unit <chr>
You can also transform the parameters according to the formulas given in the FishBase.
plot_lw(survey_ghl %>% dplyr::mutate(weight = weight*1000), weight.unit = "g")$params
#> # A tibble: 2 × 14
#> term estimate std.error statistic p.value conf.low conf.high r.squared
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 0.00382 0.00491 -1133. 0 0.00379 0.00386 0.989
#> 2 b 3.22 0.00128 2519. 0 3.22 3.22 0.989
#> # ℹ 6 more variables: AIC <dbl>, nobs <int>, length <chr>, length.unit <chr>,
#> # weight <chr>, weight.unit <chr>
Log-log axes to see differences
plot_lw(survey_ghl, split.by.sex = TRUE, log.axes = TRUE)$plot
Outlier removal
plot_lw(survey_ghl, outlier.percentile = 99.5, annotate.coefficients = TRUE)$plot
Catch curves to estimate instantaneous total mortality (Z)
Catch curves were added to the version 0.2.3 and may not be available in the CRAN version. A FishR tutorial written by Ogle (2013) contains a nice explanation of catch curves. At the time of writing ggFishPlots calculates only the simple log-linearised regression.
plot_catchcurve(survey_ghl)
#> $plot
#>
#> $text
#> [1] "Instantenous total mortality (Z) estimated using a catch curve and\nage range .\n\nZ = 0.19 (0.123-0.257 95% CIs)\nN at age 0 = 1448 (449-4674 95% CIs)\nLongevity = 38.3 years (23.8 - 68.8 95% CIs)\n\n"
#>
#> $params
#> # A tibble: 2 × 8
#> sex term estimate std.error statistic p.value conf.low conf.high
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 both (Intercept) 7.28 0.573 12.7 2.25e-13 6.11 8.45
#> 2 both age -0.190 0.0328 -5.79 2.85e- 6 -0.257 -0.123
The ages to be included to the Z estimation can be adjusted using the age.range
argument.
plot_catchcurve(survey_ghl, age.range = c(10,26))$plot
In the plot above, -b is Z (i.e. Z = 0.356) and exp(a) (i.e. 68391) is the number of 0 age fish assuming constant mortality.
Split by sex
plot_catchcurve(survey_ghl, age.range = c(10,26), split.by.sex = TRUE)$plot
Use a named list to use separate age ranges for females and males.
tmp <- plot_catchcurve(survey_ghl,
age.range = list("F" = c(13,26), "M" = c(10,26)),
split.by.sex = TRUE)
tmp$plot
Parameter estimates for the catch curve above:
htmlcat(tmp$text)
Instantenous total mortality (Z) estimated using a catch curve and
age range for females and for males.
Females:
Z = 0.37 (0.31-0.431 95% CIs)
N at age 0 = 86119 (25990-285354 95% CIs)
Longevity = 30.7 years (23.6 - 40.5 95% CIs)
Males:
Z = 0.511 (0.492-0.53 95% CIs)
N at age 0 = 145002 (105241-199785 95% CIs)
Longevity = 23.3 years (21.8 - 24.8 95% CIs)
Citations and data sources
The data used in the package are a property of the Institute of Marine Research and the Norwegian Government. They are distributed under the Creative Commons (CCBY or NLOD) licenses allowing free use as long as the source (IMR) is cited. We ask any user to refer to the package if plots or estimates are used in reports or scientific articles. For up-to-date citation information, please use:
citation("ggFishPlots")
#> To cite package 'ggFishPlots' in publications use:
#>
#> Vihtakari M (2024). _ggFishPlots: Visualise and Calculate Life
#> History Parameters for Fisheries Science using 'ggplot2'_. R package
#> version 0.3.0, <https://deepwaterimr.github.io/ggFishPlots/>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {ggFishPlots: Visualise and Calculate Life History Parameters for Fisheries Science using 'ggplot2'},
#> author = {Mikko Vihtakari},
#> year = {2024},
#> note = {R package version 0.3.0},
#> url = {https://deepwaterimr.github.io/ggFishPlots/},
#> }
Contributions
Any contributions to the package are more than welcome. Please contact the package maintainer Mikko Vihtakari (mikko.vihtakari@hi.no) to discuss your ideas on improving the package. Bug reports and corrections should be submitted directly to the GitHub site. Please include a minimal reproducible example. Considerable contributions to the package development will be credited with an authorship.