Reporting reproducibile analyses

Introduction

One of the main motivations in creating respR was to promote open, reproducible respirometry analyses.

Respirometry is ubiquitous in experimental biology, however there is a lack of standard approaches in analysing respirometry data. Reporting of respirometry analyses is often incomplete, vague or lacking in detail (Killen et al. 2021). This is sometimes understandable given journal word limits, however it hinders attempts to reproduce or scrutinise the data and analyses. In particular, many analysis criteria and parameters are under-reported. For example, the criteria used to select regions over which rates were determined, or in the case of maximum or minimum rates, the width used when fitting multiple regressions (Prinzing et al. 2021).

In the era of cheap data storage, and easy sharing of code and files online, there is little reason why all stages of analyses and raw data cannot be made available to reviewers, readers and other investigators as supplementary material or in specialised online repositories. We have designed respR to help make this process straightforward.

Readable, concise code

The functions in respR have been designed to have clear, unambiguous inputs describing the parameters used in analysis. Therefore, given some familiarity with the package functions, the code is very human-readable, and it is usually clear under what parameters an analysis was conducted. We have minimised the number of function operators where possible, mainly to streamline and simplify analyses, but this also has the side benefit of making input code less verbose and more readable.

This readable code, along with the fact that an end-to-end workflow could potentially consist of processing raw data through as few as three functions, means a respR analysis for a single experiment could consist of only a few lines of code. Combine this with piping and code is even more concise. This example uses the native |> pipes introduced in R v4.1):

urchins.rd |>                                            # Using the urchins data,
  inspect(1, 15) |>                                      # inspect columns 1 and 15, then
  calc_rate(from = 4, to = 29, by = "time") |>           # calculate rate between times 4 and 29
  adjust_rate(by = calc_rate.bg(urchins.rd, time = 1,    # calculate the background...
                                oxygen = 18:19)) |>      # ... and adjust rate
  convert_rate(oxy.unit = "mgl-1", time.unit = "m",     
               output.unit = "mg/s/kg", 
               volume = 1.09, mass = 0.19)               # and finally convert

Transparent, open analyses

While there is commercial software which can analyse respirometry data (e.g. AutoResp by Loligo), these are often poorly designed, complex, inefficient, and often prohibitively expensive and locked to single platforms, if not single computers. The main drawback of these is that they are not open-source, and so inherently prevent transparent and reproducible analyses.

Alternatively, many investigators have their own analysis workflows in Excel and R. These are perfectly fine and work well on an individual basis or within a lab, but even if the spreadsheet or code is shared, their custom nature means that reproducibility is difficult for an independent user or may require a significant time commitment. respR seeks to solve these problems.

We have designed respR to avoid it being a black box, into which data is fed and a result spit out with the user having little understanding or oversight into what is happening. We feel it important a user understands their data, which is why we include exploratory functions to visualise the data, look for common errors, and prepare it for analysis. We designed the functions to allow straightforward natural-language querying of different parts of the data in terms the user can understand, so they should always be aware of what they are doing. All the analytical functions also provide visualisations of the analysis and results. All output objects support the base R plot(), print() and summary() commands, which allow quick assessment of the most important outputs. As with all R packages and functions, the code underlying the analysis is easily viewable, so advanced users can investigate how it works for themselves. Through hosting the code on Github we encourage users to contribute to the code via Issues, Pull Requests, and Forking.

Documenting reproducible workflows

The easiest way of providing a reproducible workflow using respR is to include the raw data as a .csv or .rda file , and include a .R script containing all the code used to import and process that data file.

Because respR has been designed as an end-to-end workflow covering every step in the analysis of respirometry data, code for all stages from raw data input, data preparation and exploration, determining rates, and converting the rates to specific units can be included in a single file, and usually this will comprise only a few lines of code. There may be additional data inputs (experiment specific parameters such as volumes, masses, temperatures, etc.). These can be entered as values in the code, but ideally would be included as an additional data file, and most R users should be familiar with how to format these in good database-style formatting, and how to reference the relevant entries in code.

To demonstrate how brief an entire end-to-end respR workflow can be, here’s an example of a typical script which uses the sardine.rd dataset included in the package. We inspect the data, determine both the most linear rate, and the maximum rate over a 10 minute period, and finally convert rates to mass-specific units.

## Load the package
library(respR)

## Import data
## Here we would typically use import_file() to import a data file, 
## but sardine.rd is already loaded

## Experiment parameters (this might be an existing object containing data for many experiments)
exp_param <- data.frame(volume = 12.3, mass = 21.41, DO.unit = "mg/l", time.unit = "s")

## Inspect the data, and save it as an object
exp_1 <- inspect(sardine.rd)

## Determine rates - most linear, and highest rate over 15 mins (900 rows)
exp_1_linear_rate <- auto_rate(exp_1)
exp_1_max_rate <- auto_rate(exp_1, method = "highest", width = 900)

## Convert rates to mass-specific units using the experimental parameters
exp_1_linear_rate_conv <- convert_rate(exp_1_linear_rate, 
                                       output.unit = "mg/h/kg",
                                       oxy.unit = exp_param$DO.unit,
                                       time.unit = exp_param$time.unit,
                                       volume = exp_param$volume,
                                       mass = exp_param$mass)

exp_1_max_rate_conv <- convert_rate(exp_1_max_rate, 
                                    output.unit = "mg/h/kg",
                                    oxy.unit = exp_param$DO.unit,
                                    time.unit = exp_param$time.unit,
                                    volume = exp_param$volume,
                                    mass = exp_param$mass)

## Results
exp_1_linear_rate_conv
#> 
#> # print.convert_rate # ------------------
#> Rank 1 of 39 rates:
#> 
#> Input:
#> [1] -0.000661
#> [1] "mg/L" "sec" 
#> Converted:
#> [1] -1.37
#> [1] "mgO2/hr/kg"
#> 
#> To see other results use 'pos' input. 
#> To see full results use summary().
#> -----------------------------------------

exp_1_max_rate_conv
#> 
#> # print.convert_rate # ------------------
#> Rank 1 of 6614 rates:
#> 
#> Input:
#> [1] -0.00114
#> [1] "mg/L" "sec" 
#> Converted:
#> [1] -2.35
#> [1] "mgO2/hr/kg"
#> 
#> To see other results use 'pos' input. 
#> To see full results use summary().
#> -----------------------------------------

This shows how an entire analysis workflow can be conducted in only a few lines of code.

Saving single respR objects

For more documented analyses, for documenting parts of a workflow, or for saving the outputs of specific functions, we can use the base R saveRDS function. The main functions in respR output R objects which include raw data, results and also metadata in the form of subsetting criteria, models and input parameters. Thus, each of these objects contains all the information saved from the previous function that created it, and so the information required to continue through the respR workflow from any intermediate stage.

These objects can be saved using saveRDS and distributed like any other file. This can allow each stage of the analysis to be saved, and shared or archived. An advantage of using one of these objects is that an analysis can be continued from a certain point without having to re-run the entire workflow. For example, data can be imported, inspected, a rate determined through the auto_rate function, and the resulting output object exported via saveRDS. This could be shared with a colleague who can then import it into their own R project using readRDS, and feed it into convert_rate to convert it to whatever units they wish without having to run any other parts of the workflow.

## Running this code will show how saveRDS and readRDS preserves values 
## correctly, and gives the exact same results when used in further 
## stages of the workflow

## Inspect
urch_data <- inspect(urchins.rd, time = 1, oxygen = 15)

## Export resulting object to working directory
saveRDS(urch_data, file = "urch_data.rds")

## Re-import it to new object
urch_data_in <- readRDS(file = "urch_data.rds")

## Check values are preserved - should be TRUE
all.equal(urch_data, urch_data_in)

One example of using this method to document a reproducible workflow, would be to save the object created by inpect and include that along with the script for the rest of the analysis rather than including original data files. This object includes the raw data, but using it tells the next user the data has already been checked for common errors such as missing data, and if so these errors fixed, and so that step is unnecessary.

Saving all respR objects

To document or archive an entire working environment we can use the save.image() command. This function saves to the working directory every object and value in the current R environment (though not scripts, i.e. .R files.). This way, all objects from the analysis of a single experiment, or multiple experiments, can be saved to a single file.

## Will save entire environment to current working directory
save.image(file = "experiment_1.rda")

The environment can be reimported by double clicking on the file (usually RStudio will ask if you want to import it to the current project), or load() command.

## Will load entire environment to current working project. 
## Path to the file must be specified, or it must be in current working directory.
load(file = "experiment_1.rda")

This avoids having to save every object individually.

Conclusion

We hope this brief vignette has helped you understand how respR may be used to produce open and reproducible science from your respirometry data. As scientists, we have probably all wished we could get the raw data from a publication, or struggled to extract data from old plots and tables. So we encourage you to use our package, and when you come to submit your paper include your analysis as supplementary material to allow others to scrutinise and reproduce your results, and in the future utilise your data for new analyses. This benefits the science community as a whole.

If you do use respR in your analyses, please cite the publication (or run citation("respR")), and let us know. We would be happy to help promote and increase the visibility of your studies.

See vignette("citations") for a running list of projects which have used respR.