R/auto_rate.R
auto_rate.Rd
auto_rate
performs rolling regressions on a dataset to determine the most
linear, highest, lowest, maximum, minimum, rolling, and interval rates of
change in oxygen against time. A rolling regression of the specified width
is performed on the entire dataset, then based on the "method
" input, the
resulting regressions are ranked or ordered, and the output summarised.
auto_rate(x, method = "linear", width = NULL, by = "row", plot = TRUE, ...)
data frame, or object of class inspect
containing oxygen~time
data.
string. "linear"
, "highest"
, "lowest"
, "maximum"
,
"minimum"
, "rolling"
or "interval"
. Defaults to "linear"
. See
Details.
numeric. Width of the rolling regression. For by = "row"
,
either a value between 0 and 1 representing a proportion of the data
length, or an integer of 2 or greater representing an exact number of rows.
If by = "time"
it represents a time window in the units of the time data.
If NULL
, it defaults to 0.2 or a window of 20% of the data length. See
Details.
string. "row"
or "time"
. Defaults to "row"
. Metric by which
to apply the width
input if it is above 1.
logical. Defaults to TRUE. Plot the results.
Allows additional plotting controls to be passed, such as pos
,
panel
, and quiet = TRUE
.
Output is a list
object of class auto_rate
containing input
parameters and data, various summary data, metadata, linear models, and the
primary output of interest $rate
, which can be background adjusted in
adjust_rate
or converted to units in convert_rate
.
Currently, auto_rate
contains seven ranking and ordering algorithms that
can be applied using the method
input:
linear
: Uses kernel density estimation (KDE) to learn the shape of the
entire dataset and automatically identify the most linear regions of the
timeseries. This is achieved by using the smoothing bandwidth of the KDE to
re-sample the "peaks" in the KDE to determine linear regions of the data. The
summary output will contain only the regressions identified as coming from
linear regions of the data, ranked by order of the KDE density analysis. This
is present in the $summary
component of the output as $density
. Under
this method, the width
input is used as a starting seed value, but the
resulting regressions may be of any width. See
here for full
details.
highest
: Every regression of the specified width
across the entire
timeseries is calculated, then ordered using absolute rate values from
highest to lowest. Essentially, this option ignores the sign of the rate, and
can only be used when rates all have the same sign. Rates will be ordered
from highest to lowest in the $summary
table regardless of if they are
oxygen uptake or oxygen production rates.
lowest
: Every regression of the specified width
across the entire
timeseries is calculated, then ordered using absolute rate values from
lowest to highest. Essentially, this option ignores the sign of the rate, and
can only be used when rates all have the same sign. Rates will be ordered
from lowest to highest in the $summary
table regardless of if they are
oxygen uptake or oxygen production rates.
maximum
: Every regression of the specified width
across the entire
timeseries is calculated, then ordered using numerical rate values from
maximum to minimum. Takes full account of the sign of the rate.
Therefore, oxygen uptake rates, which in respR
are negative, would be
ordered from lowest (least negative), to highest (most negative) in the
summary table in numerical order. Therefore, generally this method should
only be used when rates are a mix of oxygen consumption and production rates,
such as when positive rates may result from regressions fit over flush
periods in intermittent-flow respirometry. Generally, for most analyses where
maximum or minimum rates are of interest the "highest"
or "lowest"
methods should be used.
minimum
: Every regression of the specified width
across the entire
timeseries is calculated, then ordered using numerical rate values from
minimum to maximum. Takes full account of the sign of the rate.
Therefore, oxygen uptake rates, which in respR
are negative, would be
ordered from highest (most negative) to lowest (least negative) in the
summary table in numerical order. Therefore, generally this method should
only be used when rates are a mix of oxygen consumption and production rates,
such as when positive rates may result from regressions fit over flush
periods in intermittent-flow respirometry. Generally, for most analyses where
maximum or minimum rates are of interest the "highest"
or "lowest"
methods should be used.
rolling
: A rolling regression of the specified width
is performed
across the entire timeseries. No reordering of results is performed.
interval
: multiple, successive, non-overlapping regressions of the
specified width
are extracted from the rolling regressions, ordered by
time.
For further selection or subsetting of auto_rate
results, see the dedicated
select_rate()
function, which allows subsetting of rates by various
criteria, including r-squared, data region, percentiles, and more.
There are no units involved in auto_rate
. This is a deliberate decision.
The units of oxygen concentration and time will be specified later in
convert_rate()
when rates are converted to specific output units.
width
and by
inputsIf by = "time"
, the width
input represents a time window in the units of
the time data in x
.
If by = "row"
and width
is between 0 and 1 it represents a proportion of
the total data length, as in the equation floor(width * number of data rows)
. For example, 0.2 represents a rolling window of 20% of the data
width. Otherwise, if entered as an integer of 2 or greater, the width
represents the number of rows.
For both by
inputs, if left as width = NULL
it defaults to 0.2 or a
window of 20% of the data length.
In most cases, by
should be left as the default "row"
, and the width
chosen with this in mind, as it is considerably more computationally
efficient. Changing to "time"
causes the function to perform checks for
irregular time intervals at every iteration of the rolling regression, which
adds to computation time. This is to ensure the specified width
input is
honoured in the time units and rates correctly calculated, even if the data
is unevenly spaced or has gaps.
A plot is produced (provided plot = TRUE
) showing the original data
timeseries of oxygen against time (bottom blue axis) and row index (top red
axis), with the rate result region highlighted. Second panel is a close-up of
the rate region with linear model coefficients. Third panel is a rolling rate
plot (note the reversed y-axis so that higher oxygen uptake rates are plotted
higher), of a rolling rate of the input width
across the whole dataset.
Each rate is plotted against the middle of the time and row range used to
calculate it. The dashed line indicates the value of the current rate result
plotted in panels 1 and 2. The fourth and fifth panels are summary plots of
fit and residuals, and for the linear
method the sisth panel the results of
the kernel density analysis, with the dashed line again indicating the value
of the current rate result plotted in panels 1 and 2.
If multiple rates have been calculated, by default the first (pos = 1
) is
plotted. Others can be plotted by changing the pos
input either in the main
function call, or by plotting the output, e.g. plot(object, pos = 2)
. In
addition, each sub-panel can be examined individually by using the panel
input, e.g. plot(object, panel = 2)
.
Console output messages can be suppressed using quiet = TRUE
. If axis
labels or other text boxes obscure parts of the plot they can be suppressed
using legend = FALSE
. The rate in the rolling rate plot can be plotted
not reversed by passing rate.rev = FALSE
, for instance when examining
oxygen production rates so that higher production rates appear higher. If
axis labels (particularly y-axis) are difficult to read, las = 2
can be
passed to make axis labels horizontal, and oma
(outer margins, default oma = c(0.4, 1, 1.5, 0.4)
), and mai
(inner margins, default mai = c(0.3, 0.15, 0.35, 0.15)
) used to adjust plot margins.
Saved output objects can be used in the generic S3 functions print()
,
summary()
, and mean()
.
print()
: prints a single result, by default the first rate. Others can be
printed by passing the pos
input. e.g. print(x, pos = 2)
summary()
: prints summary table of all results and metadata, or those
specified by the pos
input. e.g. summary(x, pos = 1:5)
. The summary can
be exported as a separate data frame by passing export = TRUE
.
mean()
: calculates the mean of all rates, or those specified by the pos
input. e.g. mean(x, pos = 1:5)
The mean can be exported as a separate value
by passing export = TRUE
.
For additional help, documentation, vignettes, and more visit the respR
website at https://januarharianto.github.io/respR/
# \donttest{
# Most linear section of an entire dataset
inspect(sardine.rd, time = 1, oxygen =2) %>%
auto_rate()
#> inspect: No issues detected while inspecting data frame.
#>
#> # print.inspect # -----------------------
#> Time Oxygen
#> numeric pass pass
#> Inf/-Inf pass pass
#> NA/NaN pass pass
#> sequential pass -
#> duplicated pass -
#> evenly-spaced pass -
#>
#> -----------------------------------------
#> auto_rate: Applying default 'width' of 0.2
#>
#> # print.auto_rate # ---------------------
#> Data extracted by 'row' using 'width' of 1502.
#> Rates computed using 'linear' method.39 linear regions detected in the kernel density estimate.
#> To see all results use summary().
#>
#> Position 1 of 39 :
#> Rate: -0.000660665
#> R.sq: 0.982
#> Rows: 3659 to 6736
#> Time: 3658 to 6735
#> -----------------------------------------
# What is the lowest oxygen consumption rate over a 10 minute (600s) period?
inspect(sardine.rd, time = 1, oxygen =2) %>%
auto_rate(method = "lowest", width = 600, by = "time") %>%
summary()
#> inspect: No issues detected while inspecting data frame.
#>
#> # print.inspect # -----------------------
#> Time Oxygen
#> numeric pass pass
#> Inf/-Inf pass pass
#> NA/NaN pass pass
#> sequential pass -
#> duplicated pass -
#> evenly-spaced pass -
#>
#> -----------------------------------------
#>
#> # summary.auto_rate # -------------------
#>
#> === Summary of Results by Lowest Rate ===
#> rep rank intercept_b0 slope_b1 rsq density row endrow time endtime oxy endoxy rate
#> <num> <int> <num> <num> <num> <lgcl> <int> <int> <int> <int> <num> <num> <num>
#> 1: NA 1 94.69791 -0.0005403066 0.5867958 NA 2259 2859 2258 2858 93.5 93.2 -0.0005403066
#> 2: NA 2 94.70075 -0.0005414343 0.5879174 NA 2258 2858 2257 2857 93.5 93.2 -0.0005414343
#> 3: NA 3 94.70318 -0.0005424790 0.5872572 NA 2260 2860 2259 2859 93.4 93.0 -0.0005424790
#> 4: NA 4 94.70355 -0.0005425454 0.5890225 NA 2257 2857 2256 2856 93.5 93.3 -0.0005425454
#> 5: NA 5 94.23628 -0.0005437062 0.6172363 NA 5843 6443 5842 6442 91.1 90.9 -0.0005437062
#> ---
#> 6909: NA 6909 95.90440 -0.0011924976 0.8592368 NA 794 1394 793 1393 94.9 94.2 -0.0011924976
#> 6910: NA 6910 95.90479 -0.0011924976 0.8592368 NA 796 1396 795 1395 95.0 94.3 -0.0011924976
#> 6911: NA 6911 95.90461 -0.0011925141 0.8592435 NA 795 1395 794 1394 94.9 94.3 -0.0011925141
#> 6912: NA 6912 95.90255 -0.0011926910 0.8647637 NA 774 1374 773 1373 95.0 94.2 -0.0011926910
#> 6913: NA 6913 95.90614 -0.0011938629 0.8595803 NA 791 1391 790 1390 95.0 94.2 -0.0011938629
#>
#> Regressions : 6913 | Results : 6913 | Method : lowest | Roll width : 600 | Roll type : time
#> -----------------------------------------
# What is the highest oxygen consumption rate over a 10 minute (600s) period?
inspect(sardine.rd, time = 1, oxygen =2) %>%
auto_rate(method = "highest", width = 600, by = "time") %>%
summary()
#> inspect: No issues detected while inspecting data frame.
#>
#> # print.inspect # -----------------------
#> Time Oxygen
#> numeric pass pass
#> Inf/-Inf pass pass
#> NA/NaN pass pass
#> sequential pass -
#> duplicated pass -
#> evenly-spaced pass -
#>
#> -----------------------------------------
#>
#> # summary.auto_rate # -------------------
#>
#> === Summary of Results by Highest Rate ===
#> rep rank intercept_b0 slope_b1 rsq density row endrow time endtime oxy endoxy rate
#> <num> <int> <num> <num> <num> <lgcl> <int> <int> <int> <int> <num> <num> <num>
#> 1: NA 1 95.90614 -0.0011938629 0.8595803 NA 791 1391 790 1390 95.0 94.2 -0.0011938629
#> 2: NA 2 95.90255 -0.0011926910 0.8647637 NA 774 1374 773 1373 95.0 94.2 -0.0011926910
#> 3: NA 3 95.90461 -0.0011925141 0.8592435 NA 795 1395 794 1394 94.9 94.3 -0.0011925141
#> 4: NA 4 95.90479 -0.0011924976 0.8592368 NA 796 1396 795 1395 95.0 94.3 -0.0011924976
#> 5: NA 5 95.90440 -0.0011924976 0.8592368 NA 794 1394 793 1393 94.9 94.2 -0.0011924976
#> ---
#> 6909: NA 6909 94.23628 -0.0005437062 0.6172363 NA 5843 6443 5842 6442 91.1 90.9 -0.0005437062
#> 6910: NA 6910 94.70355 -0.0005425454 0.5890225 NA 2257 2857 2256 2856 93.5 93.3 -0.0005425454
#> 6911: NA 6911 94.70318 -0.0005424790 0.5872572 NA 2260 2860 2259 2859 93.4 93.0 -0.0005424790
#> 6912: NA 6912 94.70075 -0.0005414343 0.5879174 NA 2258 2858 2257 2857 93.5 93.2 -0.0005414343
#> 6913: NA 6913 94.69791 -0.0005403066 0.5867958 NA 2259 2859 2258 2858 93.5 93.2 -0.0005403066
#>
#> Regressions : 6913 | Results : 6913 | Method : highest | Roll width : 600 | Roll type : time
#> -----------------------------------------
# What is the NUMERICAL minimum oxygen consumption rate over a 5 minute (300s)
# period in intermittent-flow respirometry data?
# NOTE: because uptake rates are negative, this would actually be
# the HIGHEST uptake rate.
auto_rate(intermittent.rd, method = "minimum", width = 300, by = "time") %>%
summary()
#> auto_rate: Note dataset contains both negative and positive rates. Ensure ordering 'method' is appropriate.
#>
#> # summary.auto_rate # -------------------
#>
#> === Summary of Results by Minimum Rate ===
#> rep rank intercept_b0 slope_b1 rsq density row endrow time endtime oxy endoxy rate
#> <num> <int> <num> <num> <num> <lgcl> <int> <int> <int> <int> <num> <num> <num>
#> 1: NA 1 7.188147 -0.0007172339 0.9100089 NA 152 452 151 451 7.09 6.87 -0.0007172339
#> 2: NA 2 7.187775 -0.0007169567 0.9095930 NA 156 456 155 455 7.09 6.84 -0.0007169567
#> 3: NA 3 7.187642 -0.0007167851 0.9095981 NA 157 457 156 456 7.08 6.85 -0.0007167851
#> 4: NA 4 7.187706 -0.0007163583 0.9096754 NA 155 455 154 454 7.09 6.84 -0.0007163583
#> 5: NA 5 7.187809 -0.0007161603 0.9098124 NA 153 453 152 452 7.09 6.87 -0.0007161603
#> ---
#> 4527: NA 4527 -3.013661 0.0048978592 0.9352893 NA 1823 2123 1822 2122 6.12 7.19 0.0048978592
#> 4528: NA 4528 -3.015906 0.0048982993 0.9353444 NA 1824 2124 1823 2123 6.11 7.18 0.0048982993
#> 4529: NA 4529 -3.020091 0.0048990913 0.9354343 NA 1826 2126 1825 2125 6.11 7.21 0.0048990913
#> 4530: NA 4530 -3.022963 0.0048992849 0.9354567 NA 1828 2128 1827 2127 6.10 7.21 0.0048992849
#> 4531: NA 4531 -3.022005 0.0048994302 0.9354734 NA 1827 2127 1826 2126 6.11 7.21 0.0048994302
#>
#> Regressions : 4531 | Results : 4531 | Method : minimum | Roll width : 300 | Roll type : time
#> -----------------------------------------
# What is the NUMERICAL maximum oxygen consumption rate over a 20 minute
# (1200 rows) period in respirometry data in which oxygen is declining?
# NOTE: because uptake rates are negative, this would actually be
# the LOWEST uptake rate.
sardine.rd %>%
inspect() %>%
auto_rate(method = "maximum", width = 1200, by = "row") %>%
summary()
#> inspect: Applying column default of 'time = 1'
#> inspect: Applying column default of 'oxygen = 2'
#> inspect: No issues detected while inspecting data frame.
#>
#> # print.inspect # -----------------------
#> Time Oxygen
#> numeric pass pass
#> Inf/-Inf pass pass
#> NA/NaN pass pass
#> sequential pass -
#> duplicated pass -
#> evenly-spaced pass -
#>
#> -----------------------------------------
#>
#> # summary.auto_rate # -------------------
#>
#> === Summary of Results by Maximum Rate ===
#> rep rank intercept_b0 slope_b1 rsq density row endrow time endtime oxy endoxy rate
#> <num> <int> <num> <num> <num> <lgcl> <int> <num> <int> <int> <num> <num> <num>
#> 1: NA 1 94.65936 -0.0006119306 0.8860806 NA 5258 6457 5257 6456 91.4 90.9 -0.0006119306
#> 2: NA 2 94.66030 -0.0006121060 0.8873339 NA 5255 6454 5254 6453 91.5 90.7 -0.0006121060
#> 3: NA 3 94.66062 -0.0006121414 0.8861755 NA 5259 6458 5258 6457 91.3 90.7 -0.0006121414
#> 4: NA 4 94.66144 -0.0006122921 0.8873521 NA 5254 6453 5253 6452 91.5 90.8 -0.0006122921
#> 5: NA 5 94.66291 -0.0006124744 0.8881668 NA 5245 6444 5244 6443 91.5 90.6 -0.0006124744
#> ---
#> 6310: NA 6310 95.78939 -0.0010891497 0.9541232 NA 693 1892 692 1891 95.0 93.8 -0.0010891497
#> 6311: NA 6311 95.78963 -0.0010892612 0.9541131 NA 695 1894 694 1893 95.0 93.8 -0.0010892612
#> 6312: NA 6312 95.78948 -0.0010892893 0.9541535 NA 692 1891 691 1890 95.0 93.8 -0.0010892893
#> 6313: NA 6313 95.78974 -0.0010894181 0.9541472 NA 694 1893 693 1892 95.0 93.7 -0.0010894181
#> 6314: NA 6314 95.78956 -0.0010894205 0.9541820 NA 691 1890 690 1889 95.0 93.7 -0.0010894205
#>
#> Regressions : 6314 | Results : 6314 | Method : maximum | Roll width : 1200 | Roll type : row
#> -----------------------------------------
# Perform a rolling regression of 10 minutes width across the entire dataset.
# Results are not ordered under this method.
sardine.rd %>%
inspect() %>%
auto_rate(method = "rolling", width = 600, by = "time") %>%
summary()
#> inspect: Applying column default of 'time = 1'
#> inspect: Applying column default of 'oxygen = 2'
#> inspect: No issues detected while inspecting data frame.
#>
#> # print.inspect # -----------------------
#> Time Oxygen
#> numeric pass pass
#> Inf/-Inf pass pass
#> NA/NaN pass pass
#> sequential pass -
#> duplicated pass -
#> evenly-spaced pass -
#>
#> -----------------------------------------
#>
#> # summary.auto_rate # -------------------
#>
#> === Summary of Results by Rolling Order ===
#> rep rank intercept_b0 slope_b1 rsq density row endrow time endtime oxy endoxy rate
#> <num> <int> <num> <num> <num> <lgcl> <int> <int> <int> <int> <num> <num> <num>
#> 1: NA 1 95.58876 -0.0009658708 0.8098300 NA 1 601 0 600 95.6 95.1 -0.0009658708
#> 2: NA 2 95.58805 -0.0009625044 0.8073351 NA 2 602 1 601 95.6 95.2 -0.0009625044
#> 3: NA 3 95.58799 -0.0009624325 0.8073155 NA 3 603 2 602 95.6 95.0 -0.0009624325
#> 4: NA 4 95.58792 -0.0009623275 0.8072869 NA 4 604 3 603 95.6 95.0 -0.0009623275
#> 5: NA 5 95.58751 -0.0009605309 0.8063767 NA 5 605 4 604 95.6 95.1 -0.0009605309
#> ---
#> 6909: NA 6909 95.52650 -0.0007437493 0.7536903 NA 6909 7509 6908 7508 90.4 90.0 -0.0007437493
#> 6910: NA 6910 95.50646 -0.0007409356 0.7508723 NA 6910 7510 6909 7509 90.4 90.1 -0.0007409356
#> 6911: NA 6911 95.48630 -0.0007381054 0.7480377 NA 6911 7511 6910 7510 90.3 90.1 -0.0007381054
#> 6912: NA 6912 95.46638 -0.0007352640 0.7432546 NA 6912 7512 6911 7511 90.4 90.2 -0.0007352640
#> 6913: NA 6913 95.42246 -0.0007290949 0.7329032 NA 6913 7513 6912 7512 90.4 90.3 -0.0007290949
#>
#> Regressions : 6913 | Results : 6913 | Method : rolling | Roll width : 600 | Roll type : time
#> -----------------------------------------
# }