auto_rate()
's linear method.R/sim_data.R
sim_data.Rd
Generate data of size len
that is coerced to mimic common respirometry
data. This is an internal function not intending for public use, though may
prove of interest or utility. We may modify this function at any time, which
may irreversibly change the outputs. This function was first created to test
auto_rate()
using the other internal function, test_lin()
, but we decided
to publish the code as it is an effective (and visually-appealing) tool for
teaching, testing and visualising purposes. This function is by no means
comprehensive and we encourage users to generate data that suit their own
unique situations.
sim_data(len = 300, type = "default", sd = 0.05, preview = TRUE)
numeric. Defaults at 300. Number of observations in the dataset.
character. What kind of data should the function generate? Available for use: "default", "corrupted" and "segmented".
numeric. Defaults at 0.05. This is the amount of noise to add to the system, randomly generated as a standard deviation based on the entire data set.
logical. Defaults to TRUE. Plots the generated data as an xy.
A list containing the dataframe, the slope and the length of the
linear section of the data, to be used for analysis in the function
test_lin()
.
sim_data()
creates 3 types of data that we think are common in respirometry
or oxygen flux data. The data types can be selected using the type
input:
"default"
: data is made up of a linear segment of known length and slope,
with a non-linear segment, generated by a sine or cosine function depending
on whether the slope is positive or negative, appended to the beginning of
the data. The shape of the dataset is designed to mimic many similar data
whereby the initial sections of the data are often non-linear. Here the slope
is randomly generated using rnorm(1, 0, 0.025)
, the length of the inital
segment randomly generated using floor(abs(rnorm(1, .25*len, .05*len)))
where len
is the total number of observations in the data, and the
amplitude of the segment also randomly generated using rnorm(1, .8, .05)
.
"corrupted"
: same as "default"
, but "corrupted" data is inserted
randomly at any point in the linear segment. The data corruption is chosen as
a sudden dip in the reading, which recovers. This event mimics equipment
interference that sometimes happens, but does not necessarily invalidate the
dataset if the corrupted section is omitted from analysis, The dip is
generated by a cosine function of fixed amplitude of 1, and the length is
randomly generated.
"segmented"
: same as "default"
, but the data is modified to contain two
linear segments. The slope of the second linear segment is randomly picked at
between 0.5 and 0.6 of the first linear segment. Its length is also randomly
generated but always smaller than the first segment.
Normally-distributed noise is added to the dataset to add variation, which
can be modified using the "sd"
input.
# Generate data of length 200
sim_data(len = 200)
# Generate data that contains a "corruption"
sim_data(type = "corrupted")
# Generate noisy data
sim_data(type = "segmented", sd = .2)
# Generate "perfect" non-noisy data
sim_data(sd = 0)