fn_process
creates the data structures necessary to analyze nucleotide recoding RNA-seq data with the
MLE and Hybrid implementations in bakRFit
. The input to fn_process
must be an object of class
bakRFnData
.
Usage
fn_process(
obj,
totcut = 50,
totcut_all = 10,
Chase = FALSE,
FOI = c(),
concat = TRUE
)
Arguments
- obj
An object of class bakRFnData
- totcut
Numeric; Any transcripts with less than this number of sequencing reads in any replicate of all experimental conditions are filtered out
- totcut_all
Numeric; Any transcripts with less than this number of sequencing reads in any sample are filtered out
- Chase
Boolean; if TRUE, pulse-chase analysis strategy is implemented
- FOI
Features of interest; character vector containing names of features to analyze. If
FOI
is non-null andconcat
is TRUE, then all minimally reliable FOIs will be combined with reliable features passing all set filters (totcut
andtotcut_all
). Ifconcat
is FALSE, only the minimally reliable FOIs will be kept. A minimally reliable FOI is one that passes filtering with minimally stringent parameters.- concat
Boolean; If TRUE, FOI is concatenated with output of reliableFeatures
Value
returns list of objects that can be passed to TL_stan
and/or fast_analysis
. Those objects are:
Stan_data; list that can be passed to
TL_stan
with Hybrid_Fit = TRUE. Consists of metadata as well as data thatStan
will analyze. Data to be analyzed consists of equal length vectors. The contents of Stan_data are:NE; Number of datapoints for 'Stan' to analyze (NE = Number of Elements)
NF; Number of features in dataset
TP; Numerical indicator of s4U feed (0 = no s4U feed, 1 = s4U fed)
FE; Numerical indicator of feature
num_mut; Number of U-to-C mutations observed in a particular set of reads
MT; Numerical indicator of experimental condition (Exp_ID from metadf)
nMT; Number of experimental conditions
R; Numerical indicator of replicate
nrep; Number of replicates (maximum across experimental conditions)
nrep_vect; Vector of number of replicates in each experimental condition
tl; Vector of label times for each experimental condition
Avg_Reads; Standardized log10(average read counts) for a particular feature in a particular condition, averaged over replicates
sdf; Dataframe that maps numerical feature ID to original feature name. Also has read depth information
sample_lookup; Lookup table relating MT and R to the original sample name
Fn_est; A data frame containing fraction new estimates for +s4U samples:
sample; Original sample name
XF; Original feature name
fn; Fraction new estimate
n; Number of reads
Feature_ID; Numerical ID for each feature
Replicate; Numerical ID for each replicate
Exp_ID; Numerical ID for each experimental condition
tl; s4U label time
logit_fn; logit of fraction new estimate
kdeg; degradation rate constant estimate
log_kdeg; log of degradation rate constant estimate
logit_fn_se; Uncertainty of logit(fraction new) estimate
log_kd_se; Uncertainty of log(kdeg) estimate
Count_Matrix; A matrix with read count information. Each column represents a sample and each row represents a feature. Each entry is the raw number of read counts mapping to a particular feature in a particular sample. Column names are the corresponding sample names and row names are the corresponding feature names.
Ctl_data; Identical content to Fn_est but for any -s4U data (and thus with fn estimates set to 0). Will be
NULL
if no -s4U data is present
Details
fn_process
first filters out features with less than totcut reads in any sample. It then
creates the necessary data structures for analysis with bakRFit
and some of the visualization
functions (namely plotMA
).
The 1st step executed by fn_process
is to find the names of features which are deemed "reliable". A reliable feature is one with
sufficient read coverage in every single sample (i.e., > totcut_all reads in all samples) and sufficient read coverage in at all replicates
of at least one experimental condition (i.e., > totcut reads in all replicates for one or more experimental conditions). This is done with a call to reliableFeatures
.
The 2nd step is to extract only reliableFeatures from the fns dataframe in the bakRFnData
object. During this process, a numerical
ID is given to each reliableFeature, with the numerical ID corresponding to their order when arranged using dplyr::arrange
.
The 3rd step is to prepare data structures that can be passed to fast_analysis
and TL_stan
(usually accessed via the
bakRFit
helper function).