Identify features (e.g., transcripts) with high quality data

This function identifies all features (e.g., transcripts, exons, etc.) for which the mutation rate is below a set threshold in the control (-s4U) sample and which have more reads than a set threshold in all samples. If there is no -s4U sample, then only the read count cutoff is considered. Additional filtering options are only relevant if working with short RNA-seq read data. This includes filtering out features with extremely low empirical U-content (i.e., the average number of Us in sequencing reads from that feature) and those with very few reads having at least 3 Us in them.

Usage

reliableFeatures(
  obj,
  high_p = 0.2,
  totcut = 50,
  totcut_all = 10,
  Ucut = 0.25,
  AvgU = 4
)

Arguments

obj: Object of class bakRData
high_p: highest mutation rate accepted in control samples
totcut: Numeric; Any transcripts with less than this number of sequencing reads in any replicate of all experimental conditions are filtered out
totcut_all: Numeric; Any transcripts with less than this number of sequencing reads in any sample are filtered out
Ucut: Must have a fraction of reads with 2 or less Us less than this cutoff in all samples
AvgU: Must have an average number of Us greater than this

Value

vector of gene names that passed reliability filter

Examples

# \donttest{

# Load cB
data("cB_small")

# Load metadf
data("metadf")

# Create bakRData
bakRData <- bakRData(cB_small, metadf)

# Find reliable features
features_to_keep <- reliableFeatures(obj = bakRData)
# }