Correcting for metabolic labeling induced RNA dropout

Dropout is the name given to a phenomenon originally identified by our lab and further detailed in two independent publications (Zimmer et al. (2023), and Berg et al. (2023)). Dropout is the under-representation of reads from RNA containing metabolic label (4-thiouridine or 6-thioguanidine most commonly). Loss of 4-thiouridine (s4U) containing RNA on plastic surfaces and RT dropoff caused by modifications on s4U introduced by recoding chemistry have been attributed as the likely causes of this phenomenon. While protocols can be altered in ways to drastically reduce this source of dropout, you may still have datasets that you want to analyze with bakR collected with suboptimal handling. That is where CorrectDropout comes in.

Usage

CorrectDropout(
  obj,
  scale_init = 1.05,
  pdo_init = 0.3,
  recalc_uncertainty = FALSE,
  ...
)

Arguments

obj: bakRFit object
scale_init: Numeric; initial estimate for -s4U/+s4U scale factor. This is the factor difference in RPM normalized read counts for completely unlabeled transcripts (i.e., highly stable transcript) between the +s4U and -s4U samples.
pdo_init: Numeric; initial estimtae for the dropout rate. This is the probability that an s4U labeled RNA molecule is lost during library prepartion.
recalc_uncertainty: Logical; if TRUE, then fraction new uncertainty is recalculated using adjusted fn and a simple binomial model of estimate uncertainty. This will provide a slight underestimate of the fn uncertainty, but will be far less biased for low coverage features, or for samples with low pnews.
...: Additional (optional) parameters to be passed to stats::nls()

Value

A bakRFit or bakRFnFit object (same type as was passed in). Fraction new estimates and read counts in Fast_Fit$Fn_Estimates and (in the case of a bakRFnFit input) Data_lists$Fn_Estare dropout corrected. A count matrix with corrected read counts (Data_lists$Count_Matrix_corrected) is also output, along with a data frame with information about the dropout rate estimated for each sample (Data_lists$Dropout_df).

Details

CorrectDropout estimates the percentage of 4-thiouridine containing RNA that was lost during library preparation (pdo). It then uses this estimate of pdo to correct fraction new estimates and read counts. Both corrections are analytically derived from a rigorous generative model of NR-seq data. Importantly, the read count correction preserves the total library size to avoid artificially inflating read counts.

Examples

# \donttest{
# Simulate data for 500 genes and 2 replicates with 40% dropout
sim <- Simulate_relative_bakRData(500, 100000, nreps = 2, p_do = 0.4)

# Fit data with fast implementation
Fit <- bakRFit(sim$bakRData)
#> Finding reliable Features
#> Filtering out unwanted or unreliable features
#> Processing data...
#> Estimating pnew with likelihood maximization
#> Estimating unlabeled mutation rate with -s4U data
#> Estimated pnews and polds for each sample are:
#> # A tibble: 4 × 4
#> # Groups:   mut [2]
#>     mut  reps   pnew    pold
#>   <int> <dbl>  <dbl>   <dbl>
#> 1     1     1 0.0501 0.00102
#> 2     1     2 0.0505 0.00102
#> 3     2     1 0.0498 0.00102
#> 4     2     2 0.0500 0.00102
#> Estimating fraction labeled
#> Estimating per replicate uncertainties
#> Estimating read count-variance relationship
#> Averaging replicate data and regularizing estimates
#> Assessing statistical significance
#> All done! Run QC_checks() on your bakRFit object to assess the 
#>             quality of your data and get recommendations for next steps.

# Correct for dropout
Fit <- CorrectDropout(Fit)
#> Estimated rates of dropout are:
#>   Exp_ID Replicate       pdo
#> 1      1         1 0.4169686
#> 2      1         2 0.2494506
#> 3      2         1 0.0000000
#> 4      2         2 0.0000000
#> Mapping sample name to sample characteristics
#> Filtering out low coverage features
#> Processing data...
#> Estimating read count-variance relationship
#> Averaging replicate data and regularizing estimates
#> Assessing statistical significance
#> All done! Run QC_checks() on your bakRFit object to assess the 
#>             quality of your data and get recommendations for next steps.

# }