Check data quality and make suggestions to user about what analyses to run.
Source:R/QC.R
QC_checks.Rd
QC_checks
takes as input a bakRFit
or bakRFnFit
object and uses the Fast_Fit object to assess
data quality and make suggestions about which implementation to run next. QC_checks
takes into account the mutation rates in all samples, the fraction new distributions, the reproducibility
of fraction new estimates, and the read lengths. It then outputs a number of
diagnostic plots that might alert users to problems in their data. It also
outputs messages informing users what implementation is best used next.
Value
A list with 3 components:
raw_mutrates. This is a plot of the raw T-to-C mutation rates in all samples analyzed by bakR. It includes horizontal lines as reference for what could be considered "too low" to be useful in s4U fed samples.
conversion_rates. This is a plot of the estimated T-to-C mutation rates in new and old reads. Thus, each bar represents the probability that a U in a new/old read is mutated. It includes horizontal lines as reference for what could be considered good mutation rates.
correlation_plots. This is a list of ggplot objects. Each is a scatter plot comparing estimates of the fraction new in one replicate to another replicate in the same experimental condition. A y=x guide line is included to reveal any estimation biases.
Examples
# \donttest{
# Simulate data for 500 genes and 2 replicates
sim <- Simulate_bakRData(500, nreps = 2)
# Fit data with fast implementation
Fit <- bakRFit(sim$bakRData)
#> Finding reliable Features
#> Filtering out unwanted or unreliable features
#> Processing data...
#> Estimating pnew with likelihood maximization
#> Estimating unlabeled mutation rate with -s4U data
#> Estimated pnews and polds for each sample are:
#> # A tibble: 4 × 4
#> # Groups: mut [2]
#> mut reps pnew pold
#> <int> <dbl> <dbl> <dbl>
#> 1 1 1 0.0500 0.000993
#> 2 1 2 0.0500 0.000993
#> 3 2 1 0.0501 0.000993
#> 4 2 2 0.0501 0.000993
#> Estimating fraction labeled
#> Estimating per replicate uncertainties
#> Estimating read count-variance relationship
#> Averaging replicate data and regularizing estimates
#> Assessing statistical significance
#> All done! Run QC_checks() on your bakRFit object to assess the
#> quality of your data and get recommendations for next steps.
# Run QC
QC <- QC_checks(Fit)
#> Mutation rates in new reads looks good!
#> Background mutation rate looks good!
#> Average fraction news for each sample are:
#> # A tibble: 4 × 3
#> # Groups: Exp_ID [2]
#> Exp_ID Replicate avg_fn
#> <int> <dbl> <dbl>
#> 1 1 1 0.505
#> 2 1 2 0.510
#> 3 2 1 0.506
#> 4 2 2 0.500
#> The average fraction news in all samples are between 0.2 and 0.8,
#> suggesting an appropriate label time!
#> logit(fn) correlations for each pair of replicates are:
#> Exp_ID Rep_ID1 Rep_ID2 correlation
#> 1 1 1 2 0.9253125
#> 2 2 1 2 0.9377533
#> logit(fn) correlations are high, suggesting good reproducibility!
#> I suggest running the Hybrid implementation next. This can be done
#> with bakRFit(Fit, HybridFit = TRUE), where Fit is your bakRFit object.
# }