Check data quality and make suggestions to user about what analyses to run.

QC_checks takes as input a bakRFit or bakRFnFit object and uses the Fast_Fit object to assess data quality and make suggestions about which implementation to run next. QC_checks takes into account the mutation rates in all samples, the fraction new distributions, the reproducibility of fraction new estimates, and the read lengths. It then outputs a number of diagnostic plots that might alert users to problems in their data. It also outputs messages informing users what implementation is best used next.

Usage

QC_checks(obj)

Arguments

obj: bakRFit object

Value

A list with 3 components:

raw_mutrates. This is a plot of the raw T-to-C mutation rates in all samples analyzed by bakR. It includes horizontal lines as reference for what could be considered "too low" to be useful in s4U fed samples.
conversion_rates. This is a plot of the estimated T-to-C mutation rates in new and old reads. Thus, each bar represents the probability that a U in a new/old read is mutated. It includes horizontal lines as reference for what could be considered good mutation rates.
correlation_plots. This is a list of ggplot objects. Each is a scatter plot comparing estimates of the fraction new in one replicate to another replicate in the same experimental condition. A y=x guide line is included to reveal any estimation biases.

Examples

# \donttest{
# Simulate data for 500 genes and 2 replicates
sim <- Simulate_bakRData(500, nreps = 2)

# Fit data with fast implementation
Fit <- bakRFit(sim$bakRData)
#> Finding reliable Features
#> Filtering out unwanted or unreliable features
#> Processing data...
#> Estimating pnew with likelihood maximization
#> Estimating unlabeled mutation rate with -s4U data
#> Estimated pnews and polds for each sample are:
#> # A tibble: 4 × 4
#> # Groups:   mut [2]
#>     mut  reps   pnew     pold
#>   <int> <dbl>  <dbl>    <dbl>
#> 1     1     1 0.0500 0.000993
#> 2     1     2 0.0500 0.000993
#> 3     2     1 0.0501 0.000993
#> 4     2     2 0.0501 0.000993
#> Estimating fraction labeled
#> Estimating per replicate uncertainties
#> Estimating read count-variance relationship
#> Averaging replicate data and regularizing estimates
#> Assessing statistical significance
#> All done! Run QC_checks() on your bakRFit object to assess the 
#>             quality of your data and get recommendations for next steps.

# Run QC
QC <- QC_checks(Fit)
#> Mutation rates in new reads looks good!
#> Background mutation rate looks good!
#> Average fraction news for each sample are:
#> # A tibble: 4 × 3
#> # Groups:   Exp_ID [2]
#>   Exp_ID Replicate avg_fn
#>    <int>     <dbl>  <dbl>
#> 1      1         1  0.505
#> 2      1         2  0.510
#> 3      2         1  0.506
#> 4      2         2  0.500
#> The average fraction news in all samples are between 0.2 and 0.8, 
#>               suggesting an appropriate label time!
#> logit(fn) correlations for each pair of replicates are:
#>   Exp_ID Rep_ID1 Rep_ID2 correlation
#> 1      1       1       2   0.9253125
#> 2      2       1       2   0.9377533
#> logit(fn) correlations are high, suggesting good reproducibility!
#> I suggest running the Hybrid implementation next. This can be done 
#>               with bakRFit(Fit, HybridFit = TRUE), where Fit is your bakRFit object.

# }