Simpler interface to retrieve a data.frame of PR metrics from a happy_result object.

pr_data(happy_result, var_type = c("both", "snv", "indel"),
  filter = c("ALL", "PASS", "SEL"), subtype = c("*", "C16_PLUS",
  "C1_5", "C6_15", "D16_PLUS", "D1_5", "D6_15", "I16_PLUS", "I1_5",
  "I6_15"), subset = "*", quietly = TRUE)

Arguments

happy_result

a happy result loaded via read_happy

var_type

subset for either insertions and deletions "indel", SNVs "snv" or keep both

filter

include all records (ALL), only passing (PASS) or with selective filters applied (SEL)

subtype

variant subtype of the form [IDC]length_range, e.g. "D6_15" is deletions of length \(>=5\) and \(<=15\)

subset

when run with stratification regions, the subset is the region ID. "*" for genome-wide PR data. See details.

quietly

suppress info messages

Value

a data.frame of Precision-Recall metrics for the selected subset

Details

Subsets: hap.py v0.3.7+ writes subsets TS_contained and TS_boundary by default, corresponding to truth variants well contained or at the boundary of confident regions. In some truthsets, those in TS_boundary will show worse performance metrics due to issues with variant representation or a partial haplotype description.

Subtypes: Insertion subtypes are of the form: [IDC]length_range where the first letter indicates the variant classification: I insertion; D deletion; and C complex. Hap.py bins the lengths of these records into ranges by ALT allele length in basepairs: 1_5, 6_15 and 16_PLUS.

Examples

# figure out prefix from pkg install location happy_input <- system.file("extdata", "happy_demo.summary.csv", package = "happyR") happy_prefix <- sub(".summary.csv", "", happy_input) # load happy result hapdata <- read_happy(happy_prefix)
#> Reading summary table
#> Reading extended table
#> Reading precision-recall curve data
# long deletion PR curve del_pr <- pr_data(hapdata, var_type = "indel", subtype = "D16_PLUS")