Estimate highest density intervals and success rates from hap.py counts using a Binomial model and empirical Bayes. See package docs for details on method implementation.

estimate_hdi(df, successes_col, totals_col, group_cols, aggregate_only = TRUE,
  significance = 0.05, sample_size = 1e+05, max_alpha1 = 1000)

Arguments

df

A data.frame. Required columns: Replicate.Id, Subset, columns specified in group_cols argument.

successes_col

Name of the column that contains success counts.

totals_col

Name of the column that contains total counts.

group_cols

Vector of columns to group counts by. Observations within the same group will be treated as replicates.

aggregate_only

Estimate HDIs for aggregate replicate only (speeds up execution). Default: TRUE.

significance

Significance for HDI estimation. Default: 0.05 (= 95% HDIs).

sample_size

Number of observations to draw from the Beta posterior to estimate HDIs. Default: 1e5.

max_alpha1

Upper bound for alpha hyperparameter in the aggregate Beta posterior.

Value

A data.frame with performance counts, model hyperparameters, success rate and HDI estimates.

Examples

# NOT RUN { hdi <- estimate_hdi(df, successes_col = 'TRUTH.TP', totals_col = 'TRUTH.TOTAL', group_cols = c('Group.Id', 'Subset', 'Type', 'Subtype')) # }