Annotation Engine vs Data update
Background
Update to annotations can be broadly categorized into two categories:
- Annotation engine (Annotator) update.
- Annotation data update.
Understanding the nature of these two types of updates is key when it comes to updating annotation.
Annotator update
The annotator is the engine that contains logic for core annotations such as computing variant consequences, HGVS notations, mapped positions (e.g. CDNA, CDS, protein positions), detecting gene fusions, etc., and perform annotation lookups from external data sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. also known as supplementary annotations (SA). Update to the annotator entails new features or bugfixes to the compute or lookup mechanism. This is completely independent of the data update such as updating dbSNP from v154 to v155. In other words, the same annotator can annotate with dbSNP v154 and dbSNP v155 when provided with the appropriate data files.
Data update
The annotator uses data from various sources (listed in Introduction). For example, gene models used for core annotations are obtained from RefSeq and Ensembl. Supplementary annotations come from various sources such as dbSNP, gnomAD, ClinVar, OMIM, etc. Any of these data can be updated without updating the annotator as long as the file formats are compatible.
Update scenarios
Let us look at a few update scenarios.
| Requirement | What needs to be updated /added | Suggested action |
|---|---|---|
| New transcripts and gene symbols | Cache files from RefSeq and Ensembl | Run DataManager |
| Update ClinVar | ClinVar SA files | Run DataManager |
| New external annotation | New SA files required | Submit feature request |
| New annotation feature | Annotator | Submit feature request |