Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.
The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.
The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.
What does Illumina Connected Annotations annotate?
We use Sequence Ontology consequences to describe how each variant impacts a given transcript:
Reference genome
For GRCh37, For GRCh38, our reference genome is using version GRCh38.p14. The list of chromosome and contigs in this version can be checked from this file
caution
There was a bug in our previous genome reference. Previously, we generate genome reference based on GRCh38.p13 version 109.20190607 for chromosomes and contigs name. Despite that, we use GRCh38.p12 version 109 for the FASTA sequence. This resulted in mismatch between some alt contigs name and its FASTA sequence. This mismatch can cause some minor issue on the variant ID and HGVS g. notation if the variant is on the alt contigs that were affected. This issue is not present for variant that occurs on main chromosome (chromosome 1-22, X, and Y).
With release GRCh38.p14, we have fix this issue and the GRCh38.p14 release will also backward compatible with previous release. Old transcript and gene model files and all of supplementary annotation data will not have any issue with the new genome reference version. It is advisable to update to the current genome reference version.
Transcript and Gene Models
The transcript and gene models are obtained from RefSeq and Ensembl. The current officially supported versions for GRCh38 are:
Data Source | Version | Release Date |
---|---|---|
RefSeq | GCF_000001405.40-RS_2023_10 | 2023-10-07 |
Ensembl | 112 | 2024-05-14 |
For GRCh37:
Data Source | Version | Release Date |
---|---|---|
RefSeq | 105.20220307 | 2022-03-10 |
Ensembl | 110 | 2023-02-08 |
note
For GRCh37 Ensembl release, despite the version is 110 (and they also have newer release version 111 and 112), the annotation data is effectively the same as version 87 which was release back in 2017.
For GRCh37 Refseq release, NCBI has stopped releasing new gene annotation for and version 105.20220307 is the last release version for GRCh37.
For gene symbols that we supported for the transcript model above, we download the data from HGNC website as of 2024-06-03.
Supplementary Annotation
In addition, it uses external data sources to provide additional context for each variant. Illumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic. The basic tier can be accessed free of charge. The professional tier requires a license. Please see Licensed Content for details. For access, please contact annotation_support@illumina.com.
Data Source | Availability | Latest Supported Version |
---|---|---|
COSMIC | Professional | 99 |
OMIM | Professional | 20240807 |
Primate AI-3D | Professional | 1.0 |
Promoter AI | Professional | 1.0 |
Splice AI | Professional | 1.3 |
1000 Genomes Project | Basic | Phase 3 v3plus |
Cancer Hotspots | Basic | 2017 |
ClinGen | Basic | 20240807 |
ClinVar | Basic | 20240730 |
DANN | Basic | 20200205 |
dbSNP | Basic | 156 |
DECIPHER | Basic | 201509 |
FusionCatcher | Basic | 1.33 |
GERP | Basic | 20110522 |
GME Variome | Basic | 20160618 |
gnomAD (GRCh37) | Basic | 2.1 |
gnomAD (GRCh38) | Basic | 4.1 |
MITOMAP | Basic | 20200819 |
MultiZ 100 way | Basic | 20171006 |
REVEL | Basic | 20200205 |
TOPMed | Basic | freeze 5 |
Download manager
To effectively download all of data, we have provided download manager. Please go to DataManager page to read more.
Download
Please visit Illumina Connected Annotations.