Illumina Annotator JSON File Format
Overview
Conventions
In the Illumina Annotator JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:
- With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display
"isStructuralVariant":false
a few million times when annotating a small variant VCF. - When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Annotator treats periods like empty or null strings and therefore will not output those entries.
JSON Layout
info
In general, each position corresponds to a row in the original VCF file.
For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.
Parsing
info
We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.
Header
{
"header":{
"annotator":"IlluminaAnnotator 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
Field | Type | Notes |
---|---|---|
annotator | string | the name of the annotator and the current version |
creationTime | string | yyyy-MM-dd hh:mm:ss |
genomeAssembly | string | see possible values below |
schemaVersion | integer | incremented whenever the core structure of the JSON file introduces breaking changes |
dataVersion | string | |
dataSources | object array | see Data Source entry below |
samples | string array | the order of these sample names will be used throughout the JSON file when enumerating samples |
Data Source
Field | Type | Notes |
---|---|---|
name | string | |
version | string | |
description | string | optional description of the data source |
releaseDate | string | yyyy-MM-dd |
Genome Assemblies
- GRCh37
- GRCh38
- hg19
- SARSCoV2
Positions
"positions":[
{
"chromosome":"chr2",
"position":48010488,
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
Field | Type | Variant Type | Notes |
---|---|---|---|
chromosome | string | all | exactly as displayed in the vcf |
position | integer | all | exactly as displayed in the vcf (1-based notation). Range: 1 - 250 million |
repeatUnit | string | STR | provided by ExpansionHunter |
refRepeatCount | integer | STR | provided by ExpansionHunter |
svEnd | integer | SV | |
refAllele | string | all | exactly as displayed in the vcf |
altAllele | string array | all | exactly as displayed in the vcf |
quality | float | all | exactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k) |
filters | string array | all | exactly as displayed in the vcf |
ciPos | integer array | SV | |
ciEnd | integer array | SV | |
svLength | integer | SV | |
strandBias | float | small variant | provided by GATK (from SB) |
jointSomaticNormalQuality | integer | SV | provided by the Manta variant caller (SOMATICSCORE) |
cytogeneticBand | string | all | e.g. 17p13.1 |
ClinGen
"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
Field | Type | Notes |
---|---|---|
clingen | object array | |
chromosome | string | Ensembl-style chromosome names |
begin | integer | 1-based position |
end | integer | 1-based position |
variantType | string | Any of the sequence alterations defined here. |
id | string | Identifier from the data source. Alternatively a VID |
clinicalInterpretation | string | see possible values below |
observedGains | integer | Range: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain. |
observedLosses | integer | Range: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain. |
validated | boolean | |
phenotypes | string array | Description of the phenotype. |
phenotypeIds | string array | Description of the phenotype IDs. |
reciprocalOverlap | floating point | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions). |
clinicalInterpretation
- benign
- curated benign
- curated pathogenic
- likely benign
- likely pathogenic
- path gain
- path loss
- pathogenic
- uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
Field | Type | Notes |
---|---|---|
clingenDosageSensitivityMap | object array | |
chromosome | string | Ensembl-style chromosome names |
begin | integer | 1-based position |
end | integer | 1-based position |
haploinsufficiency | string | see possible values below |
triplosensitivity | string | (same as haploinsufficiency) |
reciprocalOverlap | floating point | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions). |
annotationOverlap | floating point | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions). |
haploinsufficiency and triplosensitivity
- no evidence to suggest that dosage sensitivity is associated with clinical phenotype
- little evidence suggesting dosage sensitivity is associated with clinical phenotype
- emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
- sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
- gene associated with autosomal recessive phenotype
- dosage sensitivity unlikely
1000 Genomes (SV)
"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
Field | Type | Notes |
---|---|---|
chromosome | string | |
begin | integer | |
end | integer | |
variantType | string | |
id | string | |
allAn | integer | allele number for all populations. Non-zero integer. |
allAc | integer | allele count for all populations. Integer. |
allAf | floating point | allele frequency for all populations. Range: 0 - 1.0 |
afrAf | floating point | allele frequency for the African super population. Range: 0 - 1.0 |
amrAf | floating point | allele frequency for the Ad Mixed American super population. Range: 0 - 1.0 |
eurAf | floating point | allele frequency for the European super population. Range: 0 - 1.0 |
easAf | floating point | allele frequency for the East Asian super population. Range: 0 - 1.0 |
sasAf | floating point | allele frequency for the South Asian super population. Range: 0 - 1.0 |
reciprocalOverlap | floating point | range: 0 - 1. |
gnomAD (SV)
"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]
Field | Type | Notes |
---|---|---|
chromosome | string | chromosome number |
begin | integer | position interval start |
end | integer | position internal end |
variantType | string | structural variant type |
variantId | string | gnomAD ID |
allAf | floating point | allele frequency for all populations. Range: 0 - 1.0 |
afrAf | floating point | allele frequency for the African super population. Range: 0 - 1.0 |
amrAf | floating point | allele frequency for the Ad Mixed American super population. Range: 0 - 1.0 |
easAf | floating point | allele frequency for the East Asian super population. Range: 0 - 1.0 |
eurAf | floating point | allele frequency for the European super population. Range: 0 - 1.0 |
othAf | floating point | allele frequency for all other populations. Range: 0 - 1.0 |
femaleAf | floating point | allele frequency for female population. Range: 0 - 1.0 |
maleAf | floating point | allele frequency for male population. Range: 0 - 1.0 |
allAc | integer | allele count for all populations. |
afrAc | integer | allele count for the African super population. |
amrAc | integer | allele count for the Ad Mixed American super population. |
easAc | integer | allele count for the East Asian super population. |
eurAc | integer | allele count for the European super population. |
othAc | integer | allele count for all other populations. |
maleAc | integer | allele count for male population. |
femaleAc | integer | allele count for female population. |
allAn | integer | allele number for all populations. |
afrAn | integer | allele number for the African super population. |
amrAn | integer | allele number for the Ad Mixed American super population. |
easAn | integer | allele number for the East Asian super population. |
eurAn | integer | allele number for the European super population. |
othAn | integer | allele number for all other populations. |
femaleAn | integer | allele number for female population. |
maleAn | integer | allele number for male population. |
allHc | integer | count of homozygous individuals for all populations. |
afrHc | integer | count of homozygous individuals for the African / African American population. |
amrHc | integer | count of homozygous individuals for the Latino population. |
easHc | integer | count of homozygous individuals for the East Asian population. |
eurAc | integer | count of homozygous individuals for the European super population. |
othHc | integer | count of homozygous individuals for all other populations. |
maleHc | integer | count of homozygous individuals for male population. |
femaleHc | integer | count of homozygous individuals for female population. |
failedFilter | boolean | True if this variant failed any filters (Note: we do not list the failed filters) |
reciprocalOverlap | floating point | Reciprocal overlap. Range: 0 - 1.0 |
annotationOverlap | floating point | Reciprocal overlap. Range: 0 - 1.0 |
Note: Following fields are not available in GRCh38 because the source file does not contain this information:
Field |
---|
femaleAf |
maleAf |
maleAc |
femaleAc |
femaleAn |
maleAn |
allHc |
afrHc |
amrHc |
easHc |
eurAc |
othHc |
maleHc |
femaleHc |
failedFilter |
MITOMAP (SV)
"mitomap":[
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
Field | Type | Notes |
---|---|---|
chromosome | string | |
begin | integer | |
end | integer | |
variantType | string array | |
reciprocalOverlap | float | Range: 0 - 1. Specified up to 5 decimal places |
annotationOverlap | float | Range: 0 - 1. Specified up to 5 decimal places |
Samples
"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
Field | Type | VCF | Notes |
---|---|---|---|
genotype | string | GT | |
variantFrequencies | float array | VF, AD | range: 0 - 1.0. One value per alternate allele |
totalDepth | integer | DP | non-negative integer values |
genotypeQuality | integer | GQ | non-negative integer values. Typically maxes out at 99 |
copyNumber | integer | CN | non-negative integer values |
minorHaplotypeCopyNumber | integer | MCN | non-negative integer values |
repeatUnitCounts | integer array | REPCN | ExpansionHunter-specific |
alleleDepths | integer array | AD | non-negative integer values |
failedFilter | bool | FT | |
splitReadCounts | integer array | SR | Manta-specific |
pairedEndReadCounts | integer array | PR | Manta-specific |
isDeNovo | bool | DN | |
deNovoQuality | float | DQ | |
diseaseAffectedStatuses | string array | DST | ExpansionHunter-specific |
artifactAdjustedQualityScore | float | AQ | PEPE-specific. Range: 0 - 100.0 |
likelihoodRatioQualityScore | float | LQ | PEPE-specific. Range: 0 - 100.0 |
lossOfHeterozygosity | bool | CN, MCN | |
somaticQuality | float | SQ | |
heteroplasmyPercentile | float | VF | range: 0 - 100. 2 decimal places. One value per alternate allele |
binCount | integer | BC | non-negative integer values |
Empty Samples
If a sample does not contain any entries, we will create a sample object that contains the isEmpty
key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.
"samples":[
{
"isEmpty":true
}
],
Variants
"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"isDecomposedVariant":true,
"isRecomposedVariant":true,
"linkedVids":["2:48010488:GTA:ATC"],
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
Field | Type | Notes |
---|---|---|
vid | string | see Variant Identifiers |
chromosome | string | |
begin | int | 1-based non-negative integer values. Range: 1 - 250 million |
end | int | 1-based non-negative integer values. Range: 1 - 250 million |
isReferenceMinorAllele | bool | true when this is a reference minor allele |
isStructuralVariant | bool | true when the variant is a structural variant |
inLowComplexityRegion | bool | true when the variant lies in a low complexity region (gnomAD low complexity regions) |
refAllele | string | parsimonious representation of the reference allele |
altAllele | string | parsimonious representation of the alternate allele. |
variantType | string | uses Sequence Ontology sequence alterations |
isDecomposedVariant | bool | true when the decomposed variant has been used to create another recomposed variant |
isRecomposedVariant | bool | true when the variant is recomposed from two or more decomposed variants |
linkedVids | string array | list of VIDs for variants connecting decomposed and recomposed variants |
hgvsg | string | HGVS g. notation |
phylopScore | float | phyloP conservation score. Range: -14.08 to 6.424 |
Reference Minor Alleles
Illumina Annotator supports annotating reference minor alleles. In such a case, refAllele
will be replaced by the global major allele and altAllele
will be replaced with the original reference allele.
Flagging Decomposed & Recomposed Variants
When two or more decomposed variants are recomposed into an MNV, the decomposed variants will be marked with "isDecomposedVariant":true
.
Similarly, the recomposed variant will be shown as a new VCF position. This recomposed variant will be flagged with "isRecomposedVariant":true
.
Transcripts
"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"nonsense_mediated_decay",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268",
"cdsPos":"116",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"polyPhenScore":0.95,
"polyPhenPrediction":"probably damaging",
"proteinId":"ENSP00000405294.1",
"siftScore":0.61,
"siftPrediction":"tolerated",
"completeOverlap":true
}
]
Field | Type | Notes |
---|---|---|
transcript | string | transcript ID. e.g. ENST00000445503.1 |
source | string | RefSeq / Ensembl |
bioType | string | descriptions of the biotypes from Ensembl |
codons | string | |
aminoAcids | string | |
cdnaPos | string | |
cdsPos | string | |
exons | string | exons affected by the variant |
introns | string | introns affected by the variant |
proteinPos | string | |
geneId | string | gene ID. e.g. ENSG00000116062 |
hgnc | string | gene symbol. e.g. MSH6 |
consequence | string array | Sequence Ontology Consequences |
hgvsc | string | HGVS coding nomenclature |
hgvsp | string | HGVS protein nomenclature |
geneFusion | object | see Gene Fusions entry below |
isCanonical | bool | true when this is a canonical transcript |
isManeSelect | bool | true when this is a MANE select transcript |
polyPhenScore | float | range: 0 - 1.0 |
polyPhenPrediction | string | see possible values below |
proteinId | string | protein ID. E.g. ENSP00000405294.1 |
siftScore | float | range: 0 - 1.0 |
siftPrediction | string | see possible values below |
completeOverlap | bool | true when this transcript is completely overlapped by the variant |
cancerHotspots | string array | see Cancer Hotspots entry below |
MANE Select
MANE select tags are only available for RefSeq transcripts on GRCh38.
PolyPhen
- probably damaging
- possibly damaging
- benign
- unknown
SIFT
- tolerated
- deleterious
- tolerated - low confidence
- deleterious - low confidence
Amino Acid Conservation
"aminoAcidConservation": {
"scores": [0.34]
}
Field | Type | Notes |
---|---|---|
aminoAcidConservation | object | |
scores | object array of doubles | percent conserved with respect to human amino acid residue. Range: 0.01 - 1.00 |
Gene Fusions
Field | Type | Notes |
---|---|---|
exon | int | actual exon where the breakpoint was located |
intron | int | actual intron where the breakpoint was located |
fusions | object array | see Fusion entry below |
Fusion
Field | Type | Notes |
---|---|---|
exon | int | actual exon where the other breakpoint was located |
intron | int | actual intron where the other breakpoint was located |
hgvsc | string | HGVS coding nomenclature describing the two genes and the transcripts that are fused along with |
Cancer Hotspots
Field | Type | Notes |
---|---|---|
residue | string | |
numSamples | int | how many samples are associated with a variant at the same amino acid position |
numAltAminoAcidSamples | int | how many samples are associated with a variant with the same position and alternate amino acid position |
qValue | double |
Regulatory Regions
"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
Field | Type | Notes |
---|---|---|
id | string | |
type | string | see possible values below |
consequence | string array | see possible values below |
Regulatory Types
- CTCF_binding_site
- enhancer
- open_chromatin_region
- promoter
- promoter_flanking_region
- TF_binding_site
Regulatory Consequences
- regulatory_region_variant
- regulatory_region_ablation
- regulatory_region_amplification
- regulatory_region_truncation
ClinVar
small variants:
"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]
large variants:
"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
Field | Type | Notes |
---|---|---|
id | string | ClinVar ID |
variationId | string | ClinVar VCV ID |
variantType | string | variant type |
reviewStatus | string | see possible values below |
alleleOrigins | string array | see possible values below |
refAllele | string | |
altAllele | string | |
phenotypes | string array | |
medGenIds | string array | MedGen IDs |
omimIds | string array | OMIM IDs |
orphanetIds | string array | Orphanet IDs |
significance | string array | see possible values below |
lastUpdatedDate | string | yyyy-MM-dd |
pubMedIds | string array | PubMed IDs |
isAlleleSpecific | bool | true when the current variant alternate allele matches the ClinVar alternate allele |
reviewStatus:
- no assertion provided
- no assertion criteria provided
- criteria provided, single submitter
- practice guideline
- classified by multiple submitters
- criteria provided, conflicting interpretations
- criteria provided, multiple submitters, no conflicts
- no interpretation for the single variant
alleleOrigins:
- unknown
- other
- germline
- somatic
- inherited
- paternal
- maternal
- de-novo
- biparental
- uniparental
- not-tested
- tested-inconclusive
significance:
- uncertain significance
- not provided
- benign
- likely benign
- likely pathogenic
- pathogenic
- drug response
- histocompatibility
- association
- risk factor
- protective
- affects
- conflicting data from submitters
- other
- no interpretation for the single variant
- conflicting interpretations of pathogenicity
1000 Genomes
"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
Field | Type | Notes |
---|---|---|
allAf | float | allele frequency for all populations. Range: 0 - 1.0 |
allAc | int | allele count for all populations. Integer. |
allAn | int | allele number for all populations. Non-zero integer. |
afrAf | float | allele frequency for the African super population. Range: 0 - 1.0 |
afrAc | int | allele count for the African super population. Integer. |
afrAn | int | allele number for the African super population. Non-zero integer. |
amrAf | float | allele frequency for the Ad Mixed American super population. Range: 0 - 1.0 |
amrAc | int | allele count for the Ad Mixed American super population. Integer. |
amrAn | int | allele number for the Ad Mixed American super population. Non-zero integer. |
easAf | float | allele frequency for the East Asian super population. Range: 0 - 1.0 |
easAc | int | allele count for the East Asian super population. Integer. |
easAn | int | allele number for the East Asian super population. Non-zero integer. |
eurAf | float | allele frequency for the European super population. Range: 0 - 1.0 |
eurAc | int | allele count for the European super population. Integer. |
eurAn | int | allele number for the European super population. Non-zero integer. |
sasAf | float | allele frequency for the South Asian super population. Range: 0 - 1.0 |
sasAc | int | allele count for the South Asian super population. Integer. |
sasAn | int | allele number for the South Asian super population. Non-zero integer. |
DANN
"dannScore": 0.27
Field | Type | Notes |
---|---|---|
dannScore | float | Range: 0 - 1.0 |
dbSNP
"dbsnp":[
"rs1042821"
]
Field | Type | Notes |
---|---|---|
dbsnp | string array | dbSNP rsIDs |
DECIPHER
"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
Field | Type | Notes |
---|---|---|
chromosome | int | Ensembl-style chromosome names |
begin | int | 1-based position |
end | int | 1-based position |
numDeletions | int | # of observed deletions |
deletionFrequency | float | deletion frequency |
numDuplications | int | # of observed duplications |
duplicationFrequency | float | duplication frequency |
sampleSize | int | total # of samples |
reciprocalOverlap | float | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap |
annotationOverlap | float | Range: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap |
GERP
"gerpScore": 1.27
Field | Type | Notes |
---|---|---|
gerpScore | float | Range: -∞ to +∞ |
GME Variome
"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
Field | Type | Notes |
---|---|---|
allAc | int | GME allele count |
allAn | int | GME allele number |
allAf | float | GME allele frequency |
failedFilter | bool | True if this variant failed any filters |
gnomAD
"gnomad":{
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
Field | Type | Notes |
---|---|---|
coverage | int | average coverage (non-negative integer values) |
allAf | float | allele frequency for all populations. Range: 0 - 1.0 |
maleAf | float | allele frequency for male population. Range: 0 - 1.0 |
femaleAf | float | allele frequency for female population. Range: 0 - 1.0 |
controlsAllAf | float | allele frequency for the controls subset. Range: 0 - 1.0 |
allAc | int | allele count for all populations. Integer. |
maleAc | int | allele count for male population. Integer. |
femaleAc | int | allele count for female population. Integer. |
controlsAllAc | int | allele count for the controls subset. Integer. |
allAn | int | allele number for all populations. Non-zero integer. |
maleAn | int | allele number for male population. Non-zero integer. |
femaleAn | int | allele number for female population. Non-zero integer. |
controlsAllAn | int | allele number for the controls subset. Non-zero integer. |
allHc | int | count of homozygous individuals for all populations. Non-negative integer. |
maleHc | int | count of homozygous individuals for male population. Non-negative integer. |
femaleHc | int | count of homozygous individuals for female population. Non-negative integer. |
afrAf | float | allele frequency for the African / African American population. Range: 0 - 1.0 |
afrAc | int | allele count for the African / African American population. Integer. |
afrAn | int | allele number for the African / African American population. Non-zero integer. |
afrHc | int | count of homozygous individuals for African / African American population. Non-negative integer. |
amrAf | float | allele frequency for the Latino population. Range: 0 - 1.0 |
amrAc | int | allele count for the Latino population. Integer. |
amrAn | int | allele number for the Latino population. Non-zero integer. |
amrHc | int | count of homozygous individuals for Latino population. Non-negative integer. |
easAf | float | allele frequency for the East Asian population. Range: 0 - 1.0 |
easAc | int | allele count for the East Asian population. Integer. |
easAn | int | allele number for the East Asian population. Non-zero integer. |
easHc | int | count of homozygous individuals for East Asian population. Non-negative integer. |
finAf | float | allele frequency for the Finnish population. Range: 0 - 1.0 |
finAc | int | allele count for the Finnish population. Integer. |
finAn | int | allele number for the Finnish population. Non-zero integer. |
finHc | int | count of homozygous individuals for Finnish population. Non-negative integer |
nfeAf | float | allele frequency for the Non-Finnish European population. Range: 0 - 1.0 |
nfeAc | int | allele count for the Non-Finnish European population. Integer. |
nfeAn | int | allele number for the Non-Finnish European population. Non-zero integer. |
nfeHc | int | count of homozygous individuals for Non-Finnish European population. Non-negative integer |
othAf | float | allele frequency for the Other population. Range: 0 - 1.0 |
othAc | int | allele count for the Other population. Integer. |
othAn | int | allele number for the Other population. Non-zero integer. |
othHc | int | count of homozygous individuals for Other population. Non-negative integer |
asjAf | float | allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0 |
asjAc | int | allele count for the Ashkenazi Jewish population Integer. |
asjAn | int | allele number for the Ashkenazi Jewish population. Non-zero integer. |
asjHc | int | count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer |
sasAf | float | allele frequency for the South Asian population. Range: 0 - 1.0 |
sasAc | int | allele count for the South Asian population Integer. |
sasAn | int | allele number for the South Asian population. Non-zero integer. |
sasHc | int | count of homozygous individuals for the South Asian population. Non-negative integer. |
failedFilter | bool | True if this variant failed any filters (Note: we do not list the failed filters) |
lowComplexityRegion | bool | True if this variant is located in a low complexity region. |
MITOMAP
"mitomap":[
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
Field | Type | Notes |
---|---|---|
refAllele | string | |
altAllele | string | |
diseases | string array | associated diseases |
hasHomoplasmy | boolean | |
hasHeteroplasmy | boolean | |
status | string | record status |
clinicalSignificance | string | predicted pathogenicity |
scorePercentile | float | MitoTIP score |
numGenBankFullLengthSeqs | integer | # of GenBank full-length sequences |
pubMedIds | string array | |
isAlleleSpecific | boolean | true when the current variant alternate allele matches the MITOMAP alternate allele |
Primate AI
"primateAI":[
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
Field | Type | Notes |
---|---|---|
hgnc | string | |
scorePercentile | float | range: 0 - 1.0 |
REVEL
"revel":{
"score":0.027
}
Field | Type | Notes |
---|---|---|
score | float | Range: 0 - 1.0 |
Splice AI
"spliceAI":[
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
Field | Type | Notes |
---|---|---|
hgnc | string | HGNC gene symbol |
acceptorGainDistance | int | ± bp from current position |
acceptorGainScore | float | range: 0 - 1.0. 1 decimal place |
acceptorLossDistance | int | ± bp from current position |
acceptorLossScore | float | range: 0 - 1.0. 1 decimal place |
donorGainDistance | int | ± bp from current position |
donorGainScore | float | range: 0 - 1.0. 1 decimal place |
donorLossDistance | int | ± bp from current position |
donorLossScore | float | range: 0 - 1.0. 1 decimal place |
TOPMed
"topmed":{
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
Field | Type | Notes |
---|---|---|
allAc | int | TOPMed allele count |
allAn | int | TOPMed allele number. Non-zero integer. |
allAf | float | TOPMed allele frequency (computed by Nirvana) |
allHc | int | TOPMed homozygous count |
failedFilter | bool | True if this variant failed any filters |
Genes
Illumina Annotator repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).
"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
Field | Type | Notes |
---|---|---|
name | string | HGNC gene symbol |
hgncId | int | HGNC ID |
summary | string | short description of the gene from OMIM |
OMIM
"omim":[
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
Field | Type | Notes |
---|---|---|
mimNumber | int | OMIM ID for gene |
geneName | string | gene name |
description | string | |
phenotypes | object array | see Phenotype entry below |
Phenotype
Field | Type | Notes |
---|---|---|
mimNumber | int | |
phenotype | string | |
description | string | |
mapping | string | see possible values below |
inheritance | string array | see possible values below |
comments | string array | see possible values below |
Mapping
- disorder was positioned by mapping of the wild type gene
- disease phenotype itself was mapped
- molecular basis of the disorder is known
- disorder is a chromosome deletion or duplication syndrome
Inheritance
- autosomal recessive
- autosomal dominant
Comments
- contributes to the susceptibility to multifactorial disorders
- variations that lead to apparently abnormal laboratory test values
- unconfirmed mapping
gnomAD LoF Gene Metrics
"gnomAD":{
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
Field | Type | Notes |
---|---|---|
pLi | float | probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected) |
pNull | float | probability of being completely tolerant of loss of function variation (observed = expected) |
pRec | float | probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected) |
synZ | float | corrected synonymous Z score |
misZ | float | corrected missense Z score |
loeuf | float | loss of function observed/expected upper bound fraction (LOEUF) |
ClinGen Disease Validity
"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
Field | Type | Notes |
---|---|---|
clingenGeneValidity | object | |
diseaseId | string | Monarch Disease Ontology ID (MONDO) |
disease | string | disease label |
classification | string | see below for possible values |
classificationDate | string | yyyy-MM-dd |
classification
- no reported evidence
- disputed
- limited
- moderate
- definitive
- strong
- refuted
- no known disease relationship
COSMIC Cancer Gene Census
{
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
Field | Type | Notes |
---|---|---|
roleInCancer | string array | Possible roles in caner |