Skip to main content
Version: 3.22

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

  • With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
  • When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
"header":{
"annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
"creationTime":"2017-06-14 15:53:13",
"genomeAssembly":"GRCh37",
"dataSources":[
{
"name":"OMIM",
"version":"unknown",
"description":"An Online Catalog of Human Genes and Genetic Disorders",
"releaseDate":"2017-05-03"
},
{
"name":"VEP",
"version":"84",
"description":"BothRefSeqAndEnsembl",
"releaseDate":"2017-01-16"
},
{
"name":"ClinVar",
"version":"20170503",
"description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
"releaseDate":"2017-05-03"
},
{
"name":"phyloP",
"version":"hg19",
"description":"46 way conservation score between humans and 45 other vertebrates",
"releaseDate":"2009-11-10"
}
],
"samples":[
"NA12878",
"NA12891",
"NA12892"
]
},
FieldTypeNotes
annotatorstringthe name of the annotator and the current version
creationTimestringyyyy-MM-dd hh:mm:ss
genomeAssemblystringsee possible values below
schemaVersionintegerincremented whenever the core structure of the JSON file introduces breaking changes
dataVersionstring
dataSourcesobject arraysee Data Source entry below
samplesstring arraythe order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

FieldTypeNotes
namestring
versionstring
descriptionstringoptional description of the data source
releaseDatestringyyyy-MM-dd

Genome Assemblies

  • GRCh37
  • GRCh38
  • hg19
  • SARSCoV2

Positions

"positions":[
{
"chromosome":"chr2",
"position":48010488,
"repeatUnit":"GGCCCC",
"refRepeatCount":3,
"svEnd":48020488,
"refAllele":"G",
"altAlleles":[
"A",
"GT"
],
"quality":461,
"filters":[
"PASS"
],
"ciPos":[
-170,
170
],
"ciEnd":[
-175,
175
],
"svLength":1000,
"strandBias":1.23,
"jointSomaticNormalQuality":29,
"cytogeneticBand":"2p16.3",
FieldTypeVariant TypeNotes
chromosomestringallexactly as displayed in the vcf
positionintegerallexactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
repeatUnitstringSTRprovided by ExpansionHunter
refRepeatCountintegerSTRprovided by ExpansionHunter
svEndintegerSV
refAllelestringallexactly as displayed in the vcf
altAllelestring arrayallexactly as displayed in the vcf
qualityfloatallexactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filtersstring arrayallexactly as displayed in the vcf
ciPosinteger arraySV
ciEndinteger arraySV
svLengthintegerSV
strandBiasfloatsmall variantprovided by GATK (from SB)
jointSomaticNormalQualityintegerSVprovided by the Manta variant caller (SOMATICSCORE)
cytogeneticBandstringalle.g. 17p13.1

ClinGen

"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
FieldTypeNotes
clingenobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
variantTypestringAny of the sequence alterations defined here.
idstringIdentifier from the data source. Alternatively a VID
clinicalInterpretationstringsee possible values below
observedGainsintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLossesintegerRange: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validatedboolean
phenotypesstring arrayDescription of the phenotype.
phenotypeIdsstring arrayDescription of the phenotype IDs.
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

  • benign
  • curated benign
  • curated pathogenic
  • likely benign
  • likely pathogenic
  • path gain
  • path loss
  • pathogenic
  • uncertain
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
FieldTypeNotes
clingenDosageSensitivityMapobject array
chromosomestringEnsembl-style chromosome names
begininteger1-based position
endinteger1-based position
haploinsufficiencystringsee possible values below
triplosensitivitystring(same as haploinsufficiency) 
reciprocalOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlapfloating pointRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

  • no evidence to suggest that dosage sensitivity is associated with clinical phenotype
  • little evidence suggesting dosage sensitivity is associated with clinical phenotype
  • emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
  • sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
  • gene associated with autosomal recessive phenotype
  • dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
{
"chromosome":"1",
"begin":1595369,
"end":1612441,
"variantType": "copy_number_variation",
"id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
"allAn": 5008,
"allAc": 2702,
"allAf": 0.539537,
"afrAf": 0.6052,
"amrAf": 0.3675,
"eurAf": 0.5357,
"easAf": 0.5368,
"sasAf": 0.5797,
"reciprocalOverlap": 0.07555
}
],
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring
idstring
allAnintegerallele number for all populations. Non-zero integer.
allAcintegerallele count for all populations. Integer.
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
sasAffloating pointallele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlapfloating pointrange: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
{
"genotype":"0/1",
"variantFrequencies":[
0.333,
0.5
],
"totalDepth":57,
"genotypeQuality":12,
"copyNumber":3,
"repeatUnitCounts":[
10,
20
],
"alleleDepths":[
10,
20,
30
],
"failedFilter":true,
"splitReadCounts":[
10,
20
],
"pairedEndReadCounts":[
10,
20
],
"isDeNovo":true,
"diseaseAffectedStatuses":[
"-"
],
"artifactAdjustedQualityScore":89.3,
"likelihoodRatioQualityScore":78.2,
"heteroplasmyPercentile":[
23.13,
12.65
]
}
]
FieldTypeVCFNotes
genotypestringGT
variantFrequenciesfloat arrayVF, ADrange: 0 - 1.0. One value per alternate allele
totalDepthintegerDPnon-negative integer values
genotypeQualityintegerGQnon-negative integer values. Typically maxes out at 99
copyNumberintegerCNnon-negative integer values
minorHaplotypeCopyNumberintegerMCNnon-negative integer values
repeatUnitCountsinteger arrayREPCNExpansionHunter-specific
alleleDepthsinteger arrayADnon-negative integer values
failedFilterboolFT
splitReadCountsinteger arraySRManta-specific
pairedEndReadCountsinteger arrayPRManta-specific
isDeNovoboolDN
deNovoQualityfloatDQ
diseaseAffectedStatusesstring arrayDSTExpansionHunter-specific
artifactAdjustedQualityScorefloatAQPEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScorefloatLQPEPE-specific. Range: 0 - 100.0
lossOfHeterozygosityboolCN, MCN
somaticQualityfloatSQ
heteroplasmyPercentilefloatVFrange: 0 - 100. 2 decimal places. One value per alternate allele
binCountintegerBCnon-negative integer values
Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
{
"isEmpty":true
}
],

Variants

"variants":[
{
"vid":"2:48010488:A",
"chromosome":"chr2",
"begin":48010488,
"end":48010488,
"isReferenceMinorAllele":true,
"isStructuralVariant":true,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"isDecomposedVariant":true,
"isRecomposedVariant":true,
"linkedVids":["2:48010488:GTA:ATC"],
"hgvsg":"NC_000002.11:g.48010488G>A",
"phylopScore":0.459
FieldTypeNotes
vidstringsee Variant Identifiers
chromosomestring
beginint1-based non-negative integer values. Range: 1 - 250 million
endint1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllelebooltrue when this is a reference minor allele
isStructuralVariantbooltrue when the variant is a structural variant
inLowComplexityRegionbooltrue when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllelestringparsimonious representation of the reference allele
altAllelestringparsimonious representation of the alternate allele.
variantTypestringuses Sequence Ontology sequence alterations
isDecomposedVariantbooltrue when the decomposed variant has been used to create another recomposed variant
isRecomposedVariantbooltrue when the variant is recomposed from two or more decomposed variants
linkedVidsstring arraylist of VIDs for variants connecting decomposed and recomposed variants
hgvsgstringHGVS g. notation
phylopScorefloatphyloP conservation score. Range: -14.08 to 6.424
Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Flagging Decomposed & Recomposed Variants

When two or more decomposed variants are recomposed into an MNV, the decomposed variants will be marked with "isDecomposedVariant":true.

Similarly, the recomposed variant will be shown as a new VCF position. This recomposed variant will be flagged with "isRecomposedVariant":true.

Transcripts

"transcripts":[
{
"transcript":"ENST00000445503.1",
"source":"Ensembl",
"bioType":"nonsense_mediated_decay",
"codons":"gGg/gAg",
"aminoAcids":"G/E",
"cdnaPos":"268",
"cdsPos":"116",
"exons":"1/9",
"introns":"1/8",
"proteinPos":"39",
"geneId":"ENSG00000116062",
"hgnc":"MSH6",
"consequence":[
"missense_variant",
"NMD_transcript_variant"
],
"hgvsc":"ENST00000445503.1:c.116G>A",
"hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
"geneFusion":{
"exon":6,
"intron":5,
"fusions":[
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
"exon":3,
"intron":2
},
{
"hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
"exon":2,
"intron":1
}
]
},
"isCanonical":true,
"polyPhenScore":0.95,
"polyPhenPrediction":"probably damaging",
"proteinId":"ENSP00000405294.1",
"siftScore":0.61,
"siftPrediction":"tolerated",
"completeOverlap":true
}
]
FieldTypeNotes
transcriptstringtranscript ID. e.g. ENST00000445503.1
sourcestringRefSeq / Ensembl
bioTypestringdescriptions of the biotypes from Ensembl
codonsstring
aminoAcidsstring
cdnaPosstring
cdsPosstring
exonsstringexons affected by the variant
intronsstringintrons affected by the variant
proteinPosstring
geneIdstringgene ID. e.g. ENSG00000116062
hgncstringgene symbol. e.g. MSH6
consequencestring arraySequence Ontology Consequences
hgvscstringHGVS coding nomenclature
hgvspstringHGVS protein nomenclature
geneFusionobjectsee Gene Fusions entry below
isCanonicalbooltrue when this is a canonical transcript
isManeSelectbooltrue when this is a MANE select transcript
polyPhenScorefloatrange: 0 - 1.0
polyPhenPredictionstringsee possible values below
proteinIdstringprotein ID. E.g. ENSP00000405294.1
siftScorefloatrange: 0 - 1.0
siftPredictionstringsee possible values below
completeOverlapbooltrue when this transcript is completely overlapped by the variant
cancerHotspotsstring arraysee Cancer Hotspots entry below
MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

PolyPhen

  • probably damaging
  • possibly damaging
  • benign
  • unknown

SIFT

  • tolerated
  • deleterious
  • tolerated - low confidence
  • deleterious - low confidence

Amino Acid Conservation

"aminoAcidConservation": {
"scores": [0.34]
}
FieldTypeNotes
aminoAcidConservationobject
scoresobject array of doublespercent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

FieldTypeNotes
exonintactual exon where the breakpoint was located
intronintactual intron where the breakpoint was located
fusionsobject arraysee Fusion entry below

Fusion

FieldTypeNotes
exonintactual exon where the other breakpoint was located
intronintactual intron where the other breakpoint was located
hgvscstringHGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

FieldTypeNotes
residuestring
numSamplesinthow many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamplesinthow many samples are associated with a variant with the same position and alternate amino acid position
qValuedouble

Regulatory Regions

"regulatoryRegions":[
{
"id":"ENSR00001542175",
"type":"promoter",
"consequence":[
"regulatory_region_variant"
]
}
]
FieldTypeNotes
idstring
typestringsee possible values below
consequencestring arraysee possible values below

Regulatory Types

  • CTCF_binding_site
  • enhancer
  • open_chromatin_region
  • promoter
  • promoter_flanking_region
  • TF_binding_site

Regulatory Consequences

  • regulatory_region_variant
  • regulatory_region_ablation
  • regulatory_region_amplification
  • regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
{
"id":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"significance":[
"benign"
],
"refAllele":"G",
"altAllele":"A",
"lastUpdatedDate":"2020-03-01",
"isAlleleSpecific":true
},
{
"id":"RCV000030258.4",
"variationId":"VCV000036581.3",
"reviewStatus":"reviewed by expert panel",
"alleleOrigins":[
"germline"
],
"refAllele":"G",
"altAllele":"A",
"phenotypes":[
"Lynch syndrome"
],
"medGenIds":[
"C1333990"
],
"omimIds":[
"120435"
],
"significance":[
"benign"
],
"lastUpdatedDate":"2017-05-01",
"isAlleleSpecific":true
}
]

large variants:

"clinvar":[
{
"chromosome":"1",
"begin":629025,
"end":8537745,
"variantType":"copy_number_loss",
"id":"RCV000051993.4",
"variationId":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"alleleOrigins":[
"not provided"
],
"phenotypes":[
"See cases"
],
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21",
"pubMedIds":[
"21844811"
]
},
{
"id":"VCV000058242.1",
"reviewStatus":"criteria provided, single submitter",
"significance":[
"pathogenic"
],
"lastUpdatedDate":"2022-04-21"
},
......
]
FieldTypeNotes
idstringClinVar ID
variationIdstringClinVar VCV ID
variantTypestringvariant type
reviewStatusstringsee possible values below
alleleOriginsstring arraysee possible values below
refAllelestring
altAllelestring
phenotypesstring array
medGenIdsstring arrayMedGen IDs
omimIdsstring arrayOMIM IDs
orphanetIdsstring arrayOrphanet IDs
significancestring arraysee possible values below
lastUpdatedDatestringyyyy-MM-dd
pubMedIdsstring arrayPubMed IDs
isAlleleSpecificbooltrue when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

  • no assertion provided
  • no assertion criteria provided
  • criteria provided, single submitter
  • practice guideline
  • classified by multiple submitters
  • criteria provided, conflicting interpretations
  • criteria provided, multiple submitters, no conflicts
  • no interpretation for the single variant

alleleOrigins:

  • unknown
  • other
  • germline
  • somatic
  • inherited
  • paternal
  • maternal
  • de-novo
  • biparental
  • uniparental
  • not-tested
  • tested-inconclusive

significance:

  • uncertain significance
  • not provided
  • benign
  • likely benign
  • likely pathogenic
  • pathogenic
  • drug response
  • histocompatibility
  • association
  • risk factor
  • protective
  • affects
  • conflicting data from submitters
  • other
  • no interpretation for the single variant
  • conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
"allAf":0.200879,
"afrAf":0.210287,
"amrAf":0.139769,
"easAf":0.275794,
"eurAf":0.181909,
"sasAf":0.173824,
"allAn":5008,
"afrAn":1322,
"amrAn":694,
"easAn":1008,
"eurAn":1006,
"sasAn":978,
"allAc":1006,
"afrAc":278,
"amrAc":97,
"easAc":278,
"eurAc":183,
"sasAc":170
}
FieldTypeNotes
allAffloatallele frequency for all populations. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
allAnintallele number for all populations. Non-zero integer.
afrAffloatallele frequency for the African super population. Range: 0 - 1.0
afrAcintallele count for the African super population. Integer.
afrAnintallele number for the African super population. Non-zero integer.
amrAffloatallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAcintallele count for the Ad Mixed American super population. Integer.
amrAnintallele number for the Ad Mixed American super population. Non-zero integer.
easAffloatallele frequency for the East Asian super population. Range: 0 - 1.0
easAcintallele count for the East Asian super population. Integer.
easAnintallele number for the East Asian super population. Non-zero integer.
eurAffloatallele frequency for the European super population. Range: 0 - 1.0
eurAcintallele count for the European super population. Integer.
eurAnintallele number for the European super population. Non-zero integer.
sasAffloatallele frequency for the South Asian super population. Range: 0 - 1.0
sasAcintallele count for the South Asian super population. Integer.
sasAnintallele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27
FieldTypeNotes
dannScorefloatRange: 0 - 1.0

dbSNP

"dbsnp":[
"rs1042821"
]
FieldTypeNotes
dbsnpstring arraydbSNP rsIDs

DECIPHER

"decipher":[
{
"chromosome":"1",
"begin":13516,
"end":91073,
"numDeletions":27,
"deletionFrequency":0.675,
"numDuplications":27,
"duplicationFrequency":0.675,
"sampleSize":40,
"reciprocalOverlap": 0.27555,
"annotationOverlap": 0.5901
}
],
FieldTypeNotes
chromosomeintEnsembl-style chromosome names
beginint1-based position
endint1-based position
numDeletionsint# of observed deletions
deletionFrequencyfloatdeletion frequency
numDuplicationsint# of observed duplications
duplicationFrequencyfloatduplication frequency
sampleSizeinttotal # of samples
reciprocalOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlapfloatRange: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27
FieldTypeNotes
gerpScorefloatRange: -∞ to +∞

GME Variome

"gmeVariome":{
"allAc":10,
"allAn":202,
"allAf":0.049504,
"failedFilter":true
}
FieldTypeNotes
allAcintGME allele count
allAnintGME allele number
allAffloatGME allele frequency
failedFilterboolTrue if this variant failed any filters

gnomAD

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

GRCh38

"primateAI-3D": [
{
"aminoAcidPosition": 2,
"refAminoAcid": "V",
"altAminoAcid": "M",
"score": 0.616944,
"scorePercentile": 0.52,
"ensemblTranscriptId": "ENST00000335137.4",
"refSeqTranscriptId": "NM_001005484.1"
}
]
FieldTypeNotes
aminoAcidPositionintAmino Acid Position (1-based)
refAminoAcidstringReference Amino Acid
altAminoAcidstringAlternate Amino Acid
ensemblTranscriptIdstringTranscript ID (Ensembl)
refSeqTranscriptIdstringTranscript ID (RefSeq)
scorePercentilefloatrange: 0 - 1.0
scorefloatrange: 0 - 1.0

GRCh37

"primateAI": [
{
"hgnc":"TP53",
"scorePercentile":0.3,
}
]
FieldTypeNotes
hgncstringHGNC Gene Symbol
scorePercentilefloatrange: 0 - 1.0

REVEL

"revel":{ 
"score":0.027
}
FieldTypeNotes
scorefloatRange: 0 - 1.0

Splice AI

"spliceAI":[ 
{
"hgnc":"BLCAP",
"acceptorGainDistance":-3,
"acceptorGainScore":0.3,
"donorLossDistance":7,
"donorLossScore":0.9
},
{
"hgnc":"NNAT",
"acceptorGainDistance":-1,
"acceptorGainScore":0.2,
"donorGainDistance":-2,
"donorGainScore":0.3
}
]
FieldTypeNotes
hgncstringHGNC gene symbol
acceptorGainDistanceint± bp from current position
acceptorGainScorefloatrange: 0 - 1.0. 1 decimal place
acceptorLossDistanceint± bp from current position
acceptorLossScorefloatrange: 0 - 1.0. 1 decimal place
donorGainDistanceint± bp from current position
donorGainScorefloatrange: 0 - 1.0. 1 decimal place
donorLossDistanceint± bp from current position
donorLossScorefloatrange: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
"allAc":20,
"allAn":125568,
"allAf":0.000159,
"allHc":0,
"failedFilter":true
}
FieldTypeNotes
allAcintTOPMed allele count
allAnintTOPMed allele number. Non-zero integer.
allAffloatTOPMed allele frequency (computed by Illumina Connected Annotations)
allHcintTOPMed homozygous count
failedFilterboolTrue if this variant failed any filters

Genes

Illumina Connected Annotations repots gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
{
"name":"MSH6",
"hgncId":7329,
"summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
/* this is where gene-level data sources can be found e.g. OMIM */
}
]
FieldTypeNotes
namestringHGNC gene symbol
hgncIdintHGNC ID
summarystringshort description of the gene from OMIM

OMIM

"omim":[ 
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
FieldTypeNotes
mimNumberintOMIM ID for gene
geneNamestringgene name
descriptionstring
phenotypesobject arraysee Phenotype entry below

Phenotype

FieldTypeNotes
mimNumberint
phenotypestring
descriptionstring
mappingstringsee possible values below
inheritancestring arraysee possible values below
commentsstring arraysee possible values below

Mapping

  1. disorder was positioned by mapping of the wild type gene
  2. disease phenotype itself was mapped
  3. molecular basis of the disorder is known
  4. disorder is a chromosome deletion or duplication syndrome

Inheritance

  • autosomal recessive
  • autosomal dominant

Comments

  • contributes to the susceptibility to multifactorial disorders
  • variations that lead to apparently abnormal laboratory test values
  • unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
FieldTypeNotes
clingenGeneValidityobject
diseaseIdstringMonarch Disease Ontology ID (MONDO)
diseasestringdisease label
classificationstringsee below for possible values
classificationDatestringyyyy-MM-dd

classification

  • no reported evidence
  • disputed
  • limited
  • moderate
  • definitive
  • strong
  • refuted
  • no known disease relationship

COSMIC Cancer Gene Census

   {
"name": "PRDM16",
"hgncId": 14000,
"ncbiGeneId": "63976",
"ensemblGeneId": "ENSG00000142611",
"cosmic": {
"roleInCancer": [
"oncogene",
"fusion"
]
}
}
FieldTypeNotes
roleInCancerstring arrayPossible roles in caner