Version: 3.28 (unreleased)

Illumina Connected Annotations JSON File Format

Overview

Conventions

In the Illumina Connected Annotations JSON representation, we try to maximize the amount of useful information that is relayed in the output file. As such, we have several conventions that are useful to know about:

With boolean key/value pairs, we only output the keys that have a true value. I.e. there's no reason to display "isStructuralVariant":false a few million times when annotating a small variant VCF.
When transferring data from the VCF file to the JSON (e.g. for allele depths (AD)), it is common to use a period (.) as a placeholder for missing data in the VCF file. Illumina Connected Annotations treats periods like empty or null strings and therefore will not output those entries.

JSON Layout

info

In general, each position corresponds to a row in the original VCF file.

For each gene that was referenced in the transcripts found in the positions section, there will be additional gene-level annotation in the gene section.

Parsing

info

We've put together a new section that discusses how to parse our JSON files easily using examples in a Python Jupyter notebook and a R version as well. In addition, we have information about how to quickly dump content from our JSON file using a tabix-like utility called JASIX.

{
   "header":{
      "annotator":"IlluminaConnectedAnnotations 3.0.0-alpha.5+g6c52e247",
      "creationTime":"2017-06-14 15:53:13",
      "genomeAssembly":"GRCh37",
      "dataSources":[
         {
            "name":"OMIM",
            "version":"unknown",
            "description":"An Online Catalog of Human Genes and Genetic Disorders",
            "releaseDate":"2017-05-03"
         },
         {
            "name":"VEP",
            "version":"84",
            "description":"BothRefSeqAndEnsembl",
            "releaseDate":"2017-01-16"
         },
         {
            "name":"ClinVar",
            "version":"20170503",
            "description":"A freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence",
            "releaseDate":"2017-05-03"
         },
         {
            "name":"phyloP",
            "version":"hg19",
            "description":"46 way conservation score between humans and 45 other vertebrates",
            "releaseDate":"2009-11-10"
         }
      ],
      "samples":[
         "NA12878",
         "NA12891",
         "NA12892"
      ]
   },

Field	Type	Notes
annotator	string	the name of the annotator and the current version
creationTime	string	yyyy-MM-dd hh:mm:ss
genomeAssembly	string	see possible values below
schemaVersion	integer	incremented whenever the core structure of the JSON file introduces breaking changes
dataVersion	string
dataSources	object array	see Data Source entry below
samples	string array	the order of these sample names will be used throughout the JSON file when enumerating samples

Data Source

Field	Type	Notes
name	string
version	string
description	string	optional description of the data source
releaseDate	string	yyyy-MM-dd

Genome Assemblies

GRCh37
GRCh38
hg19
SARSCoV2

Positions

"positions":[
   {
      "chromosome":"chr2",
      "position":48010488,
      "id": "4"
      "repeatUnit":"GGCCCC",
      "refRepeatCount":3,
      "svEnd":48020488,
      "refAllele":"G",
      "altAlleles":[
         "A",
         "GT"
      ],
      "quality":461,
      "filters":[
         "PASS"
      ],
      "ciPos":[
         -170,
         170
      ],
      "ciEnd":[
         -175,
         175
      ],
      "svLength":1000,
      "strandBias":1.23,
      "jointSomaticNormalQuality":29,
      "cytogeneticBand":"2p16.3",

Field	Type	Variant Type	Notes
chromosome	string	all	exactly as displayed in the vcf
position	integer	all	exactly as displayed in the vcf (1-based notation). Range: 1 - 250 million
id	string	all	provided from ID column in the VCF file, this field will be omitted if empty or has "." value
repeatUnit	string	STR	provided by ExpansionHunter
refRepeatCount	integer	STR	provided by ExpansionHunter
svEnd	integer	SV
refAllele	string	all	exactly as displayed in the vcf
altAllele	string array	all	exactly as displayed in the vcf
quality	float	all	exactly as displayed in the vcf (Normally an integer, but some variant callers using floating point. Has been observed as high as 500k)
filters	string array	all	exactly as displayed in the vcf
ciPos	integer array	SV
ciEnd	integer array	SV
svLength	integer	SV
strandBias	float	small variant	provided by GATK (from SB)
jointSomaticNormalQuality	integer	SV	provided by the Manta variant caller (SOMATICSCORE)
cytogeneticBand	string	all	e.g. 17p13.1

ClinGen

"clingen":[
   {
      "chromosome":"17",
      "begin":525,
      "end":14667519,
      "variantType":"copy_number_gain",
      "id":"nsv996083",
      "clinicalInterpretation":"pathogenic",
      "observedGains":1,
      "validated":true,
      "phenotypes":[
         "Intrauterine growth retardation"
      ],
      "phenotypeIds":[
         "HP:0001511",
         "MedGen:C1853481"
      ],
      "reciprocalOverlap":0.00131
   },
   {
      "chromosome":"17",
      "begin":45835,
      "end":7600330,
      "variantType":"copy_number_loss",
      "id":"nsv869419",
      "clinicalInterpretation":"pathogenic",
      "observedLosses":1,
      "validated":true,
      "phenotypes":[
         "Developmental delay AND/OR other significant developmental or morphological phenotypes"
      ],
      "reciprocalOverlap":0.00254
   }
]

Field	Type	Notes
clingen	object array
chromosome	string	Ensembl-style chromosome names
begin	integer	1-based position
end	integer	1-based position
variantType	string	Any of the sequence alterations defined here.
id	string	Identifier from the data source. Alternatively a VID
clinicalInterpretation	string	see possible values below
observedGains	integer	Range: 0 - (2³¹ - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
observedLosses	integer	Range: 0 - (2³¹ - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain.
validated	boolean
phenotypes	string array	Description of the phenotype.
phenotypeIds	string array	Description of the phenotype IDs.
reciprocalOverlap	floating point	Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

clinicalInterpretation

benign
curated benign
curated pathogenic
likely benign
likely pathogenic
path gain
path loss
pathogenic
uncertain

"clingenDosageSensitivityMap": [{
    "chromosome": "15",
    "begin": 30900686,
    "end": 32153204,
    "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
    "triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
    "reciprocalOverlap": 0.00147,
    "annotationOverlap": 0.33994
},
{
    "chromosome": "15",
    "begin": 31727418,
    "end": 32153204,
    "haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
    "triplosensitivity": "dosage sensitivity unlikely",
    "reciprocalOverlap": 0.00147,
    "annotationOverlap": 1
}]

Field	Type	Notes
clingenDosageSensitivityMap	object array
chromosome	string	Ensembl-style chromosome names
begin	integer	1-based position
end	integer	1-based position
haploinsufficiency	string	see possible values below
triplosensitivity	string	(same as haploinsufficiency)
reciprocalOverlap	floating point	Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).
annotationOverlap	floating point	Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions).

haploinsufficiency and triplosensitivity

no evidence to suggest that dosage sensitivity is associated with clinical phenotype
little evidence suggesting dosage sensitivity is associated with clinical phenotype
emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
gene associated with autosomal recessive phenotype
dosage sensitivity unlikely

1000 Genomes (SV)

"oneKg":[
   {
      "chromosome":"1",
      "begin":1595369,
      "end":1612441,
      "variantType": "copy_number_variation",
      "id": "esv3635753;esv3635754;esv3635755;esv3635756;esv3635757",
      "allAn": 5008,
      "allAc": 2702,
      "allAf": 0.539537,
      "afrAf": 0.6052,
      "amrAf": 0.3675,
      "eurAf": 0.5357,
      "easAf": 0.5368,
      "sasAf": 0.5797,
      "reciprocalOverlap": 0.07555
   }
],

Field	Type	Notes
chromosome	string
begin	integer
end	integer
variantType	string
id	string
allAn	integer	allele number for all populations. Non-zero integer.
allAc	integer	allele count for all populations. Integer.
allAf	floating point	allele frequency for all populations. Range: 0 - 1.0
afrAf	floating point	allele frequency for the African super population. Range: 0 - 1.0
amrAf	floating point	allele frequency for the Ad Mixed American super population. Range: 0 - 1.0
eurAf	floating point	allele frequency for the European super population. Range: 0 - 1.0
easAf	floating point	allele frequency for the East Asian super population. Range: 0 - 1.0
sasAf	floating point	allele frequency for the South Asian super population. Range: 0 - 1.0
reciprocalOverlap	floating point	range: 0 - 1.

gnomAD (SV)

"gnomAD-preview": [
  {
    "chromosome": "1",
    "begin": 40001,
    "end": 47200,
    "variantId": "gnomAD-SV_v2.1_DUP_1_1",
    "variantType": "duplication",
    "failedFilter": true,
    "allAf": 0.068963,
    "afrAf": 0.135694,
    "amrAf": 0.022876,
    "easAf": 0.01101,
    "eurAf": 0.007846,
    "othAf": 0.017544,
    "femaleAf": 0.065288,
    "maleAf": 0.07255,
    "allAc": 943,
    "afrAc": 866,
    "amrAc": 21,
    "easAc": 17,
    "eurAc": 37,
    "othAc": 2,
    "femaleAc": 442,
    "maleAc": 499,
    "allAn": 13674,
    "afrAn": 6382,
    "amrAn": 918,
    "easAn": 1544,
    "eurAn": 4716,
    "othAn": 114,
    "femaleAn": 6770,
    "maleAn": 6878,
    "allHc": 91,
    "afrHc": 90,
    "amrHc": 1,
    "easHc": 0,
    "eurHc": 0,
    "othHc": 55,
    "femaleHc": 44,
    "maleHc": 47,
    "reciprocalOverlap": 0.01839,
    "annotationOverlap": 0.16667
  }
]

Field	Type	Notes
chromosome	string	chromosome number
begin	integer	position interval start
end	integer	position internal end
variantType	string	structural variant type
variantId	string	gnomAD ID
allAf	floating point	allele frequency for all populations. Range: 0 - 1.0
afrAf	floating point	allele frequency for the African super population. Range: 0 - 1.0
amrAf	floating point	allele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAf	floating point	allele frequency for the East Asian super population. Range: 0 - 1.0
eurAf	floating point	allele frequency for the European super population. Range: 0 - 1.0
othAf	floating point	allele frequency for all other populations. Range: 0 - 1.0
femaleAf	floating point	allele frequency for female population. Range: 0 - 1.0
maleAf	floating point	allele frequency for male population. Range: 0 - 1.0
allAc	integer	allele count for all populations.
afrAc	integer	allele count for the African super population.
amrAc	integer	allele count for the Ad Mixed American super population.
easAc	integer	allele count for the East Asian super population.
eurAc	integer	allele count for the European super population.
othAc	integer	allele count for all other populations.
maleAc	integer	allele count for male population.
femaleAc	integer	allele count for female population.
allAn	integer	allele number for all populations.
afrAn	integer	allele number for the African super population.
amrAn	integer	allele number for the Ad Mixed American super population.
easAn	integer	allele number for the East Asian super population.
eurAn	integer	allele number for the European super population.
othAn	integer	allele number for all other populations.
femaleAn	integer	allele number for female population.
maleAn	integer	allele number for male population.
allHc	integer	count of homozygous individuals for all populations.
afrHc	integer	count of homozygous individuals for the African / African American population.
amrHc	integer	count of homozygous individuals for the Latino population.
easHc	integer	count of homozygous individuals for the East Asian population.
eurAc	integer	count of homozygous individuals for the European super population.
othHc	integer	count of homozygous individuals for all other populations.
maleHc	integer	count of homozygous individuals for male population.
femaleHc	integer	count of homozygous individuals for female population.
failedFilter	boolean	True if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlap	floating point	Reciprocal overlap. Range: 0 - 1.0
annotationOverlap	floating point	Reciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

MITOMAP (SV)

"mitomap":[ 
   { 
      "chromosome":"MT",
      "begin":3166,
      "end":14152,
      "variantType":"deletion",
      "reciprocalOverlap":0.18068,
      "annotationOverlap":0.42405
   }
]

Field	Type	Notes
chromosome	string
begin	integer
end	integer
variantType	string array
reciprocalOverlap	float	Range: 0 - 1. Specified up to 5 decimal places
annotationOverlap	float	Range: 0 - 1. Specified up to 5 decimal places

Samples

"samples":[
   {
      "genotype":"0/1",
      "variantFrequencies":[
         0.333,
         0.5
      ],
      "totalDepth":57,
      "genotypeQuality":12,
      "copyNumber":3,
      "repeatUnitCounts":[
         10,
         20
      ],
      "alleleDepths":[
         10,
         20,
         30
      ],
      "failedFilter":true,
      "splitReadCounts":[
         10,
         20
      ],
      "pairedEndReadCounts":[
         10,
         20
      ],
      "isDeNovo":true,
      "diseaseAffectedStatuses":[
         "-"
      ],
      "artifactAdjustedQualityScore":89.3,
      "likelihoodRatioQualityScore":78.2,
      "heteroplasmyPercentile":[
         23.13,
         12.65
      ]
   }
]

Field	Type	VCF	Notes
genotype	string	GT
variantFrequencies	float array	VF, AD	range: 0 - 1.0. One value per alternate allele
totalDepth	integer	DP	non-negative integer values
genotypeQuality	integer	GQ	non-negative integer values. Typically maxes out at 99
copyNumber	integer	CN	non-negative integer values
minorHaplotypeCopyNumber	integer	MCN	non-negative integer values
repeatUnitCounts	integer array	REPCN	ExpansionHunter-specific
alleleDepths	integer array	AD	non-negative integer values
failedFilter	bool	FT
splitReadCounts	integer array	SR	Manta-specific
pairedEndReadCounts	integer array	PR	Manta-specific
isDeNovo	bool	DN
deNovoQuality	float	DQ
diseaseAffectedStatuses	string array	DST	ExpansionHunter-specific
artifactAdjustedQualityScore	float	AQ	PEPE-specific. Range: 0 - 100.0
likelihoodRatioQualityScore	float	LQ	PEPE-specific. Range: 0 - 100.0
lossOfHeterozygosity	bool	CN, MCN
somaticQuality	float	SQ
heteroplasmyPercentile	float	VF	range: 0 - 100. 2 decimal places. One value per alternate allele
binCount	integer	BC	non-negative integer values

Empty Samples

If a sample does not contain any entries, we will create a sample object that contains the isEmpty key. This ensures that sample ordering is preserved while indicating that a sample is intentionally empty.

"samples":[
   {
      "isEmpty":true
   }
],

Variants

"variants":[
   {
      "vid":"2-48010488-G-A",
      "chromosome":"chr2",
      "begin":48010488,
      "end":48010488,
      "isReferenceMinorAllele":true,
      "isStructuralVariant":true,
      "refAllele":"G",
      "altAllele":"A",
      "variantType":"SNV",
      "hgvsg":"NC_000002.11:g.48010488G>A",
      "phylopScore":0.459

Field	Type	Notes
vid	string	see Variant Identifiers
chromosome	string
begin	int	1-based non-negative integer values. Range: 1 - 250 million
end	int	1-based non-negative integer values. Range: 1 - 250 million
isReferenceMinorAllele	bool	true when this is a reference minor allele
isStructuralVariant	bool	true when the variant is a structural variant
inLowComplexityRegion	bool	true when the variant lies in a low complexity region (gnomAD low complexity regions)
refAllele	string	parsimonious representation of the reference allele
altAllele	string	parsimonious representation of the alternate allele.
variantType	string	uses Sequence Ontology sequence alterations
hgvsg	string	HGVS g. notation
phylopScore	float	phyloP conservation score. Range: -14.08 to 6.424

Reference Minor Alleles

Illumina Connected Annotations supports annotating reference minor alleles. In such a case, refAllele will be replaced by the global major allele and altAllele will be replaced with the original reference allele.

Transcripts

"transcripts":[
   {
      "transcript":"ENST00000445503.1",
      "source":"Ensembl",
      "bioType":"NMD_transcript_variant",
      "codons":"gGg/gAg",
      "aminoAcids":"G/E",
      "cdnaPos":"268/4158",
      "cdsPos":"116/483",
      "exons":"1/9",
      "introns":"1/8",
      "proteinPos":"39/160",
      "geneId":"ENSG00000116062",
      "hgnc":"MSH6",
      "consequence":[
         "missense_variant",
         "NMD_transcript_variant"
      ],
      "impact": "moderate",
      "hgvsc":"ENST00000445503.1:c.116G>A",
      "hgvsp":"ENSP00000405294.1:p.(Gly39Glu)",
      "geneFusion":{
         "exon":6,
         "intron":5,
         "fusions":[
            {
               "hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000437180.1}:c.58+568_1443",
               "exon":3,
               "intron":2
            },
            {
               "hgvsc":"ETV6{ENST00000396373.4}:c.1_1009+3402_RUNX1{ENST00000300305.3}:c.58+568_1443",
               "exon":2,
               "intron":1
            }
         ]
      },
      "isCanonical":true,
      "proteinId":"ENSP00000405294.1",
      "completeOverlap":true
   }
]

Field	Type	Notes
transcript	string	transcript ID. e.g. ENST00000445503.1
source	string	RefSeq / Ensembl
bioType	string	descriptions of the biotypes from Ensembl
codons	string
aminoAcids	string
cdnaPos	string	Format: start-end/Length
cdsPos	string	Format: start-end/Length
exons	string	exons affected by the variant
introns	string	introns affected by the variant
proteinPos	string	Format: start-end/Length
geneId	string	gene ID. e.g. ENSG00000116062
hgnc	string	gene symbol. e.g. MSH6
consequence	string array	Sequence Ontology Consequences
impact	string	See Consequence Impact for details
hgvsc	string	HGVS coding nomenclature
hgvsp	string	HGVS protein nomenclature
geneFusion	object	see Gene Fusions entry below
isCanonical	bool	true when this is a canonical transcript
isManeSelect	bool	true when this is a MANE select transcript
proteinId	string	protein ID. E.g. ENSP00000405294.1
completeOverlap	bool	true when this transcript is completely overlapped by the variant
cancerHotspots	string array	see Cancer Hotspots entry below

MANE Select

MANE select tags are only available for RefSeq transcripts on GRCh38.

Amino Acid Conservation

"aminoAcidConservation": {
    "scores": [0.34]
}

Field	Type	Notes
aminoAcidConservation	object
scores	object array of doubles	percent conserved with respect to human amino acid residue. Range: 0.01 - 1.00

Gene Fusions

Field	Type	Notes
exon	int	actual exon where the breakpoint was located
intron	int	actual intron where the breakpoint was located
fusions	object array	see Fusion entry below

Fusion

Field	Type	Notes
exon	int	actual exon where the other breakpoint was located
intron	int	actual intron where the other breakpoint was located
hgvsc	string	HGVS coding nomenclature describing the two genes and the transcripts that are fused along with

Cancer Hotspots

Field	Type	Notes
residue	string
numSamples	int	how many samples are associated with a variant at the same amino acid position
numAltAminoAcidSamples	int	how many samples are associated with a variant with the same position and alternate amino acid position
qValue	double

Regulatory Regions

"regulatoryRegions":[
   {
      "id":"ENSR00001542175",
      "type":"promoter",
      "consequence":[
         "regulatory_region_variant"
      ]
   }
]

Field	Type	Notes
id	string
type	string	see possible values below
consequence	string array	see possible values below

Regulatory Types

CTCF_binding_site
enhancer
open_chromatin_region
promoter
promoter_flanking_region
TF_binding_site

Regulatory Consequences

regulatory_region_variant
regulatory_region_ablation
regulatory_region_amplification
regulatory_region_truncation

ClinVar

small variants:

"clinvar":[
   {
      "id":"VCV000036581.3",
      "reviewStatus":"reviewed by expert panel",
      "significance":[
         "benign"
      ],
      "refAllele":"G",
      "altAllele":"A",
      "lastUpdatedDate":"2020-03-01",
      "isAlleleSpecific":true
   },
   {
      "id":"RCV000030258.4",
      "variationId":"VCV000036581.3",
      "reviewStatus":"reviewed by expert panel",
      "alleleOrigins":[
         "germline"
      ],
      "refAllele":"G",
      "altAllele":"A",
      "phenotypes":[
         "Lynch syndrome"
      ],
      "medGenIds":[
         "C1333990"
      ],
      "omimIds":[
         "120435"
      ],
      "significance":[
         "benign"
      ],
      "lastUpdatedDate":"2017-05-01",
      "isAlleleSpecific":true
   }
]

large variants:

"clinvar":[
   {
      "chromosome":"1", 
      "begin":629025, 
      "end":8537745, 
      "variantType":"copy_number_loss", 
      "id":"RCV000051993.4", 
      "variationId":"VCV000058242.1", 
      "reviewStatus":"criteria provided, single submitter", 
      "alleleOrigins":[
         "not provided"
      ], 
      "phenotypes":[
         "See cases"
      ], 
      "significance":[
         "pathogenic"
      ], 
      "lastUpdatedDate":"2022-04-21", 
      "pubMedIds":[
         "21844811"
      ]
   },
   {
      "id":"VCV000058242.1",
      "reviewStatus":"criteria provided, single submitter",
      "significance":[
         "pathogenic"
      ],
      "lastUpdatedDate":"2022-04-21"
   },
        ......
]

Field	Type	Notes
id	string	ClinVar ID
variationId	string	ClinVar VCV ID
variantType	string	variant type
reviewStatus	string	see possible values below
alleleOrigins	string array	see possible values below
refAllele	string
altAllele	string
phenotypes	string array
medGenIds	string array	MedGen IDs
omimIds	string array	OMIM IDs
orphanetIds	string array	Orphanet IDs
significance	string array	see possible values below
lastUpdatedDate	string	yyyy-MM-dd
pubMedIds	string array	PubMed IDs
isAlleleSpecific	bool	true when the current variant alternate allele matches the ClinVar alternate allele

reviewStatus:

no assertion provided
no assertion criteria provided
criteria provided, single submitter
practice guideline
classified by multiple submitters
criteria provided, conflicting interpretations
criteria provided, multiple submitters, no conflicts
no interpretation for the single variant

alleleOrigins:

unknown
other
germline
somatic
inherited
paternal
maternal
de-novo
biparental
uniparental
not-tested
tested-inconclusive

significance:

uncertain significance
not provided
benign
likely benign
likely pathogenic
pathogenic
drug response
histocompatibility
association
risk factor
protective
affects
conflicting data from submitters
other
no interpretation for the single variant
conflicting interpretations of pathogenicity

1000 Genomes

"oneKg":{
   "allAf":0.200879,
   "afrAf":0.210287,
   "amrAf":0.139769,
   "easAf":0.275794,
   "eurAf":0.181909,
   "sasAf":0.173824,
   "allAn":5008,
   "afrAn":1322,
   "amrAn":694,
   "easAn":1008,
   "eurAn":1006,
   "sasAn":978,
   "allAc":1006,
   "afrAc":278,
   "amrAc":97,
   "easAc":278,
   "eurAc":183,
   "sasAc":170
}

Field	Type	Notes
allAf	float	allele frequency for all populations. Range: 0 - 1.0
allAc	int	allele count for all populations. Integer.
allAn	int	allele number for all populations. Non-zero integer.
afrAf	float	allele frequency for the African super population. Range: 0 - 1.0
afrAc	int	allele count for the African super population. Integer.
afrAn	int	allele number for the African super population. Non-zero integer.
amrAf	float	allele frequency for the Ad Mixed American super population. Range: 0 - 1.0
amrAc	int	allele count for the Ad Mixed American super population. Integer.
amrAn	int	allele number for the Ad Mixed American super population. Non-zero integer.
easAf	float	allele frequency for the East Asian super population. Range: 0 - 1.0
easAc	int	allele count for the East Asian super population. Integer.
easAn	int	allele number for the East Asian super population. Non-zero integer.
eurAf	float	allele frequency for the European super population. Range: 0 - 1.0
eurAc	int	allele count for the European super population. Integer.
eurAn	int	allele number for the European super population. Non-zero integer.
sasAf	float	allele frequency for the South Asian super population. Range: 0 - 1.0
sasAc	int	allele count for the South Asian super population. Integer.
sasAn	int	allele number for the South Asian super population. Non-zero integer.

DANN

"dannScore": 0.27

Field	Type	Notes
dannScore	float	Range: 0 - 1.0

dbSNP

"dbsnp":[
   "rs1042821"
]

Field	Type	Notes
dbsnp	string array	dbSNP rsIDs

DECIPHER

"decipher":[
  {
    "chromosome":"1",
    "begin":13516,
    "end":91073,
    "numDeletions":27,
    "deletionFrequency":0.675,
    "numDuplications":27,
    "duplicationFrequency":0.675,
    "sampleSize":40,
    "reciprocalOverlap": 0.27555,
    "annotationOverlap": 0.5901
  }
],

Field	Type	Notes
chromosome	int	Ensembl-style chromosome names
begin	int	1-based position
end	int	1-based position
numDeletions	int	# of observed deletions
deletionFrequency	float	deletion frequency
numDuplications	int	# of observed duplications
duplicationFrequency	float	duplication frequency
sampleSize	int	total # of samples
reciprocalOverlap	float	Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap
annotationOverlap	float	Range: 0 - 1. E.g. 0.57 would indicate a 57% annotation overlap

GERP

"gerpScore": 1.27

Field	Type	Notes
gerpScore	float	Range: -∞ to +∞

GME Variome

"gmeVariome":{
   "allAc":10,
   "allAn":202,
   "allAf":0.049504,
   "failedFilter":true
}

Field	Type	Notes
allAc	int	GME allele count
allAn	int	GME allele number
allAf	float	GME allele frequency
failedFilter	bool	True if this variant failed any filters

gnomAD

"gnomad":{ 
   "coverage":20,
   "allAf":0.190317,
   "maleAf":0.193,
   "femaleAf": 0.1935,
   "afrAf":0.222876,
   "amrAf":0.121394,
   "easAf":0.239802,
   "finAf":0.136833,
   "nfeAf":0.181282,
   "asjAf":0.258278,
   "othAf":0.186094,
   "allAn":30796,
   "maleAn":15096,
   "femaleAn":15700
   "afrAn":8664,
   "amrAn":832,
   "easAn":1618,
   "finAn":3486,
   "nfeAn":14916,
   "asjAn":302,
   "othAn":978,
   "allAc":5861,
   "maleAc":2930,
   "femaleAc": 2931,
   "afrAc":1931,
   "amrAc":101,
   "easAc":388,
   "finAc":477,
   "nfeAc":2704,
   "asjAc":78,
   "othAc":182,
   "allHc":561,
   "afrHc":208,
   "amrHc":6,
   "easHc":42,
   "finHc":31,
   "nfeHc":242,
   "asjHc":13,
   "othHc":19,
   "maleHc":280,
   "femaleHc":281,
   "controlsAllAf":0.190317,
   "controlsAllAn":30796,
   "controlsAllAc":5861,
   "lowComplexityRegion":true,
   "failedFilter":true
}

Field	Type	Notes
coverage	int	average coverage (non-negative integer values)
allAf	float	allele frequency for all populations. Range: 0 - 1.0
maleAf	float	allele frequency for male population. Range: 0 - 1.0
femaleAf	float	allele frequency for female population. Range: 0 - 1.0
controlsAllAf	float	allele frequency for the controls subset. Range: 0 - 1.0
allAc	int	allele count for all populations. Integer.
maleAc	int	allele count for male population. Integer.
femaleAc	int	allele count for female population. Integer.
controlsAllAc	int	allele count for the controls subset. Integer.
allAn	int	allele number for all populations. Non-zero integer.
maleAn	int	allele number for male population. Non-zero integer.
femaleAn	int	allele number for female population. Non-zero integer.
controlsAllAn	int	allele number for the controls subset. Non-zero integer.
allHc	int	count of homozygous individuals for all populations. Non-negative integer.
maleHc	int	count of homozygous individuals for male population. Non-negative integer.
femaleHc	int	count of homozygous individuals for female population. Non-negative integer.
afrAf	float	allele frequency for the African / African American population. Range: 0 - 1.0
afrAc	int	allele count for the African / African American population. Integer.
afrAn	int	allele number for the African / African American population. Non-zero integer.
afrHc	int	count of homozygous individuals for African / African American population. Non-negative integer.
amrAf	float	allele frequency for the Latino population. Range: 0 - 1.0
amrAc	int	allele count for the Latino population. Integer.
amrAn	int	allele number for the Latino population. Non-zero integer.
amrHc	int	count of homozygous individuals for Latino population. Non-negative integer.
easAf	float	allele frequency for the East Asian population. Range: 0 - 1.0
easAc	int	allele count for the East Asian population. Integer.
easAn	int	allele number for the East Asian population. Non-zero integer.
easHc	int	count of homozygous individuals for East Asian population. Non-negative integer.
finAf	float	allele frequency for the Finnish population. Range: 0 - 1.0
finAc	int	allele count for the Finnish population. Integer.
finAn	int	allele number for the Finnish population. Non-zero integer.
finHc	int	count of homozygous individuals for Finnish population. Non-negative integer
nfeAf	float	allele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAc	int	allele count for the Non-Finnish European population. Integer.
nfeAn	int	allele number for the Non-Finnish European population. Non-zero integer.
nfeHc	int	count of homozygous individuals for Non-Finnish European population. Non-negative integer
othAf	float	allele frequency for the Other population. Range: 0 - 1.0
othAc	int	allele count for the Other population. Integer.
othAn	int	allele number for the Other population. Non-zero integer.
othHc	int	count of homozygous individuals for Other population. Non-negative integer
asjAf	float	allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAc	int	allele count for the Ashkenazi Jewish population Integer.
asjAn	int	allele number for the Ashkenazi Jewish population. Non-zero integer.
asjHc	int	count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAf	float	allele frequency for the South Asian population. Range: 0 - 1.0
sasAc	int	allele count for the South Asian population Integer.
sasAn	int	allele number for the South Asian population. Non-zero integer.
sasHc	int	count of homozygous individuals for the South Asian population. Non-negative integer.
failedFilter	bool	True if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegion	bool	True if this variant is located in a low complexity region.

MITOMAP

"mitomap":[ 
   { 
      "refAllele":"G",
      "altAllele":"A",
      "diseases":[ 
         "Bipolar disorder",
         "Melanoma"
      ],
      "hasHomoplasmy":false,
      "hasHeteroplasmy":true,
      "status":"Reported",
      "clinicalSignificance":"confirmed pathogenic",
      "scorePercentile":83.30,
      "numGenBankFullLengthSeqs":2,
      "pubMedIds":["2316527","6299878","6301949"],
      "isAlleleSpecific":true
   }
]

Field	Type	Notes
refAllele	string
altAllele	string
diseases	string array	associated diseases
hasHomoplasmy	boolean
hasHeteroplasmy	boolean
status	string	record status
clinicalSignificance	string	predicted pathogenicity
scorePercentile	float	MitoTIP score
numGenBankFullLengthSeqs	integer	# of GenBank full-length sequences
pubMedIds	string array
isAlleleSpecific	boolean	true when the current variant alternate allele matches the MITOMAP alternate allele

Primate AI

"primateAI-3D": [
  {
    "aminoAcidPosition": 2,
    "refAminoAcid": "V",
    "altAminoAcid": "M",
    "score": 0.616944,
    "scorePercentile": 0.52,
    "genePercentile": 0.7,
    "classification": "pathogenic",    
    "ensemblTranscriptId": "ENST00000335137.4",
    "refSeqTranscriptId": "NM_001005484.1",
    "geneSymbol":"OR4F5" 
  }
]

Field	Type	Notes
aminoAcidPosition	int	Amino Acid Position (1-based)
refAminoAcid	string	Reference Amino Acid
altAminoAcid	string	Alternate Amino Acid
ensemblTranscriptId	string	Transcript ID (Ensembl)
refSeqTranscriptId	string	Transcript ID (RefSeq)
scorePercentile	float	range: 0 - 1.0
genePercentile	float	range: 0 - 1.0
score	float	range: 0 - 1.0
classification	string	pathogenic or benign classification
geneSymbol	string	HGNC gene symbol

REVEL

"revel":{ 
   "score":0.027
}

Field	Type	Notes
score	float	Range: 0 - 1.0

Splice AI

"spliceAI":[ 
   {
      "hgnc":"BLCAP",
      "acceptorGainDistance":-3,
      "acceptorGainScore":0.3,
      "donorLossDistance":7,
      "donorLossScore":0.9
   },
   { 
      "hgnc":"NNAT",
      "acceptorGainDistance":-1,
      "acceptorGainScore":0.2,
      "donorGainDistance":-2,
      "donorGainScore":0.3
   }
]

Field	Type	Notes
hgnc	string	HGNC gene symbol
acceptorGainDistance	int	± bp from current position
acceptorGainScore	float	range: 0 - 1.0. 1 decimal place
acceptorLossDistance	int	± bp from current position
acceptorLossScore	float	range: 0 - 1.0. 1 decimal place
donorGainDistance	int	± bp from current position
donorGainScore	float	range: 0 - 1.0. 1 decimal place
donorLossDistance	int	± bp from current position
donorLossScore	float	range: 0 - 1.0. 1 decimal place

TOPMed

"topmed":{ 
   "allAc":20,
   "allAn":125568,
   "allAf":0.000159,
   "allHc":0,
   "failedFilter":true
}

Field	Type	Notes
allAc	int	TOPMed allele count
allAn	int	TOPMed allele number. Non-zero integer.
allAf	float	TOPMed allele frequency (computed by Illumina Connected Annotations)
allHc	int	TOPMed homozygous count
failedFilter	bool	True if this variant failed any filters

Genes

Illumina Connected Annotations reports gene annotations for all genes that have an overlapping variant with the exception of flanking variants (i.e. variants that only cause upstream_gene_variant or downstream_gene_variant).

"genes":[
   {
      "name":"MSH6",
      "hgncId":7329,
      "summary":"This gene encodes a member of the DNA mismatch repair MutS family. In E. coli, the MutS protein helps in the recognition of mismatched nucleotides prior to their repair. A highly conserved region of approximately 150 aa, called the Walker-A adenine nucleotide binding motif, exists in MutS homologs. The encoded protein heterodimerizes with MSH2 to form a mismatch recognition complex that functions as a bidirectional molecular switch that exchanges ADP and ATP as DNA mismatches are bound and dissociated. Mutations in this gene may be associated with hereditary nonpolyposis colon cancer, colorectal cancer, and endometrial cancer. Transcripts variants encoding different isoforms have been described. [provided by RefSeq, Jul 2013]",
      /* this is where gene-level data sources can be found e.g. OMIM */
   }
]

Field	Type	Notes
name	string	HGNC gene symbol
hgncId	int	HGNC ID
summary	string	short description of the gene from OMIM

OMIM

"omim":[ 
   { 
      "mimNumber":600678,
      "geneName":"MutS, E. coli, homolog of, 6",
      "description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
      "phenotypes":[ 
         { 
            "mimNumber":614350,
            "phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
            "description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
            "mapping":"molecular basis of the disorder is known",
            "inheritances":[ 
               "Autosomal dominant"
            ]
         },
         { 
            "mimNumber":608089,
            "phenotype":"Endometrial cancer, familial",
            "mapping":"molecular basis of the disorder is known"
         },
         { 
            "mimNumber":276300,
            "phenotype":"Mismatch repair cancer syndrome",
            "description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
            "mapping":"molecular basis of the disorder is known",
            "inheritances":[ 
               "Autosomal recessive"
            ],
            "comments"     : [
                "contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
                "unconfirmed or possibly spurious mapping"
            ]
         }
      ]
   }
]

Field	Type	Notes
mimNumber	int	OMIM ID for gene
geneName	string	gene name
description	string
phenotypes	object array	see Phenotype entry below

Phenotype

Field	Type	Notes
mimNumber	int
phenotype	string
description	string
mapping	string	see possible values below
inheritance	string array	see possible values below
comments	string array	see possible values below

Mapping

disorder was positioned by mapping of the wild type gene
disease phenotype itself was mapped
molecular basis of the disorder is known
disorder is a chromosome deletion or duplication syndrome

Inheritance

autosomal recessive
autosomal dominant

Comments

contributes to the susceptibility to multifactorial disorders
variations that lead to apparently abnormal laboratory test values
unconfirmed mapping

gnomAD LoF Gene Metrics

"gnomAD":{ 
   "pLi":1.00e0,
   "pNull":8.94e-40,
   "pRec":1.84e-16,
   "synZ":-8.44e-2,
   "misZ":5.96e-1,
   "loeuf":1.13e0
}

Field	Type	Notes
pLi	float	probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNull	float	probability of being completely tolerant of loss of function variation (observed = expected)
pRec	float	probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZ	float	corrected synonymous Z score
misZ	float	corrected missense Z score
loeuf	float	loss of function observed/expected upper bound fraction (LOEUF)

ClinGen Disease Validity

"clingenGeneValidity":[
   {
      "diseaseId":"MONDO_0007893",
      "disease":"Noonan syndrome with multiple lentigines",
      "classification":"no reported evidence",
      "classificationDate":"2018-06-07"
   },
   {
      "diseaseId":"MONDO_0015280",
      "disease":"cardiofaciocutaneous syndrome",
      "classification":"no reported evidence",
      "classificationDate":"2018-06-07"
   }
]

Field	Type	Notes
clingenGeneValidity	object
diseaseId	string	Monarch Disease Ontology ID (MONDO)
disease	string	disease label
classification	string	see below for possible values
classificationDate	string	yyyy-MM-dd

classification

no reported evidence
disputed
limited
moderate
definitive
strong
refuted
no known disease relationship

COSMIC Cancer Gene Census

   {
  "name": "PRDM16",
  "ensemblGeneId": "ENSG00000142611",
  "ncbiGeneId": "63976",
  "hgncId": 14000,
  "cosmic": {
    "tier": 1,
    "roleInCancer": [
      "oncogene",
      "fusion"
    ]
  }
}

Field	Type	Notes
roleInCancer	string array	Possible roles in caner
tier	number	Cosmic tiers [1, 2]

Samples

ISCN

ISCN-like Simple Nomenclature for CNV and Ploidy VCFs by encoding chromosomal variations using cytogenetic banding patterns, are provided at the sample level.

"samples":[
  {
    "id":"LP0129_C06_ATCC2323_13112017_S1_Proband",
    "simpleNomenclature":"46,XX"
  },
  {
    "id":"LP0129_F05_ATCC2322T_13112017_S1_Proband",
    "simpleNomenclature":"46,XX"
  }
]

Field	Type	Notes
id	string	Unique identifier for the sample
simpleNomenclature	string	ISCN-like representation of the karyotype

Overview​

Conventions​

JSON Layout​

info

Parsing​

info

Header​

Data Source​

Genome Assemblies​

Positions​

ClinGen​

1000 Genomes (SV)​

gnomAD (SV)​

MITOMAP (SV)​

Samples​

Empty Samples

Variants​

Reference Minor Alleles

Transcripts​

MANE Select

Amino Acid Conservation​

Gene Fusions​

Fusion​

Cancer Hotspots​

Regulatory Regions​

Regulatory Types​

Regulatory Consequences​

ClinVar​

1000 Genomes​

DANN​

dbSNP​

DECIPHER​

GERP​

GME Variome​

gnomAD​

MITOMAP​

Primate AI​

REVEL​

Splice AI​

TOPMed​

Genes​

OMIM​

Phenotype​

Mapping​

Inheritance​

Comments​

gnomAD LoF Gene Metrics​

ClinGen Disease Validity​

COSMIC Cancer Gene Census​

Samples​

ISCN​

Overview

Conventions

JSON Layout

Parsing

Header

Data Source

Genome Assemblies

Positions

ClinGen

1000 Genomes (SV)

gnomAD (SV)

MITOMAP (SV)

Samples

Variants

Transcripts

Amino Acid Conservation

Gene Fusions

Fusion

Cancer Hotspots

Regulatory Regions

Regulatory Types

Regulatory Consequences

ClinVar

1000 Genomes

DANN

dbSNP

DECIPHER

GERP

GME Variome

gnomAD

MITOMAP

Primate AI

REVEL

Splice AI

TOPMed

Genes

OMIM

Phenotype

Mapping

Inheritance

Comments

gnomAD LoF Gene Metrics

ClinGen Disease Validity

COSMIC Cancer Gene Census

Samples

ISCN