ClinGen
Overview
ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.
Publication
Heidi L. Rehm, Ph.D., Jonathan S. Berg, M.D., Ph.D., Lisa D. Brooks, Ph.D., Carlos D. Bustamante, Ph.D., James P. Evans, M.D., Ph.D., Melissa J. Landrum, Ph.D., David H. Ledbetter, Ph.D., Donna R. Maglott, Ph.D., Christa Lese Martin, Ph.D., Robert L. Nussbaum, M.D., Sharon E. Plon, M.D., Ph.D., Erin M. Ramos, Ph.D., Stephen T. Sherry, Ph.D., and Michael S. Watson, Ph.D., for ClinGen. ClinGen The Clinical Genome Resource. N Engl J Med 2015; 372:2235-2242 June 4, 2015 DOI: 10.1056/NEJMsr1406261.
ISCA Regions
TSV Extraction
ClinGen contains only copy number variation variants, since the coordinates in ClinGen original file follow the same rule as BED format, the coordinates had to be adjusted to [BEGIN+1, END].
#bin chrom chromStart chromEnd name score strand thickStart thickEnd attrCount attrTags attrVals
nsv530705 1 564405 8597804 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530706 1 564424 3262790 0 1 copy_number_loss pathogenic False Abnormal facial shape,Abnormality of cardiac morphology,Global developmental delay,Muscular hypotonia HP:0001252,HP:0001263,HP:0001627,HP:0001999,MedGen:CN001147,MedGen:CN001157,MedGen:CN001482,MedGen:CN001810
nsv530707 1 564424 7068738 0 1 copy_number_loss pathogenic False Abnormality of cardiac morphology,Cleft upper lip,Failure to thrive,Global developmental delay,Intrauterine growth retardation,Microcephaly,Short stature HP:0000204,HP:0000252,HP:0001263,HP:0001508,HP:0001511,HP:0001627,HP:0004322,MedGen:C0349588,MedGen:C1845868,MedGen:C1853481,MedGen:C2364119,MedGen:CN000197,MedGen:CN001157,MedGen:CN001482
nsv533512 1 564435 649748 0 1 copy_number_loss benign False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv931338 1 714078 4958499 0 1 copy_number_loss pathogenic False Developmental delay AND/OR other significant developmental or morphological phenotypes
nsv530300 1 728138 5066371 1 0 copy_number_gain pathogenic False Abnormality of cardiac morphology,Cleft palate,Global developmental delay HP:0000175,HP:0001263,HP:0001627,MedGen:C2240378,MedGen:CN001157,MedGen:CN001482
Status levels
- benign
- curated benign
- curated pathogenic
- likely benign
- likely pathogenic
- path gain
- path loss
- pathogenic
- uncertain
Parsing
We parse the ClinGen tsv file and extract the following:
- chrom
- chromStart (note this a 0-based coordinate)
- chromEnd
- attrTags
- attrVals
attrTags
and attrVals
are comma separated lists. attrTags
contains the field keys and attrVals
contains the field values. We will parse the following keys from the two fields:
- parent (this will be used as the ID in our JSON output)
- clinical_int
- validated
- phenotype (this should be a string array)
- phenotype_id (this should be a string array)
Observed losses and observed gains will be calculated from entries that share a common parent ID.
- variants with a common parent ID and same coordinates are grouped
- calculated observed losses, observed gains for each group
- Clinical significance and validation status are collapsed using the priority strategy described below
- Variants with the same parent ID can have different coordinates (mapped to hg38)
- nsv491508 : chr14:105583663-106881350 and chr14:105605043-106766076 (only one example)
- we kept both variants
Conflict Resolution
Clinical significance priority
When there are a mixture of variants belonging to the same parent ID, we will choose the most pathogenic clinical significance from the available values. i.e. if 3 samples were deemed pathogenic and 2 samples were likely pathogenic, we would list the variant as pathogenic.
Priority (high to low)
- Priority
- Pathogenic
- Likely pathogenic
- Benign
- Likely benign
- Uncertain significance
Validation Priority
When there are a mixture of variants belonging to same parent ID, we will set the validation status to true if any of the variants were validated.
Download URL
https://cirm.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=iscaComposite
JSON Output
"clingen":[
{
"chromosome":"17",
"begin":525,
"end":14667519,
"variantType":"copy_number_gain",
"id":"nsv996083",
"clinicalInterpretation":"pathogenic",
"observedGains":1,
"validated":true,
"phenotypes":[
"Intrauterine growth retardation"
],
"phenotypeIds":[
"HP:0001511",
"MedGen:C1853481"
],
"reciprocalOverlap":0.00131
},
{
"chromosome":"17",
"begin":45835,
"end":7600330,
"variantType":"copy_number_loss",
"id":"nsv869419",
"clinicalInterpretation":"pathogenic",
"observedLosses":1,
"validated":true,
"phenotypes":[
"Developmental delay AND/OR other significant developmental or morphological phenotypes"
],
"reciprocalOverlap":0.00254
}
]
Field | Type | Notes |
---|---|---|
clingen | object array | |
chromosome | string | Ensembl-style chromosome names |
begin | integer | 1-based position |
end | integer | 1-based position |
variantType | string | Any of the sequence alterations defined here. |
id | string | Identifier from the data source. Alternatively a VID |
clinicalInterpretation | string | see possible values below |
observedGains | integer | Range: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain. |
observedLosses | integer | Range: 0 - (231 - 1). Only used if copy_number_variation, copy_number_loss, or copy_number_gain. |
validated | boolean | |
phenotypes | string array | Description of the phenotype. |
phenotypeIds | string array | Description of the phenotype IDs. |
reciprocalOverlap | floating point | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions). |
clinicalInterpretation
- benign
- curated benign
- curated pathogenic
- likely benign
- likely pathogenic
- path gain
- path loss
- pathogenic
- uncertain
Dosage Sensitivity Map
The Clinical Genome Resource (ClinGen) consortium is curating genes and regions of the genome to assess whether there is evidence to support that these genes/regions are dosage sensitive and should be targeted on a cytogenomic array. Illumina Connected Annotations reports these annotations for overlapping SVs.
Publication
Riggs ER, Nelson T, Merz A, Ackley T, Bunke B, Collins CD, Collinson MN, Fan YS, Goodenberger ML, Golden DM, Haglund-Hazy L, Krgovic D, Lamb AN, Lewis Z, Li G, Liu Y, Meck J, Neufeld-Kaiser W, Runke CK, Sanmann JN, Stavropoulos DJ, Strong E, Su M, Tayeh MK, Kokalj Vokac N, Thorland EC, Andersen E, Martin CL. Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar. Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610. PMID: 30095202; PMCID: PMC7374944.
TSV Source files
Regions
#ClinGen Region Curation Results
#07 May,2019
#Genomic Locations are reported on GRCh38 (hg38): GCF_000001405.36
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_region.cgi?id=key
#ISCA ID ISCA Region Name cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
ISCA-46299 Xp11.22 region (includes HUWE1) Xp11.22 tbd 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 22840365 20655035 26692240 2018-11-19
ISCA-46295 15q13.3 recurrent region (D-CHRNA7 to BP5) (includes CHRNA7 and OTUD7A) 15q13.3 chr15:31727418-32153204 3 Sufficient evidence for dosage pathogenicity 19898479 20236110 22775350 40 Dosage sensitivity unlikely 26968334 22420048 2018-05-10
ISCA-46291 7q11.23 recurrent distal region (includes HIP1, YWHAG) 7q11.23 chr7:75528718-76433859 2 Some evidence for dosage pathogenicity 21109226 16971481 1 Little evidence for dosage pathogenicity 21109226 27867344 2018-12-31
ISCA-46290 Xp11.22p11.23 recurrent region (includes SHROOM4) Xp11.22-p11.23 chrX: 48447780-52444264 0 No evidence available 3 Sufficient evidence for dosage pathogenicity 19716111 21418194 25425167 2017-12-14 300801
Genes
#ClinGen Gene Curation Results
#24 May,2019
#Genomic Locations are reported on GRCh37 (hg19): GCF_000001405.13
#https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen
#to create link: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/clingen_gene.cgi?sym=Gene Symbol
#Gene Symbol Gene ID cytoBand Genomic Location Haploinsufficiency Score Haploinsufficiency Description Haploinsufficiency PMID1 Haploinsufficiency PMID2 Haploinsufficiency PMID3 Triplosensitivity Score Triplosensitivity Description Triplosensitivity PMID1 Triplosensitivity PMID2 Triplosensitivity PMID3 Date Last Evaluated Loss phenotype OMIM ID Triplosensitive phenotype OMIM ID
A4GALT 53947 22q13.2 chr22:43088121-43117307 30 Gene associated with autosomal recessive phenotype 0 No evidence available 2014-12-11 111400
AAGAB 79719 15q23 chr15:67493013-67547536 3 Sufficient evidence for dosage pathogenicity 23064416 23000146 0 No evidence available 2013-02-28 148600
Dosage Rating System
Rating | Possible Clinical Interpretation |
---|---|
0 | No evidence to suggest that dosage sensitivity is associated with clinical phenotype |
1 | Little evidence suggesting dosage sensitivity is associated with clinical phenotype |
2 | Emerging evidence suggesting dosage sensitivity is associated with clinical phenotype |
3 | Sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype |
30 | Gene associated with autosomal recessive phenotype |
40 | Dosage sensitivity unlikely |
Reference: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/help.shtml
Download URL
JSON Output
"clingenDosageSensitivityMap": [{
"chromosome": "15",
"begin": 30900686,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "little evidence suggesting dosage sensitivity is associated with clinical phenotype",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 0.33994
},
{
"chromosome": "15",
"begin": 31727418,
"end": 32153204,
"haploinsufficiency": "sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype",
"triplosensitivity": "dosage sensitivity unlikely",
"reciprocalOverlap": 0.00147,
"annotationOverlap": 1
}]
Field | Type | Notes |
---|---|---|
clingenDosageSensitivityMap | object array | |
chromosome | string | Ensembl-style chromosome names |
begin | integer | 1-based position |
end | integer | 1-based position |
haploinsufficiency | string | see possible values below |
triplosensitivity | string | (same as haploinsufficiency) |
reciprocalOverlap | floating point | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions). |
annotationOverlap | floating point | Range: 0 - 1. E.g. 0.57 would indicate a 57% reciprocal overlap. Specified up to 5 decimal places (Not reported for Insertions). |
haploinsufficiency and triplosensitivity
- no evidence to suggest that dosage sensitivity is associated with clinical phenotype
- little evidence suggesting dosage sensitivity is associated with clinical phenotype
- emerging evidence suggesting dosage sensitivity is associated with clinical phenotype
- sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype
- gene associated with autosomal recessive phenotype
- dosage sensitivity unlikely
Building the supplementary files
The gene dosage sensitivity .nga
for Illumina Connected Annotations can be built using the SAUtils
command's DosageSensitivity
subcommand. The required data file is ClinGen_gene_curation_list_{ASSEMBLY}.tsv
(url provided above) and its associated .version
file.
NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)
Here is a sample run:
dotnet SAUtils.dll DosageSensitivity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
USAGE: dotnet SAUtils.dll dosagesensitivity [options]
Creates a gene annotation database from dbVar data
OPTIONS:
--tsv, -t <VALUE> input tsv file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version
dotnet SAUtils.dll DosageSensitivity --out SupplementaryDatabase/64/GRCh37 --tsv ClinGen_gene_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
Time: 00:00:00.1
For building the .nsi
files, we use the SAUtils
command's DosageMapRegions
subcommand. The required data file is ClinGen_region_curation_list_{ASSEMBLY}.tsv
(url provided above) and its associated .version
file.
NAME=ClinGen Dosage Sensitivity Map
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Dosage sensitivity map from ClinGen (dbVar)
Here is a sample run:
dotnet SAUtils.dll DosageMapRegions
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
USAGE: dotnet SAUtils.dll dosagemapregions [options]
Creates an interval annotation database from dbVar data
OPTIONS:
--tsv, -t <VALUE> input tsv file
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version
dotnet SAUtils.dll DosageMapRegions --out SupplementaryDatabase/64/GRCh37 --ref References/7/Homo_sapiens.GRCh37.Nirvana.dat --tsv ClinGen_region_curation_list_GRCh37.tsv
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
Writing 505 intervals to database...
Time: 00:00:00.1
You can also use SAUtils
command's subcommands AutoDownloadGenerate
to generate ClinGen files. To use AutoDownloadGenerate
, read more in SAUtils
section.
Gene-Disease Validity
The ClinGen Gene-Disease Clinical Validity curation process involves evaluating the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. Illumina Connected Annotations reports these annotations for genes in the genes section of the JSON.
Publication
Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. doi:10.1016/j.ajhg.2017.04.015
Source TSV
The source data comes in a CSV file that we convert to a TSV.
CLINGEN GENE VALIDITY CURATIONS
FILE CREATED: 2019-05-28
WEBPAGE: https://search.clinicalgenome.org/kb/gene-validity
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
GENE SYMBOL,GENE ID (HGNC),DISEASE LABEL,DISEASE ID (MONDO),SOP,CLASSIFICATION,ONLINE REPORT,CLASSIFICATION DATE
+++++++++++,++++++++++++++,+++++++++++++,++++++++++++++++++,+++++++++,++++++++++++++,+++++++++++++,+++++++++++++++++++
A2ML1,HGNC:23336,Noonan syndrome with multiple lentigines,MONDO_0007893,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/59b87033-dd91-4f1e-aec1-c9b1f5124b16--2018-06-07T14:37:47,2018-06-07T14:37:47.175Z
A2ML1,HGNC:23336,cardiofaciocutaneous syndrome,MONDO_0015280,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/fc3c41d8-8497-489b-a350-c9e30016bc6a--2018-06-07T14:31:03,2018-06-07T14:31:03.696Z
A2ML1,HGNC:23336,Costello syndrome,MONDO_0009026,SOP5,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/ea72ba8d-cf62-44bc-86be-da64e3848eba--2018-06-07T14:34:05,2018-06-07T14:34:05.324Z
Download URL
https://search.clinicalgenome.org/kb/downloads#section_gene-disease-validity
Conflict Resolution
Multiple Classifications
Here is an example of multiple classifications.
$ grep MONDO_0010192 ClinGen-Gene-Disease-Summary-2019-12-02.csv | grep EDNRB
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Moderate,https://search.clinicalgenome.org/kb/gene-validity/d7abbd45-7915-437b-849b-dea876bfc2f5--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
EDNRB,HGNC:3180,Waardenburg syndrome type 4A,MONDO_0010192,SOP6,Limited,https://search.clinicalgenome.org/kb/gene-validity/73ee9727-60c1-40fd-830f-08c2b513d2ee--2018-05-08T04:00:00,2018-05-08T04:00:00.000Z
In such cases, we select the more severe classification.
Multiple Dates
$ grep MONDO_0016419 ClinGen-Gene-Disease-Summary-2019-12-02.csv | grep MUTYH
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9904,2017-05-24T00:00:00
MUTYH,HGNC:7527,hereditary breast carcinoma,MONDO_0016419,SOP4,No Reported Evidence,https://search.clinicalgenome.org/kb/gene-validity/9902,2017-05-25T00:00:00
If the classifications are the same, we should select the latest classification date.
JSON Output
"clingenGeneValidity":[
{
"diseaseId":"MONDO_0007893",
"disease":"Noonan syndrome with multiple lentigines",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
},
{
"diseaseId":"MONDO_0015280",
"disease":"cardiofaciocutaneous syndrome",
"classification":"no reported evidence",
"classificationDate":"2018-06-07"
}
]
Field | Type | Notes |
---|---|---|
clingenGeneValidity | object | |
diseaseId | string | Monarch Disease Ontology ID (MONDO) |
disease | string | disease label |
classification | string | see below for possible values |
classificationDate | string | yyyy-MM-dd |
classification
- no reported evidence
- disputed
- limited
- moderate
- definitive
- strong
- refuted
- no known disease relationship
Building the supplementary files
The gene disease validity .nga
for Illumina Connected Annotations can be built using the SAUtils
command's DiseaseValidity
subcommand. The only required data file is Clingen-Gene-Disease-Summary-2021-12-01.tsv
(url provided above) and its associated .version
file.
NAME=ClinGen disease validity curations
VERSION=20211201
DATE=2021-12-01
DESCRIPTION=Disease validity curations from ClinGen (dbVar)
Here is a sample run:
dotnet SAUtils.dll DiseaseValidity
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
USAGE: dotnet SAUtils.dll diseasevalidity [options]
Creates a gene annotation database from ClinGen gene validity data
OPTIONS:
--csv, -i <VALUE> ClinGen gene validity file path
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version
dotnet SAUtils.dll DiseaseValidity --tsv Clingen-Gene-Disease-Summary-2021-12-01.tsv \\
--uga Cache --out SupplementaryDatabase
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
Number of geneIds missing from the cache:0 (0%)
Time: 00:00:00.2
You can also use SAUtils
command's subcommands AutoDownloadGenerate
to generate ClinGen files. To use AutoDownloadGenerate
, read more in SAUtils
section.