OMIM
Overview
OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily.
Publications
Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019 Jan 8;47(D1):D1038-D1043. doi:10.1093/nar/gky1151. PMID: 30445645.
Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. PMID: 25428349.
Professional data source
This is a Professional data source and is not available freely. Please contact annotation_support@illumina.com if you would like to obtain it.
Parse OMIM data
Illumina Connected Annotations uses gene symbols as the gene identifiers internally. To generate the OMIM database, we first map the MIM numbers, which are the primary identifiers used by OMIM, to gene symbols supported by Illumina Connected Annotations. Please note that there can be multiple MIM numbers mapped to one gene symbol. Only MIM numbers successfully mapped to an Illumina Connected Annotations gene symbol are further processed. The OMIM API is used to fetch all the information associated with a gene MIM number, except the gene symbols.
mim2gene.txt
This mim2gene.txt (http://omim.org/static/omim/data/mim2gene.txt) file provides the mapping between MIM numbers and gene symbols. An example of this file is given below:
# MIM Number MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq) Entrez Gene ID (NCBI) Approved Gene Symbol (HGNC) Ensembl Gene ID (Ensembl)
100050 predominantly phenotypes
100070 phenotype 100329167
100100 phenotype
100200 predominantly phenotypes
100300 phenotype
100500 moved/removed
100600 phenotype
100640 gene 216 ALDH1A1 ENSG00000165092
100650 gene/phenotype 217 ALDH2 ENSG00000111275
100660 gene 218 ALDH3A1 ENSG00000108602
100670 gene 219 ALDH1B1 ENSG00000137124
100675 predominantly phenotypes
100678 gene 39 ACAT2 ENSG00000120437
The information in the "Entrez Gene ID (NCBI)", "Approved Gene Symbol (HGNC)" and "Ensembl Gene ID (Ensembl)" columns are used to find the proper gene symbol supported by Illumina Connected Annotations, which may or may not be the same as the gene symbol listed here.
OMIM API
Illumina Connected Annotations retrieves the OMIM annotations from the OMIM API JSON responses. The "entry" handler is used to fetch all the annotations associated with a given OMIM gene. A sample JSON response from the API is provided there.
{
"omim": {
"version": "1.0",
"entryList": [
{
"entry": {
"prefix": "*",
"mimNumber": 100640,
"status": "live",
"titles": {
"preferredTitle": "ALDEHYDE DEHYDROGENASE 1 FAMILY, MEMBER A1; ALDH1A1",
"alternativeTitles": "ALDEHYDE DEHYDROGENASE 1; ALDH1;;\nACETALDEHYDE DEHYDROGENASE 1;;\nALDH, LIVER CYTOSOLIC;;\nRETINAL DEHYDROGENASE 1; RALDH1"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985})."
}
}
],
"geneMap": {
"sequenceID": 7709,
"chromosome": 9,
"chromosomeSymbol": "9",
"chromosomeSort": 225,
"chromosomeLocationStart": 72900670,
"chromosomeLocationEnd": 72953052,
"transcript": "ENST00000297785.7",
"cytoLocation": "9q21",
"computedCytoLocation": "9q21.13",
"mimNumber": 100640,
"geneSymbols": "ALDH1A1",
"geneName": "Aldehyde dehydrogenase-1 family, member A1, soluble",
"mappingMethod": "REa, A",
"confidence": "P",
"mouseGeneSymbol": "Aldh1a1",
"mouseMgiID": "MGI:1353450",
"geneInheritance": null
},
"externalLinks": {
"geneIDs": "216",
"hgncID": "402",
"ensemblIDs": "ENSG00000165092,ENST00000297785.8",
"approvedGeneSymbols": "ALDH1A1",
"ncbiReferenceSequences": "1519246465",
"proteinSequences": "194378740,211947843,2183299,178400,119582947,119582948,178372,40807656,194375548,30582681,209402710,4262707,194739599,4261625,178394,261487497,16306661,21361176,32815082,118495,62089228",
"uniGenes": "Hs.76392",
"swissProtIDs": "P00352",
"decipherGene": false,
"umlsIDs": "C1412333",
"gtr": true,
"cmgGene": false,
"keggPathways": true,
"gwasCatalog": false,
}
}
},
{
"entry": {
"prefix": "*",
"mimNumber": 102560,
"status": "live",
"titles": {
"preferredTitle": "ACTIN, GAMMA-1; ACTG1",
"alternativeTitles": "ACTIN, GAMMA; ACTG;;\nCYTOSKELETAL GAMMA-ACTIN;;\nACTIN, CYTOPLASMIC, 2"
},
"textSectionList": [
{
"textSection": {
"textSectionName": "description",
"textSectionTitle": "Description",
"textSectionContent": "Actins are a family of highly conserved cytoskeletal proteins that play fundamental roles in nearly all aspects of eukaryotic cell biology. The ability of a cell to divide, move, endocytose, generate contractile force, and maintain shape is reliant upon functional actin-based structures. Actin isoforms are grouped according to expression patterns: muscle actins predominate in striated and smooth muscle (e.g., ACTA1, {102610}, and ACTA2, {102620}, respectively), whereas the 2 cytoplasmic nonmuscle actins, gamma-actin (ACTG1) and beta-actin (ACTB; {102630}), are found in all cells ({13:Sonnemann et al., 2006})."
}
}
],
"geneMap": {
"sequenceID": 13666,
"chromosome": 17,
"chromosomeSymbol": "17",
"chromosomeSort": 947,
"chromosomeLocationStart": 81509970,
"chromosomeLocationEnd": 81512798,
"transcript": "ENST00000331925.7",
"cytoLocation": "17q25.3",
"computedCytoLocation": "17q25.3",
"mimNumber": 102560,
"geneSymbols": "ACTG1, DFNA20, DFNA26, BRWS2",
"geneName": "Actin, gamma-1",
"mappingMethod": "REa, A, Fd",
"confidence": "C",
"mouseGeneSymbol": "Actg1",
"mouseMgiID": "MGI:87906",
"geneInheritance": null,
"phenotypeMapList": [
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Baraitser-Winter syndrome 2",
"phenotypeMimNumber": 614583,
"phenotypicSeriesNumber": "PS243310",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
},
{
"phenotypeMap": {
"mimNumber": 102560,
"phenotype": "Deafness, autosomal dominant 20/26",
"phenotypeMimNumber": 604717,
"phenotypicSeriesNumber": "PS124900",
"phenotypeMappingKey": 3,
"phenotypeInheritance": "Autosomal dominant"
}
}
]
}
}
}
]
}
}
Content from the OMIM API JSON response is reorganized as shown in the Illumina Connected Annotations JSON Output
Mappings between the Illumina Connected Annotations JSON output and OMIM JSON API are listed in the table below:
Illumina Connected Annotations JSON key chain | OMIM API JSON key chain |
---|---|
omim:mimNumber | omim:entryList:entry:mimNumber |
omim:geneName | omim:entryList:entry:geneMap:geneName |
omim:description | omim:entryList:entry:textSectionList:textSection:textSectionContent |
omim:phenotypes:mimNumber | omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:mimNumber |
omim:phenotypes:phenotype | omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype |
omim:phenotypes:description | omim:entryList:entry:textSectionList:textSection:textSectionContent |
omim:phenotypes:mapping | omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeMappingKey (see mapping below) |
omim:phenotypes:inheritances | omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotypeInheritance |
omim:phenotypes:comments | omim:entryList:entry:geneMap:phenotypeMapList:phenotypeMap:phenotype (see mapping below) |
Mapping key to content
1
to disorder was positioned by mapping of the wild type gene
2
to disease phenotype itself was mapped
3
to molecular basis of the disorder is known
4
to disorder is a chromosome deletion or duplication syndrome
Phenotype character to comment
?
to unconfirmed or possibly spurious mapping
[
/]
to nondiseases
{
/}
to contribute to susceptibility to multifactorial disorders or to susceptibility to infection
Remove links in OMIM descriptions
There are different types of link in the OMIM description section. For example, in above JSON response, we have the description of MIM entry 100640:
The ALDH1A1 gene encodes a liver cytosolic isoform of acetaldehyde dehydrogenase ({EC 1.2.1.3}), an enzyme involved in the major pathway of alcohol metabolism after alcohol dehydrogenase (ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}), variation in which has been implicated in different responses to alcohol ingestion.\n\nALDH1 is associated with a low Km for NAD, a high Km for acetaldehyde, and is strongly inactivated by disulfiram. ALDH2 is associated with a high Km for NAD, and low Km for acetaldehyde, and is insensitive to inhibition by disulfiram ({4:Hsu et al., 1985}).
As the descriptions will be shown as plain text, we remove the curry brackets surrounding links and try to make the text still readable with minimal modifications. Briefly:
- Links referring to another MIM entry (e.g. {100650}) will be removed. Any word(s) specifically associated with the removed link will also be removed. For example, "(ADH, see {103700})" will become "(ADH)" after the process.
- Links referring to a literature reference will be processed to remove the internal index and curry brackets. For example, "{4:Hsu et al., 1985}" becomes "Hsu et al., 1985".
- All the other links will simple have their curry brackets removed. For example, "{EC 1.2.1.3}" becomes "EC 1.2.1.3".
- If the content within a pair of parentheses becomes empty after being processed, the parentheses need to be removed as well and its surrounding white spaces should be properly processed. For example, "ALDH2 ({100650})," will become "ALDH2,".
Here is a list of examples about how the description section supposed to be processed:
Original text | Processed text |
---|---|
({516030}, {516040}, and {516050}) | |
(e.g., D1, {168461}; D2, {123833}; D3, {123834}) | (e.g., D1; D2; D3) |
(desmocollins; see DSC2, {125645}) | (desmocollins; see DSC2) |
(e.g., see {102700}, {300755}) | |
(ADH, see {103700}). See also liver mitochondrial ALDH2 ({100650}) | (ADH). See also liver mitochondrial ALDH2 |
(see, e.g., CACNA1A; {601011}) | (see, e.g., CACNA1A) |
(e.g., GSTA1; {138359}), mu (e.g., {138350}) | (e.g., GSTA1), mu |
(NFKB; see {164011}) | (NFKB) |
(see ISGF3G, {147574}) | (see ISGF3G) |
(DCK; {EC 2.7.1.74}; {125450}) | (DCK; EC 2.7.1.74) |
JSON output
"omim":[
{
"mimNumber":600678,
"geneName":"MutS, E. coli, homolog of, 6",
"description":"The transcription factor p53 responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. In addition, p53 appears to induce apoptosis through nontranscriptional cytoplasmic processes. In unstressed cells, p53 is kept inactive essentially through the actions of the ubiquitin ligase MDM2, which inhibits p53 transcriptional activity and ubiquitinates p53 to promote its degradation. Numerous posttranslational modifications modulate p53 activity, most notably phosphorylation and acetylation. Several less abundant p53 isoforms also modulate p53 activity. Activity of p53 is ubiquitously lost in human cancer either by mutation of the p53 gene itself or by loss of cell signaling upstream or downstream of p53 (Toledo and Wahl, 2006; Bourdon, 2007; Vousden and Lane, 2007)",
"phenotypes":[
{
"mimNumber":614350,
"phenotype":"Colorectal cancer, hereditary nonpolyposis, type 5",
"description":"Hereditary nonpolyposis colorectal cancer type 5 is a cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal dominant"
]
},
{
"mimNumber":608089,
"phenotype":"Endometrial cancer, familial",
"mapping":"molecular basis of the disorder is known"
},
{
"mimNumber":276300,
"phenotype":"Mismatch repair cancer syndrome",
"description":"Constitutional mismatch repair deficiency is a rare childhood cancer predisposition syndrome ...",
"mapping":"molecular basis of the disorder is known",
"inheritances":[
"Autosomal recessive"
],
"comments" : [
"contribute to susceptibility to multifactorial disorders or to susceptibility to infection",
"unconfirmed or possibly spurious mapping"
]
}
]
}
]
Field | Type | Notes |
---|---|---|
mimNumber | int | OMIM ID for gene |
geneName | string | gene name |
description | string | |
phenotypes | object array | see Phenotype entry below |
Phenotype
Field | Type | Notes |
---|---|---|
mimNumber | int | |
phenotype | string | |
description | string | |
mapping | string | see possible values below |
inheritance | string array | see possible values below |
comments | string array | see possible values below |
Mapping
- disorder was positioned by mapping of the wild type gene
- disease phenotype itself was mapped
- molecular basis of the disorder is known
- disorder is a chromosome deletion or duplication syndrome
Inheritance
- autosomal recessive
- autosomal dominant
Comments
- contributes to the susceptibility to multifactorial disorders
- variations that lead to apparently abnormal laboratory test values
- unconfirmed mapping
Building the supplementary files
The first step in builing the OMIM .nga
files is to use the SAUtils
command's subcommand downloadOMIM
to download the necessary data. In order to download the data the user must possess an API key obtained from OMIM. This key has to be set as the environment variable OmimApiKey.
export OmimApiKey=<users-omim-api-key>
SAUtils.dll downloadOMIM
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
USAGE: dotnet SAUtils.dll downloadomim [options]
Download the OMIM gene annotation data
OPTIONS:
--cache, -c <directory>
input cache directory
--ref, -r <filename> input reference filename
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version
dotnet SAUtils.dll downloadOMIM --ref References/7/Homo_sapiens.GRCh38.Nirvana.dat --uga Cache/ --out ExternalDataSources/OMIM/2021-06-14
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
Gene Symbol Update Statistics
============================================
{
"NumGeneSymbolsUpToDate": 16788,
"NumGeneSymbolsUpdated": 95,
"NumGenesWhereBothIdsAreNull": 0,
"NumGeneSymbolsNotInCache": 106,
"NumResolvedGeneSymbolConflicts": 15,
"NumUnresolvedGeneSymbolConflicts": 0
}
Time: 00:04:08.9
Once the download has succeeded, the nga
files can be produced using the SAUtils
command's subcommand omim
.
dotnet SAUtils.dll omim
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
USAGE: dotnet SAUtils.dll omim [options]
Creates a gene annotation database from OMIM data
OPTIONS:
--m2g, -m <VALUE> MimToGeneSymbol tsv file
--json, -j <VALUE> OMIM entry json file
--out, -o <VALUE> output directory
--help, -h displays the help menu
--version, -v displays the version
dotnet SAUtils.dll omim --m2g ExternalDataSources/OMIM/2021-06-14/MimToGeneSymbol.tsv --json ExternalDataSources/OMIM/2021-06-14/MimEntries.json.gz --out SupplementaryDatabase/63/
---------------------------------------------------------------------------
SAUtils (c) 2023 Illumina, Inc.
Stromberg, Roy, Platzer, Siddiqui, Ouyang, et al 3.21.0-0-gd2a0e953
---------------------------------------------------------------------------
Time: 00:00:04.5