Skip to main content
Version: 3.18

MITOMAP

Overview

MITOMAP provides a compendium of polymorphisms and mutations in human mitochondrial DNA.

Publication

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. mtDNA variation and analysis using MITOMAP and MITOMASTER. Current Protocols in Bioinformatics 1(123):1.23.1-26 (2013). http://www.mitomap.org

Scraping HTML Pages

Example

MITOMAP is unique in that it doesn't offer the data in a downloadable format. As a result, the annotation content in Nirvana is scraped from the following MITOMAP pages:

  1. mtDNA Control Region Sequence Variants
  2. mtDNA Coding Region & RNA Sequence Variants
  3. Reported Mitochondrial DNA Base Substitution Diseases: rRNA/tRNA mutations
  4. Reported Mitochondrial DNA Base Substitution Diseases: Coding and Control Region Point Mutations
  5. Reported mtDNA Deletions
  6. mtDNA Simple Insertions

Parsing

Here's what the HTML code looks like:

["582","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","Mitochondrial myopathy","T582C","tRNA Phe","-","+","Reported","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=582&alt=C&quart=2'><u>72.90%</u></a> <i class='fa fa-arrow-up' style='color:orange' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=90165,91590&title=RNA+Mutation+T582C' target='_blank'>2</a>"],
["583","<a href='/MITOMAP/GenomeLoci#MTTF'>MT-TF</a>","MELAS / MM & EXIT","G583A","tRNA Phe","-","+","Cfrm","<span style='display:inline-block;white-space:nowrap;'><a href='/cgi-bin/mitotip?pos=583&alt=A&quart=0'><u>93.10%</u></a> <i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i><i class='fa fa-arrow-up' style='color:red' aria-hidden='true'></i></span>","0","<a href='/cgi-bin/print_ref_list?refs=2066,90532,91590&title=RNA+Mutation+G583A' target='_blank'>3</a>"],

We're mainly interested in the following columns (numbers indicate the HTML page above):

  • Position1,2,3,4
  • Disease3,4
  • Nucleotide Change1,2
  • Allele3,4
  • Homoplasmy3,4
  • Heteroplasmy3,4
  • Status3,4
  • MitoTIP3,4
  • GB Seqs FL(CR)1,2,3,4
  • Deletion Junction5
  • Insert (nt)6
  • Insert Point (nt)6
  • References/Curated References1,2,3,4
MitoTIP

The MitoTIP information is used to populate the clinicalSignificance and scorePercentile JSON keys. The "frequency alert" entries are skipped since it's not directly relevant to clinical significance.

Left alignment

Many of the variants in MITOMAP have not been normalized. As part of our import procedure, we left align all insertions and deletions.

Variant Enumeration

Sometimes MITOMAP provides data that indicates that multiple values have been observed. Some examples of this are C-C(2-8) and A-AC or ACC. Alternate alleles containing IUPAC ambiguity codes are similarly enumerated.

Inversions

MITOMAP inversions are currently treated as MNVs.

Allele Parsing

The following MITOMAP allele parsing conventions are supported:

  • C123T
  • 16021_16022del
  • 8042del2
  • C9537insC
  • 3902_3908invACCTTGC
  • A-AC or ACC
  • C-C(2-8)
  • 8042delAT

PostgreSQL Dump File

Example

COPY mitomap.reference (id, authors, title, publication, editors, volume, number, pages, date, city, publisher, keywords, abstract, nlmid) FROM stdin;
1 Albring, M., Griffith, J. and Attardi, G. Association of a protein structure of probable membrane derivation with HeLa cell mitochondrial DNA near its origin of replication Proceedings of the National Academy of Sciences of the United States of America . 74 4 1348-1352 1977 . . Deoxyribonucleoproteins; DNA Replication; DNA, Mitochondrial; Hela Cells; Membrane Proteins; Microscopy, Electron; Molecular Weight; Neoplasm Proteins; Protein Binding Almost all (about 95 percent) of the mitochondrial DNA molecules released by Triton X-100 lysis of HeLa cell mitochondria in the presence of 0.15 M salt are associated with a single protein-containing structure varying in appearance between a 10-20 nm knob and a 100-500 nm membrane-like patch. Analysis by high resolution electron microscopy and by polyacrylamide gel electrophoresis after cleavage of mitochondrial DNA with the endonucleases EcoRI, HindIII, and Hpa II has shown that the protein structure is attached to the DNA in the region of the D-loop, and probably near the origin of mitochondrial DNA replication. The data strongly suggest that HeLa cell mitochondrial DNA is attached in vivo to the inner mitochondrial membrane at or near the origin of replication, and that a membrane fragment of variable size remains associated with the DNA during the isolation. After sodium dodecyl sulfate extraction of mitochondrial DNA, a small 5-10 nm protein is found at the same site on a fraction of the mitochondrial DNA molecules. 266177
2 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden, R., Young, I.G. Sequence and organization of the human mitochondrial genome Nature . 290 5806 457-465 1981 . . Base Sequence; Codon; DNA Replication; mtDNA; Evolution; Genes, Structural; Human; Nucleic Acid Precursors; Peptide Chain Initiation; Peptide Chain Termination; RNA, Ribosomal; RNA, Transfer; Transcription, Genetic The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post- transcriptionally by polyadenylation of the mRNAs. 7219534

Parsing

From the PostgreSQL dump file, we're interested in parsing the mapping between reference IDs and the PubMed IDs:

  • id
  • nlmid
Why not use the PostgreSQL file for everything?

Ideally we would use this file for parsing all of our data, but the schema contains 80+ tables and we haven't invested the time yet to see how the tables are linked together to produce the 6 main HTML pages that we're interested in.

Known Issues

Duplicated records

Multiple records describing the same nucleotide change are merged into the same record. If any conflicting information is found (homoplasmy, heteroplasmy, status, clinical significance, score percentile, end coordinate, variant type), an exception is thrown.

  • For diseases and PubMed IDs, we take the union of the values in the duplicated records.
  • For full length GenBank sequences, we take the largest number from each of the duplicated records since it provides the strongest evidence for this variant.
Skipped records

Records that represent an alternate notation of the original variant are skipped. Similarly some variants with confusing alleles (T961delT+ / -C(n)ins) are also skipped.

Download URLs

JSON Output

Small Variants

"mitomap":[ 
{
"refAllele":"G",
"altAllele":"A",
"diseases":[
"Bipolar disorder",
"Melanoma"
],
"hasHomoplasmy":false,
"hasHeteroplasmy":true,
"status":"Reported",
"clinicalSignificance":"confirmed pathogenic",
"scorePercentile":83.30,
"numGenBankFullLengthSeqs":2,
"pubMedIds":["2316527","6299878","6301949"],
"isAlleleSpecific":true
}
]
FieldTypeNotes
refAllelestring
altAllelestring
diseasesstring arrayassociated diseases
hasHomoplasmyboolean
hasHeteroplasmyboolean
statusstringrecord status
clinicalSignificancestringpredicted pathogenicity
scorePercentilefloatMitoTIP score
numGenBankFullLengthSeqsinteger# of GenBank full-length sequences
pubMedIdsstring array
isAlleleSpecificbooleantrue when the current variant alternate allele matches the MITOMAP alternate allele

Structural Variants

"mitomap":[ 
{
"chromosome":"MT",
"begin":3166,
"end":14152,
"variantType":"deletion",
"reciprocalOverlap":0.18068,
"annotationOverlap":0.42405
}
]
FieldTypeNotes
chromosomestring
begininteger
endinteger
variantTypestring array
reciprocalOverlapfloatRange: 0 - 1. Specified up to 5 decimal places
annotationOverlapfloatRange: 0 - 1. Specified up to 5 decimal places