AlphaMissense
Overview
AlphaMissense is a deep learning model that predicts the pathogenicity of missense variants across the human proteome. It produces a pathogenicity score between 0 and 1, where higher values indicate more pathogenic predictions.
This release provides pre-computed predictions for all possible human amino acid substitutions across major transcripts and isoforms.
For more details, refer to:
Publication
Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, Rosalia G. Schneider, Andrew W. Senior, John Jumper, Demis Hassabis, Pushmeet Kohli, Žiga Avsec. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (2023). https://doi.org/10.1126/science.adg7492
Parsing
AlphaMissense provides two tab-separated files:
- a canonical TSV file (required)
AlphaMissense_hg38.tsv.gzfor GRCh38AlphaMissense_hg19.tsv.gzfor GRCh37
- an isoforms TSV file (optional)
AlphaMissense_isoforms_hg38.tsv.gz only for GRCh38
Only a subset of columns are consumed during parsing. Column positions are 0-based below.
TSV File
AlphaMissense_hg38.tsv.gz Snippet
#CHROM POS REF ALT genome uniprot_id transcript_id protein_variant am_pathogenicity am_class
chr1 69094 G T hg38 Q8NH21 ENST00000335137.4 V2L 0.2937 likely_benign
chr1 69094 G C hg38 Q8NH21 ENST00000335137.4 V2L 0.2937 likely_benign
chr1 69094 G A hg38 Q8NH21 ENST00000335137.4 V2M 0.3296 likely_benign
chr1 69103 T C hg38 Q8NH21 ENST00000335137.4 F5L 0.9110 likely_pathogenic
chr1 69103 T G hg38 Q8NH21 ENST00000335137.4 F5V 0.4055 ambiguous
From the canonical file, the following columns are parsed (0-based indices):
- 0:
#CHROM(reference name) - 1:
pos(1-based position) - 2:
ref(reference allele) - 3:
alt(alternate allele) - 6:
transcriptId(Ensembl transcript) - 7:
proteinVariant(amino-acid substitution, e.g., V2L) - 8:
pathogenicity(0-1) - 9:
classification(e.g., likely_benign / likely_pathogenic / ambiguous)
Columns present but not consumed:
genomeuniprot_id(UniProt accession; see release notes: UniProt release notes)
AlphaMissense_isoforms_hg38.tsv.gz Snippet
#CHROM POS REF ALT genome transcript_id protein_variant am_pathogenicity am_class
chr1 65568 A C hg38 ENST00000641515.2 K2Q 0.0938 likely_benign
chr1 65568 A G hg38 ENST00000641515.2 K2E 0.0766 likely_benign
chr1 65569 A G hg38 ENST00000641515.2 K2R 0.0756 likely_benign
chr1 65569 A T hg38 ENST00000641515.2 K2M 0.1732 likely_benign
chr1 65569 A C hg38 ENST00000641515.2 K2T 0.1186 likely_benign
chr1 65570 G T hg38 ENST00000641515.2 K2N 0.1432 likely_benign
From the isoforms file, the following columns are parsed (0-based indices):
- 0:
#CHROM - 1:
pos - 2:
ref - 3:
alt - 5:
transcriptId - 6:
proteinVariant - 7:
pathogenicity - 8:
classification
During ingestion, transcripts present in the canonical file take precedence. Isoform records with the same transcriptId as a canonical record are skipped.
Classification labels
AlphaMissense provides am_class labels such as likely_benign, likely_pathogenic, and ambiguous.
SA Generation
dotnet SAUtils.dll \
AlphaMissense \
--r "${References}/Homo_sapiens.GRCh38.Nirvana.dat" \
--t "${ExternalDataSources}/AlphaMissense/AlphaMissense_hg38.tsv.gz" \
--i "${ExternalDataSources}/AlphaMissense/AlphaMissense_isoforms_hg38.tsv.gz" \
--o "${SaUtilsOutput}"
Notes:
--iis optional. If omitted, only canonical records are ingested.- Output files are written with an automatically derived version name based on the
.versionsidecar file.
Known Issues
Known Issues
Some transcript IDs defined in the AlphaMissense files may be obsolete, retired, or updated. They are not modified by Illumina Connected Annotations and are passed as-is from the data source.
License and Disclaimer
Disclaimer
AlphaMissense predictions have varying confidence; they are not medical advice and are not approved for clinical use. This is not an officially supported Google product.
License
We use and redistribute AlphaMissense predictions only, which are licensed under Creative Commons Attribution 4.0 (CC BY 4.0). See CC BY 4.0 legal code.
Attribution (CC BY 4.0): credit DeepMind/AlphaMissense and the authors, link to the license and source, indicate changes, and do not imply endorsement.
AlphaMissense predictions © 2023 DeepMind Technologies Limited, used under CC BY 4.0. Adapted for Illumina Connected Annotations.
Download URL
https://console.cloud.google.com/storage/browser/dm_alphamissense
Contact
For questions about the dataset, contact alphamissense@google.com.
JSON Output
"alphaMissense": [
{
"transcriptId": "ENST00000335137.4",
"proteinVariant": "V2L",
"pathogenicity": 0.2937,
"classification": "likely_benign",
"isIsoform": false
}
]
| Field | Type | Notes |
|---|---|---|
| transcriptId | string | Transcript ID (Ensembl) |
| proteinVariant | string | Protein change (e.g., V2L) |
| pathogenicity | float | range: 0 - 1.0 |
| classification | string | e.g., likely_benign, likely_pathogenic, ambiguous |
| isIsoform | bool | true if the record originated from the isoforms TSV |