Skip to main content
Version: 3.18

MNV Recomposition

Overview

Most annotation tools handle variants independently. The problem with this approach is that nearby variants could affect the same codon leading to a very different annotation. For example, consider the following example (Danecek, 2017):

When handled independently, the two variants (C→T & G→A) would be annotated as missense annotations. However, if we consider them together, the resulting MNV would yield a stop gain.

By default, Nirvana identifies these types of cases where two or more SNVs would affect the same codon. In addition, it's able to perform this operation on VCFs containing large numbers of samples (we've tested this on 2,500+ samples using the 1000 Genomes Project VCF files).

Publication

Petr Danecek, Shane A McCarthy, BCFtools/csq: haplotype-aware variant consequences, Bioinformatics, Volume 33, Issue 13, 1 July 2017, Pages 2037–2039

Supported variant types

At the moment, Nirvana only supports recomposing multiple SNVs into an MNV. The Danecek paper makes a compelling case for supporting frameshifting variants paired with frame-restoring variants. We've also received requests for supporting the recomposition of an SNV with insertions and deletions. While this is something we've looked into, it represents functionality that many of our clinical customers are not yet comfortable with.

Criteria

Nirvana will recompose a set of SNVs if two or more SNVs are located in the same codon for any codon in any of the overlapping transcripts.

The following criteria must also be met for at least one sample:

  1. Genotypes are provided for the VCF variants and all variants are in phase or homozygous variant.
  2. All the available phase set IDs are the same (homozygous variants are available to all phase sets)
  3. The genotype ploidy for all the variants are the same.
  4. No unsupported variant type (i.e. insertion or deletion) overlaps the recomposed variants
  5. The first and last base in at least one of the recomposed alleles must be non-reference.

Examples

During variant recomposition, if two SNVs affect the same codon, it becomes the seed codon. If there are SNVs in the adjacent codons, they will be aggregated into the seed codon.

  • Three SNVs in two adjacent codons. The recomposed alternate allele is ATAG:

  • Three SNVs in two adjacent codons (larger distance). The recomposed alternate allele is ATATCC:

  • Nirvana can use multiple reading frames to aggregate the seed codon. In this example, the seed codon is highlighted in green. If we look at reading frame 1, we see that the T→A variant occurs in the ACT codon. The adjacent codon to the left also has a variant C→T. As a result, there can be up to four bases between SNVs when aggregating the flanking codons. The recomposed alternate allele is TTCACATAGCACTCAC:

  • Nothing will be recomposed if there's no seed codon:

Multiple Samples

Recomposing variants while handling multiple samples can be complex. The recomposition criteria described above often leads to sample-specific recomposed variants. Here we show the recomposition of three variants with sample-specific criteria marked in bold:

POSREFALTSample 1Sample 2Sample 3
Decomposed Variant 1100AC0|10|11|1
Decomposed Variant 2101CG0/11|10|0
Decomposed Variant 3102TA1|1.0|1
Recomposed Variant 1100ACAG, CG.1|2.
Recomposed Variant 2100ACTCCT, CCA..1|2

In the example above, the heterozygous genotype in sample 1 at position 101 would prevent the MNVs from being recomposed. Similarly, the unknown genotype for sample 2 at position 102 would produce a smaller MNV than the one expressed for sample 3.

Phase Sets

Homozygous variants, same phase set

Recomposed phase set becomes . since homozygous variants belong to all phase sets.

POSREFALTGenotypePhase Set
Decomposed Variant 1100AT1|1567
Decomposed Variant 2101CG1|1567
Recomposed Variant100ACTG1|1.

Mixing phased and unphased variants

POSREFALTGenotypePhase Set
Decomposed Variant 1100AT0|1567
Decomposed Variant 2101CG1/1.
Recomposed Variant100ACAG,TG1|2567

Variants in different phase sets

POSREFALTGenotypePhase Set
Decomposed Variant 1100AT0|1567
Decomposed Variant 2101CG1|1890
Recomposed Variant100ACAG,TG1|2.

Unphased homozygous variants

POSREFALTGenotypePhase Set
Decomposed Variant 1100AT1/1.
Decomposed Variant 2101CG1/1.
Recomposed Variant100ACTG1/1.

Homozygous variants are not commutative

POSREFALTGenotypePhase Set
Decomposed Variant 1100AT0|1567
Decomposed Variant 2101CG1|1567
Decomposed Variant 3102GT0|1890

In this example, the homozygous variant at position 101 cannot bridge the gap between other two variants since there could be a switching error between phase sets 567 & 890. As a result, we have to create two overlapping MNVs:

POSREFALTGenotypePhase Set
Recomposed Variant 1100ACAG, TG1|2567
Recomposed Variant 2101CGGG, GT1|2890

Conflicting Genotypes

JSON Output

Given the following VCF entries:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
chr1 12861477 . T C . PASS . GT:PS 0/0:. 0/0:. 0|1:12861477
chr1 12861478 . G A . PASS . GT:PS 0/0:. 0/0:. 0|1:12861477

Each original variant would be annotated as usual. The difference is that both will now have a isDecomposedVariant flag set to true in addition to an entry in the linkedVids field that points to the new MNV:

{
"chromosome":"chr1",
"position":12861477,
"refAllele":"T",
"altAlleles":[
"C"
],
"filters":[
"PASS"
],
"samples":[
{
"genotype":"0/0",
},
{
"genotype":"0/0",
},
{
"genotype":"0|1",
}
],
"variants":[
{
"vid":"1-12861477-T-C",
"chromosome":"chr1",
"begin":12861477,
"end":12861477,
"refAllele":"T",
"altAllele":"C",
"variantType":"SNV",
"isDecomposedVariant":true,
"linkedVids":[
"1-12861477-TG-CA"
],
"hgvsg":"NC_000001.11:g.12861477T>C",
"transcripts":[ ... ]
}
]
},
{
"chromosome":"chr1",
"position":12861478,
"refAllele":"G",
"altAlleles":[
"A"
],
"filters":[
"PASS"
],
"samples":[
{
"genotype":"0/0",
},
{
"genotype":"0/0",
},
{
"genotype":"0|1",
}
],
"variants":[
{
"vid":"1-12861478-G-A",
"chromosome":"chr1",
"begin":12861478,
"end":12861478,
"refAllele":"G",
"altAllele":"A",
"variantType":"SNV",
"isDecomposedVariant":true,
"linkedVids":[
"1-12861477-TG-CA"
],
"hgvsg":"NC_000001.11:g.12861478G>A",
"transcripts":[ ... ]
}
]
}

The recomposed variant gets a separate entry where the isRecomposedVariant flag is set to true and the linkedVids field links to the constituent SNVs:

    {
"chromosome": "chr1",
"position": 12861477,
"refAllele": "TG",
"altAlleles": [
"CA"
],
"filters": [
"PASS"
],
"cytogeneticBand": "1p36.21",
"samples": [
{
"genotype": "0|0"
},
{
"genotype": "0|0"
},
{
"genotype": "0|1"
}
],
"variants": [
{
"vid": "1-12861477-TG-CA",
"chromosome": "chr1",
"begin": 12861477,
"end": 12861478,
"refAllele": "TG",
"altAllele": "CA",
"variantType": "MNV",
"isRecomposedVariant": true,
"linkedVids": [
"1-12861477-T-C",
"1-12861478-G-A"
],
"hgvsg": "NC_000001.11:g.12861477_12861478inv",
"transcripts":[ ... ]
]
}
]
},
Recomposed QUAL, FILTER, and GQ

Although the example above does not demonstrate it, Nirvana tries to set the quality score, filter, and genotype quality (GQ) for the recomposed variant. The QUAL score is calculated to be the minimum QUAL score for all the constituent SNVs. The same method is used for the genotype quality (GQ) scores. For the filters field, PASS will be used if all constituent variants passed their filters, otherwise we set it to FilteredVariantsRecomposed.