Skip to main content
Version: 3.26 (unreleased)

Variant Classification

Introduction

Illumina Connected Annotation follows the VCF Specification when parsing the VCF files. VCF files classify the variants based on alleles and certain INFO fields.

While the VCF variant type does not depend on the size of the variant, (both a ATG->A and A-><DEL> are deletion) for the purpose of annotation, it is useful to classify variants into small and structural (large) variants. This is due to the realization that reporting annotations for a deletion like 1-1000-ATG-A from ClinVar for a 20kbp <DEL> at that same location is not helpful and vice versa. Following guidelines are used by Illumina Connected Annotations.

info

Structural variants are treated as intervals when providing supplementary annotations such as ClinVar, gnomAD etc.

Classifying Small and Structural Variants

Variant ClassificationCriteriaExample
Small VariantSingle nucleotide changesG -> A SNP or T -> TAA insertion
Small VariantShort insertions/deletionsT -> TAA insertion
Structural VariantExplicit alleles with SVTYPE,ACCCCT.... with SVLEN=200 or SVTYPE=DEL
Structural VariantSymbolic notation (<DEL>, <DUP>, etc.),<DEL>
Structural VariantIMPRECISE in INFO fieldIMPRECISE in INFO field
Structural VariantBreakend notation ([ ])N[chr10:1234567[ translocation
Structural VariantExpansion of short tandem repeats (STRs)<STR> with SVTYPE=STR
Structural VariantCopy number variation (CNV) or loss of heterozygosity (LOH)SVTYPE=CNV or SVTYPE=LOH

Examples

Small Variants

SNP

Input Data:
chr1    1234567    .    G    A    .    PASS    .
Classification:
  • This is a single nucleotide polymorphism (SNP)
  • It does not contain breakend notation or symbolic alleles
  • It does not contain SVTYPE or SVLEN in the INFO field

Short Indel

Input Data:
chr2    8901234    .    T    TAA    .    PASS    .
Classification:
  • This is an insertion of 2 base pairs
  • The length of change is small, so it falls under small variant
  • It does not contain SVTYPE or SVLEN in the INFO field

Multiallelic SNP

Input Data:
chr7    2345678    .    G    A,C    .    PASS    .
Classification:
  • This represents a multiallelic SNP, where two different alternative alleles exist (A and C)
  • Despite multiple alleles, it does not involve large rearrangements
  • It does not contain SVTYPE or SVLEN in the INFO field

Structural Variants

Deletion

Input Data:
chr3    3456789    .    N    <DEL>    .    PASS    SVTYPE=DEL;END=3457890
Classification:
  • The alternate allele (<DEL>) is symbolic
  • The SVTYPE=DEL confirms it is a structural variant

Deletion with explicit alleles

Input Data:
chr1    934064    .    AGGGGAGGCGGCTGCGTTACAGGTGGGCAGGGGAGGCGGCTCCGTTACAGGTGGGCAGGGGAGGCGGCTGCGTTACA    A    .    .    END=934904;SVTYPE=DEL
Classification:
  • The alternate and reference allels are precise but SVTYPE=DEL indicates it is a structural variant

Run Of Homozygosity

Input Data:
1   4000000    .    N    <ROH>    .    ROHLC    SVTYPE=ROH;END=4001000    GT    .    .    1
Classification:
  • The alternate allele (<ROH>) is symbolic
  • The SVTYPE=ROH confirms it is a structural variant

Repeat Expansion

Input Data:
chrX    9876543    .    CAG    <STR>    .    PASS    SVTYPE=STR;REPEAT_UNIT=CAG
Classification:
  • The alternate allele <STR> is symbolic, indicating a repeat expansion
  • SVTYPE=STR confirms it is a structural variant related to repeat instability

Translocation

Input Data:
chr7    110541589    .    N    <DEL>    27    cnvLength    SVTYPE=CNV;END=110548681;REFLEN=7092
Classification:
  • The alternate allele <DEL> is symbolic
  • SVTYPE=CNV confirms it is a structural variant (copy number loss)

Translocation

Input Data:
chr4    5678901    .    N    N[chr10:1234567[    .    PASS    SVTYPE=BND
Classification:
  • The alternate allele contains [ and ], signifying a breakend
  • SVTYPE=BND confirms this is a structural variant (translocation)