Skip to main content
Version: 3.22

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
  • Global population
  • African/African Americans
  • Admixed Americans
  • Ashkenazi Jews
  • East Asians
  • Finnish
  • Non-Finnish Europeans
  • South Asian
  • Others (population not assigned)
  • Male
  • Female
  • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
  • For GRCh38, Illumina Connected Annotations currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz chr2.vcf.bgz
chr12.vcf.bgz chr3.vcf.bgz
chr13.vcf.bgz chr4.vcf.bgz
chr14.vcf.bgz chr5.vcf.bgz
chr15.vcf.bgz chr6.vcf.bgz
chr16.vcf.bgz chr7.vcf.bgz
chr17.vcf.bgz chr8.vcf.bgz
chr18.vcf.bgz chr9.vcf.bgz
chr19.vcf.bgz chrM.vcf.bgz
chr1.vcf.bgz chrX.vcf.bgz
chr20.vcf.bgz chrY.vcf.bgz
chr21.vcf.bgz gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang 3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
--ref, -r <VALUE> compressed reference sequence file
--genome, -g <VALUE> input directory containing VCF (and .version)
files with genomic frequencies
--exome, -e <VALUE> input directory containing VCF (and .version)
files with exomic frequencies
--temp, -t <VALUE> output temp directory for intermediate (per chrom)
NSA files
--out, -o <VALUE> output directory for NSA file
--help, -h displays the help menu
--version, -v displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note The gnomAD structural variant annotations are in a preview stage at the moment. Currently, the annotations do not include translocation breakends. Future updates will include a better way of annotating the structural variants.

Source Files

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1 10641 10642 gnomAD-SV_v2.1_BND_1_1 BND manta False 15 NA NA 10643 10643 PE,SR False False True 10642 NA NA NA False NA NA NA NA NA NA NA NA NA -1 BND SINGLE_ENDER_-- False False 21366 145 0.006785999983549118 10683 10543 135 5 0.9868950247764587 0.012636899948120117 0.00046803298755548894 10866 69 0.00634999992325902 5433 5366 65 2 0.987667977809906 0.011963900178670883 0.000368120992789045 NA NA NA NA False 10454 76 0.007269999943673615227 5154 70 3 0.9860339760780334 0.013392000459134579 0.0005739430198445916 0.015956999734044075 93972 0.007660999894142151 4699 4629 68 2 0.9851030111312866 0.014471200294792652 0.0004256220126990229 5154 33 0.006403000093996525 2577 2544 33 0 0.9871940016746521 0.012805599719285965 0.0NA NA NA NA 4232 39 0.009216000325977802 2116 2079 35 2 0.9825140237808228 0.01654059998691082 0.0009451800142414868 1910 7 0.003664999967440963 955 949 5 1 0.9937170147895813 0.00523559981957078 0.001047119963914156 950 4 0.004211000166833401 475 472 2 1 0.9936839938163757 0.00421052984893322 0.0021052600350230932 NA NA NA NA 952 3 0.0031510000117123127 476473 3 0 0.9936969876289368 0.006302520167082548 0.0 2296 31 0.013501999899744987 1148 11131 0 0.9729970097541809 0.02700350061058998 0.0 1312 13 0.009909000247716904 656 643 13 0.9801830053329468 0.01981710083782673 0.0 NA NA NA NA 976 18 0.018442999571561813 488470 18 0 0.9631149768829346 0.03688519820570946 0.0 7574 32 0.004224999807775021 3787 37528 2 0.9920780062675476 0.007393720094114542 0.0005281229969114065 3374 17 0.005038999952375889 1681671 15 1 0.9905160069465637 0.008891520090401173 0.000592768017668277 NA NA NA NA 41815 0.003587000072002411 2091 2077 13 1 0.9933050274848938 0.006217120215296745 0.00047823999193497188 3 0.015956999734044075 94 91 3 0 0.968084990978241 0.03191490098834038 0.0 76 0.026316000148653984 38 36 2 0 0.9473680257797241 0.05263160169124603 0.0 NA NA NA NA 112 1 0.008929000236093998 56 55 1 0 0.982142984867096 0.017857100814580917 0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856 gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation 1 1 GRCh38.p12 10 736806 738184 nsv4039284 10__782746___784124______GRCh37.p13_copy_number_variation 0 Remapped BestAvailable Single First Pass 0 1 AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type KeyGRCh37 Source SV Type KeyGRCh38 Source SV Type Key
copy_number_variationcopy number variation
deletionDEL, CN=0deletion
duplicationDUPduplication
insertionINSinsertion
inversionINVinversion
mobile_element_insertionINS:MEmobile element insertion
mobile_element_insertionINS:ME:ALUalu insertion
mobile_element_insertionINS:ME:LINE1line1 insertion
mobile_element_insertionINS:ME:SVAsva insertion
structural alterationsequence alteration
complex_structural_alterationCPX

Download URLs

GRCh37

The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:

https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz

GRCh38

Note: The data was unavailable from gnomAD 2.1 original source, however the lifted over structural variant dataset was created by dbVar and was obtained from them https://www.ncbi.nlm.nih.gov/sites/dbvarapp/studies/nstd166/.

Download URL

https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/tsv/nstd166.GRCh38.variant_call.tsv.gz

JSON output

"gnomAD-preview": [
{
"chromosome": "1",
"begin": 40001,
"end": 47200,
"variantId": "gnomAD-SV_v2.1_DUP_1_1",
"variantType": "duplication",
"failedFilter": true,
"allAf": 0.068963,
"afrAf": 0.135694,
"amrAf": 0.022876,
"easAf": 0.01101,
"eurAf": 0.007846,
"othAf": 0.017544,
"femaleAf": 0.065288,
"maleAf": 0.07255,
"allAc": 943,
"afrAc": 866,
"amrAc": 21,
"easAc": 17,
"eurAc": 37,
"othAc": 2,
"femaleAc": 442,
"maleAc": 499,
"allAn": 13674,
"afrAn": 6382,
"amrAn": 918,
"easAn": 1544,
"eurAn": 4716,
"othAn": 114,
"femaleAn": 6770,
"maleAn": 6878,
"allHc": 91,
"afrHc": 90,
"amrHc": 1,
"easHc": 0,
"eurHc": 0,
"othHc": 55,
"femaleHc": 44,
"maleHc": 47,
"reciprocalOverlap": 0.01839,
"annotationOverlap": 0.16667
}
]

FieldTypeNotes
chromosomestringchromosome number
beginintegerposition interval start
endintegerposition internal end
variantTypestringstructural variant type
variantIdstringgnomAD ID
allAffloating pointallele frequency for all populations. Range: 0 - 1.0
afrAffloating pointallele frequency for the African super population. Range: 0 - 1.0
amrAffloating pointallele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAffloating pointallele frequency for the East Asian super population. Range: 0 - 1.0
eurAffloating pointallele frequency for the European super population. Range: 0 - 1.0
othAffloating pointallele frequency for all other populations. Range: 0 - 1.0
femaleAffloating pointallele frequency for female population. Range: 0 - 1.0
maleAffloating pointallele frequency for male population. Range: 0 - 1.0
allAcintegerallele count for all populations.
afrAcintegerallele count for the African super population.
amrAcintegerallele count for the Ad Mixed American super population.
easAcintegerallele count for the East Asian super population.
eurAcintegerallele count for the European super population.
othAcintegerallele count for all other populations.
maleAcintegerallele count for male population.
femaleAcintegerallele count for female population.
allAnintegerallele number for all populations.
afrAnintegerallele number for the African super population.
amrAnintegerallele number for the Ad Mixed American super population.
easAnintegerallele number for the East Asian super population.
eurAnintegerallele number for the European super population.
othAnintegerallele number for all other populations.
femaleAnintegerallele number for female population.
maleAnintegerallele number for male population.
allHcintegercount of homozygous individuals for all populations.
afrHcintegercount of homozygous individuals for the African / African American population.
amrHcintegercount of homozygous individuals for the Latino population.
easHcintegercount of homozygous individuals for the East Asian population.
eurAcintegercount of homozygous individuals for the European super population.
othHcintegercount of homozygous individuals for all other populations.
maleHcintegercount of homozygous individuals for male population.
femaleHcintegercount of homozygous individuals for female population.
failedFilterbooleanTrue if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlapfloating pointReciprocal overlap. Range: 0 - 1.0
annotationOverlapfloating pointReciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter