Version: 3.22

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

Coverage
Allele count, Homozygous count, allele number and allele frequencies for:
Global population
African/African Americans
Admixed Americans
Ashkenazi Jews
East Asians
Finnish
Non-Finnish Europeans
South Asian
Others (population not assigned)
Male
Female
Controls

Note

Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info

For GRCh37, Illumina Connected Annotations currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
For GRCh38, Illumina Connected Annotations currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

	Genomes PASS	Genomes Filtered
Exomes PASS	PASS	Only use exome data
Exomes Filtered	Only use genome data	Filtered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
   "coverage":20,
   "allAf":0.190317,
   "maleAf":0.193,
   "femaleAf": 0.1935,
   "afrAf":0.222876,
   "amrAf":0.121394,
   "easAf":0.239802,
   "finAf":0.136833,
   "nfeAf":0.181282,
   "asjAf":0.258278,
   "othAf":0.186094,
   "allAn":30796,
   "maleAn":15096,
   "femaleAn":15700
   "afrAn":8664,
   "amrAn":832,
   "easAn":1618,
   "finAn":3486,
   "nfeAn":14916,
   "asjAn":302,
   "othAn":978,
   "allAc":5861,
   "maleAc":2930,
   "femaleAc": 2931,
   "afrAc":1931,
   "amrAc":101,
   "easAc":388,
   "finAc":477,
   "nfeAc":2704,
   "asjAc":78,
   "othAc":182,
   "allHc":561,
   "afrHc":208,
   "amrHc":6,
   "easHc":42,
   "finHc":31,
   "nfeHc":242,
   "asjHc":13,
   "othHc":19,
   "maleHc":280,
   "femaleHc":281,
   "controlsAllAf":0.190317,
   "controlsAllAn":30796,
   "controlsAllAc":5861,
   "lowComplexityRegion":true,
   "failedFilter":true
}

Field	Type	Notes
coverage	int	average coverage (non-negative integer values)
allAf	float	allele frequency for all populations. Range: 0 - 1.0
maleAf	float	allele frequency for male population. Range: 0 - 1.0
femaleAf	float	allele frequency for female population. Range: 0 - 1.0
controlsAllAf	float	allele frequency for the controls subset. Range: 0 - 1.0
allAc	int	allele count for all populations. Integer.
maleAc	int	allele count for male population. Integer.
femaleAc	int	allele count for female population. Integer.
controlsAllAc	int	allele count for the controls subset. Integer.
allAn	int	allele number for all populations. Non-zero integer.
maleAn	int	allele number for male population. Non-zero integer.
femaleAn	int	allele number for female population. Non-zero integer.
controlsAllAn	int	allele number for the controls subset. Non-zero integer.
allHc	int	count of homozygous individuals for all populations. Non-negative integer.
maleHc	int	count of homozygous individuals for male population. Non-negative integer.
femaleHc	int	count of homozygous individuals for female population. Non-negative integer.
afrAf	float	allele frequency for the African / African American population. Range: 0 - 1.0
afrAc	int	allele count for the African / African American population. Integer.
afrAn	int	allele number for the African / African American population. Non-zero integer.
afrHc	int	count of homozygous individuals for African / African American population. Non-negative integer.
amrAf	float	allele frequency for the Latino population. Range: 0 - 1.0
amrAc	int	allele count for the Latino population. Integer.
amrAn	int	allele number for the Latino population. Non-zero integer.
amrHc	int	count of homozygous individuals for Latino population. Non-negative integer.
easAf	float	allele frequency for the East Asian population. Range: 0 - 1.0
easAc	int	allele count for the East Asian population. Integer.
easAn	int	allele number for the East Asian population. Non-zero integer.
easHc	int	count of homozygous individuals for East Asian population. Non-negative integer.
finAf	float	allele frequency for the Finnish population. Range: 0 - 1.0
finAc	int	allele count for the Finnish population. Integer.
finAn	int	allele number for the Finnish population. Non-zero integer.
finHc	int	count of homozygous individuals for Finnish population. Non-negative integer
nfeAf	float	allele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAc	int	allele count for the Non-Finnish European population. Integer.
nfeAn	int	allele number for the Non-Finnish European population. Non-zero integer.
nfeHc	int	count of homozygous individuals for Non-Finnish European population. Non-negative integer
othAf	float	allele frequency for the Other population. Range: 0 - 1.0
othAc	int	allele count for the Other population. Integer.
othAn	int	allele number for the Other population. Non-zero integer.
othHc	int	count of homozygous individuals for Other population. Non-negative integer
asjAf	float	allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAc	int	allele count for the Ashkenazi Jewish population Integer.
asjAn	int	allele number for the Ashkenazi Jewish population. Non-zero integer.
asjHc	int	count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAf	float	allele frequency for the South Asian population. Range: 0 - 1.0
sasAc	int	allele count for the South Asian population Integer.
sasAn	int	allele number for the South Asian population. Non-zero integer.
sasHc	int	count of homozygous individuals for the South Asian population. Non-negative integer.
failedFilter	bool	True if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegion	bool	True if this variant is located in a low complexity region.

Building the supplementary files

The gnomAD .nsa for Illumina Connected Annotations can be built using the SAUtils command's gnomad subcommand. We will describe building gnomAD version 3.1 here.

Source data files

Input VCF files (one per chromosome) and a .version file are required in a folder to build the .nsa file. For example, my directory contains:

chr10.vcf.bgz  chr22.vcf.bgz
chr11.vcf.bgz  chr2.vcf.bgz
chr12.vcf.bgz  chr3.vcf.bgz
chr13.vcf.bgz  chr4.vcf.bgz
chr14.vcf.bgz  chr5.vcf.bgz
chr15.vcf.bgz  chr6.vcf.bgz
chr16.vcf.bgz  chr7.vcf.bgz
chr17.vcf.bgz  chr8.vcf.bgz
chr18.vcf.bgz  chr9.vcf.bgz
chr19.vcf.bgz  chrM.vcf.bgz
chr1.vcf.bgz   chrX.vcf.bgz
chr20.vcf.bgz  chrY.vcf.bgz
chr21.vcf.bgz  gnomad.r3.1.version

The version file is a text file with the following content.

NAME=gnomAD
VERSION=3.1
DATE=2020-10-29
DESCRIPTION=Allele frequencies from Genome Aggregation Database (gnomAD)

The help menu for the utility is as follows:

SAUtils.dll gnomad
---------------------------------------------------------------------------
SAUtils                                             (c) 2021 Illumina, Inc.
Stromberg, Roy, Lajugie, Jiang, Li, and Kang                         3.17.0
---------------------------------------------------------------------------

USAGE: dotnet SAUtils.dll gnomad [options]
Reads provided supplementary data files and populates tsv files

OPTIONS:
      --ref, -r <VALUE>      compressed reference sequence file
      --genome, -g <VALUE>   input directory containing VCF (and .version)
                               files with genomic frequencies
      --exome, -e <VALUE>    input directory containing VCF (and .version)
                               files with exomic frequencies
      --temp, -t <VALUE>     output temp directory for intermediate (per chrom)
                                NSA files
      --out, -o <VALUE>      output directory for NSA file
      --help, -h             displays the help menu
      --version, -v          displays the version

Here is a sample execution:

dotnet SAUtils.dll Gnomad \\
--ref ~/References/7/Homo_sapiens.GRCh38.Nirvana.dat --genome genomes/ \\
--out ~/SupplementaryDatabase/63/GRCh38 --temp ~/ExternalDataSources/gnomAD/3.1/GRCh38/temp

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON key	TSV column	Description
pLi	pLI	probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNull	pNull	probability of being completely tolerant of loss of function variation (observed = expected)
pRec	pRec	probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZ	syn_z	corrected synonymous Z score
misZ	mis_z	corrected missense Z score
loeuf	oe_lof_upper	loss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Illumina Connected Annotations transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Illumina Connected Annotations cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Illumina Connected Annotations uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2   ENST00000439988 438 5.5311e+02  7.9189e-01  2.9490e-05  6608    105 2.0496e+02  5.1228e-01  2386    180 1.9491e+02  9.2351e-01  9.8371e-06  2048    11  2.8074e-06  627 5.1882e+01  6.6457e-01  5.5841e-10  3.3543e-01  2.1202e-01  8.1700e-01  1.0450e+00  7.3100e-01  8.5700e-01  1.3200e-01  3.5100e-01      8.3940e-01  1.7393e+00  5.2595e+00  2989    1   0   9   3.6173e-05  4.0463e-06  124782  9   0   124791  3.6061e-05  1.6228e-04  6.4986e-05  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  4.4275e-05  1.6672e-04  3.2680e-05  6.4577e-05  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  4.4135e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000272781 3   3075    17  protein_coding  832866  NA  NA  NA  NA  NA  14  47311134    48143999

In such cases, Illumina Connected Annotations chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decile	Haplo-insufficient	Autosomal Dominant	Autosomal Recessive	Olfactory Genes
0-10%	104	140	36	0
10-20%	47	128	72	1
20-30%	17	86	112	0
30-40%	8	80	173	4
40-50%	7	65	206	8
50-60%	4	54	207	6
60-70%	0	46	154	18
70-80%	2	49	120	49
80-90%	0	34	58	96
90-100%	0	26	40	174

Note

Table source: https://www.biorxiv.org/content/biorxiv/early/2019/01/28/531210.full-text.pdf
This table indicates that lower LOEUF scores have more deleterious effect on genes.
Only 15 out of 19685 genes have conflicting entries.

List of genes with conflicting entries

MDGA2:
 {"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
 {"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
 {"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
 {"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
 {"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
 {"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
 {"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
 {"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
 {"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
 {"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
 {"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
 {"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
 {"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
 {"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
 {"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
 {"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
 {"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
 {"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
 {"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
 {"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
 {"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
 {"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
 {"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
 {"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
 {"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
 {"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
 {"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
 {"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
 {"synZ":-1.98e0,"misZ":-1.44e0}
 {"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

Pick the entry with the lowest LOEUF score
If the same, pick the lowest pLI
Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
   "pLi":1.00e0,
   "pNull":8.94e-40,
   "pRec":1.84e-16,
   "synZ":-8.44e-2,
   "misZ":5.96e-1,
   "loeuf":1.13e0
}

Field	Type	Notes
pLi	float	probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNull	float	probability of being completely tolerant of loss of function variation (observed = expected)
pRec	float	probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZ	float	corrected synonymous Z score
misZ	float	corrected missense Z score
loeuf	float	loss of function observed/expected upper bound fraction (LOEUF)

Structural Variants

Publication

Collins, R.L., Brand, H., Karczewski, K.J. et al. 2020. A structural variation reference for medical and population genetics. Nature 581, pp.444–451. https://doi.org/10.1038/s41586-020-2287-8

Note The gnomAD structural variant annotations are in a preview stage at the moment. Currently, the annotations do not include translocation breakends. Future updates will include a better way of annotating the structural variants.

Source Files

Bed Example

The bed file was obtained from original source for GRCh37

#chrom  start   end name    svtype  ALGORITHMS  BOTHSIDES_SUPPORT   CHR2    CPX_INTERVALS   CPX_TYPE    END2    ENDEVIDENCE HIGH_SR_BACKGROUND  PCRPLUS_DEPLETED    PESR_GT_OVERDISPERSION  POS2    PROTEIN_CODING__COPY_GAIN   PROTEIN_CODING__DUP_LOF PROTEIN_CODING__DUP_PARTIAL PROTEIN_CODING__INTERGENIC  PROTEIN_CODING__INTRONIC    PROTEIN_CODING__INV_SPAN    PROTEIN_CODING__LOF PROTEIN_CODING__MSV_EXON_OVR    PROTEIN_CODING__NEAREST_TSS PROTEIN_CODING__PROMOTER    PROTEIN_CODING__UTR SOURCE  STRANDS SVLEN   SVTYPE  UNRESOLVED_TYPE UNSTABLE_AF_PCRPLUS VARIABLE_ACROSS_BATCHES AN  AC  AF  N_BI_GENOS  N_HOMREF    N_HET   N_HOMALT    FREQ_HOMREF FREQ_HET    FREQ_HOMALT MALE_AN MALE_AC MALE_AF MALE_N_BI_GENOS MALE_N_HOMREF   MALE_N_HET  MALE_N_HOMALT   MALE_FREQ_HOMREF    MALE_FREQ_HET   MALE_FREQ_HOMALT    MALE_N_HEMIREF  MALE_N_HEMIALT  MALE_FREQ_HEMIREF   MALE_FREQ_HEMIALT   PAR FEMALE_AN   FEMALE_AC   FEMALE_AF   FEMALE_N_BI_GENOS   FEMALE_N_HOMREF FEMALE_N_HET    FEMALE_N_HOMALT FEMALE_FREQ_HOMREF  FEMALE_FREQ_HET FEMALE_FREQ_HOMALT  POPMAX_AF   AFR_AN  AFR_AC  AFR_AF  AFR_N_BI_GENOS  AFR_N_HOMREF    AFR_N_HET   AFR_N_HOMALT    AFR_FREQ_HOMREF AFR_FREQ_HEAFR_FREQ_HOMALT  AFR_MALE_AN AFR_MALE_AC AFR_MALE_AF AFR_MALE_N_BI_GENOS AFR_MALE_N_HOMREF   AFR_MALE_N_HET  AFR_MALE_N_HOMALT   AFR_MALE_FREQ_HOMREF    AFR_MALE_FREQ_HET   AFR_MALE_FREQ_HOMALT    AFR_MALE_N_HEMIREF  AFR_MALE_N_HEMIALT  AFR_MALE_FREQ_HEMIREF   AFR_MALE_FREQ_HEMIALT   AFR_FEMALE_AN   AFR_FEMALE_AC   AFR_FEMALE_AF   AFR_FEMALE_N_BI_GENOS   AFR_FEMALE_N_HOMREF AFR_FEMALE_N_HET    AFR_FEMALE_N_HOMALT AFR_FEMALE_FREQ_HOMREF  AFR_FEMALE_FREQ_HET AFR_FEMALE_FREQ_HOMALT  AMR_AN  AMR_AC  AMR_AF  AMR_N_BI_GENOS  AMR_N_HOMREF    AMR_N_HET   AMR_N_HOMALT    AMR_FREQ_HOMREF AMR_FREQ_HET    AMR_FREQ_HOMALT AMR_MALE_AN AMR_MALE_AC AMR_MALE_AF AMR_MALE_N_BI_GENOS AMR_MALE_N_HOMREF   AMR_MALE_N_HET  AMR_MALE_N_HOMALT   AMR_MALE_FREQ_HOMREF    AMR_MALE_FREQ_HET   AMR_MALE_FREQ_HOMALT    AMR_MALE_N_HEMIREF  AMR_MALE_N_HEMIALT  AMR_MALE_FREQ_HEMIREF   AMR_MALE_FREQ_HEMIALT   AMR_FEMALE_AN   AMR_FEMALE_AC   AMR_FEMALE_AF   AMR_FEMALE_N_BI_GENOS   AMR_FEMALE_N_HOMREF AMR_FEMALE_N_HET    AMR_FEMALE_N_HOMALT AMR_FEMALE_FREQ_HOMREF  AMR_FEMALE_FREQ_HET AMR_FEMALE_FREQ_HOMALT  EAS_AN  EAS_AC  EAS_AF  EAS_N_BI_GENOS  EAS_N_HOMREF    EAS_N_HET   EAS_N_HOMALT    EAS_FREQ_HOMREF EAS_FREQ_HET    EAS_FREQ_HOMALT EAS_MALE_AN EAS_MALE_AC EAS_MALE_AF EAS_MALE_N_BI_GENOS EAS_MALE_N_HOMREF   EAS_MALE_N_HET  EAS_MALE_N_HOMALT   EAS_MALE_FREQ_HOMREF    EAS_MALE_FREQ_HET   EAS_MALE_FREQ_HOMALT    EAS_MALE_N_HEMIREF  EAS_MALE_N_HEMIALT  EAS_MALE_FREQ_HEMIREF   EAS_MALE_FREQ_HEMIALT   EAS_FEMALE_AN   EAS_FEMALE_AC   EAS_FEMALE_AF   EAS_FEMALE_N_BI_GENOS   EAS_FEMALE_N_HOMREF EAS_FEMALE_N_HET    EAS_FEMALE_N_HOMALT EAS_FEMALE_FREQ_HOMREF  EAS_FEMALE_FREQ_HET EAS_FEMALE_FREQ_HOMALT  EUR_AN  EUR_AC  EUR_AF  EUR_N_BI_GENOS  EUR_N_HOMREF    EUR_N_HET   EUR_N_HOMALT    EUR_FREQ_HOMREF EUR_FREQ_HET    EUR_FREQ_HOMALT EUR_MALE_AN EUR_MALE_AC EUR_MALE_AF EUR_MALE_N_BI_GENOS EUR_MALE_N_HOMREF   EUR_MALE_N_HET  EUR_MALE_N_HOMALT   EUR_MALE_FREQ_HOMREF    EUR_MALE_FREQ_HET   EUR_MALE_FREQ_HOMALT    EUR_MALE_N_HEMIREF  EUR_MALE_N_HEMIALT  EUR_MALE_FREQ_HEMIREF   EUR_MALE_FREQ_HEMIALT   EUR_FEMALE_AN   EUR_FEMALE_AC   EUR_FEMALE_AF   EUR_FEMALE_N_BI_GENOS   EUR_FEMALE_N_HOMREF EUR_FEMALE_N_HET    EUR_FEMALE_N_HOMALT EUR_FEMALE_FREQ_HOMREF  EUR_FEMALE_FREQ_HET EUR_FEMALE_FREQ_HOMALT  OTH_AN  OTH_AC  OTH_AF  OTH_N_BI_GENOS  OTH_N_HOMREF    OTH_N_HET   OTH_N_HOMALT    OTH_FREQ_HOMREF OTH_FREQ_HET    OTH_FREQ_HOMALT OTH_MALE_AN OTH_MALE_AC OTH_MALE_AF OTH_MALE_N_BI_GENOS OTH_MALE_N_HOMREF   OTH_MALE_N_HET  OTH_MALE_N_HOMALT   OTH_MALE_FREQ_HOMREF    OTH_MALE_FREQ_HET   OTH_MALE_FREQ_HOMALT    OTH_MALE_N_HEMIREF  OTH_MALE_N_HEMIALT  OTH_MALE_FREQ_HEMIREF   OTH_MALE_FREQ_HEMIALT   OTH_FEMALE_AN   OTH_FEMALE_AC   OTH_FEMALE_AF   OTH_FEMALE_N_BI_GENOS   OTH_FEMALE_N_HOMREF OTH_FEMALE_N_HET    OTH_FEMALE_N_HOMALT OTH_FEMALE_FREQ_HOMREF  OTH_FEMALE_FREQ_HET OTH_FEMALE_FREQ_HOMALT  FILTER
1   10641   10642   gnomAD-SV_v2.1_BND_1_1  BND manta   False   15  NA  NA  10643   10643   PE,SR   False   False   True    10642   NA  NA  NA  False   NA  NA  NA  NA  NA  NA  NA  NA  NA  -1  BND SINGLE_ENDER_-- False   False   21366   145 0.006785999983549118    10683   10543   135 5   0.9868950247764587  0.012636899948120117    0.00046803298755548894  10866   69  0.00634999992325902 5433    5366    65  2   0.987667977809906   0.011963900178670883    0.000368120992789045    NA  NA  NA  NA  False   10454   76  0.007269999943673615227 5154    70  3   0.9860339760780334  0.013392000459134579    0.0005739430198445916   0.015956999734044075    93972   0.007660999894142151    4699    4629    68  2   0.9851030111312866  0.014471200294792652    0.0004256220126990229   5154    33  0.006403000093996525    2577    2544    33  0   0.9871940016746521  0.012805599719285965    0.0NA   NA  NA  NA  4232    39  0.009216000325977802    2116    2079    35  2   0.9825140237808228  0.01654059998691082 0.0009451800142414868   1910    7   0.003664999967440963    955 949 5   1   0.9937170147895813  0.00523559981957078 0.001047119963914156    950 4   0.004211000166833401    475 472 2   1   0.9936839938163757  0.00421052984893322 0.0021052600350230932   NA  NA  NA  NA  952 3   0.0031510000117123127   476473  3   0   0.9936969876289368  0.006302520167082548    0.0 2296    31  0.013501999899744987    1148    11131   0   0.9729970097541809  0.02700350061058998 0.0 1312    13  0.009909000247716904    656 643 13  0.9801830053329468  0.01981710083782673 0.0 NA  NA  NA  NA  976 18  0.018442999571561813    488470  18  0   0.9631149768829346  0.03688519820570946 0.0 7574    32  0.004224999807775021    3787    37528   2   0.9920780062675476  0.007393720094114542    0.0005281229969114065   3374    17  0.005038999952375889    1681671 15  1   0.9905160069465637  0.008891520090401173    0.000592768017668277    NA  NA  NA  NA  41815   0.003587000072002411    2091    2077    13  1   0.9933050274848938  0.006217120215296745    0.00047823999193497188  3   0.015956999734044075    94  91  3   0   0.968084990978241   0.03191490098834038 0.0 76  0.026316000148653984    38  36  2   0   0.9473680257797241  0.05263160169124603 0.0 NA  NA  NA  NA  112 1   0.008929000236093998    56  55  1   0   0.982142984867096   0.017857100814580917    0.0UNRESOLVED

TSV Example

The tsv was obtained from lifted over dataset created by dbVar for GRCh38

#variant_call_accession variant_call_id variant_call_type   experiment_id   sample_id   sampleset_id    assembly    chrcontig   outer_start start   inner_start inner_stop  stop    outer_stop  insertion_length    variant_region_acc  variant_region_id   copy_number description validation  zygosity    origin  phenotype   hgvs_name   placement_method    placement_rank  placements_per_assembly remap_alignment remap_best_within_cluster   remap_coverage  remap_diff_chr  remap_failure_code  allele_count    allele_frequency    allele_number
nssv15777856    gnomAD-SV_v2.1_CNV_10_564_alt_1 copy number variation   1       1   GRCh38.p12  10          736806          738184          nsv4039284  10__782746___784124______GRCh37.p13_copy_number_variation   0   Remapped    BestAvailable   Single  First Pass  0   1           AC=21,AFR_AC=10,AMR_AC=9,EAS_AC=0,EUR_AC=2,OTH_AC=0AF=0.038889,AFR_AF=0.044643,AMR_AF=0.03913,EAS_AF=0,EUR_AF=0.023256,OTH_AF=0 AN=540,AFR_AN=224,AMR_AN=230,EAS_AN=0,EUR_AN=86,OTH_AN=0

Structural Variant Type Mapping

The source files represented the structural variants with keys using various naming conventions. In the Illumina Connected Annotations JSON output, these keys will be mapped according to the following.

Illumina Connected Annotations JSON SV Type Key	GRCh37 Source SV Type Key	GRCh38 Source SV Type Key
copy_number_variation		copy number variation
deletion	DEL, CN=0	deletion
duplication	DUP	duplication
insertion	INS	insertion
inversion	INV	inversion
mobile_element_insertion	INS:ME	mobile element insertion
mobile_element_insertion	INS:ME:ALU	alu insertion
mobile_element_insertion	INS:ME:LINE1	line1 insertion
mobile_element_insertion	INS:ME:SVA	sva insertion
structural alteration		sequence alteration
complex_structural_alteration	CPX

Download URLs

GRCh37

The GRCh37 file was downloaded from the original source. Following table gives some essential data metrics:

https://storage.googleapis.com/gcp-public-data--gnomad/papers/2019-sv/gnomad_v2.1_sv.sites.bed.gz

GRCh38

Note: The data was unavailable from gnomAD 2.1 original source, however the lifted over structural variant dataset was created by dbVar and was obtained from them https://www.ncbi.nlm.nih.gov/sites/dbvarapp/studies/nstd166/.

Download URL

https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/tsv/nstd166.GRCh38.variant_call.tsv.gz

JSON output

"gnomAD-preview": [
  {
    "chromosome": "1",
    "begin": 40001,
    "end": 47200,
    "variantId": "gnomAD-SV_v2.1_DUP_1_1",
    "variantType": "duplication",
    "failedFilter": true,
    "allAf": 0.068963,
    "afrAf": 0.135694,
    "amrAf": 0.022876,
    "easAf": 0.01101,
    "eurAf": 0.007846,
    "othAf": 0.017544,
    "femaleAf": 0.065288,
    "maleAf": 0.07255,
    "allAc": 943,
    "afrAc": 866,
    "amrAc": 21,
    "easAc": 17,
    "eurAc": 37,
    "othAc": 2,
    "femaleAc": 442,
    "maleAc": 499,
    "allAn": 13674,
    "afrAn": 6382,
    "amrAn": 918,
    "easAn": 1544,
    "eurAn": 4716,
    "othAn": 114,
    "femaleAn": 6770,
    "maleAn": 6878,
    "allHc": 91,
    "afrHc": 90,
    "amrHc": 1,
    "easHc": 0,
    "eurHc": 0,
    "othHc": 55,
    "femaleHc": 44,
    "maleHc": 47,
    "reciprocalOverlap": 0.01839,
    "annotationOverlap": 0.16667
  }
]

Field	Type	Notes
chromosome	string	chromosome number
begin	integer	position interval start
end	integer	position internal end
variantType	string	structural variant type
variantId	string	gnomAD ID
allAf	floating point	allele frequency for all populations. Range: 0 - 1.0
afrAf	floating point	allele frequency for the African super population. Range: 0 - 1.0
amrAf	floating point	allele frequency for the Ad Mixed American super population. Range: 0 - 1.0
easAf	floating point	allele frequency for the East Asian super population. Range: 0 - 1.0
eurAf	floating point	allele frequency for the European super population. Range: 0 - 1.0
othAf	floating point	allele frequency for all other populations. Range: 0 - 1.0
femaleAf	floating point	allele frequency for female population. Range: 0 - 1.0
maleAf	floating point	allele frequency for male population. Range: 0 - 1.0
allAc	integer	allele count for all populations.
afrAc	integer	allele count for the African super population.
amrAc	integer	allele count for the Ad Mixed American super population.
easAc	integer	allele count for the East Asian super population.
eurAc	integer	allele count for the European super population.
othAc	integer	allele count for all other populations.
maleAc	integer	allele count for male population.
femaleAc	integer	allele count for female population.
allAn	integer	allele number for all populations.
afrAn	integer	allele number for the African super population.
amrAn	integer	allele number for the Ad Mixed American super population.
easAn	integer	allele number for the East Asian super population.
eurAn	integer	allele number for the European super population.
othAn	integer	allele number for all other populations.
femaleAn	integer	allele number for female population.
maleAn	integer	allele number for male population.
allHc	integer	count of homozygous individuals for all populations.
afrHc	integer	count of homozygous individuals for the African / African American population.
amrHc	integer	count of homozygous individuals for the Latino population.
easHc	integer	count of homozygous individuals for the East Asian population.
eurAc	integer	count of homozygous individuals for the European super population.
othHc	integer	count of homozygous individuals for all other populations.
maleHc	integer	count of homozygous individuals for male population.
femaleHc	integer	count of homozygous individuals for female population.
failedFilter	boolean	True if this variant failed any filters (Note: we do not list the failed filters)
reciprocalOverlap	floating point	Reciprocal overlap. Range: 0 - 1.0
annotationOverlap	floating point	Reciprocal overlap. Range: 0 - 1.0

Note: Following fields are not available in GRCh38 because the source file does not contain this information:

Field
femaleAf
maleAf
maleAc
femaleAc
femaleAn
maleAn
allHc
afrHc
amrHc
easHc
eurAc
othHc
maleHc
femaleHc
failedFilter

Overview​

Publication

Small Variants​

VCF extraction​

Computation​

Note

Merging genomes and exomes​

info

Filters​

VCF download instructions​

JSON output​

Building the supplementary files​

Source data files​

LoF Gene Metrics​

Tab delimited file example​

JSON key to TSV column mapping​

Gene symbol update​

Conflict resolution​

Note

Download URL​

JSON output​

Structural Variants​

Publication

Source Files​

Bed Example​

TSV Example​

Structural Variant Type Mapping​

Download URLs​

GRCh37​

GRCh38​

Download URL​

JSON output​

Overview

Small Variants

VCF extraction

Computation

Merging genomes and exomes

Filters

VCF download instructions

JSON output

Building the supplementary files

Source data files

LoF Gene Metrics

Tab delimited file example

JSON key to TSV column mapping

Gene symbol update

Conflict resolution

Download URL

JSON output

Structural Variants

Source Files

Bed Example

TSV Example

Structural Variant Type Mapping

Download URLs

GRCh37

GRCh38

Download URL

JSON output