Version: 3.16

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

Coverage
Allele count, Homozygous count, allele number and allele frequencies for:
- Global population
- African/African Americans
- Admixed Americans
- Ashkenazi Jews
- East Asians
- Finnish
- Non-Finnish Europeans
- South Asian
- Others (population not assigned)
- Male
- Female
- Controls

Note

Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info

For GRCh37, Nirvana currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
For GRCh38, Nirvana currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

	Genomes PASS	Genomes Filtered
Exomes PASS	PASS	Only use exome data
Exomes Filtered	Only use genome data	Filtered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
   "coverage":20,
   "allAf":0.190317,
   "maleAf":0.193,
   "femaleAf": 0.1935,
   "afrAf":0.222876,
   "amrAf":0.121394,
   "easAf":0.239802,
   "finAf":0.136833,
   "nfeAf":0.181282,
   "asjAf":0.258278,
   "othAf":0.186094,
   "allAn":30796,
   "maleAn":15096,
   "femaleAn":15700
   "afrAn":8664,
   "amrAn":832,
   "easAn":1618,
   "finAn":3486,
   "nfeAn":14916,
   "asjAn":302,
   "othAn":978,
   "allAc":5861,
   "maleAc":2930,
   "femaleAc": 2931,
   "afrAc":1931,
   "amrAc":101,
   "easAc":388,
   "finAc":477,
   "nfeAc":2704,
   "asjAc":78,
   "othAc":182,
   "allHc":561,
   "afrHc":208,
   "amrHc":6,
   "easHc":42,
   "finHc":31,
   "nfeHc":242,
   "asjHc":13,
   "othHc":19,
   "maleHc":280,
   "femaleHc":281,
   "controlsAllAf":0.190317,
   "controlsAllAn":30796,
   "controlsAllAc":5861,
   "lowComplexityRegion":true,
   "failedFilter":true
}

Field	Type	Notes
coverage	int	average coverage (non-negative integer values)
allAf	float	allele frequency for all populations. Range: 0 - 1.0
maleAf	float	allele frequency for male population. Range: 0 - 1.0
femaleAf	float	allele frequency for female population. Range: 0 - 1.0
controlsAllAf	float	allele frequency for the controls subset. Range: 0 - 1.0
allAc	int	allele count for all populations. Integer.
maleAc	int	allele count for male population. Integer.
femaleAc	int	allele count for female population. Integer.
controlsAllAc	int	allele count for the controls subset. Integer.
allAn	int	allele number for all populations. Non-zero integer.
maleAn	int	allele number for male population. Non-zero integer.
femaleAn	int	allele number for female population. Non-zero integer.
controlsAllAn	int	allele number for the controls subset. Non-zero integer.
allHc	int	count of homozygous individuals for all populations. Non-negative integer.
maleHc	int	count of homozygous individuals for male population. Non-negative integer.
femaleHc	int	count of homozygous individuals for female population. Non-negative integer.
afrAf	float	allele frequency for the African / African American population. Range: 0 - 1.0
afrAc	int	allele count for the African / African American population. Integer.
afrAn	int	allele number for the African / African American population. Non-zero integer.
afrHc	int	count of homozygous individuals for African / African American population. Non-negative integer.
amrAf	float	allele frequency for the Latino population. Range: 0 - 1.0
amrAc	int	allele count for the Latino population. Integer.
amrAn	int	allele number for the Latino population. Non-zero integer.
amrHc	int	count of homozygous individuals for Latino population. Non-negative integer.
easAf	float	allele frequency for the East Asian population. Range: 0 - 1.0
easAc	int	allele count for the East Asian population. Integer.
easAn	int	allele number for the East Asian population. Non-zero integer.
easHc	int	count of homozygous individuals for East Asian population. Non-negative integer.
finAf	float	allele frequency for the Finnish population. Range: 0 - 1.0
finAc	int	allele count for the Finnish population. Integer.
finAn	int	allele number for the Finnish population. Non-zero integer.
finHc	int	count of homozygous individuals for Finnish population. Non-negative integer
nfeAf	float	allele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAc	int	allele count for the Non-Finnish European population. Integer.
nfeAn	int	allele number for the Non-Finnish European population. Non-zero integer.
nfeHc	int	count of homozygous individuals for Non-Finnish European population. Non-negative integer
othAf	float	allele frequency for the Other population. Range: 0 - 1.0
othAc	int	allele count for the Other population. Integer.
othAn	int	allele number for the Other population. Non-zero integer.
othHc	int	count of homozygous individuals for Other population. Non-negative integer
asjAf	float	allele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAc	int	allele count for the Ashkenazi Jewish population Integer.
asjAn	int	allele number for the Ashkenazi Jewish population. Non-zero integer.
asjHc	int	count of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAf	float	allele frequency for the South Asian population. Range: 0 - 1.0
sasAc	int	allele count for the South Asian population Integer.
sasAn	int	allele number for the South Asian population. Non-zero integer.
sasHc	int	count of homozygous individuals for the South Asian population. Non-negative integer.
failedFilter	bool	True if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegion	bool	True if this variant is located in a low complexity region.

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON key	TSV column	Description
pLi	pLI	probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNull	pNull	probability of being completely tolerant of loss of function variation (observed = expected)
pRec	pRec	probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZ	syn_z	corrected synonymous Z score
misZ	mis_z	corrected missense Z score
loeuf	oe_lof_upper	loss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Nirvana transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Nirvana cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Nirvana uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2   ENST00000439988 438 5.5311e+02  7.9189e-01  2.9490e-05  6608    105 2.0496e+02  5.1228e-01  2386    180 1.9491e+02  9.2351e-01  9.8371e-06  2048    11  2.8074e-06  627 5.1882e+01  6.6457e-01  5.5841e-10  3.3543e-01  2.1202e-01  8.1700e-01  1.0450e+00  7.3100e-01  8.5700e-01  1.3200e-01  3.5100e-01      8.3940e-01  1.7393e+00  5.2595e+00  2989    1   0   9   3.6173e-05  4.0463e-06  124782  9   0   124791  3.6061e-05  1.6228e-04  6.4986e-05  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  4.4275e-05  1.6672e-04  3.2680e-05  6.4577e-05  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  4.4135e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000272781 3   3075    17  protein_coding  832866  NA  NA  NA  NA  NA  14  47311134    48143999

In such cases, Nirvana chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decile	Haplo-insufficient	Autosomal Dominant	Autosomal Recessive	Olfactory Genes
0-10%	104	140	36	0
10-20%	47	128	72	1
20-30%	17	86	112	0
30-40%	8	80	173	4
40-50%	7	65	206	8
50-60%	4	54	207	6
60-70%	0	46	154	18
70-80%	2	49	120	49
80-90%	0	34	58	96
90-100%	0	26	40	174

Note

Table source: https://www.biorxiv.org/content/biorxiv/early/2019/01/28/531210.full-text.pdf
This table indicates that lower LOEUF scores have more deleterious effect on genes.
Only 15 out of 19685 genes have conflicting entries.

List of genes with conflicting entries

MDGA2:
 {"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
 {"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
 {"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
 {"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
 {"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
 {"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
 {"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
 {"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
 {"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
 {"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
 {"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
 {"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
 {"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
 {"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
 {"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
 {"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
 {"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
 {"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
 {"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
 {"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
 {"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
 {"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
 {"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
 {"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
 {"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
 {"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
 {"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
 {"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
 {"synZ":-1.98e0,"misZ":-1.44e0}
 {"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

Pick the entry with the lowest LOEUF score
If the same, pick the lowest pLI
Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
   "pLi":1.00e0,
   "pNull":8.94e-40,
   "pRec":1.84e-16,
   "synZ":-8.44e-2,
   "misZ":5.96e-1,
   "loeuf":1.13e0
}

Field	Type	Notes
pLi	float	probability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNull	float	probability of being completely tolerant of loss of function variation (observed = expected)
pRec	float	probability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZ	float	corrected synonymous Z score
misZ	float	corrected missense Z score
loeuf	float	loss of function observed/expected upper bound fraction (LOEUF)

Overview​

Publication

Small Variants​

VCF extraction​

Computation​

Note

Merging genomes and exomes​

info

Filters​

VCF download instructions​

JSON output​

LoF Gene Metrics​

Tab delimited file example​

JSON key to TSV column mapping​

Gene symbol update​

Conflict resolution​

Note

Download URL​

JSON output​

Overview

Small Variants

VCF extraction

Computation

Merging genomes and exomes

Filters

VCF download instructions

JSON output

LoF Gene Metrics

Tab delimited file example

JSON key to TSV column mapping

Gene symbol update

Conflict resolution

Download URL

JSON output