Skip to main content
Version: 3.17

gnomAD

Overview

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.

Publication

Koch, L., 2020. Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), pp.448-448.

Small Variants

VCF extraction

We currently extract the following info fields from gnomAD genome and exome VCF files:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
##INFO=<ID=AN,Number=A,Type=Integer,Description="Total number of alleles in samples">
##INFO=<ID=nhomalt,Number=A,Type=Integer,Description="Count of homozygous individuals in samples">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered">
##INFO=<ID=lcr,Number=0,Type=Flag,Description="Variant falls within a low complexity region">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples of African-American ancestry">
##INFO=<ID=AN_afr,Number=A,Type=Integer,Description="Total number of alleles in samples of African-American ancestry">
##INFO=<ID=AF_afr,Number=A,Type=Float,Description="Alternate allele frequency in samples of African-American ancestry">
##INFO=<ID=nhomalt_afr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of African-American ancestry">
##INFO=<ID=AC_amr,Number=A,Type=Integer,Description="Alternate allele count for samples of Latino ancestry">
##INFO=<ID=AN_amr,Number=A,Type=Integer,Description="Total number of alleles in samples of Latino ancestry">
##INFO=<ID=nhomalt_amr,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Latino ancestry">
##INFO=<ID=AC_eas,Number=A,Type=Integer,Description="Alternate allele count for samples of East Asian ancestry">
##INFO=<ID=AN_eas,Number=A,Type=Integer,Description="Total number of alleles in samples of East Asian ancestry">
##INFO=<ID=nhomalt_eas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of East Asian ancestry">
##INFO=<ID=AC_female,Number=A,Type=Integer,Description="Alternate allele count for female samples">
##INFO=<ID=AN_female,Number=A,Type=Integer,Description="Total number of alleles in female samples">
##INFO=<ID=nhomalt_female,Number=A,Type=Integer,Description="Count of homozygous individuals in female samples">
##INFO=<ID=AC_nfe,Number=A,Type=Integer,Description="Alternate allele count for samples of non-Finnish European ancestry">
##INFO=<ID=AN_nfe,Number=A,Type=Integer,Description="Total number of alleles in samples of non-Finnish European ancestry">
##INFO=<ID=nhomalt_nfe,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of non-Finnish European ancestry">
##INFO=<ID=AC_fin,Number=A,Type=Integer,Description="Alternate allele count for samples of Finnish ancestry">
##INFO=<ID=AN_fin,Number=A,Type=Integer,Description="Total number of alleles in samples of Finnish ancestry">
##INFO=<ID=nhomalt_fin,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Finnish ancestry">
##INFO=<ID=AC_asj,Number=A,Type=Integer,Description="Alternate allele count for samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AN_asj,Number=A,Type=Integer,Description="Total number of alleles in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=nhomalt_asj,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of Ashkenazi Jewish ancestry">
##INFO=<ID=AC_oth,Number=A,Type=Integer,Description="Alternate allele count for samples of uncertain ancestry">
##INFO=<ID=AN_oth,Number=A,Type=Integer,Description="Total number of alleles in samples of uncertain ancestry">
##INFO=<ID=nhomalt_oth,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of uncertain ancestry">
##INFO=<ID=AC_male,Number=A,Type=Integer,Description="Alternate allele count for male samples">
##INFO=<ID=AN_male,Number=A,Type=Integer,Description="Total number of alleles in male samples">
##INFO=<ID=nhomalt_male,Number=A,Type=Integer,Description="Count of homozygous individuals in male samples">
##INFO=<ID=controls_AC,Number=A,Type=Integer,Description="Alternate allele count for samples in the controls subset">
##INFO=<ID=controls_AN,Number=A,Type=Integer,Description="Total number of alleles in samples in the controls subset">

We also extract the following extra fields from gnomAD exome VCF file:

##INFO=<ID=AC_sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry">
##INFO=<ID=AN_sas,Number=A,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry">
##INFO=<ID=nhomalt_sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry">

Computation

Using these, we compute the following:

  • Coverage
  • Allele count, Homozygous count, allele number and allele frequencies for:
    • Global population
    • African/African Americans
    • Admixed Americans
    • Ashkenazi Jews
    • East Asians
    • Finnish
    • Non-Finnish Europeans
    • South Asian
    • Others (population not assigned)
    • Male
    • Female
    • Controls
Note
  • Coverage = DP / AN. Frequencies are computed using AC/AN for each population.
  • Please note that currently there is no genome sequencing data of south asian (SAS) population available in gnomAD.
  • Allele Count, Homozygous count, allele number and allele frequencies for control groups are also provided for the global population.

Merging genomes and exomes

When merging the genomes and exomes, the allele counts and allele numbers will be summed across both of the data sets.

info
  • For GRCh37, Nirvana currently uses gnomAD version 2.1 which contains both genomes and exomes data. Genomes and exomes data are merged in the output.
  • For GRCh38, Nirvana currently uses gnomAD version 3.0 which doesn't contain the exomes data. Therefore, only genomes data are presented in the output.

Filters

The following strategy will be used when there's a conflict in filter status:

Genomes PASSGenomes Filtered
Exomes PASSPASSOnly use exome data
Exomes FilteredOnly use genome dataFiltered

VCF download instructions

https://gnomad.broadinstitute.org/downloads

JSON output

"gnomad":{ 
"coverage":20,
"allAf":0.190317,
"maleAf":0.193,
"femaleAf": 0.1935,
"afrAf":0.222876,
"amrAf":0.121394,
"easAf":0.239802,
"finAf":0.136833,
"nfeAf":0.181282,
"asjAf":0.258278,
"othAf":0.186094,
"allAn":30796,
"maleAn":15096,
"femaleAn":15700
"afrAn":8664,
"amrAn":832,
"easAn":1618,
"finAn":3486,
"nfeAn":14916,
"asjAn":302,
"othAn":978,
"allAc":5861,
"maleAc":2930,
"femaleAc": 2931,
"afrAc":1931,
"amrAc":101,
"easAc":388,
"finAc":477,
"nfeAc":2704,
"asjAc":78,
"othAc":182,
"allHc":561,
"afrHc":208,
"amrHc":6,
"easHc":42,
"finHc":31,
"nfeHc":242,
"asjHc":13,
"othHc":19,
"maleHc":280,
"femaleHc":281,
"controlsAllAf":0.190317,
"controlsAllAn":30796,
"controlsAllAc":5861,
"lowComplexityRegion":true,
"failedFilter":true
}
FieldTypeNotes
coverageintaverage coverage (non-negative integer values)
allAffloatallele frequency for all populations. Range: 0 - 1.0
maleAffloatallele frequency for male population. Range: 0 - 1.0
femaleAffloatallele frequency for female population. Range: 0 - 1.0
controlsAllAffloatallele frequency for the controls subset. Range: 0 - 1.0
allAcintallele count for all populations. Integer.
maleAcintallele count for male population. Integer.
femaleAcintallele count for female population. Integer.
controlsAllAcintallele count for the controls subset. Integer.
allAnintallele number for all populations. Non-zero integer.
maleAnintallele number for male population. Non-zero integer.
femaleAnintallele number for female population. Non-zero integer.
controlsAllAnintallele number for the controls subset. Non-zero integer.
allHcintcount of homozygous individuals for all populations. Non-negative integer.
maleHcintcount of homozygous individuals for male population. Non-negative integer.
femaleHcintcount of homozygous individuals for female population. Non-negative integer.
afrAffloatallele frequency for the African / African American population. Range: 0 - 1.0
afrAcintallele count for the African / African American population. Integer.
afrAnintallele number for the African / African American population. Non-zero integer.
afrHcintcount of homozygous individuals for African / African American population. Non-negative integer.
amrAffloatallele frequency for the Latino population. Range: 0 - 1.0
amrAcintallele count for the Latino population. Integer.
amrAnintallele number for the Latino population. Non-zero integer.
amrHcintcount of homozygous individuals for Latino population. Non-negative integer.
easAffloatallele frequency for the East Asian population. Range: 0 - 1.0
easAcintallele count for the East Asian population. Integer.
easAnintallele number for the East Asian population. Non-zero integer.
easHcintcount of homozygous individuals for East Asian population. Non-negative integer.
finAffloatallele frequency for the Finnish population. Range: 0 - 1.0
finAcintallele count for the Finnish population. Integer.
finAnintallele number for the Finnish population. Non-zero integer.
finHcintcount of homozygous individuals for Finnish population. Non-negative integer
nfeAffloatallele frequency for the Non-Finnish European population. Range: 0 - 1.0
nfeAcintallele count for the Non-Finnish European population. Integer.
nfeAnintallele number for the Non-Finnish European population. Non-zero integer.
nfeHcintcount of homozygous individuals for Non-Finnish European population. Non-negative integer
othAffloatallele frequency for the Other population. Range: 0 - 1.0
othAcintallele count for the Other population. Integer.
othAnintallele number for the Other population. Non-zero integer.
othHcintcount of homozygous individuals for Other population. Non-negative integer
asjAffloatallele frequency for the Ashkenazi Jewish population. Range: 0 - 1.0
asjAcintallele count for the Ashkenazi Jewish population Integer.
asjAnintallele number for the Ashkenazi Jewish population. Non-zero integer.
asjHcintcount of homozygous individuals for the Ashkenazi Jewish population. Non-negative integer
sasAffloatallele frequency for the South Asian population. Range: 0 - 1.0
sasAcintallele count for the South Asian population Integer.
sasAnintallele number for the South Asian population. Non-zero integer.
sasHcintcount of homozygous individuals for the South Asian population. Non-negative integer.
failedFilterboolTrue if this variant failed any filters (Note: we do not list the failed filters)
lowComplexityRegionboolTrue if this variant is located in a low complexity region.

LoF Gene Metrics

Tab delimited file example

gene transcript obs_mis exp_mis oe_mis mu_mis possible_mis obs_mis_pphen exp_mis_pphen oe_mis_pphen possible_mis_pphen obs_syn exp_syn oe_syn mu_syn possible_syn obs_lof mu_lof possible_lof exp_lof pLI pNull pRec oe_lof oe_syn_lower oe_syn_upper oe_mis_lower oe_mis_upper oe_lof_lower oe_lof_upper constraint_flag syn_zmis_z lof_z oe_lof_upper_rank oe_lof_upper_bin oe_lof_upper_bin_6 n_sites classic_caf max_af no_lofs obs_het_lof obs_hom_lof defined p exp_hom_lof classic_caf_afr classic_caf_amr classic_caf_asj classic_caf_eas classic_caf_fin classic_caf_nfe classic_caf_oth classic_caf_sas p_afr p_amr p_asj p_eas p_fin p_nfep_oth p_sas transcript_type gene_id transcript_level cds_length num_coding_exons gene_type gene_length exac_pLI exac_obs_lof exac_exp_lof exac_oe_lof brain_expression chromosome start_positionend_position
MED13 ENST00000397786 871 1.1178e+03 7.7921e-01 5.5598e-05 14195 314 5.2975e+02 5.9273e-01 6708 422 3.8753e+02 1.0890e+00 1.9097e-05 4248 0 4.9203e-06 1257 9.8429e+01 1.0000e+00 8.9436e-40 1.8383e-16 0.0000e+00 1.0050e+00 1.1800e+00 7.3600e-01 8.2400e-01 0.0000e+00 3.0000e-02 -1.3765e+00 2.6232e+00 9.1935e+00 0 0 0 2 1.2058e-05 8.0492e-06 124782 3 0 124785 1.2021e-05 1.8031e-05 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2812e-05 8.8571e-06 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 9.2760e-05 8.8276e-06 0.0000e+00 0.0000e+00 protein_coding ENSG00000108510 2 6522 30 protein_coding 122678 1.0000e+00 0 6.4393e+01 0.0000e+00 NA 17 60019966 60142643

JSON key to TSV column mapping

JSON keyTSV columnDescription
pLipLIprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullpNullprobability of being completely tolerant of loss of function variation (observed = expected)
pRecpRecprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZsyn_zcorrected synonymous Z score
misZmis_zcorrected missense Z score
loeufoe_lof_upperloss of function observed/expected upper bound fraction (LOEUF)

Gene symbol update

The input file provides Ensembl gene ids for each entry. We observed that they were unique while gene symbols may be repeated (multiple lines may have the same gene symbol). Since Ensembl gene Ids are more stable, and Nirvana transcript cache data contains Ensembl gene ids, we use these ids to extract the gene symbols from the transcript cache. For example, if ENSG0001 has gene symbol GENE1 in the input but Nirvana cache say ENSG0001 maps to GENE2, we use GENE2 as the gene symbol for that entry.

Conflict resolution

gnomAD uses Ensembl GeneID as unique identifiers in the source file but Nirvana uses HGNC gene symbols. Multiple Ensembl GeneIDs can map to the same HGNC symbol and therefore may result is conflict.

MDGA2   ENST00000426342 306 4.0043e+02  7.6419e-01  2.1096e-05  4724    78  1.6525e+02  4.7202e-01  1923    125 1.3737e+02  9.0993e-01  7.1973e-06  1413    4   2.0926e-06  453 3.8316e+01  9.9922e-01  8.6490e-12  7.8128e-04  1.0440e-01  7.8600e-01  1.0560e+00  6.9500e-01  8.4000e-01  5.0000e-02  2.3900e-01      8.2988e-01  1.6769e+00  5.1372e+00  1529    0   0   7   2.8103e-05  4.0317e-06  124784  7   0   124791  2.8047e-05  9.8167e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5391e-05  1.6672e-04  3.2680e-05  0.0000e+00  2.8962e-05  0.0000e+00  0.0000e+00  0.0000e+00  3.5308e-05  1.6492e-04  3.2678e-05  protein_coding  ENSG00000139915 2   2181    13  protein_coding  835332  9.9322e-01  3   2.7833e+01  1.0779e-01  NA  14  47308826    48144157
MDGA2 ENST00000439988 438 5.5311e+02 7.9189e-01 2.9490e-05 6608 105 2.0496e+02 5.1228e-01 2386 180 1.9491e+02 9.2351e-01 9.8371e-06 2048 11 2.8074e-06 627 5.1882e+01 6.6457e-01 5.5841e-10 3.3543e-01 2.1202e-01 8.1700e-01 1.0450e+00 7.3100e-01 8.5700e-01 1.3200e-01 3.5100e-01 8.3940e-01 1.7393e+00 5.2595e+00 2989 1 0 9 3.6173e-05 4.0463e-06 124782 9 0 124791 3.6061e-05 1.6228e-04 6.4986e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4275e-05 1.6672e-04 3.2680e-05 6.4577e-05 2.8962e-05 0.0000e+00 0.0000e+00 0.0000e+00 4.4135e-05 1.6492e-04 3.2678e-05 protein_coding ENSG00000272781 3 3075 17 protein_coding 832866 NA NA NA NA NA 14 47311134 48143999

In such cases, Nirvana chooses the entry with the smallest "LOEUF" value. The reason for choosing this value can be highlighted by the following table:

LOEUF decileHaplo-insufficientAutosomal DominantAutosomal RecessiveOlfactory Genes
0-10%104140360
10-20%47128721
20-30%17861120
30-40%8801734
40-50%7652068
50-60%4542076
60-70%04615418
70-80%24912049
80-90%0345896
90-100%02640174
Note

List of genes with conflicting entries

MDGA2:
{"pLI":9.99e-1,"pRec":7.81e-4,"pNull":8.65e-12,"synZ":8.30e-1,"misZ":1.68e0,"loeuf":2.39e-1}
{"pLI":6.65e-1,"pRec":3.35e-1,"pNull":5.58e-10,"synZ":8.39e-1,"misZ":1.74e0,"loeuf":3.51e-1}
CRYBG3:
{"pLI":9.27e-5,"pRec":1.00e0,"pNull":1.88e-7,"synZ":1.82e0,"misZ":4.68e-1,"loeuf":4.93e-1}
{"pLI":2.69e-4,"pRec":1.00e0,"pNull":1.20e-4,"synZ":2.63e0,"misZ":9.80e-1,"loeuf":5.98e-1}
CHTF8:
{"pLI":8.29e-1,"pRec":1.67e-1,"pNull":3.21e-3,"synZ":1.94e0,"misZ":9.48e-1,"loeuf":5.13e-1}
{"pLI":3.73e-1,"pRec":5.84e-1,"pNull":4.29e-2,"synZ":3.33e-1,"misZ":2.91e-1,"loeuf":9.92e-1}
SEPT1:
{"pLI":6.77e-8,"pRec":8.90e-1,"pNull":1.10e-1,"synZ":1.58e-1,"misZ":1.57e0,"loeuf":9.68e-1}
{"pLI":1.96e-8,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":1.68e-1,"misZ":1.41e0,"loeuf":1.08e0}
ARL14EPL:
{"pLI":3.48e-2,"pRec":8.38e-1,"pNull":1.28e-1,"synZ":3.56e-1,"misZ":-1.87e-1,"loeuf":1.23e0}
{"pLI":3.23e-2,"pRec":8.29e-1,"pNull":1.38e-1,"synZ":1.15e0,"misZ":-4.05e-1,"loeuf":1.26e0}
UGT2A1:
{"pLI":2.90e-13,"pRec":1.40e-1,"pNull":8.60e-1,"synZ":-1.29e0,"misZ":-1.77e0,"loeuf":1.18e0}
{"pLI":3.88e-17,"pRec":2.87e-3,"pNull":9.97e-1,"synZ":-8.00e-1,"misZ":-1.40e0,"loeuf":1.53e0}
LTB4R2:
{"pLI":4.39e-4,"pRec":6.71e-1,"pNull":3.29e-1,"synZ":-5.24e-1,"misZ":-2.96e-1,"loeuf":1.40e0}
{"pLI":1.38e-5,"pRec":4.12e-1,"pNull":5.88e-1,"synZ":-4.58e-1,"misZ":-2.02e-1,"loeuf":1.54e0}
CDRT1:
{"pLI":4.98e-14,"pRec":5.31e-1,"pNull":4.69e-1,"synZ":8.18e-1,"misZ":6.57e-1,"loeuf":1.00e0}
{"pLI":3.50e-3,"pRec":6.37e-1,"pNull":3.59e-1,"synZ":4.89e-1,"misZ":6.90e-1,"loeuf":1.63e0}
MUC3A:
{"pLI":1.48e-10,"pRec":5.76e-1,"pNull":4.24e-1,"synZ":5.81e-2,"misZ":-6.01e-1,"loeuf":1.06e0}
{"pLI":4.03e-1,"pRec":4.79e-1,"pNull":1.17e-1,"synZ":4.05e-2,"misZ":-1.60e-1,"loeuf":1.70e0}
COG8:
{"pLI":2.97e-9,"pRec":5.04e-1,"pNull":4.96e-1,"synZ":-1.35e0,"misZ":-9.37e-2,"loeuf":1.13e0}
{"pLI":2.31e-3,"pRec":5.47e-1,"pNull":4.50e-1,"synZ":-4.94e-1,"misZ":-1.48e-1,"loeuf":1.76e0}
AC006486.1:
{"pLI":9.37e-1,"pRec":6.27e-2,"pNull":2.47e-4,"synZ":1.44e0,"misZ":2.12e0,"loeuf":3.41e-1}
{"pLI":1.14e-1,"pRec":6.16e-1,"pNull":2.70e-1,"synZ":-7.57e-2,"misZ":8.33e-2,"loeuf":1.84e0}
AL645922.1:
{"pLI":4.67e-16,"pRec":1.00e0,"pNull":4.15e-5,"synZ":7.99e-1,"misZ":1.61e0,"loeuf":6.92e-1}
{"pLI":1.60e-3,"pRec":2.78e-1,"pNull":7.21e-1,"synZ":-7.30e-2,"misZ":3.21e-1,"loeuf":1.96e0}
NBPF20:
{"pLI":1.42e-7,"pRec":3.40e-2,"pNull":9.66e-1,"synZ":-1.86e0,"misZ":-2.88e0,"loeuf":1.97e0}
{"pLI":1.92e-22,"pRec":7.96e-6,"pNull":1.00e0,"synZ":-9.73e0,"misZ":-7.67e0,"loeuf":1.97e0}
PRAMEF11:
{"pLI":6.16e-4,"pRec":7.42e-1,"pNull":2.58e-1,"synZ":-4.02e0,"misZ":-3.69e0,"loeuf":1.31e0}
{"synZ":-3.33e0,"misZ":-2.59e0}
FAM231D:
{"synZ":-1.98e0,"misZ":-1.44e0}
{"synZ":1.07e0,"misZ":3.13e-1}

Conflict resolution

  • Pick the entry with the lowest LOEUF score
  • If the same, pick the lowest pLI
  • Otherwise pick the entry with the max absolute value of synZ + misZ

Download URL

https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz

JSON output

"gnomAD":{ 
"pLi":1.00e0,
"pNull":8.94e-40,
"pRec":1.84e-16,
"synZ":-8.44e-2,
"misZ":5.96e-1,
"loeuf":1.13e0
}
FieldTypeNotes
pLifloatprobability of being intolerant of a single loss-of-function variant (like haploinsufficient genes, observed ~ 0.1*expected)
pNullfloatprobability of being completely tolerant of loss of function variation (observed = expected)
pRecfloatprobability of being intolerant of two loss of function variants (like recessive genes, observed ~ 0.5*expected)
synZfloatcorrected synonymous Z score
misZfloatcorrected missense Z score
loeuffloatloss of function observed/expected upper bound fraction (LOEUF)