Skip to main content
Version: 3.25

Data Manager

Overview

DataManager is a utility software from Illumina Connected Annotation software suite for user to easily pick and choose prebuilt annotation databases compatible with Illumina Connected Annotation. Previously, we provided the Downloader software that the user can run to download the latest annotation database automatically. With the new DataManager utility, users can choose the exact version of an annotation database they want.

info

DataManager utility requires credential file. Please follow the guide to generate this file from Prerequisite.

caution

Downloader.dll utility will no longer be provided and instead DataManager.dll should be used.

Commands

Datamanager utility has several commands. Below are those commands:

List

The first command is list. This command is to list all available data annotation and version.

dotnet DataManager.dll list
---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll list [options]
List available data

OPTIONS:
--credentials-file, -l <VALUE>
Path to credential file. The default file
location is ~/.ilmnAnnotations/credentials.json
--ref, -r <VALUE> Assembly reference (GRCh38/GRCh37)
--dir, -d <VALUE> Local directory
--sources, -s <VALUE> data source to show
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters that you can pass.

  • Param --ref or -r is a required parameter. This is to filter data based on genome assembly reference. The value is either GRCh37 or GRCh38
  • Param --dir or -d is an optional parameter. This param is the path to your local data directory. If you provide this param, the list will mark data source version that you already have in the directory.
  • Param --credentials-file or -l is an optional parameter. This param is the path to your credential file.
  • Param --sources or -s is a filter based on data source. Without this param, the list will display all data source but only display the latest version available. If you use this param, you will see all available version for that particular data source.

Below are some example of the output:

  • List all latest data source version and providing local directory
dotnet DataManager.dll list -r GRCh38 -d /path/to/folder

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

Using default credentials file: /Users/bkalbuaji/.ilmnAnnotations/credentials.json
======================================================================================================================================================
Data Source | Annotation Type | Description | Version
======================================================================================================================================================
DANN | Score | DANN | 20200205 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
DECIPHER | StructuralVariant | DECIPHER | 201509 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
Ensembl | GeneModels | Ensembl | 112 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
FusionCatcher | GeneFusion | FusionCatcher | 1.33 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
GME | SmallVariant | GME | 20160618 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
GenomeAssembly | GenomeAssembly | GenomeAssembly | GRCh38.p14 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
Gerp | Score | Gerp | 20110522 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
HGNC | GeneModels | HGNC | 20240603 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
MultiZ100Way | Protein | MultiZ100Way | 20171006 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
PrimateAI | SmallVariant | PrimateAI | 0.2 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
PromoterAI | SmallVariant | PromoterAI | 1.0 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
REVEL | SmallVariant | REVEL | 20200205 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
RefSeq | GeneModels | RefSeq | GCF_000001405.40-RS_2023_10 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
TOPMed | SmallVariant | TOPMed | freeze_5 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
clingen | Gene | ClinGen disease validity curations | 20241007 (*)
| | ClinGen Dosage Sensitivity Map | 20241007 (*)
| StructuralVariant | ClinGen Dosage Sensitivity Map | 20241007 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
clingen (legacy) | StructuralVariant | ClinGen | 20160414 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
clinvar | SmallVariant | ClinVar | 20241001 (*)
| StructuralVariant | ClinVar | 20241001 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
clinvar-preview | SmallVariant | ClinVarPreview | 20240930 (*)
| StructuralVariant | ClinVarPreview | 20240930 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
cosmic | Gene | Cosmic Cancer Gene Census | 99 (*)
| GeneFusion | COSMIC gene fusions | 99 (*)
| SmallVariant | COSMIC | 99 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
dbSNP | SmallVariant | dbSNP | 156 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
globalAllele | SmallVariant | dbSNP | 151 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
gnomad | Gene | gnomAD_gene_scores | 4.1 (*)
| LowComplexityRegions | gnomAD_LCR | 2.1 (*)
| SmallVariant | gnomAD | 4.1 (*)
| StructuralVariant | gnomAD_SV | 4.1 (*)
| | gnomAD-cnv | 4.1 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
gnomad-exome | SmallVariant | gnomAD_exome | 4.1 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
mitomap | SmallVariant | MITOMAP | 20200819 (*)
| StructuralVariant | MITOMAP_SV | 20200819 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
omim | Gene | OMIM | 20241007 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
oneKg | RefMinor | 1000 Genomes Project | Phase 3 v3plus (*)
| SmallVariant | 1000 Genomes Project | Phase 3 v3plus (*)
| StructuralVariant | 1000 Genomes Project (SV) | Phase 3 v5a (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
phylopScore | ConservationScore | phyloP | hg38 (*)
| Score | PhyloPPrimate | 1.0 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
primateAI-3D | SmallVariant | PrimateAI-3D | 1.0 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
spliceAI | SmallVariant | SpliceAI | 1.3 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
(*) : available in local directory
  • Display all available GnomAD GRCh38 version
dotnet DataManager.dll list -r GRCh38 -d /path/to/folder -s gnomad

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

Using default credentials file: /Users/bkalbuaji/.ilmnAnnotations/credentials.json
======================================================================================================================================================
Data Source | Annotation Type | Description | Version
======================================================================================================================================================
gnomad | Gene | gnomAD_gene_scores | 4.1 (*)
| | gnomAD_gene_scores | 4.0
| | gnomAD_gene_scores | 2.1
| LowComplexityRegions | gnomAD_LCR | 2.1 (*)
| SmallVariant | gnomAD | 4.1 (*)
| | gnomAD | 4.1_trimmed
| | gnomAD | 4.0
| | gnomAD | 3.1.2
| | gnomAD | 3.1
| | gnomAD | 3.0
| | gnomAD | 2.1
| | gnomAD | 2.0.1
| StructuralVariant | gnomAD_SV | 4.1 (*)
| | gnomAD-cnv | 4.1 (*)
| | gnomAD_SV | 4.0
| | gnomAD_SV | 2.1
------------------------------------------------------------------------------------------------------------------------------------------------------
(*) : available in local directory

Create config

The second command is make-config. This command will generate a default config file that will be used to define which annotation data and which version to download and to use during annotation. The generated config file is basically a template which you can edit manually in case you want to download it with some specific data source and version.

dotnet DataManager.dll make-config

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll make-config [options]
Create a downloader config file

OPTIONS:
--credentials-file, -l <VALUE>
(Optional) Path to credential file. The default file location is ~/.ilmnAnnotations/credentials. json.
--ref, -r <VALUE> Genome assembly reference (GRCh38/GRCh37/all)
--dir, -d <VALUE> (Optional) Path to the parent directory that
contains all data downloaded using Downloader.
If not set, version information will be
obtained from server.
--outDir, -o <VALUE> Path to the output directory.(Optional) Default
value is ~/.ilmnAnnotations.
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters for make-config command.

  • Param --credentials-file or -l is an optional parameter. This param is the path to your credential file.
  • Param --ref or -r is a required parameter. This is to filter data based on genome assembly reference. The value is either GRCh37 or GRCh38 or all to generate config for both assembly.
  • Param --dir or -d is an optional parameter. This is to generate a config file from your existing directory. It will read the content of your directory and generate config file. It is useful to generate config file from existing directory when you previously use old Downloader to download.
  • Param --outDir or -o is an optional parameter. This param is the path where the config file will be written to. If not specified, it will write to ~/.ilmnAnnotation. The file name would be {assembly}_annotation_config.json.

Below is example of the generated config file.

{
"Ensembl": {
"GeneModels": "110"
},
"RefSeq": {
"GeneModels": "GCF_000001405.40-RS_2023_03"
},
"GenomeAssembly": {
"GenomeAssembly": "GRCh38.p12"
},
"clinvar": {
"SmallVariant": "20240502",
"StructuralVariant": "20240502"
},
...
}

Download file

The third command available is download. This command will download the database files. Files that already exist in the output folder will be skipped (if the file size match).

dotnet DataManager.dll download
---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll download [options]
Download file

OPTIONS:
--credentials-file, -l <VALUE>
Path to credential file. The default file
location is ~/.ilmnAnnotations/credentials.json
--ref, -r <VALUE> Assembly reference (GRCh38/GRCh37)
--versions-config <VALUE>
Download file config. By default, it will use
file ~/.ilmnAnnotation/[assembly]_annotation_
config.json
--dir, -d <VALUE> Download file directory
--thread, -t <VALUE> Number of concurrent download
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters for the download command.

  • Param --credentials-file or -l is an optional parameter. This param is the path to your credential file.
  • Param --ref or -r is a required parameter. This is to filter data based on genome assembly reference. The value is either GRCh37 or GRCh38
  • Param --versions-config is an optional parameter. This is the file config that contains database and version that you want to download. If not specified, it will use ~/.ilmnAnnotation/GRChXX_annotation_config. json
  • Param --dir or -d is a required pqrameter. It specified the directory that will be used to store all downloaded files.
  • Param --thread or -t is an optional parameter. It is the number of thread that will be used to download file simultaneously. By default, the value is 4.

Below is an example where you download new release of reference file and Ensembl and RefSeq cache. The downloader will first detect that you have an existing reference and cache files with different version. The downloader then will delete those files and download the new files.

dotnet DataManager.dll download \
-r GRCh37 \
--credentials-file ~/.ilmnAnnotations/credentials.json \
--dir /target/directory \
--versions-config ~/.ilmnAnnotations/GRCh37_annotation_config.json

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

Listing annotation files in local directory...
Requesting remote file information to be downloaded...
Remote file list received!
Syncing local files with requested files
Requesting remote file information to be downloaded...
Start downloading files.
Downloading file GeneSymbols.ndb...
Downloading file ClinVar_preview_20240301.nsi...
Downloading file OMIM_20240110.ega...
============= Downloading ==============
ClinVar_preview_20240301.nsi : [Downloaded #################] 100%
ClinVar_preview_20240301.nsi : [Downloaded #################] 100%
OMIM_20240110.ega : [Downloaded #################] 100%
======== Download Completed =========
Submitting download usage...
Submitting download usage finished!
Obtaining Data License...


---------------------------------------------------------------------------
Download Summary
---------------------------------------------------------------------------
Download success:
Total size: 4.6 MB
Downloaded files:
- index_GeneSymbols.ndb
- GeneSymbols.ndb
- OMIM_20240110.ega
- ClinVar_preview_20240301.nsi
Data license saved: ~/.ilmnAnnotations/premium.lic

Note that adding additional sources so versions config file will only download those additional files.

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

Listing annotation files in local directory...
Requesting remote file information to be downloaded...
Remote file list received!
Syncing local files with requested files
Requesting remote file information to be downloaded...
Start downloading files.
Downloading file ClinVar_preview_20240301.nsa...
============= Downloading ==============
ClinVar_preview_20240301.nsa (with index): [##################################################] 100%
======== Download Completed =========
Submitting download usage...
Submitting download usage finished!
Obtaining Data License...


---------------------------------------------------------------------------
Download Summary
---------------------------------------------------------------------------
Download success:
Total size: 102.74 MB
Downloaded files:
- index_ClinVar_preview_20240301.nsa
- ClinVar_preview_20240301.nsa
Data license saved: ~/.ilmnAnnotations/premium.lic

Version validate

The fourth command is version-validate. This command is used to check whether the directory provided has all data source file and version that is defined in the version config.

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.25.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll version-validate --ref <reference sequence file path> --cache <cache directory> --sd <supplementary data directory path> --data-config <config json file path>
version-validate

OPTIONS:
--cache, -c <directory>
input cache directory
--ref, -r <path> input compressed reference sequence path
--sd <VALUE> Supplementary Annotation Data directory path
--versions-config <VALUE>
Data versions config json file path
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters that you can pass:

  • Param --versions-config is a required param to the version config file path that you want to check against your file.
  • Param --cache or -c is an optional param to the cache directory path that you want to use. This will be the same path that you pass when running Illumina Connected Annotation to annotate.
  • Param --ref or -r is a required param to the genome assembly reference file path that you want to use. This will be the same path that you pass when running Illumina Connected Annotation to annotate.
  • Param --sd is an optional param to the supplementary annotation directory path that you want to use. This will be the same path that you pass when running Illumina Connected Annotation to annotate.

This command will return successfully all data sources and matching versions that are defined in the version config exist.

info

The default path for credential file and version config is located in ~/.ilmnAnnotation which is the home folder of the user that run the command. In case you are using docker to run DataManager utility, make sure the credential file is mounted in the docker container. By default, docker container is run as root, so the default directory path the DataManager will use would be /root/.ilmnAnnotation. You can mount your local .ilmnAnnotation to this folder by adding parameter:

-v /local/path/.ilmnAnnotation:/root/.ilmnAnnotation