Skip to main content
Version: 3.24 (unreleased)

Data Manager

Overview

Data Manager is a utility software from Illumina Connected Annotation software suite for user to easily pick and choose prebuilt annotation databases compatible with Illumina Connected Annotation. Previously, we provided the Downloader software that the user can run to download the latest annotation database automatically. With the new Data Manager, users can choose the exact version of an annotation database they want.

caution

Downloader.dll utility will no longer be provided and instead DataManager.dll should be used.

Commands

List

The first command is list. This command is to list all available data annotation and version.

dotnet DownloadManager.dll list

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll list [options]
List available data

OPTIONS:
--credentials-file, -l <VALUE>
Path to credential file. The default file location is ~/.ilmnAnnotations/credentials.json
--ref, -r <VALUE> Assembly reference (GRCh38/GRCh37)
--dir, -d <VALUE> Local directory
--sources, -s <VALUE> data source to show
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters that you can pass.

  • Param --ref or -r is a required parameter. This is to filter data based on genome assembly reference. The value is either GRCh37 or GRCh38
  • Param --dir or -d is an optional parameter. This param is the path to your local data directory. If you provide this param, the list will mark file that you have already downloaded.
  • Param --credentials-file or -l is an optional parameter. This param is the path to your credential file.
  • Param --sources or -s is a filter based on data source. Without this param, the list will display all data source but only display the latest version available. If you filter using this param, you will see all available version.

Below are some example of the output:

dotnet DownloadManager.dll list -r GRCh38 -d /path/to/folder

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

Using default credentials file: ~/.ilmnAnnotations/credentials.json
======================================================================================================================================================
Data Source | Annotation Type | Description | Version
======================================================================================================================================================
DANN | Score | DANN | 20200205
------------------------------------------------------------------------------------------------------------------------------------------------------
DECIPHER | StructuralVariant | DECIPHER | 201509
------------------------------------------------------------------------------------------------------------------------------------------------------
Ensembl | GeneModels | Ensembl | 110
------------------------------------------------------------------------------------------------------------------------------------------------------
FusionCatcher | GeneFusion | FusionCatcher | 1.33
------------------------------------------------------------------------------------------------------------------------------------------------------
GME | SmallVariant | GME | 20160618
------------------------------------------------------------------------------------------------------------------------------------------------------
GenomeAssembly | GenomeAssembly | GenomeAssembly | GRCh37.p13
------------------------------------------------------------------------------------------------------------------------------------------------------
Gerp | Score | Gerp | 20110522
------------------------------------------------------------------------------------------------------------------------------------------------------
HGNC | GeneModels | HGNC | 20240603
------------------------------------------------------------------------------------------------------------------------------------------------------
MultiZ100Way | Protein | MultiZ100Way | 20150427
------------------------------------------------------------------------------------------------------------------------------------------------------
PrimateAI | SmallVariant | PrimateAI | 0.2
------------------------------------------------------------------------------------------------------------------------------------------------------
REVEL | SmallVariant | REVEL | 20200205
------------------------------------------------------------------------------------------------------------------------------------------------------
RefSeq | GeneModels | RefSeq | 105.20220307
------------------------------------------------------------------------------------------------------------------------------------------------------
TOPMed | SmallVariant | TOPMed | freeze_5
------------------------------------------------------------------------------------------------------------------------------------------------------
clingen | Gene | ClinGen disease validity curations | 20240910
| | ClinGen Dosage Sensitivity Map | 20240910
| StructuralVariant | ClinGen Dosage Sensitivity Map | 20240910
------------------------------------------------------------------------------------------------------------------------------------------------------
clingen (legacy) | StructuralVariant | ClinGen | 20160414
------------------------------------------------------------------------------------------------------------------------------------------------------
clinvar | SmallVariant | ClinVar | 20240902
| StructuralVariant | ClinVar | 20240902
------------------------------------------------------------------------------------------------------------------------------------------------------
clinvar-preview | SmallVariant | ClinVarPreview | 20240902 (*)
| StructuralVariant | ClinVarPreview | 20240902 (*)
------------------------------------------------------------------------------------------------------------------------------------------------------
cosmic | Gene | Cosmic Cancer Gene Census | 99
| GeneFusion | COSMIC gene fusions | 99
| SmallVariant | COSMIC | 99
------------------------------------------------------------------------------------------------------------------------------------------------------
dbSNP | SmallVariant | dbSNP | 156
------------------------------------------------------------------------------------------------------------------------------------------------------
globalAllele | SmallVariant | dbSNP | 151
------------------------------------------------------------------------------------------------------------------------------------------------------
gnomad | Gene | gnomAD_gene_scores | 4.1
| LowComplexityRegions | gnomAD_LCR | 2.1
| SmallVariant | gnomAD | 2.1
| StructuralVariant | gnomAD_SV | 2.1
------------------------------------------------------------------------------------------------------------------------------------------------------
gnomad-exome | SmallVariant | gnomAD_exome | 2.1
------------------------------------------------------------------------------------------------------------------------------------------------------
mitomap | SmallVariant | MITOMAP | 20200819
| StructuralVariant | MITOMAP_SV | 20200819
------------------------------------------------------------------------------------------------------------------------------------------------------
omim | Gene | OMIM | 20240910
------------------------------------------------------------------------------------------------------------------------------------------------------
oneKg | RefMinor | 1000 Genomes Project | Phase 3 v5a
| SmallVariant | 1000 Genomes Project | Phase 3 v5a
| StructuralVariant | 1000 Genomes Project (SV) | Phase 3 v5a
------------------------------------------------------------------------------------------------------------------------------------------------------
phylopScore | ConservationScore | phyloP | hg19
------------------------------------------------------------------------------------------------------------------------------------------------------
primateAI-3D | SmallVariant | PrimateAI-3D | 1.0
------------------------------------------------------------------------------------------------------------------------------------------------------
spliceAI | SmallVariant | SpliceAI | 1.3
------------------------------------------------------------------------------------------------------------------------------------------------------
(*) : available in local directory

dotnet DownloadManager.dll list -r GRCh38 -d /path/to/folder -s clinvar

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

Using default credentials file: ~/.ilmnAnnotations/credentials.json
======================================================================================================================================================
Data Source | Annotation Type | Description | Version
======================================================================================================================================================
clinvar | SmallVariant | ClinVar | 20240902 (*)
| | ClinVar | 20240902_trimmed
| | ClinVar | 20240730
| | ClinVar | 20240630
| | ClinVar | 20240603
| | ClinVar | 20240502
| | ClinVar | 20231230
| | ClinVar | 20231203
| | ClinVar | 20231028
| | ClinVar | 20230930
| | ClinVar | 20230903
| | ClinVar | 20230822
| | ClinVar | 20230608
| | ClinVar | 20230509
| | ClinVar | 20230301
| | ClinVar | 20230210
| | ClinVar | 20220901
| | ClinVar | 20220505
| | ClinVar | 20211202
| | ClinVar | 20210805
| | ClinVar | 20210603
| | ClinVar | 20210401
| | ClinVar | 20200903
| | ClinVar | 20200302
| | ClinVar | 20200102
| | ClinVar | 20191001
| | ClinVar | 20190808
| | ClinVar | 20190204
| | ClinVar | 20180129
| StructuralVariant | ClinVar | 20240902 (*)
| | ClinVar | 20240730
| | ClinVar | 20240630
| | ClinVar | 20240603
| | ClinVar | 20240502
| | ClinVar | 20231230
| | ClinVar | 20231203
| | ClinVar | 20231028
| | ClinVar | 20230930
| | ClinVar | 20230903
| | ClinVar | 20230822
| | ClinVar | 20230608
| | ClinVar | 20230509
| | ClinVar | 20230301
| | ClinVar | 20230210
| | ClinVar | 20220901
| | ClinVar | 20220505
------------------------------------------------------------------------------------------------------------------------------------------------------
(*) : available in local directory

Create config

The second command is make-config. This command will generate a config file that will be used to define which annotation data and which version to download. The generated config file is basically a template which you can edit manually. It will contain latest version of all data source. In case you want to exclude some data source or require a different version, you can edit it manually.

dotnet DownloadManager.dll make-config

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll make-config [options]
Create a downloader config file

OPTIONS:
--credentials-file, -l <VALUE>
(Optional) Path to credential file. The default file location is ~/.ilmnAnnotations/credentials. json.
--ref, -r <VALUE> Genome assembly reference (GRCh38/GRCh37/all)
--dir, -d <VALUE> (Optional) Path to the parent directory that
contains all data downloaded using Downloader.
If not set, version information will be
obtained from server.
--outDir, -o <VALUE> Path to the output directory.(Optional) Default
value is ~/.ilmnAnnotations.
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters for make-config command.

  • Param --credentials-file or -l is an optional parameter. This param is the path to your credential file.
  • Param --ref or -r is a required parameter. This is to filter data based on genome assembly reference. The value is either GRCh37 or GRCh38
  • Param --out or -o is an optional parameter. This param is the path where the config file will be written to. If not specified, it will write to ~/.ilmnAnnotation/{assembly}_annotation_config.json.

Below is example of the generated config file.

{
"Ensembl": {
"GeneModels": "110"
},
"RefSeq": {
"GeneModels": "GCF_000001405.40-RS_2023_03"
},
"GenomeAssembly": {
"GenomeAssembly": "GRCh38.p12"
},
"clinvar": {
"SmallVariant": "20240502",
"StructuralVariant": "20240502"
},
...
}

Download file

The third command available is download. This command will download the database files. Files that already exist in the output folder will be skipped (if the md5checksum match). You may think of this more like a sync command than a download or copy. It will also auto update if you have older version of the data by deleting the old version and downloading the new version.

dotnet DownloadManager.dll download
---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

USAGE: dotnet DataManager.dll download [options]
Download file

OPTIONS:
--credentials-file, -l <VALUE>
Path to credential file. The default file location is ~/.ilmnAnnotations/credentials.json
--ref, -r <VALUE> Assembly reference (GRCh38/GRCh37)
--versions-config <VALUE>
Download file config. By default, it will use file ~/.ilmnAnnotation/[assembly]_annotation_config.json
--dir, -d <VALUE> Download file directory
--thread, -t <VALUE> Number of concurrent download
--help, -h displays the help menu
--version, -v displays the version

##### Supported Annotation Sources #####
Basic Tier: DECIPHER, GME, GERP, DANN, REVEL, ClinGen, gnomAD, phyloP, TOPMed, DGV, 1000 Genomes, CliinVar, dbSNP, FusionCatcher, MITOMAP, MultiZ100Way

Professional Tier: PrimateAI(GRCh37), PrimateAI-3D(GRCh38), SpliceAI, COSMIC, OMIM.

##### Contact #####
Professional content licensing, feedback and technical support: annotation_support@illumina.com.

There are several parameters for the download command.

  • Param --credentials-file or -l is an optional parameter. This param is the path to your credential file.
  • Param --ref or -r is a required parameter. This is to filter data based on genome assembly reference. The value is either GRCh37 or GRCh38
  • Param --versions-config is an optional parameter. This is the file config that contains database and version that you want to download. If not specified, it will use ~/.ilmnAnnotation/GRChXX_annotation_config. json
  • Param --dir or -d is a required pqrameter. It specified the directory that will be used to store all downloaded files.
  • Param --thread or -t is an optional parameter. It is the number of thread that will be used to download file simultaneously. By default, the value is 4.

Below is an example where you download new release of reference file and Ensembl and RefSeq cache. Tge downloader will first detect that you have an existing reference and cache files with different version. The downloader then will delete those files and download the new files.

dotnet DownloadManager.dll download \
-r GRCh37 \
--credentials-file ~/.ilmnAnnotations/credentials.json \
--dir /target/directory \
--versions-config ~/.ilmnAnnotations/GRCh37_annotation_config.json

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

Listing annotation files in local directory...
Requesting remote file information to be downloaded...
Remote file list received!
Syncing local files with requested files
Requesting remote file information to be downloaded...
Start downloading files.
Downloading file GeneSymbols.ndb...
Downloading file ClinVar_preview_20240301.nsi...
Downloading file OMIM_20240110.ega...
============= Downloading ==============
ClinVar_preview_20240301.nsi : [Downloaded #################] 100%
ClinVar_preview_20240301.nsi : [Downloaded #################] 100%
OMIM_20240110.ega : [Downloaded #################] 100%
======== Download Completed =========
Submitting download usage...
Submitting download usage finished!
Obtaining Data License...


---------------------------------------------------------------------------
Download Summary
---------------------------------------------------------------------------
Download success:
Total size: 4.6 MB
Downloaded files:
- index_GeneSymbols.ndb
- GeneSymbols.ndb
- OMIM_20240110.ega
- ClinVar_preview_20240301.nsi
Data license saved: ~/.ilmnAnnotations/premium.lic

Note that adding additional sources so versions_config.json will only download those additional files.

---------------------------------------------------------------------------
DataManager (c) 2024 Illumina, Inc.
3.24.0
---------------------------------------------------------------------------

Listing annotation files in local directory...
Requesting remote file information to be downloaded...
Remote file list received!
Syncing local files with requested files
Requesting remote file information to be downloaded...
Start downloading files.
Downloading file ClinVar_preview_20240301.nsa...
============= Downloading ==============
ClinVar_preview_20240301.nsa (with index): [##################################################] 100%
======== Download Completed =========
Submitting download usage...
Submitting download usage finished!
Obtaining Data License...


---------------------------------------------------------------------------
Download Summary
---------------------------------------------------------------------------
Download success:
Total size: 102.74 MB
Downloaded files:
- index_ClinVar_preview_20240301.nsa
- ClinVar_preview_20240301.nsa
Data license saved: ~/.ilmnAnnotations/premium.lic