Skip to main content
Version: 3.24 (unreleased)

FusionCatcher

Overview

FusionCatcher is a well-known tool that searches for somatic novel/known fusion genes, translocations, and/or chimeras in RNA-seq data. While FusionCatcher itself is not part of Illumina Connected Annotations, we have included a subset of their genomic databases in Illumina Connected Annotations.

Publication

Daniel Nicorici, Mihaela Şatalan, Henrik Edgren, Sara Kangaspeska, Astrid Murumägi, Olli Kallioniemi, Sami Virtanen, Olavi Kilkku. (2014) FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv 011650

Supported Data Sources

Oncogenes

The following data sources are aggregated and used to populate the isOncogene field in the gene JSON object:

DescriptionReferenceDataFusionCatcher filename
Bushmanbushmanlab.orgcancer_genes.txt
ONGENEJGGbioinfo-minzhao.orgoncogenes_more.txt
UniProt tumor genesNARuniprot.orgtumor_genes.txt

Germline

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
1000 Genomes ProjectPLOS ONE1000genomes.txt
Healthy (strong support)banned.txt
Illumina Body Map 2.0EBIbodymap2.txt
CACGGenomicscacg.txt
ConjoinGPLOS ONEconjoing.txt
Healthy prefrontal cortexBMC Medical GenomicsNCBI GEOcortex.txt
Duplicated Genes DatabasePLOS ONEgenouest.orgdgd.txt
GTEx healthy tissuesgtexportal.orggtex.txt
Healthyhealthy.txt
Human Protein AtlasMCPEBIhpa.txt
Babiceanu non-cancer tissuesNARNARnon-cancer_tissues.txt
non-tumor cell linesnon-tumor_cells.txt
TumorFusions normalNARNARtcga-normal.txt

Somatic

Illumina Connected Annotations labelReferenceDataFusionCatcher filename
Alaei-Mahabadi 18 cancersPNAS18cancers.txt
DepMap CCLEdepmap.orgccle.txt
CCLE KlijnNature BiotechnologyNature Biotechnologyccle2.txt
CCLE VellichirammalMolecular Therapy Nucleic Acidsccle3.txt
Cancer Genome ProjectCOSMICcgp.txt
ChimerKB 4.0NARkobic.re.krchimerdb4kb.txt
ChimerPub 4.0NARkobic.re.krchimerdb4pub.txt
ChimerSeq 4.0NARkobic.re.krchimerdb4seq.txt
COSMICNARCOSMICcosmic.txt
Bao gliomasGenome Researchgliomas.txt
Knownknown.txt
Mitelman DBISB-CGCGoogle Cloudmitelman.txt
TCGA oesophageal carcinomasNatureoesophagus.txt
Bailey pancreatic cancersNatureNaturepancreases.txt
PCAWGCellICGCpcawg.txt
Robinson prostate cancersCellCellprostate_cancer.txt
TCGAcancer.govtcga.txt
TumorFusions tumorNARNARtcga-cancer.txt
TCGA GaoCellCelltcga2.txt
TCGA VellichirammalMolecular Therapy Nucleic Acidstcga3.txt
TICdbBMC Genomicsunav.eduticdb.txt

Gene Pair TSV File

Most of the data files in FusionCatcher are two-column TSV files containing the Ensembl gene IDs that are paired together.

Example

Here are the first few lines of the 1000genomes.txt file:

ENSG00000006210 ENSG00000102962
ENSG00000006652 ENSG00000181016
ENSG00000014138 ENSG00000149798
ENSG00000026297 ENSG00000071242
ENSG00000035499 ENSG00000155959
ENSG00000055211 ENSG00000131013
ENSG00000055332 ENSG00000179915
ENSG00000062485 ENSG00000257727
ENSG00000065978 ENSG00000166501
ENSG00000066044 ENSG00000104980

Parsing

In Illumina Connected Annotations, we will only import a gene pair if both Ensembl gene IDs are recognized from either our GRCh37 or GRCh38 cache files.

Gene TSV File

Some of the data files are single-column files containing Ensembl gene IDs. This is commonly used in the data files representing oncogene data sources.

Example

Here are the first few lines of the oncogenes_more.txt file:

ENSG00000000938
ENSG00000003402
ENSG00000005469
ENSG00000005884
ENSG00000006128
ENSG00000006453
ENSG00000006468
ENSG00000007350
ENSG00000008294
ENSG00000008952

Parsing

Known Issues

Known Issues

FusionCatcher also uses creates custom Ensembl genes (e.g. ENSG09000000002) to handle missing Ensembl genes. Illumina Connected Annotations will ignore these entries since we only include the gene IDs that are currently recognized by Illumina Connected Annotations.

I suspect that these were originally RefSeq genes and if so, we can support those directly in Illumina Connected Annotations in the future.

Download URL

https://sourceforge.net/projects/fusioncatcher/files/data

JSON Output

   "fusionCatcher":[
{
"genes":{
"first":{
"hgnc":"ETV6",
"isOncogene":true
},
"second":{
"hgnc":"RUNX1"
},
"isParalogPair":true,
"isPseudogenePair":true,
"isReadthrough":true
},
"germlineSources":[
"1000 Genomes Project"
],
"somaticSources":[
"COSMIC",
"TCGA oesophageal carcinomas"
]
}
]
FieldTypeNotes
genesgenes object5' gene & 3' gene
germlineSourcesstring arraymatches in known germline data sources
somaticSourcesstring arraymatches in known somatic data sources

genes

FieldTypeNotes
firstgene object5' gene
secondgene object3' gene
isParalogPairbooltrue when both genes are paralogs for each other
isPseudogenePairbooltrue when both genes are pseudogenes for each other
isReadthroughbooltrue when this fusion gene is a readthrough event (both are on the same strand and there are no genes between them)

gene

FieldTypeNotes
hgncstringgene symbol. e.g. MSH6
isOncogenebooltrue when this gene is an oncogene