A Deep Learning Approach to the Screening of Oncogenic Gene Fusions in Humans

Gene fusions have a very important role in the study of cancer development. In this regard, predicting the probability of protein fusion transcripts of developing into a cancer is a very challenging and yet not fully explored research problem. To this date, all the available approaches in literature try to explain the oncogenic potential of gene fusions based on protein domain analysis, that is cancer-specific and not easy to adapt to newly developed information. In our work, we choose the raw protein sequences as the input baseline, and propose the use of deep learning, and more specifically Convolutional Neural Networks, to infer the oncogenity probability score of gene fusion transcripts and to group them into a number of categories (e.g., oncogenic/not oncogenic). This is an inherently flexible methodology that, unlike previous approaches, can be re-trained with very less efforts on newly available data (for example, from a different cancer). Based on experimental results on a large dataset of pre-annotated gene fusions, our method is able to predict the oncogenity potential of gene fusion transcripts with accuracy of about 72%, which increases to 86% if we consider the only instances that are classified with a high confidence level.

[1]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[2]  Nung Kion Lee,et al.  Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method , 2017, bioRxiv.

[3]  Josh Patterson,et al.  Deep Learning: A Practitioner's Approach , 2017 .

[4]  A. Kunnumakkara,et al.  Techniques to Identify Novel Fusion Genes and to Detect Known Fusion Genes , 2017 .

[5]  Timothy L. Tickle,et al.  STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq , 2017, bioRxiv.

[6]  Elisa Ficarra,et al.  Mining textural knowledge in biological images: Applications, methods and trends , 2016, Computational and structural biotechnology journal.

[7]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[8]  M. Babu,et al.  Discovering and understanding oncogenic gene fusions through data intensive computational approaches , 2016, Nucleic acids research.

[9]  Yanjun Qi,et al.  Recurrent chimeric fusion RNAs in non-cancer tissues and cells , 2016, Nucleic acids research.

[10]  B. Johansson,et al.  The emerging complexity of gene fusions in cancer , 2015, Nature Reviews Cancer.

[11]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[12]  Chris Wiggins,et al.  Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer , 2014, BMC Systems Biology.

[13]  John N. Weinstein,et al.  PRADA: pipeline for RNA sequencing data analysis , 2014, Bioinform..

[14]  Mikhail Shugay,et al.  Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions , 2013, Bioinform..

[15]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[16]  Enrico Macii,et al.  Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model , 2012, Bioinform..

[17]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[18]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[19]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[20]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[21]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[22]  R. Amann,et al.  Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques , 2008, Nature Reviews Microbiology.

[23]  D. Pinkel,et al.  Comparative Genomic Hybridization for Molecular Cytogenetic Analysis of Solid Tumors , 2022 .

[24]  Y Ichioka,et al.  Parallel distributed processing model with local space-invariant interconnections and its optical architecture. , 1990, Applied optics.