DNA Sequences Classification with Deep Learning: A Survey

Deep learning (DL) methods have beenachieving amazing results in solving a variety ofproblems in many different fields especially in the areaof big data. With the advances of the big data era inbioinformatics, applying DL techniques, the DNAsequences can be classified with accurate and scalableprediction. The strength of DL methods come from thedevelopment of software and hardware, such asprocessing abilities graphical processing units (GPU) forthe hardware and new learning or inference algorithmsfor the software, which reducing the main primarydifficulties that faced the training process. In This work,we start from the previous classification methods such asalignment methods pointing out the problems, which areface to use these methods.After that, we demonstratedeep learning, from artificial neural networks to hyperparameter tuning, and the most recent state-of-the-artDL architectures used in DNA classification. After that,the paper ended with limitations and suggestions.

[1]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[2]  Sung-Hou Kim,et al.  Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs) , 2011, Proceedings of the National Academy of Sciences.

[3]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  M. Ragan,et al.  Next-generation phylogenomics , 2013, Biology Direct.

[6]  O. Stegle,et al.  Accurate prediction of single-cell DNA methylation states using deep learning , 2016, bioRxiv.

[7]  Antonino Fiannaca,et al.  Classification Experiments of DNA Sequences by Using a Deep Neural Network and Chaos Game Representation , 2016, CompSysTech.

[8]  Alexander Wong,et al.  Lung Nodule Classification Using Deep Features in CT Images , 2015, 2015 12th Conference on Computer and Robot Vision.

[9]  Troy Hernandez,et al.  Real Time Classification of Viruses in 12 Dimensions , 2013, PloS one.

[10]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[11]  Lila Kari,et al.  ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels , 2018 .

[12]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[13]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[15]  Luiz Eduardo Soares de Oliveira,et al.  A Dataset for Breast Cancer Histopathological Image Classification , 2016, IEEE Transactions on Biomedical Engineering.

[16]  Yang Wang,et al.  BigDL: A Distributed Deep Learning Framework for Big Data , 2018, SoCC.

[17]  Leland H. Hartwell,et al.  Genetics: From Genes to Genomes , 1999 .

[18]  Mattia Antonino Di Gangi,et al.  Deep Learning Architectures for DNA Sequence Classification , 2016, WILF.

[19]  Winston A Hide,et al.  A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. , 1999, Genome research.

[20]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[21]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[22]  Stefan C. Kremer,et al.  Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[23]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[24]  Kenji Satou,et al.  DNA Sequence Classification by Convolutional Neural Network , 2016 .

[25]  Yasubumi Sakakibara,et al.  Convolutional neural networks for classification of alignments of non-coding RNA sequences , 2018, Bioinform..

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[30]  Ahmed Halioui,et al.  A machine learning approach for viral genome classification , 2017, BMC Bioinformatics.

[31]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  El-Sayed M. El-Rabaie,et al.  Bacterial classification with convolutional neural networks based on different data reduction layers , 2019, Nucleosides, nucleotides & nucleic acids.

[34]  Phil Kim,et al.  MATLAB Deep Learning , 2017, Apress.

[35]  Jonas S. Almeida,et al.  Alignment-free sequence comparison: benefits, applications, and tools , 2017, Genome Biology.

[36]  Gesine Reinert,et al.  Alignment-Free Sequence Analysis and Applications. , 2018, Annual review of biomedical data science.

[37]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[38]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[39]  Glenn Lawyer,et al.  COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification , 2014, Nucleic acids research.

[40]  Jin Xiong,et al.  Essential bioinformatics , 2006 .

[41]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[42]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[43]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[44]  Bernhard Haubold,et al.  Alignment-free detection of local similarity among viral and bacterial genomes , 2011, Bioinform..