Classification Experiments of DNA Sequences by Using a Deep Neural Network and Chaos Game Representation

Analysis and classification of sequences is one of the key research areas in bioinformatics. The basic tool for sequence analysis is alignment, but there are also other techniques that can be used. Frequency Chaos Game Representation is a technique that builds an image characteristic of the sequence The paper describes the first experiment in the use of a deep neural network for classification of DNA sequences represented as images by using the Frequency Chaos Game Representation.

[1]  N. Goldman,et al.  Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. , 1993, Nucleic acids research.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Tae-Kun Seo,et al.  Classification of Nucleotide Sequences Using Support Vector Machines , 2010, Journal of Molecular Evolution.

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Vladimir Pavlovic,et al.  Fast Kernel Methods for SVM Sequence Classifiers , 2007, WABI.

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Jonas S. Almeida,et al.  Analysis of genomic sequences by Chaos Game Representation , 2001, Bioinform..

[9]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  B. Chor,et al.  Genomic DNA k-mer spectra: models and modalities , 2009, Genome Biology.

[13]  Antonino Fiannaca,et al.  Probabilistic topic modeling for the analysis and classification of genomic sequences , 2015, BMC Bioinformatics.

[14]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[15]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[16]  Antonino Fiannaca,et al.  A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network , 2015, Artif. Intell. Medicine.

[17]  Lila Kari,et al.  The spectrum of genomic signatures: from dinucleotides to chaos game representation. , 2005, Gene.

[18]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[19]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[20]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[21]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[22]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[23]  Reginald Wilson About the Authors , 2018, IEEE Transactions on Engineering Management.

[24]  B. Blaisdell A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[26]  Vladimir Pavlovic,et al.  Efficient alignment-free DNA barcode analytics , 2009, BMC Bioinformatics.