Analysis of DNA Barcode Sequences Using Neural Gas and Spectral Representation

In this paper we present an application of the neural gas network to the classification of the DNA barcode sequences. The proposed method is based on the identification of distinctive words, extracted from the spectral representation of DNA sequences. In particular we calculated the “signatures” that are a characteristic of the DNA sequence at different taxonomic levels. In order to demonstrate the efficacy of the proposed method, we tested it over 10 real barcode datasets belonging to different animalia species, provided by on-line resource Barcode of Life Database (BOLD).

[1]  B. Chor,et al.  Genomic DNA k-mer spectra: models and modalities , 2009, Genome Biology.

[2]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[3]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[4]  Friedrich Leisch,et al.  A toolbox for K-centroids cluster analysis , 2006 .

[5]  Saman K. Halgamuge,et al.  A Method for Evaluating Quality of Clustering DNA Fragments Encoded in Different Nucleotide Frequencies , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[6]  Giuseppe Di Fatta,et al.  Simulated annealing technique for fast learning of SOM networks , 2013, Neural Computing and Applications.

[7]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[8]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[9]  Thomas Villmann,et al.  Neural networks and machine learning in bioinformatics - theory and applications , 2006, ESANN.

[10]  M. Brock,et al.  The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests , 2004, Journal of General Internal Medicine.

[11]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[12]  Thomas Villmann,et al.  Batch and median neural gas , 2006, Neural Networks.

[13]  Antonino Fiannaca,et al.  Alignment-free analysis of barcode sequences by means of compression-based methods , 2013, BMC Bioinformatics.

[14]  Vladimir Pavlovic,et al.  Efficient alignment-free DNA barcode analytics , 2009, BMC Bioinformatics.