DNA Barcode Classification Using General Regression Neural Network with Different Distance Models

The “cythosome c oxidase subunits 1” (COI) gene is used for identification of species, and it is one of the so-called DNA barcode genes. Identification of species, even using DNA barcoding can be difficult if the biological examples are degraded. Spectral representation of sequences and the General Regression Neural Network (GRNN) can give some interesting results in these difficult cases. The GRNN is based on the distance between the memorized examples of sequence and the input unknown sequence, both represented using a vector space spectral representation. In this paper we will analyse the effectiveness of different distance models in the GRNN implementation and will compare the obtained results in the classification of full length sequences and degraded samples.

[1]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[2]  Antonino Fiannaca,et al.  The General Regression Neural Network to Classify Barcode and mini-barcode DNA , 2014, CIBB.

[3]  Vladimir Pavlovic,et al.  Efficient alignment-free DNA barcode analytics , 2009, BMC Bioinformatics.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[6]  Eliot Marshall,et al.  Will DNA Bar Codes Breathe Life Into Classification? , 2005, Science.

[7]  Antonino Fiannaca,et al.  Alignment-free analysis of barcode sequences by means of compression-based methods , 2013, BMC Bioinformatics.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  P. Bertolazzi,et al.  BLOG 2.0: a software system for character‐based species classification with DNA Barcode sequences. What it does, how to use it , 2013, Molecular ecology resources.

[10]  J. Landry,et al.  A universal DNA mini-barcode for biodiversity analysis , 2008, BMC Genomics.

[11]  Antonino Fiannaca,et al.  A Study of Compression-Based Methods for the Analysis of Barcode Sequences , 2012, CIBB.

[12]  R DeSalle,et al.  Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata , 2007, Proceedings of the Royal Society B: Biological Sciences.

[13]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Antonino Fiannaca,et al.  Analysis of DNA Barcode Sequences Using Neural Gas and Spectral Representation , 2013, EANN.

[15]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[16]  P. Hebert,et al.  DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. , 2007, Trends in genetics : TIG.