Recognition of Herpes Viruses on the Basis of a New Metric for Protein Sequences

This paper addresses the problem of intellectual human herpes viruses recognition based on the analysis of their protein sequences. To compare proteins, we use a new dissimilarity measure based on finding an optimal sequence alignment. In the previous work, we proved that the proposed way of sequence comparison generates a measure that has properties of a metric. These properties allow for more convenient and effective use of the proposed measure in further analysis in contrast to the traditional similarity measure, such as Needleman-Wunch alignment. The results of herpes viruses recognition show, that the metric properties allow to improve the classification quality. In addition, in this paper, we adduce an updated computational scheme for the proposed metric, which allows to speed up the comparison of proteins.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  A. Salman,et al.  Detection of Vero Cells Infected with Herpes Simplex Types 1 and 2 and Varicella Zoster Viruses Using Raman Spectroscopy and Advanced Statistical Methods , 2016, PloS one.

[4]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[5]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[6]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[7]  István Miklós,et al.  Stochastic models of sequence evolution including insertion—deletion events , 2009, Statistical methods in medical research.

[8]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[9]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[10]  Andrew J Davison,et al.  Topics in herpesvirus genomics and evolution. , 2006, Virus research.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Matthias W. Seeger,et al.  Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[13]  Gustavo E. A. P. A. Batista,et al.  Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation , 2016, SDM.

[14]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[15]  V. V. Sulimova Metrics on the basis of optimal alignment of biomolecular sequences , 2016 .

[16]  Elzbieta Pekalska,et al.  The Dissimilarity representations in pattern recognition. Concepts, theory and applications. , 2005 .

[17]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[18]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[19]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[20]  Stéphane Marchand-Maillet,et al.  Two-Stage Metric Learning , 2014, ICML.

[21]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  Noah M. Daniels,et al.  Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks , 2013, PloS one.

[24]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[25]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.