A Novel Method for Protein Function Prediction Based on Sequence Numerical Features

Compared with costly and time-consuming biological experiments, computational approaches to predict protein functions are easier and more cost-efficient. In this work, a feature vector constructed by extracting numerical features from sequences based on hydrophobicity, polarity and charge properties, and a function possibility of sequence are proposed. Then the feature vector and function possibility are used to predict protein function with k-nearest neighbors algorithm (KNN). Our method avoids some problems of sequence similarity based methods, because it has involved both local and global information of sequences. The results of our experiments show that our method is more efficient.

[1]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[2]  Bo Liao,et al.  An Approach for Data Selection of Protein Function Prediction , 2011 .

[3]  Seungjin Choi,et al.  Sequence-driven Features for Prediction of Subcellular Localization of Proteins , 2022 .

[4]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[5]  Chun Yan,et al.  Prediction of protein subcellular location using a combined feature of sequence , 2005, FEBS letters.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  Y. Z. Chen,et al.  Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach , 2004, Nucleic acids research.

[8]  Jenn-Kang Hwang,et al.  Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences , 2004, Proteins.

[9]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[10]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[11]  Goran Neshich,et al.  Predicting enzyme class from protein structure using Bayesian classification. , 2006, Genetics and molecular research : GMR.

[12]  Yanda Li,et al.  Prediction of C-to-U RNA editing sites in higher plant mitochondria using only nucleotide sequence features. , 2007, Biochemical and biophysical research communications.

[13]  P. Suganthan,et al.  Identification of catalytic residues from protein structure using support vector machine with sequence and structural features. , 2008, Biochemical and biophysical research communications.

[14]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[15]  Jiawei Luo,et al.  Protein functional class prediction using global encoding of amino acid sequence. , 2009, Journal of theoretical biology.

[16]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Søren Brunak,et al.  Prediction of novel archaeal enzymes from sequence‐derived features , 2002, Protein science : a publication of the Protein Society.

[18]  Keun Ho Ryu,et al.  Identification of protein functions using a machine-learning approach based on sequence-derived properties , 2009, Proteome Science.

[19]  Y. Z. Chen,et al.  Protein function classification via support vector machine approach. , 2003, Mathematical biosciences.

[20]  Claude Pasquier,et al.  PRED‐CLASS: Cascading neural networks for generalized protein classification and genome‐wide applications , 2001, Proteins.