Protein functional class prediction using global encoding of amino acid sequence.

A key goal of the post-genomic era is to determine protein functions. In this paper, we proposed a global encoding method of protein sequence (GE) to descript global information of amino acid sequence, and then assign protein functional class using machine learning methods nearest neighbor algorithm (NNA). We predicted the function of 1818 Saccharomyces cerevisiae proteins which was used in Vazquez's global optimization method (GOM) except eight proteins which cannot get from the database now or whose sequence length is too short. Using our approach, the computed accuracy is better than Vazquez's global optimization method (GOM) in some cases. The experiment results show that our new method is efficient to predict functional class of unknown proteins.

[1]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[2]  E. Uriarte,et al.  Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices , 2008, Journal of Theoretical Biology.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Yu Zong Chen,et al.  Support Vector Machine Classification Of Physical And Biological Datasets , 2003 .

[7]  J. Thornton,et al.  The (betaalpha)(8) glycosidases: sequence and structure analyses suggest distant evolutionary relationships. , 2001, Protein engineering.

[8]  Lourdes Santana,et al.  A model for the recognition of protein kinases based on the entropy of 3D van der Waals interactions. , 2007, Journal of proteome research.

[9]  Zheng-Hua Wang,et al.  A New Encoding Scheme to Improve the Performance of Protein Structural Class Prediction , 2005, ICNC.

[10]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[11]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[12]  Cristian Robert Munteanu,et al.  Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices , 2008, Journal of Theoretical Biology.

[13]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[14]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[15]  Y. Miyata,et al.  Distantly related cousins of MAP kinase: biochemical properties and possible physiological functions. , 1999, Biochemical and biophysical research communications.

[16]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[17]  Humberto González Díaz,et al.  Markovian chemicals "in silico" design (MARCH-INSIDE), a promising approach for computer aided molecular design II: experimental and theoretical assessment of a novel method for virtual screening of fasciolicides , 2002, Journal of molecular modeling.

[18]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[19]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[20]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[21]  Miguel A. Cabrera,et al.  Markovian chemicals "in silico" design (MARCH-INSIDE), a promising approach for computer-aided molecular design I: discovery of anticancer compounds , 2003, Journal of molecular modeling.

[22]  C. Rosenow,et al.  Monitoring gene expression using DNA microarrays. , 2000, Current opinion in microbiology.

[23]  J. Thornton,et al.  The (βα)8 glycosidases: sequence and structure analyses suggest distant evolutionary relationships , 2001 .

[24]  P. Argos,et al.  Recognition of distantly related protein sequences using conserved motifs and neural networks. , 1992, Journal of molecular biology.

[25]  Ping-an He,et al.  Numerical Characterization of DNA Primary Sequence , 2002 .

[26]  An-Suei Yang,et al.  Structure-dependent sequence alignment for remotely related proteins , 2002, Bioinform..

[27]  F. Prado-Prado,et al.  Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. , 2008, Current topics in medicinal chemistry.

[28]  Yu-Dong Cai,et al.  Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition , 2004, Bioinform..

[29]  Humberto González-Díaz,et al.  Alignment-free prediction of polygalacturonases with pseudofolding topological indices: experimental isolation from Coffea arabica and prediction of a new sequence. , 2009, Journal of proteome research.

[30]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[32]  Eugenio Uriarte,et al.  Alignment-free prediction of a drug-target complex network based on parameters of drug connectivity and protein sequence of receptors. , 2009, Molecular pharmaceutics.

[33]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[34]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.