Discrimination of thermophilic and mesophilic proteins via pattern recognition methods

Four pattern recognition methods, namely, principal component analysis (PCA), stepwise regression (SR), partial least-square regression (PLSR), and backpropagation neural network, were used to discriminate thermophilic and mesophilic proteins. And four models were made to classify between these two kinds of proteins. To some degree the prediction accuracy of the methods was encouraging except for principal component analysis. Results showed that the average fitting accuracy of the four methods was 92%, 96%, 95% and 98%, respectively. And the average prediction reliability was 60%, 67.5%, 72.5% and 72.5%, respectively, the best prediction reliability for thermophilic proteins was 75%, and for mesophilic proteins was 85%.

[1]  D. A. Dougherty,et al.  The Cationminus signpi Interaction. , 1997, Chemical reviews.

[2]  David P. Kreil,et al.  Identification of thermophilic species by the amino acid compositions deduced from their genomes. , 2001, Nucleic acids research.

[3]  M. Tuohy,et al.  Evaluation of three thermostable fungal endo-β-glucanases from Talaromyces emersonii for brewing and food applications , 2005 .

[4]  R. Nussinov,et al.  Factors enhancing protein thermostability. , 2000, Protein engineering.

[5]  D Eisenberg,et al.  Transproteomic evidence of a loop-deletion mechanism for enhancing protein thermostability. , 1999, Journal of molecular biology.

[6]  C. O. Fágáin,et al.  Understanding and increasing protein stability. , 1995, Biochimica et biophysica acta.

[7]  S. Soni,et al.  Purification and characterisation of a thermostable alkaline lipase from a new thermophilic Bacillus sp. RSJ-1 , 2002 .

[8]  C. Vieille,et al.  Hyperthermophilic Enzymes: Sources, Uses, and Molecular Mechanisms for Thermostability , 2001, Microbiology and Molecular Biology Reviews.

[9]  Nikhil R. Pal,et al.  Editorial: Computational Intelligence for Pattern Recognition , 2002, Int. J. Pattern Recognit. Artif. Intell..

[10]  Bahram Hemmateenejad,et al.  Toward an Optimal Procedure for PC-ANN Model Building: Prediction of the Carcinogenic Activity of a Large Set of Drugs , 2005, J. Chem. Inf. Model..

[11]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[12]  R. Varadarajan,et al.  Elucidation of determinants of protein stability through genome sequence analysis , 2000, FEBS letters.

[13]  Pattern recognition of particle tracks using principal component analysis and artificial neural network , 1998 .

[14]  Michael C. Storrie-Lombardi,et al.  Principal component analysis and neural networks for detection of amino acid biosignatures , 2003 .

[15]  S. Pack,et al.  Protein thermostability: structure-based difference of amino acid between thermophilic and mesophilic proteins. , 2004, Journal of biotechnology.