PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence

Sequence-derived structural and physicochemical features have frequently been used in the development of statistical learning models for predicting proteins and peptides of different structural, functional and interaction profiles. PROFEAT (Protein Features) is a web server for computing commonly-used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes six feature groups composed of ten features that include 51 descriptors and 1447 descriptor values. The computed features include amino acid composition, dipeptide composition, normalized Moreau–Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors and the composition, transition and distribution of various structural and physicochemical properties. In addition, it can also compute previous autocorrelations descriptors based on user-defined properties. Our computational algorithms were extensively tested and the computed protein features have been used in a number of published works for predicting proteins of functional classes, protein–protein interactions and MHC-binding peptides. PROFEAT is accessible at

[1]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies: system of atomic contributions for the calculation of the n-octanol/water partition coefficients , 1984 .

[2]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.

[3]  Klaus-Jürgen Schleifer,et al.  Molecular Modeling Study of Diltiazem Mimics at L-Type Calcium Channels , 1999, Pharmaceutical Research.

[4]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[5]  C. J. Zheng,et al.  Prediction of Functional Class of Novel Bacterial Proteins without the Use of Sequence Similarity by a Statistical Learning Method , 2005, Journal of Molecular Microbiology and Biotechnology.

[6]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[7]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[8]  Denise Gorse,et al.  A novel approach to the recognition of protein architecture from sequence using fourier analysis and neural networks , 2002, Proteins.

[9]  Xiaomin Luo,et al.  Brownian dynamics simulations of the recognition of the scorpion toxin maurotoxin with the voltage-gated potassium ion channels. , 2002, Biophysical journal.

[10]  Y.Z. Chen,et al.  Enzyme family classification by support vector machines , 2004, Proteins.

[11]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[12]  D. Horne,et al.  Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities , 1988, Biopolymers.

[13]  P. China Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity , 2005 .

[14]  C. Zhang,et al.  Prediction of Membrane Protein Types Based on the Hydrophobic Index of Amino Acids , 2000, Journal of protein chemistry.

[15]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[16]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[17]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[18]  Y. Z. Chen,et al.  Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity Published, JLR Papers in Press, January 27, 2006. , 2006, Journal of Lipid Research.

[19]  D. Draper Themes in RNA-protein recognition. , 1999, Journal of molecular biology.

[20]  C C Bigelow,et al.  On the average hydrophobicity of proteins and the relation between it and protein structure. , 1967, Journal of theoretical biology.

[21]  G Schneider,et al.  The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. , 1994, Biophysical journal.

[22]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[23]  William Stafford Noble,et al.  Support vector machine classification on the web , 2004, Bioinform..

[24]  Y.Z. Chen,et al.  Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity , 2004, Virology.

[25]  X M Pan,et al.  Accurate Prediction of Protein Secondary Structural Content , 2001, Journal of protein chemistry.

[26]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[27]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[28]  H. Cid,et al.  Hydrophobicity and structural classes in proteins. , 1992, Protein engineering.

[29]  Positron Lifetime Spectra Support Vector Machine in Classification of , 2004 .

[30]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[31]  Xin Chen,et al.  Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents , 2004, J. Chem. Inf. Model..

[32]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[33]  Gajendra P S Raghava,et al.  Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition* , 2004, Journal of Biological Chemistry.

[34]  Y. Z. Chen,et al.  Prediction of transporter family from protein sequence by support vector machine approach , 2005, Proteins.

[35]  H. Bohr,et al.  The DEF data base of sequence based protein fold class predictions. , 1994, Nucleic acids research.

[36]  Y. Z. Chen,et al.  Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach , 2004, Nucleic acids research.

[37]  R. Sokal,et al.  Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population. , 2006, American journal of physical anthropology.

[38]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[39]  A. Seelig,et al.  Structure-activity relationship of P-glycoprotein substrates and modifiers. , 2000, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[40]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[41]  Martin Reczko,et al.  Protein Fold Class Prediction: New Methods of Statistical Classification , 1999, ISMB.

[42]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[43]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[44]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[45]  Yu Zong Chen,et al.  prediction of protein-protein interactions , 2004 .

[46]  R. Geary,et al.  The Contiguity Ratio and Statistical Mapping , 1954 .

[47]  M. Charton,et al.  Protein folding and the genetic code: an alternative quantitative model. , 1981, Journal of theoretical biology.

[48]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[49]  G. Ya. Wiederschain,et al.  The proteomics protocols handbook , 2006, Biochemistry (Moscow).