Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.

Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.

[1]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[2]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[3]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[4]  Hong Jiang,et al.  RBF : , 2007 .

[5]  K. Helin,et al.  Heterodimerization of the transcription factors E2F-1 and DP-1 leads to cooperative trans-activation. , 1993, Genes & development.

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[8]  D. Covell,et al.  A role for surface hydrophobicity in protein‐protein recognition , 1994, Protein science : a publication of the Protein Society.

[9]  N. Dyson,et al.  DNA-binding and trans-activation properties of Drosophila E2F and DP proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[12]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[13]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[14]  Yu-Dong Cai,et al.  Predicting protease types by hybridizing gene ontology and pseudo amino acid composition , 2006, Proteins.

[15]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[16]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[17]  Kuo-Chen Chou,et al.  Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[18]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[19]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[20]  K. Chou,et al.  Predicting protein-protein interactions from sequences in a hybridization space. , 2006, Journal of proteome research.

[21]  M. Vidal,et al.  RBF, a novel RB-related gene that regulates E2F activity and interacts with cyclin E in Drosophila. , 1996, Genes & development.

[22]  M. Tyers,et al.  From genomics to proteomics , 2003, Nature.

[23]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[24]  Kuo-Chen Chou,et al.  Predicting networking couples for metabolic pathways of Arabidopsis , 2006 .

[25]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[26]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[27]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[28]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[29]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[30]  Kuo-Chen Chou,et al.  Prediction of protease types in a hybridization space. , 2006, Biochemical and biophysical research communications.

[31]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[32]  C. Zhang,et al.  A joint prediction of the folding types of 1490 human proteins from their genetic codons. , 1993, Journal of theoretical biology.

[33]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[34]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[35]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[36]  H. Wolfson,et al.  Shape complementarity at protein–protein interfaces , 1994, Biopolymers.

[37]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[38]  P. Argos,et al.  Cavities and packing at protein interfaces , 1994, Protein science : a publication of the Protein Society.

[39]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[40]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[41]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[42]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[43]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .