Explore protein-protein interaction network involved in glucosinolate biosynthesis

Protein is the primary element of organism and takes part in almost all the biological processes such as metabolism and neurological regulation. Generally, proteins are interacting with each other while they exert biological role in vivo. The exploration upon protein-protein interactions (PPIs) of the specific biological process could provide valuable information to the study of the relevant field. In this paper, we focus on the collection of proteins participated in glucosinolate biosynthesis, and build 4 decision tree models to predict PPIs involved in glucosinolate biosynthesis. Information of domain-domain interactions (DDIs) is introduced in constructing feature vectors, and the interactive or non-interactive relationship between two proteins is represented by a pair of symmetrical feature vectors. 4 domain-based decision tree models are constructed and trained by the samples with 1:1, 1:2, 1:3, 1:4 positive-negative ratio respectively. 5-fold cross-validation and a standalone external test are used in order to trace the best performed model. The proposed method is effective which is demonstrated by the higher specificity, sensitivity and high attribute usage while training decision trees. We use the intersection of the best two prediction results to validate and explore PPIs based on the proteins participated in glucosinolate biosynthesis, and finally a comprehensive PPI network is drawn according to the prediction result.

[1]  Emily R Jefferson,et al.  Biological units and their effect upon the properties and prediction of protein-protein interactions. , 2006, Journal of molecular biology.

[2]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[3]  L J Ransone,et al.  Detection of protein-protein interactions by coimmunoprecipitation and dimerization. , 1995, Methods in enzymology.

[4]  Huiru Zheng,et al.  A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks , 2010, Comput. Biol. Medicine.

[5]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[6]  Loris Nanni,et al.  An ensemble of K-local hyperplanes for predicting protein-protein interactions , 2006, Bioinform..

[7]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[8]  Sailu Yellaboina,et al.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions , 2010, Nucleic Acids Res..

[9]  Hong-Bin Shen,et al.  Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. , 2011, Journal of theoretical biology.

[10]  Kojiro Yano,et al.  Improved prediction of protein interaction from microarray data using asymmetric correlation , 2010, ICCS.

[11]  Alessandro Flammini,et al.  Characterization and modeling of protein–protein interaction networks , 2005 .

[12]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[13]  Héctor Pomares,et al.  Method for prediction of protein-protein interactions in yeast using genomics/proteomics information and feature selection , 2009, Neurocomputing.

[14]  Hawoong Jeong,et al.  A protein interaction network associated with asthma. , 2008, Journal of theoretical biology.

[15]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[16]  Bing Niu,et al.  Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection. , 2009, Biochemical and biophysical research communications.

[17]  Zalmiyah Zakaria,et al.  Incorporating multiple genomic features with the utilization of interacting domain patterns to improve the prediction of protein-protein interactions , 2010, Inf. Sci..

[18]  Sixue Chen,et al.  Bioinformatic analysis of molecular network of glucosinolate biosynthesis , 2011, Comput. Biol. Chem..

[19]  Mei Liu,et al.  Domain-Based Predictive Models for Protein-Protein Interaction Prediction , 2006, EURASIP J. Adv. Signal Process..

[20]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[21]  Marcelo M. Brandão,et al.  AtPIN: Arabidopsis thaliana Protein Interaction Network , 2009, BMC Bioinformatics.

[22]  Zalmiyah Zakaria,et al.  Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction , 2010, Comput. Biol. Medicine.

[23]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[24]  Barbara Ann Halkier,et al.  Biology and biochemistry of glucosinolates. , 2006, Annual review of plant biology.

[25]  Darby Tien-Hao Chang,et al.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins , 2010, BMC Bioinformatics.

[26]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[27]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[28]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[29]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[30]  Dmitrij Frishman,et al.  Conservation of protein-protein interactions - lessons from ascomycota. , 2004, Trends in genetics : TIG.