Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data

Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In this case study, a dataset consisting of 135 PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient (CC) magnitude of 0.893, and low root mean square error (RMSE), and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively, were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a) a predictor of the binding energy of cancer related PPI complexes, and (b) a classifier for discriminating PPI complexes related to cancer from those of other diseases.

[1]  Krishna Kumar,et al.  Selective protein-protein interactions driven by a phenylalanine interface. , 2006, Journal of the American Chemical Society.

[2]  Ka-Lok Ng,et al.  Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms , 2015, BioMed research international.

[3]  Tatsuya Akutsu,et al.  Prediction of Protein-Protein Interaction Strength Using Domain Features with Supervised Regression , 2014, TheScientificWorldJournal.

[4]  Zhenchao Jiang,et al.  Integrating Semantic Information into Multiple Kernels for Protein-Protein Interaction Extraction from Biomedical Literatures , 2014, PloS one.

[5]  Asher Mullard,et al.  Protein–protein interaction inhibitors get into the groove , 2012, Nature Reviews Drug Discovery.

[6]  Clara Pizzuti,et al.  Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods , 2014, Bioinform..

[7]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[8]  Kamal Taha,et al.  Extracting Various Classes of Data From Biological Text Using the Concept of Existence Dependency , 2015, IEEE Journal of Biomedical and Health Informatics.

[9]  H. Le Hir,et al.  The exon–exon junction complex provides a binding platform for factors involved in mRNA export and nonsense‐mediated mRNA decay , 2001, The EMBO journal.

[10]  Hao Zhu,et al.  A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks , 2015, Scientific Reports.

[11]  Hongfang Liu,et al.  Identifying protein complexes with fuzzy machine learning model , 2013, Proteome Science.

[12]  Zhen Liu,et al.  Refined phylogenetic profiles method for predicting protein-protein interactions , 2005, Bioinform..

[13]  Sanjeev S. Tambe,et al.  Soft-sensor development for biochemical systems using genetic programming , 2014 .

[14]  Sameem Abdul Kareem,et al.  Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods , 2013, BMC Bioinformatics.

[15]  J. Schwartz,et al.  PhenomeExpress: A refined network analysis of expression datasets by inclusion of known disease phenotypes , 2015, Scientific Reports.

[16]  Z. Weng,et al.  Protein–protein docking benchmark 2.0: An update , 2005, Proteins.

[17]  Baldomero Oliva,et al.  Knowledge-based modeling of peptides at protein interfaces: PiPreD , 2015, Bioinform..

[18]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[19]  M Michael Gromiha,et al.  Feature selection and classification of protein–protein complexes based on their binding affinities using machine learning approaches , 2014, Proteins.

[20]  Arpit A. Almal,et al.  Applications of genetic programming in cancer research. , 2009, The international journal of biochemistry & cell biology.

[21]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[22]  J. Shavlik,et al.  Breast cancer risk estimation with artificial neural networks revisited , 2010, Cancer.

[23]  Martin T. Hagan,et al.  Neural network design , 1995 .

[24]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[25]  Hyunjung Shin,et al.  Research and applications: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data , 2013, J. Am. Medical Informatics Assoc..

[26]  Abbas Toloie Eshlaghy,et al.  Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence , 2013 .

[27]  David G. Karlin,et al.  Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins , 2012, PloS one.

[28]  Z. Weng,et al.  ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.

[29]  Dmitry Korkin,et al.  Literature mining of host-pathogen interactions: comparing feature-based supervised learning and language-based approaches , 2012, Bioinform..

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  Chang-Biau Yang,et al.  Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests , 2012, ICMLA.

[32]  Burkhard Rost,et al.  More challenges for machine-learning protein interactions , 2015, Bioinform..

[33]  Burkhard Rost,et al.  Evolutionary profiles improve protein-protein interaction prediction from sequence , 2015, Bioinform..

[34]  Hyunjung Shin,et al.  Robust predictive model for evaluating breast cancer survivability , 2013, Eng. Appl. Artif. Intell..

[35]  Cathy H. Wu,et al.  Prediction of contact matrix for protein-protein interaction , 2013, Bioinform..

[36]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[37]  Sanjeev S. Tambe,et al.  Artificial intelligence-based modeling of high ash coal gasification in a pilot plant scale fluidized bed gasifier , 2014 .

[38]  J. Listgarten,et al.  Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms , 2004, Clinical Cancer Research.

[39]  Yorgos Goletsis,et al.  A multiscale and multiparametric approach for modeling the progression of oral cancer , 2012, BMC Medical Informatics and Decision Making.

[40]  Brian W. Matthews,et al.  Hydrophobic Interactions in Proteins , 2001 .

[41]  Bogdan Istrate,et al.  Algorithmic approaches to protein-protein interaction site prediction , 2015, Algorithms for Molecular Biology.

[42]  Thangavel Alphonse Thanaraj,et al.  Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study , 2013, BMJ Open.

[43]  Marco Punta,et al.  PROTEIN INTERACTIONS AND DISEASE , 2007 .

[44]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[45]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[46]  Mohammad Wahab Khan,et al.  A survey of application: genomics and genetic programming, a new frontier. , 2012, Genomics.

[47]  David Page,et al.  Predicting cancer susceptibility from single-nucleotide polymorphism data: a case study in multiple myeloma , 2005, BIOKDD.

[48]  Sanghyun Park,et al.  Integrative Gene Network Construction to Analyze Cancer Recurrence Using Semi-Supervised Learning , 2014, PloS one.

[49]  Bogdan Istrate,et al.  Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor , 2014, BMC Bioinformatics.

[50]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[51]  F. Lasheras,et al.  Survival model in oral squamous cell carcinoma based on clinicopathological parameters, molecular markers and support vector machines , 2013, Expert Syst. Appl..

[52]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[53]  M. Willis,et al.  Systems modelling using genetic programming , 1997 .

[54]  Jugal K. Kalita,et al.  A multiobjective memetic algorithm for PPI network alignment , 2015, Bioinform..

[55]  Piero Fariselli,et al.  A neural network method to improve prediction of protein-protein interaction sites in heterocomplexes , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[56]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[57]  Ya Zhang,et al.  A gene signature for breast cancer prognosis using support vector machine , 2012, 2012 5th International Conference on BioMedical Engineering and Informatics.

[58]  Chao Wu,et al.  Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes , 2012, BMC Bioinformatics.

[59]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[60]  Zhiping Weng,et al.  Docking unbound proteins using shape complementarity, desolvation, and electrostatics , 2002, Proteins.

[61]  A. Stojadinovic,et al.  Development of a Bayesian Belief Network Model for Personalized Prognostic Risk Assessment in Colon Carcinomatosis , 2011, The American surgeon.

[62]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..

[63]  Renu Vyas,et al.  Genetic Programming Applications in Chemical Sciences and Engineering , 2015, Handbook of Genetic Programming Applications.

[64]  Nataša Pržulj,et al.  Protein‐protein interactions: Making sense of networks via graph‐theoretic modeling , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[65]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[66]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[67]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[68]  Kuo-Chen Chou,et al.  Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties , 2011, PloS one.

[69]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[70]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[71]  S. Saha,et al.  Prediction of Interactions between Viral and Host Proteins Using Supervised Machine Learning Methods , 2014, PloS one.

[72]  Rae Woong Park,et al.  Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine , 2012, Journal of breast cancer.

[73]  B. Sommer,et al.  Analysis of signaling networks distributed over intracellular compartments based on protein-protein interactions , 2014, BMC Genomics.