Probability Weighted Ensemble Transfer Learning for Predicting Interactions between HIV-1 and Human Proteins

Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.

[1]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[2]  Donna R. Maglott,et al.  Human immunodeficiency virus type 1, human protein interaction database at NCBI , 2008, Nucleic Acids Res..

[3]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[4]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ding-Geng Chen,et al.  Recombinant murine cytomegalovirus vector activates human monocyte-derived dendritic cells in a NF-kappaB dependent pathway. , 2009, Molecular immunology.

[6]  Lu Lu,et al.  HIV-1 Glycoprotein 41 Ectodomain Induces Activation of the CD74 Protein-mediated Extracellular Signal-regulated Kinase/Mitogen-activated Protein Kinase Pathway to Enhance Viral Infection* , 2011, The Journal of Biological Chemistry.

[7]  S. Wuchty Computational Prediction of Host-Parasite Protein Interactions between P. falciparum and H. sapiens , 2011, PloS one.

[8]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[9]  Shuigeng Zhou,et al.  Gene ontology based transfer learning for protein subcellular localization , 2011, BMC Bioinformatics.

[10]  Jason Weston,et al.  Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins , 2010, Bioinform..

[11]  Matthew D. Dyer,et al.  Supervised learning and prediction of physical interactions between human and HIV proteins. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[12]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[13]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[14]  Suyu Mei,et al.  Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. , 2012, Journal of theoretical biology.

[15]  Peter H Lin,et al.  HIV gp120 induces endothelial dysfunction in tumour necrosis factor-alpha-activated porcine and human endothelial cells. , 2010, Cardiovascular research.

[16]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[17]  T. M. Murali,et al.  Computational prediction of host-pathogen protein-protein interactions , 2007, ISMB/ECCB.

[18]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[19]  Naama Barkai,et al.  Computational verification of protein-protein interactions by orthologous co-expression , 2005, BMC Bioinformatics.

[20]  Anna Miserocchi,et al.  HIV-1 Tat protein enhances RANKL/M-CSF-mediated osteoclast differentiation. , 2010, Biochemical and biophysical research communications.

[21]  Mark A. Ragan,et al.  Gene Ontology-driven inference of protein-protein interactions using inducers , 2011 .

[22]  Haruki Nakamura,et al.  Filtering high-throughput protein-protein interaction data using a combination of genomic features , 2005, BMC Bioinformatics.

[23]  Mohammed Husain,et al.  Tubular Cell HIV-1 gp120 Expression Induces Caspase 8 Activation and Apoptosis , 2009, Renal failure.

[24]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[25]  Suyu Mei,et al.  Multi-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization , 2012, PloS one.

[26]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[27]  B. Reiser,et al.  Estimation of the area under the ROC curve , 2002, Statistics in medicine.

[28]  Ronald G. Collman,et al.  Signaling Mechanism of HIV-1 gp120 and Virion-Induced IL-1β Release in Primary Human Macrophages1 , 2008, The Journal of Immunology.

[29]  Shawn M Gomez,et al.  Structural similarity-based predictions of protein interactions between HIV-1 and Homo sapiens , 2010, Virology Journal.

[30]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[31]  R. N. Saha,et al.  Differential regulation of Mn-superoxide dismutase in neurons and astroglia by HIV-1 gp120: Implications for HIV-associated dementia. , 2007, Free radical biology & medicine.

[32]  Yanjun Qi,et al.  Prediction of Interactions Between HIV-1 and Human Proteins by Information Integration , 2008, Pacific Symposium on Biocomputing.

[33]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[34]  Sebastian Proost,et al.  Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression , 2009, BMC Genomics.

[35]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[36]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[37]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[38]  Suyu Mei Corrigendum to “Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization” [J. Theor. Biol. 293 (2012) 121–130] , 2013 .

[39]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[40]  Ujjwal Maulik,et al.  A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions , 2012, PloS one.

[41]  William Stafford Noble,et al.  Large-scale identification of yeast integral membrane protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Jian-xiong Dong,et al.  Fast SVM training algorithm with decomposition on very large data sets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[44]  Ashish V. Tendulkar,et al.  Mechanism of host cell MAPK/ERK-2 incorporation into lentivirus particles: characterization of the interaction between MAPK/ERK-2 and proline-rich-domain containing capsid region of structural protein Gag. , 2011, Journal of molecular biology.

[45]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..