Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.

An estimated 170 million people, approximately 3% of the world population, are chronically infected with the hepatitis C virus (HCV). More than 350,000 deaths are reported annually, which are caused by HCV. HCV, similar to a variety of viruses, causes disease in humans by altering protein-protein interactions within the host cells. Experimental approaches for the detection of host-virus PPIs have many inherent limitations. Computational approaches to predict these interactions are therefore of significant importance. While many studies have been developed to predict intra-species PPIs in the last decade, predictions on inter-species PPIs such as human-HCV PPIs are rare. In this study, we developed an ensemble learning method to predict PPIs between human and HCV proteins. Our model utilises four well-established diverse learners as base classifiers including random forest (RF), Naïve Bayes (NB), support vector machine (SVM) and multilayer perceptron (MLP). In addition, an MLP was used as a meta-learner to combine base learners' predictions to provide the final prediction. To encode human and HCV proteins as feature vectors, we used six different descriptors as follows: amino acid composition (ACC), pseudo amino acid composition (PAC), evolutionary information feature, network centrality measures, tissue information and post-translational modification information. To assess the prediction power of the proposed method, we assembled a benchmark dataset composed of confident positive and negative PPIs. In a 10-fold cross-validation experiment, our prediction method achieved accuracy and specificity as high as 83% and 94%, respectively. Furthermore, in an independent test set the proposed method achieved an accuracy of 84% and a specificity of 92%. When compared with the existing method, our method showed a better performance. These results revealed that our method is suitable for performing PPI prediction in a host-pathogen context.

[1]  Xindong Wu,et al.  Ensemble pruning via individual contribution ordering , 2010, KDD.

[2]  Anne Quesnel-Hellmann,et al.  Host–Pathogen Interactions: A Biological Rendez-Vous of the Infectious Nonself and Danger Models? , 2006, PLoS pathogens.

[3]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[4]  Shawn M Gomez,et al.  Structural similarity-based predictions of protein interactions between HIV-1 and Homo sapiens , 2010, Virology Journal.

[5]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[6]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[7]  Courtney Corley,et al.  Topological analysis of protein co-abundance networks identifies novel host targets important for HCV infection and pathogenesis , 2012, BMC Systems Biology.

[8]  T. M. Murali,et al.  Computational prediction of host-pathogen protein-protein interactions , 2007, ISMB/ECCB.

[9]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[10]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[11]  Shmuel Pietrokovski,et al.  The Blocks database--a system for protein classification , 1996, Nucleic Acids Res..

[12]  Yungki Park,et al.  Revisiting the negative example sampling problem for predicting protein-protein interactions , 2011, Bioinform..

[13]  Ujjwal Maulik,et al.  Incorporating the type and direction information in predicting novel regulatory interactions between HIV-1 and human proteins using a biclustering approach , 2014, BMC Bioinformatics.

[14]  Ester C. Sabino,et al.  HCV Genotypes, Characterization of Mutations Conferring Drug Resistance to Protease Inhibitors, and Risk Factors among Blood Donors in São Paulo, Brazil , 2014, PloS one.

[15]  Aidong Zhang,et al.  Protein Interaction Networks: Computational Analysis , 2009 .

[16]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[17]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[18]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[19]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[20]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[21]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[23]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[24]  Ahmet Sacan,et al.  Prediction of HIV-1 and human protein interactions based on a novel evolution-aware structure alignment method , 2013 .

[25]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[26]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[27]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[28]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[29]  Michelle R. Arkin,et al.  Small-molecule inhibitors of protein–protein interactions: progressing towards the dream , 2004, Nature Reviews Drug Discovery.

[30]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[31]  Shawn M. Gomez,et al.  Mapping Protein Interactions between Dengue Virus and Its Human and Insect Hosts , 2011, PLoS neglected tropical diseases.

[32]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[33]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[34]  K. Chou,et al.  iHyd-PseAAC: Predicting Hydroxyproline and Hydroxylysine in Proteins by Incorporating Dipeptide Position-Specific Propensity into Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[35]  Jason Weston,et al.  Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins , 2010, Bioinform..

[36]  Jianxiu Guo,et al.  Predicting protein folding rates using the concept of Chou's pseudo amino acid composition , 2011, Journal of computational chemistry.

[37]  Kyungsook Han,et al.  Prediction of protein-protein interactions between viruses and human by an SVM model , 2012, BMC Bioinformatics.

[38]  Matthew D. Dyer,et al.  Supervised learning and prediction of physical interactions between human and HIV proteins. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[39]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[40]  I. Talianidis,et al.  Cross-talk between post-translational modifications regulates life or death decisions by E2F1 , 2010, Cell cycle.

[41]  Ulrich Stelzl,et al.  Dual Coordination of Post Translational Modifications in Human Protein Networks , 2013, PLoS Comput. Biol..

[42]  Yanjun Qi,et al.  Prediction of Interactions Between HIV-1 and Human Proteins by Information Integration , 2008, Pacific Symposium on Biocomputing.

[43]  Lyle Ungar,et al.  Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs , 2009, BMC Medical Genomics.

[44]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[45]  R. Memmott,et al.  Akt-dependent and -independent mechanisms of mTOR regulation in cancer. , 2009, Cellular signalling.

[46]  Ujjwal Maulik,et al.  Ensemble learning prediction of protein-protein interactions using proteins functional annotations. , 2014, Molecular bioSystems.

[47]  Javad Zahiri,et al.  Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources , 2013, Current genomics.

[48]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[49]  American Family Physician Stable Coronary Artery Disease , 2022 .

[50]  S. Wuchty Computational Prediction of Host-Parasite Protein Interactions between P. falciparum and H. sapiens , 2011, PloS one.

[51]  Ren Sun,et al.  Identification and comparative analysis of hepatitis C virus-host cell protein interactions. , 2013, Molecular bioSystems.

[52]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[53]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[54]  Limsoon Wong,et al.  Progress in Computational Studies of Host { Pathogen Interactions , 2012 .

[55]  Lukasz Kurgan,et al.  The intrinsic disorder status of the human hepatitis C virus proteome. , 2014, Molecular bioSystems.

[56]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[57]  Fidel Ramírez,et al.  Computing topological parameters of biological networks , 2008, Bioinform..

[58]  N Srinivasan,et al.  Prediction of protein-protein interactions between human host and a pathogen and its application to three pathogenic bacteria. , 2011, International journal of biological macromolecules.

[59]  M. Vidal,et al.  Hepatitis C virus infection protein network , 2008, Molecular systems biology.

[60]  T. Wilkins,et al.  Hepatitis C: diagnosis and treatment. , 2010, American family physician.