HIME: Mining and Ensembling Heterogeneous Information for Protein-Protein Interactions Prediction

Research on protein-protein interactions (PPIs) data paves the way towards understanding the mechanisms of infectious diseases, however improving the prediction performance of PPIs of inter-species remains a challenge. Since one single type of sequence data such as amino acid composition may be deficient for high-quality prediction of protein interactions, we have investigated a broader range of heterogeneous information of sequences data. This paper proposes a novel framework for PPIs prediction based on Heterogeneous Information Mining and Ensembling (HIME) process to effectively learn from the interaction data. In particular, the proposed approach introduces an ensemble process together with substantial features that generate better performance of PPIs prediction task. The performance of the proposed framework is validated on real protein interaction datasets. The extensive experiments show that HIME achieves higher performance over all existing methods reported in literature so far.

[1]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[2]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[3]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[4]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[5]  D. Onstad,et al.  Description and analysis of two internet-based databases of insect pathogens: EDWIP and VIDIL. , 2003, Journal of invertebrate pathology.

[6]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[7]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[8]  Carlos Prieto,et al.  APID: Agile Protein Interaction DataAnalyzer , 2006, Nucleic Acids Res..

[9]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[10]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[11]  Christopher N. Larsen,et al.  BioHealthBase: informatics support in the elucidation of influenza virus host–pathogen interactions and virulence , 2007, Nucleic Acids Res..

[12]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[13]  Alex Alves Freitas,et al.  Optimizing amino acid groupings for GPCR classification , 2008, Bioinform..

[14]  T. M. Murali,et al.  PIG—the pathogen interaction gateway , 2008, Nucleic Acids Res..

[15]  Xue-wen Chen,et al.  On Position-Specific Scoring Matrix for Protein Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  S. Wuchty Computational Prediction of Host-Parasite Protein Interactions between P. falciparum and H. sapiens , 2011, PloS one.

[17]  Matthew D. Dyer,et al.  Supervised learning and prediction of physical interactions between human and HIV proteins. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[18]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[19]  Gautier Koscielny,et al.  VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics , 2011, Nucleic Acids Res..

[20]  Yun Zhang,et al.  ViPR: an open bioinformatics database and analysis resource for virology research , 2011, Nucleic Acids Res..

[21]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[22]  Feng Ye,et al.  Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM , 2012, Journal of biomolecular structure & dynamics.

[23]  Yu Xia,et al.  Structural Models for Host-Pathogen Protein-Protein Interactions: Assessing Coverage and Bias , 2011, Pacific Symposium on Biocomputing.

[24]  Kyungsook Han,et al.  Prediction of protein-protein interactions between viruses and human by an SVM model , 2012, BMC Bioinformatics.

[25]  Karin Breuer,et al.  InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation , 2012, Nucleic Acids Res..

[26]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[27]  Fatih Erdogan Sevilgen,et al.  PHISTO: pathogen-host interaction search tool , 2013, Bioinform..

[28]  Limsoon Wong,et al.  Progress in Computational Studies of Host { Pathogen Interactions , 2012 .

[29]  A. Emili,et al.  Protein-protein interaction networks: probing disease mechanisms using model systems , 2013, Genome Medicine.

[30]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[31]  Hong-Bin Shen,et al.  Predicting pupylation sites in prokaryotic proteins using pseudo-amino acid composition and extreme learning machine , 2014, Neurocomputing.

[32]  Anil K. Jain,et al.  Encyclopedia of Biometrics , 2015, Springer US.

[33]  Farshad Khunjush,et al.  Computational approaches for prediction of pathogen-host protein-protein interactions , 2015, Front. Microbiol..

[34]  Georgios A. Pavlopoulos,et al.  Protein-protein interaction predictions using text mining methods. , 2015, Methods.

[35]  R. Hotchkiss,et al.  Understanding host–pathogen interaction , 2016, Intensive Care Medicine.

[36]  Bindu Nanduri,et al.  HPIDB 2.0: a curated database for host–pathogen interactions , 2016, Database J. Biol. Databases Curation.

[37]  Haiming Wang,et al.  EuPathDB: the eukaryotic pathogen genomics database resource , 2016, Nucleic Acids Res..

[38]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[39]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[40]  Kara Dolinski,et al.  The BioGRID interaction database: 2017 update , 2016, Nucleic Acids Res..

[41]  Anup Kumar Halder,et al.  Review of computational methods for virus–host protein interaction prediction: a case study on novel Ebola–human interactions , 2017, Briefings in functional genomics.

[42]  E. Adebiyi,et al.  Inter-Species/Host-Parasite Protein Interaction Predictions Reviewed , 2018, Current bioinformatics.

[43]  Jiangning Song,et al.  Structural Principles Analysis of Host-Pathogen Protein-Protein Interactions: A Structural Bioinformatics Survey , 2018, IEEE Access.

[44]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[45]  Christian Germain,et al.  An ensemble learning approach for the classification of remote sensing scenes based on covariance pooling of CNN features , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[46]  Christopher Leckie,et al.  Robust and Accurate Short-Term Load Forecasting: A Cluster Oriented Ensemble Learning Approach , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[47]  Levente Kovács,et al.  Brain Tumor Detection and Segmentation from Magnetic Resonance Image Data Using Ensemble Learning Methods , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).