Inference of protein-protein interaction networks from multiple heterogeneous data

Protein-Protein interaction (PPI) prediction is a central task in achieving a better understanding of cellular and intracellular processes. Because high-throughput experimental methods are both expensive and time-consuming, and are also known of suffering from the problems of incompleteness and noise, many computational methods have been developed, with varied degrees of success. However, the inference of PPI network from multiple heterogeneous data sources remains a great challenge. In this work, we developed a novel method based on Approximate Bayesian Computation and modified Differential Evolution sampling (ABC-DEP) and Regularized Laplacian (RL) kernel. The method enables inference of PPI networks from topological properties and multiple heterogeneous features including gene expression and Pfam domain profiles, in forms of weighted kernels. The optimal weights are obtained by ABC-DEP, and the kernel fusion built based on optimal weights serves as input to RL to infer missing or new edges in the PPI network. Detailed comparisons with control methods have been made, and the results show that the accuracy of PPI prediction measured by AUC is increased by up to 23%, as compared to a base-line without using optimal weights. The method can provide insights into the relations between PPIs and various feature kernels and demonstrates strong capability of predicting far-away interactions that cannot be well detected by traditional RL method.

[1]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[2]  Christos Faloutsos,et al.  Random walk with restart: fast solutions and applications , 2008, Knowledge and Information Systems.

[3]  Nataša Pržulj,et al.  Protein‐protein interactions: Making sense of networks via graph‐theoretic modeling , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  Lei Huang,et al.  Evolutionary Model Selection and Parameter Estimation for Protein-Protein Interaction Network Based on Differential Evolution Algorithm , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[7]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[8]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[9]  Lin Gao,et al.  ppiPre: predicting protein-protein interactions by combining heterogeneous features , 2013, BMC Systems Biology.

[10]  François Fouss,et al.  An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification , 2012, Neural Networks.

[11]  Li Liao,et al.  Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines , 2009, BMC Bioinformatics.

[12]  Chuan Wang,et al.  InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes , 2007, BMC Bioinformatics.

[13]  Pietro Hiram Guzzi,et al.  M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations , 2013, Proteome Science.

[14]  Dong-Soo Han,et al.  A Computational Model for Predicting Protein Interactions Based on Multidomain Collaboration , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Jianhua Ruan,et al.  A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity , 2013, Bioinform..

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[18]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[19]  Yuji Matsumoto,et al.  Application of kernels to link analysis , 2005, KDD '05.

[20]  Panagiotis Symeonidis,et al.  From biological to social networks: Link prediction based on multi-way spectral clustering , 2013, Data Knowl. Eng..

[21]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[22]  Paola Lecca,et al.  Proceedings of the 27th Annual ACM Symposium on Applied Computing , 2012, SAC 2012.

[23]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[24]  Jianquan Liu,et al.  Link prediction: the power of maximal entropy random walk , 2011, CIKM '11.

[25]  Robert L. Grossman,et al.  Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining , 2005, KDD 2005.

[26]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[27]  Li Liao,et al.  Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices , 2007, BMC Bioinformatics.

[28]  Kenji Mizuguchi,et al.  Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators , 2014, BMC Bioinformatics.

[29]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[30]  Tapani Raiko,et al.  European conference on machine learning and knowledge discovery in databases , 2014 .

[31]  Marco Saerens,et al.  Semi-supervised classification and betweenness computation on large, sparse, directed graphs , 2011, Pattern Recognit..

[32]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[33]  D. Eisenberg,et al.  Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.

[34]  Hung-Hsuan Chen,et al.  Discovering missing links in networks using vertex similarity measures , 2012, SAC '12.

[35]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[36]  Zohar Itzhaki,et al.  Evolutionary conservation of domain-domain interactions , 2006, Genome Biology.

[37]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..

[38]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[39]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[40]  Christopher W. V. Hogue,et al.  Analysis of domain correlations in yeast protein complexes , 2004, ISMB/ECCB.

[41]  Lei Huang,et al.  Protein-protein interaction network inference from multiple kernels with optimization based on random walk by linear programming , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[42]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[43]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[44]  Carlo Vittorio Cannistraci,et al.  Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding , 2013, Bioinform..