Protein-protein interaction network inference from multiple kernels with optimization based on random walk by linear programming

Reconstruction of PPI networks is a central task in systems biology, and inference from multiple heterogeneous data sources offers a promising computational approach to making de novo PPI prediction by leveraging complementary information and the partial network structure. However, how to quickly learn weights for heterogeneous data sources remains a challenge. In this work, we developed a method to infer de novo PPIs by combining multiple data sources represented in kernel format and obtaining optimal weights based on random walk over the existing partial network. Our proposed method utilizes Baker algorithm and the training data to construct a transition matrix which constrains how a random walk would traverse the partial network. Multiple heterogeneous features for the proteins in the network, including gene expression and Pfam domain profiles, are then combined into the form of a weighted kernel, which provides a new “adjacency matrix” for the whole network but is required to comply with the transition matrix on the part of the training subnetwork. This requirement is met by adjusting the weights to minimize the element-wise difference between the transition matrix and the weighted kernel. The minimization problem is solved by linear programming. The weighted kernel is then transformed to regularized Laplacian (RL) kernel to infer missing or new edges in the PPI network. The results on synthetic data and real data from Yeast show that the accuracy of PPI prediction measured by AUC is increased by up to 19% as compared to a control method without using optimal weights. Moreover, the weights learned by our method Weight Optimization by Linear Programming (WOLP) are very consistent with that learned by sampling, and can provide insights into the relations between PPIs and various feature kernels, thereby improving PPI prediction.

[1]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[2]  Christopher W. V. Hogue,et al.  Analysis of domain correlations in yeast protein complexes , 2004, ISMB/ECCB.

[3]  Lei Huang,et al.  Evolutionary Model Selection and Parameter Estimation for Protein-Protein Interaction Network Based on Differential Evolution Algorithm , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[5]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[6]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[7]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[8]  Hung-Hsuan Chen,et al.  Discovering missing links in networks using vertex similarity measures , 2012, SAC '12.

[9]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[10]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[11]  Li Liao,et al.  Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices , 2007, BMC Bioinformatics.

[12]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[13]  Michael A. Saunders,et al.  LSMR: An Iterative Algorithm for Sparse Least-Squares Problems , 2011, SIAM J. Sci. Comput..

[14]  Kenji Mizuguchi,et al.  Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators , 2014, BMC Bioinformatics.

[15]  Marco Saerens,et al.  Semi-supervised classification and betweenness computation on large, sparse, directed graphs , 2011, Pattern Recognit..

[16]  Yuji Matsumoto,et al.  Application of kernels to link analysis , 2005, KDD '05.

[17]  Christos Faloutsos,et al.  Random walk with restart: fast solutions and applications , 2008, Knowledge and Information Systems.

[18]  François Fouss,et al.  An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification , 2012, Neural Networks.

[19]  Li Liao,et al.  Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines , 2009, BMC Bioinformatics.

[20]  Chuan Wang,et al.  InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes , 2007, BMC Bioinformatics.

[21]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[22]  J. Baker An algorithm for the location of transition states , 1986 .

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  Zohar Itzhaki,et al.  Evolutionary conservation of domain-domain interactions , 2006, Genome Biology.

[25]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[26]  Carlo Vittorio Cannistraci,et al.  Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding , 2013, Bioinform..

[27]  Panagiotis Symeonidis,et al.  From biological to social networks: Link prediction based on multi-way spectral clustering , 2013, Data Knowl. Eng..

[28]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[29]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[30]  D. Eisenberg,et al.  Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.

[31]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..

[34]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[35]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[36]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[37]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[38]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[39]  Jianquan Liu,et al.  Link prediction: the power of maximal entropy random walk , 2011, CIKM '11.

[40]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[41]  Nataša Pržulj,et al.  Protein‐protein interactions: Making sense of networks via graph‐theoretic modeling , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[42]  Lin Gao,et al.  ppiPre: predicting protein-protein interactions by combining heterogeneous features , 2013, BMC Systems Biology.

[43]  Pietro Hiram Guzzi,et al.  M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations , 2013, Proteome Science.

[44]  Dong-Soo Han,et al.  A Computational Model for Predicting Protein Interactions Based on Multidomain Collaboration , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.