A Hybrid Rule-Induction/Likelihood-Ratio Based Approach for Predicting Protein-Protein Interactions

We propose a new hybrid data mining method for predicting protein-protein interactions combining Likelihood-Ratio with rule induction algorithms. In essence, the new method consists of using a rule induction algorithm to discover rules representing partitions of the data, and then the discovered rules are interpreted as “bins” which are used to compute likelihood ratios. This new method is applied to the prediction of protein-protein interactions in the Saccharomyces Cerevisiae genome, using predictive genomic features in an integrated scheme. The results show that the new hybrid method outperforms a pure likelihood ratio based approach.

[1]  Purvesh Khatri,et al.  Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments , 2004, Nucleic Acids Res..

[2]  W. J. Dickinson,et al.  Marginal fitness contributions of nonessential genes in yeast. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[4]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[5]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[6]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[7]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[8]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[9]  Alfonso Martinez Arias,et al.  Molecular biology of the cell (2nd edn): edited by B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts and J.D, Watson, Garland, 1989 $46.95 (v + 1187 pages) ISBN 0 8240 3695 6 , 1989 .

[10]  Paul E. Utgoff,et al.  Shift of bias for inductive concept learning , 1984 .

[11]  Cullen Schaffer Overfitting avoidance as bias , 2004, Machine Learning.

[12]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[13]  Chern-Sing Goh,et al.  Co-evolutionary analysis reveals insights into protein-protein interactions. , 2002, Journal of molecular biology.

[14]  R. Mike Cameron-Jones,et al.  Induction of logic programs: FOIL and related systems , 1995, New Generation Computing.

[15]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[16]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[17]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[18]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[19]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[20]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[21]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[22]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[23]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[26]  Michael Y. Galperin,et al.  Who's your neighbor? New computational approaches for functional genomics , 2000, Nature Biotechnology.

[27]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[28]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[29]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[30]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[31]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[32]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[33]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[34]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[35]  Ryszard S. Michalski,et al.  AQVAL/1--Computer Implementation of a Variable-Valued Logic System VL1 and Examples of its Application to Pattern Recognition , 1973, IJCAI 1973.

[36]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[37]  Alex Alves Freitas,et al.  Message-passing algorithms for the prediction of protein domain interactions from protein-protein interaction data , 2008, Bioinform..

[38]  R. Russell,et al.  Structural systems biology: modelling protein interactions , 2006, Nature Reviews Molecular Cell Biology.

[39]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[40]  Alex Alves Freitas,et al.  Protein Interaction Inference Using Particle Swarm Optimization Algorithm , 2008, EvoBIO.

[41]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[42]  Attilio Giordana,et al.  Learning Structured Concepts Using Genetic Algorithms , 1992, ML.

[43]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[44]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[45]  Huiru Zheng,et al.  An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions , 2006, J. Integr. Bioinform..

[46]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[47]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[48]  G. Church,et al.  Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae , 2001, Nature Genetics.

[49]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.