Ranking Relations using Analogies in Biological and Information Networks

Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects S = {A(1) : B(1), A(2) : B(2), …, A(N) : B(N)}, measures how well other pairs A : B fit in with the set S. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.

[1]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[2]  Geoffrey E. Hinton,et al.  Multiple Relational Embedding , 2004, NIPS.

[3]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[4]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[5]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[6]  Oren Etzioni,et al.  Relational Web Search , 2006 .

[7]  Michael I. Jordan Graphical Models , 1998 .

[8]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[9]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[10]  Edoardo M. Airoldi,et al.  Getting Started in Probabilistic Graphical Models , 2007, PLoS Comput. Biol..

[11]  Joachim M. Buhmann,et al.  Coupled Clustering: A Method for Detecting Structural Correspondence , 2001, J. Mach. Learn. Res..

[12]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[13]  Peter D. Hoff,et al.  Modeling homophily and stochastic equivalence in symmetric relational data , 2007, NIPS.

[14]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[16]  K. Mutsumi [Transcription factor database]. , 2004, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[17]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[18]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[19]  T. Shakespeare,et al.  Observational Studies , 2003 .

[20]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[21]  Dedre Gentner,et al.  Structure-Mapping: A Theoretical Framework for Analogy , 1983, Cogn. Sci..

[22]  Jaime G. Carbonell,et al.  Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage, and Utilization , 1993, Machine Learning.

[23]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[26]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[27]  Peer Bork,et al.  Not Comparable, But Complementary , 2008, Science.

[28]  Michael L. Littman,et al.  Corpus-based Learning of Analogies and Semantic Relations , 2005, Machine Learning.

[29]  Katherine A. Heller,et al.  Bayesian Sets , 2005, NIPS.

[30]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[31]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[32]  Dan Feng,et al.  Ranking community answers by modeling question-answer relationships via analogical reasoning , 2009, SIGIR.

[33]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[34]  Stephen Muggleton,et al.  Inverting the resolution principle , 1991 .

[35]  Katherine A. Heller,et al.  Analogical Reasoning with Relational Bayesian Sets , 2007, AISTATS.

[36]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[37]  D. Gentner,et al.  Similarity and the development of rules , 1998, Cognition.

[38]  Peter D. Turney The Latent Relation Mapping Engine: Algorithm and Experiments , 2008, J. Artif. Intell. Res..

[39]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[40]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[41]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[42]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[43]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[44]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[45]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[46]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[47]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[48]  Edoardo M. Airoldi,et al.  A latent mixed membership model for relational data , 2005, LinkKDD '05.

[49]  Lyle H. Ungar,et al.  Structural Logistic Regression for Link Analysis , 2003 .

[50]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[51]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[52]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[53]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[54]  M. Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[55]  R. French The computational modeling of analogy-making , 2002, Trends in Cognitive Sciences.

[56]  C. Ball,et al.  Genetic and physical maps of Saccharomyces cerevisiae. , 1997, Nature.

[57]  M. Koshiba,et al.  Practical Quantum Cryptography: A Comprehensive Analysis (Part One) , 2000, quant-ph/0009027.

[58]  Giulio Vidotto,et al.  INDIVIDUAL DIFFERENCES IN STARTING POINT LOCALIZATION OF MOVING OBJECTS: DATA ANALYSIS USING MULTILEVEL/HIERARCHICAL MODELS. , 2008 .

[59]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[60]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[61]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[62]  PATRICK W. GILL Yeast the model , 1987, Nature.

[63]  John D. Storey,et al.  Genetic interactions between polymorphisms that affect gene expression in yeast , 2005, Nature.

[64]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[65]  Peter D. Turney A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations , 2008, COLING.

[66]  D. Rumelhart,et al.  A model for analogical reasoning. , 1973 .

[67]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[68]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[69]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[70]  Ronald W. Davis,et al.  The core meiotic transcriptome in budding yeasts , 2000, Nature Genetics.

[71]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[72]  E. Banks,et al.  NetGrep: fast network schema searches in interactomes , 2008, Genome Biology.

[73]  David Botstein,et al.  Yeast as a Model Organism , 1997, Science.

[74]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[75]  Huiru Zheng,et al.  Reassessing the limit of data integration for the prediction of protein-protein interactions in Saccharomyces cerevisiae , 2008, 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[76]  Alexander J. Hartemink,et al.  Reconstructing the Topology of Protein Complexes , 2007, RECOMB.