Evaluating Joint Modeling of Yeast Biology Literature and Protein-Protein Interaction Networks

Block-LDA is a topic modeling approach to perform data fusion between entity-annotated text documents and graphs with entity-entity links. We evaluate Block-LDA in the yeast biology domain by jointly modeling PubMed® articles and yeast protein-protein interaction networks. The topic coherence of the emergent topics and the ability of the model to retrieve relevant scientific articles and proteins related to the topic are compared to that of a text-only approach that does not make use of the protein-protein interaction matrix. Evaluation of the results by biologists show that the joint modeling results in better topic coherence and improves retrieval performance in the task of identifying top related papers and proteins.

[1]  Samuel Kaski,et al.  A Block Model Suitable for Sparse Graphs , 2009 .

[2]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[5]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[6]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[7]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[8]  Kara Dolinski,et al.  Saccharomyces genome database: Underlying principles and organisation , 2004, Briefings Bioinform..

[9]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.