A simple infinite topic mixture for rich graphs and relational data

We propose a simple component or “topic” model for relational data, that is, for heterogeneous collections of co-occurrences between categorical variables. Graphs are a special case, as collections of dyadic co-occurrences (edges) over a set of vertices. The model is especially suitable for finding global components from collections of massively heterogeneous data, where encoding all the relations to a more sophisticated model becomes cumbersome, as well as for quick-anddirty modeling of graphs enriched with, e.g., link properties or nodal attributes. The model is here estimated with collapsed Gibbs sampling, which allows sparse data structures and good memory efficiency for large data sets. Other inference methods should be straightforward to implement. We demonstrate the model with various medium-sized data sets (scientific citation data, MovieLens ratings, protein interactions), with brief comparisons to a full relational model and other approaches.

[1]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[2]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[3]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[4]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[5]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[6]  Janne Sinkkonen,et al.  Component models for large networks , 2008, 0803.1628.

[7]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[8]  Hans-Peter Kriegel,et al.  Fast Inference in Infinite Hidden Relational Models , 2007, MLG.

[9]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[10]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[13]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[14]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[15]  Thomas Hofmann,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2007 .

[16]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[17]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[18]  Ichigaku Takigawa,et al.  Annotating gene function by combining expression data with a modular gene network , 2007, ISMB/ECCB.