Discovering Relevance-Dependent Bicluster Structure from Relational Data

In this paper, we propose a statistical model for relevance-dependent biclustering to analyze relational data. The proposed model factorizes relational data into bicluster structure with two features: (1) each object in a cluster has a relevance value, which indicates how strongly the object relates to the cluster and (2) all clusters are related to at least one dense block. These features simplify the task of understanding the meaning of each cluster because only a few highly relevant objects need to be inspected. We introduced the RelevanceDependent Bernoulli Distribution (R-BD) as a prior for relevance-dependent binary matrices and proposed the novel Relevance-Dependent Infinite Biclustering (R-IB) model, which automatically estimates the number of clusters. Posterior inference can be performed efficiently using a collapsed Gibbs sampler because the parameters of the R-IB model can be fully marginalized out. Experimental results show that the R-IB extracts more essential bicluster structure with better computational efficiency than conventional models. We further observed that the biclustering results obtained by RIB facilitate interpretation of the meaning of each cluster.

[1]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[2]  Naonori Ueda,et al.  Dynamic Infinite Relational Model for Time-varying Relational Data Analysis , 2010, NIPS.

[3]  Naonori Ueda,et al.  Subset Infinite Relational Models , 2012, AISTATS.

[4]  Thomas Hofmann,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2007 .

[5]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[6]  D. Aldous Exchangeability and related topics , 1985 .

[7]  Lars Kai Hansen,et al.  Infinite multiple membership relational modeling for complex networks , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[8]  Hiroki Arimura,et al.  The Relevance Dependent Infinite Relational Model for Discovering Co-Cluster Structure from Relationships with Structured Noise , 2016, IEICE Trans. Inf. Syst..

[9]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[10]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[11]  Christos Faloutsos,et al.  Beyond Blocks: Hyperbolic Community Detection , 2014, ECML/PKDD.

[12]  Naonori Ueda,et al.  Rectangular Tiling Process , 2014, ICML.

[13]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[14]  Mingyuan Zhou,et al.  Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction , 2015, AISTATS.

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[17]  Naonori Ueda,et al.  Infinite Plaid Models for Infinite Bi-Clustering , 2016, AAAI.

[18]  Zoubin Ghahramani,et al.  An Infinite Latent Attribute Model for Network Data , 2012, ICML.

[19]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[20]  Hiroki Arimura,et al.  Multi-Layered Framework for Modeling Relationships between Biased Objects , 2015, SDM.

[21]  Le Song,et al.  A Multiscale Community Blockmodel for Network Exploration , 2011, AISTATS.

[22]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[23]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[24]  Hiroki Arimura,et al.  An Extension of the Infinite Relational Model Incorporating Interaction between Objects , 2013, PAKDD.

[25]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[26]  Thomas Hofmann,et al.  Learning annotated hierarchies from relational data , 2007 .

[27]  Le Song,et al.  Dynamic mixed membership blockmodel for evolving networks , 2009, ICML '09.