A Block Coclustering Model for Pattern Discovering in Users' Preference Data

This paper provides a principled probabilistic co-clustering framework for missing value prediction and pattern discovery in users’ preference data. We extend the original dyadic formulation of the Block Mixture Model(BMM) in order to take into account explicit users’ preferences. BMM simultaneously identifies user communities and item categories: each user is modeled as a mixture over user communities, which is computed by taking into account users’ preferences on similar items. Dually, item categories are detected by considering preferences given by similar minded users. This recursive formulation highlights the mutual relationships between items and user, which are then used to uncover the hidden block-structure of the data. We next show how to characterize and summarize each block cluster by exploiting additional meta data information and by analyzing the underlying topic distribution, proving the effectiveness of the approach in pattern discovery tasks.

[1]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[2]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Kathryn B. Laskey,et al.  Latent Dirichlet Bayesian Co-Clustering , 2009, ECML/PKDD.

[4]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[5]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[6]  Luo Si,et al.  A study of mixture models for collaborative filtering , 2006, Information Retrieval.

[7]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[8]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[9]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[10]  Xin Jin,et al.  Web usage mining based on probabilistic latent semantic analysis , 2004, KDD.

[11]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[15]  Max Welling,et al.  Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization , 2008, AAAI.

[16]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[17]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.