Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

[1]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[2]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[3]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[4]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[5]  Robert D. Nowak,et al.  High-Rank Matrix Completion and Subspace Clustering with Missing Data , 2011, ArXiv.

[6]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Alan Said,et al.  Proceedings of the 2nd Challenge on Context-Aware Movie Recommendation , 2011 .

[10]  Allen Y. Yang,et al.  Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data , 2008, SIAM Rev..

[11]  R. Vidal A TUTORIAL ON SUBSPACE CLUSTERING , 2010 .

[12]  B. Efron Empirical Bayes Estimates for Large-Scale Prediction Problems , 2009, Journal of the American Statistical Association.

[13]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[15]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .