Relational learning via collective matrix factorization

Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and actors' roles in movies. A common prediction technique given one pairwise relation, for example a #users x #movies ratings matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new large-scale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a wide variety of error models. We demonstrate its efficiency, as well as the benefit of sharing parameters among relations.

[1]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[2]  Peter P. Chen The entity-relationship model—toward a unified view of data , 2011, TODS.

[3]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[4]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[5]  D. Aldous Exchangeability and related topics , 1985 .

[6]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[7]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[9]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[10]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[11]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[12]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[13]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[14]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[15]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[16]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[17]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[18]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[19]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[20]  Geoffrey J. Gordon Generalized2 Linear2 Models , 2002, NIPS.

[21]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[22]  Geoffrey J. Gordon Generalized² Linear² Models , 2003, NIPS 2003.

[23]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[24]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[25]  P. Stoica,et al.  Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher , 2004, IEEE Signal Process. Mag..

[26]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[27]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[28]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[29]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[32]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[33]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[34]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[35]  Deepak Agarwal,et al.  Predictive discrete latent factor models for large scale dyadic data , 2007, KDD '07.

[36]  J. Magnus,et al.  On some definitions in matrix algebra , 2007 .

[37]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[38]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[39]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[40]  Philip S. Yu,et al.  Relational clustering by symmetric convex coding , 2007, ICML '07.