Expectation-Maximization for Learning Determinantal Point Processes

A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semi-definite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the log-likelihood of the available data. However, log-likelihood is non-convex in the entries of the kernel matrix, and this learning problem is conjectured to be NP-hard [1]. Thus, previous work has instead focused on more restricted convex learning settings: learning only a single weight for each row of the kernel matrix [2], or learning weights for a linear combination of DPPs with fixed kernel matrices [3]. In this work we propose a novel algorithm for learning the full kernel matrix. By changing the kernel parameterization from matrix entries to eigenvalues and eigenvectors, and then lower-bounding the likelihood in the manner of expectation-maximization algorithms, we obtain an effective optimization procedure. We test our method on a real-world product recommendation task, and achieve relative gains of up to 16.5% in test log-likelihood compared to the naive approach of maximizing likelihood by projected gradient ascent on the entries of the kernel matrix.

[1]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[2]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[3]  Ben Taskar,et al.  Learning the Parameters of Determinantal Point Process Kernels , 2014, ICML.

[4]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[5]  Byungkon Kang,et al.  Fast Determinantal Point Process Sampling with Application to Clustering , 2013, NIPS.

[6]  Jérôme Malick,et al.  Projection methods for conic feasibility problems: applications to polynomial sum-of-squares decompositions , 2011, Optim. Methods Softw..

[7]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[8]  Ben Taskar,et al.  Approximate Inference in Continuous Determinantal Point Processes , 2013, ArXiv.

[9]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[10]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[11]  Alexei Borodin,et al.  Determinantal point processes , 2009, 0911.1153.

[12]  Tim Roughgarden,et al.  Revenue submodularity , 2009, EC '09.

[13]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[14]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[15]  A. James Distributions of Matrix Variates and Latent Roots Derived from Normal Samples , 1964 .

[16]  Zoubin Ghahramani,et al.  Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering , 2013, UAI.

[17]  Ben Taskar,et al.  Near-Optimal MAP Inference for Determinantal Point Processes , 2012, NIPS.

[18]  Zoubin Ghahramani,et al.  Determinantal clustering process - a nonparametric Bayesian approach to kernel based semi-supervised clustering , 2013, UAI 2013.

[19]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[20]  Ben Taskar,et al.  Approximate Inference in Continuous Determinantal Processes , 2013, NIPS.

[21]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[22]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[23]  Jasper Snoek,et al.  A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data , 2013, NIPS.

[24]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[25]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[26]  Michael R. Harwell,et al.  Computing Elementary Symmetric Functions and Their Derivatives: A Didactic , 1996 .

[27]  Ben Taskar,et al.  Structured Determinantal Point Processes , 2010, NIPS.

[28]  Alex Kulesza,et al.  Markov Determinantal Point Processes , 2012, UAI.

[29]  Ryan P. Adams,et al.  Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[30]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[31]  Ben Taskar,et al.  Nystrom Approximation for Large-Scale Determinantal Processes , 2013, AISTATS.