论文信息 - Expectation-Maximization for Learning Determinantal Point Processes - 字舞流文

Expectation-Maximization for Learning Determinantal Point Processes

A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semi-definite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the log-likelihood of the available data. However, log-likelihood is non-convex in the entries of the kernel matrix, and this learning problem is conjectured to be NP-hard [1]. Thus, previous work has instead focused on more restricted convex learning settings: learning only a single weight for each row of the kernel matrix [2], or learning weights for a linear combination of DPPs with fixed kernel matrices [3]. In this work we propose a novel algorithm for learning the full kernel matrix. By changing the kernel parameterization from matrix entries to eigenvalues and eigenvectors, and then lower-bounding the likelihood in the manner of expectation-maximization algorithms, we obtain an effective optimization procedure. We test our method on a real-world product recommendation task, and achieve relative gains of up to 16.5% in test log-likelihood compared to the naive approach of maximizing likelihood by projected gradient ascent on the entries of the kernel matrix.

Ben Taskar | Alex Kulesza | Emily B. Fox | Jennifer Gillenwater | E. Fox | B. Taskar | Alex Kulesza | Jennifer Gillenwater

[1] Alan Edelman,et al. The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[2] Boris Polyak,et al. Constrained minimization methods , 1966 .

[3] Ben Taskar,et al. Learning the Parameters of Determinantal Point Process Kernels , 2014, ICML.

[4] Y. Peres,et al. Determinantal Processes and Independence , 2005, math/0503110.

[5] Byungkon Kang,et al. Fast Determinantal Point Process Sampling with Application to Clustering , 2013, NIPS.

[6] Jérôme Malick,et al. Projection methods for conic feasibility problems: applications to polynomial sum-of-squares decompositions , 2011, Optim. Methods Softw..

[7] Andreas Krause,et al. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[8] Ben Taskar,et al. Approximate Inference in Continuous Determinantal Point Processes , 2013, ArXiv.

[9] Ben Taskar,et al. k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[10] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[11] Alexei Borodin,et al. Determinantal point processes , 2009, 0911.1153.

[12] Tim Roughgarden,et al. Revenue submodularity , 2009, EC '09.

[13] Ben Taskar,et al. Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[14] Andreas Krause,et al. Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[15] A. James. Distributions of Matrix Variates and Latent Roots Derived from Normal Samples , 1964 .

[16] Zoubin Ghahramani,et al. Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering , 2013, UAI.

[17] Ben Taskar,et al. Near-Optimal MAP Inference for Determinantal Point Processes , 2012, NIPS.

[18] Zoubin Ghahramani,et al. Determinantal clustering process - a nonparametric Bayesian approach to kernel based semi-supervised clustering , 2013, UAI 2013.

[19] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .

[20] Ben Taskar,et al. Approximate Inference in Continuous Determinantal Processes , 2013, NIPS.

[21] O. Macchi. The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[22] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[23] Jasper Snoek,et al. A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data , 2013, NIPS.

[24] Hui Lin,et al. Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[25] Ben Taskar,et al. Learning Determinantal Point Processes , 2011, UAI.

[26] Michael R. Harwell,et al. Computing Elementary Symmetric Functions and Their Derivatives: A Didactic , 1996 .

[27] Ben Taskar,et al. Structured Determinantal Point Processes , 2010, NIPS.

[28] Alex Kulesza,et al. Markov Determinantal Point Processes , 2012, UAI.

[29] Ryan P. Adams,et al. Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[30] Ben Taskar,et al. Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[31] Ben Taskar,et al. Nystrom Approximation for Large-Scale Determinantal Processes , 2013, AISTATS.