论文信息 - Learning Determinantal Point Processes

Learning Determinantal Point Processes

The increasing availability of both interesting data and processing capacity has led to widespread interest in machine learning techniques that deal with complex, structured output spaces in fields like image processing, computational biology, and natural language processing. By making multiple interrelated decisions at once, these methods can achieve far better performance than is possible treating each decision in isolation. However, accounting for the complexity of the output space is also a significant computational burden that must be balanced against the modeling advantages. Graphical models, for example, offer efficient approximations when considering only local, positive interactions. The popularity of graphical models attests to the fact that these restrictions can be a good fit in some cases, but there are also many other interesting tasks for which we need new models with new assumptions. In this thesis we show how determinantal point processes (DPPs) can be used as probabilistic models for binary structured problems characterized by global, negative interactions. Samples from a DPP correspond to subsets of a fixed ground set, for instance, the documents in a corpus or possible locations of objects in an image, and their defining characteristic is a tendency to be diverse. Thus, DPPs can be used to choose diverse sets of high-quality search results, to build informative summaries by selecting diverse sentences from documents, or to model non-overlapping human poses in images or video. DPPs arise in quantum physics and random matrix theory from a number of interesting theoretical constructions, but we show how they can also be used to model real-world data; we develop new extensions, algorithms, and theoretical results that make modeling and learning with DPPs efficient and practical. Throughout, we demonstrate experimentally that the techniques we introduce allow DPPs to be used for performing real-world tasks like document summarization, multiple human pose estimation, search diversification, and the threading of large document collections.

Ben Taskar | Alex Kulesza | B. Taskar | Alex Kulesza

[1] J. Besag,et al. Spatial Statistics and Bayesian Computation , 1993 .

[2] Avner Magen,et al. Near Optimal Dimensionality Reductions That Preserve Volumes , 2008, APPROX-RANDOM.

[3] Peter Bürgisser. The Complexity of Immanants , 2000 .

[4] R. Swendsen. Dynamics of random sequential adsorption , 1981 .

[5] R. Waagepetersen,et al. Modern Statistics for Spatial Point Processes * , 2007 .

[6] Vahab S. Mirrokni,et al. Non-monotone submodular maximization under matroid and knapsack constraints , 2009, STOC '09.

[7] D. J. Strauss. A model for clustering , 1975 .

[8] Charles L. Wayne. Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[9] R. Wolpert,et al. Perfect simulation and moment properties for the Matérn type III process , 2010 .

[10] Yousef Saad,et al. A Probing Method for Computing the Diagonal of the Matrix Inverse ∗ , 2010 .

[11] Vladimir Kolmogorov,et al. What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[13] T. Shirai,et al. Random point fields associated with certain Fredholm determinants I: fermion, Poisson and boson point processes , 2003 .

[14] Persi Diaconis,et al. Immanants and Finite Point Processes , 2000, J. Comb. Theory A.

[15] Gunnar Rätsch,et al. Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[16] R. Lyons. Determinantal probability measures , 2002, math/0204325.

[17] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[18] Peter Bürgisser,et al. The Computational Complexity of Immanants , 2000, SIAM J. Comput..

[19] Jesper Møller,et al. Bayesian Analysis of Markov Point Processes , 2006 .

[20] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21] Luis Rademacher,et al. Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[22] L. Einkemmer. Quasi-Monte Carlo methods , 2010 .

[23] Antonio Torralba,et al. Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[24] James Allan,et al. Temporal summaries of new topics , 2001, SIGIR '01.

[25] Vahab S. Mirrokni,et al. Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[26] J. L. Jensen,et al. Pseudolikelihood for Exponential Family Models of Spatial Point Processes , 1991 .

[27] David J. Spiegelhalter,et al. Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[28] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[29] Shankar Kumar,et al. Minimum Bayes-Risk Word Alignments of Bilingual Texts , 2002, EMNLP.

[30] J. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals , 1960 .

[31] Tommi S. Jaakkola,et al. New Outer Bounds on the Marginal Polytope , 2007, NIPS.

[32] P. Diggle,et al. On parameter estimation for pairwise interaction point processes , 1994 .

[33] A. Baddeley,et al. Area-interaction point processes , 1993 .

[34] R. Cowan. An introduction to the theory of point processes , 1978 .

[35] E. Hlawka. Funktionen von beschränkter Variatiou in der Theorie der Gleichverteilung , 1961 .

[36] A. Soshnikov,et al. Janossy Densities. I. Determinantal Ensembles , 2002, math-ph/0212063.

[37] Carlos Guestrin,et al. A Note on the Budgeted Maximization of Submodular Functions , 2005 .

[38] J. Clarke,et al. Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[39] Alexander Schrijver,et al. A Combinatorial Algorithm Minimizing Submodular Functions in Strongly Polynomial Time , 2000, J. Comb. Theory B.

[40] P. Diggle,et al. A nonparametric estimator for pairwise-interaction point processes , 1987 .

[41] P. Diaconis,et al. On adding a list of numbers (and other one-dependent determinantal processes) , 2009, 0904.3740.

[42] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[43] L. Finegold,et al. Maximum density of random placing of membrane particles , 1979, Nature.

[44] A. Barvinok. Computational complexity of immanents and representations of the full linear group , 1990 .

[45] F. Dyson. Statistical Theory of the Energy Levels of Complex Systems. I , 1962 .

[46] Jeffrey D. Scargle,et al. An Introduction to the Theory of Point Processes, Vol. I: Elementary Theory and Methods , 2004, Technometrics.

[47] Andrew McCallum,et al. Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[48] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[49] K. Johansson. The Arctic circle boundary and the airy process , 2003, math/0306216.

[50] Noah A. Smith,et al. Summarization with a Joint Model for Sentence Extraction and Compression , 2009, ILP 2009.

[51] Endre Boros,et al. Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[52] A. Soshnikov. Determinantal random point fields , 2000, math/0002099.

[53] Michael Luby,et al. Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[54] Dafna Shahaf,et al. Connecting the dots between news articles , 2010, IJCAI.

[55] Dafna Shahaf,et al. Trains of thought: generating information maps , 2012, WWW.

[56] J. Nocedal. Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[57] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[58] D. Stoyan,et al. On One of Matérn's Hard‐core Point Process Models , 1985 .

[59] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[60] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[61] Y. Ogata,et al. Likelihood Analysis of Spatial Point Patterns , 1984 .

[62] B. Ripley. Statistical inference for spatial processes , 1990 .

[63] J. Besag,et al. Point process limits of lattice processes , 1982, Journal of Applied Probability.

[64] T. Shirai,et al. Fermion Process and Fredholm Determinant , 2000 .

[65] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[66] Shankar Kumar,et al. Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[67] J. Feder. Random sequential adsorption , 1980 .

[68] Ben Taskar,et al. Structured Determinantal Point Processes , 2010, NIPS.

[69] Ben Taskar,et al. Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[71] Jure Leskovec,et al. Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[72] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[73] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[74] A. Okounkov,et al. Correlation function of Schur process with application to local geometry of a random 3-dimensional Young diagram , 2001, math/0107056.

[75] M. L. Mehta,et al. ON THE DENSITY OF EIGENVALUES OF A RANDOM MATRIX , 1960 .

[76] O. Macchi. The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[77] Hui Lin,et al. Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[78] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[79] Guy Lapalme,et al. HEXTAC: the Creation of a Manual Extractive Run , 2009, TAC.

[80] Yair Weiss,et al. Linear Programming Relaxations and Belief Propagation - An Empirical Study , 2006, J. Mach. Learn. Res..

[81] Jean-Luc Brylinski,et al. Complexity and Completeness of Immanants , 2003, ArXiv.