论文信息 - Learning and Inference via Maximum Inner Product Search

Learning and Inference via Maximum Inner Product Search

A large class of commonly used probabilistic models known as log-linear models are defined up to a normalization constant. Typical learning algorithms for such models require solving a sequence of probabilistic inference queries. These inferences are typically intractable, and are a major bottleneck for learning models with large output spaces. In this paper, we provide a new approach for amortizing the cost of a sequence of related inference queries, such as the ones arising during learning. Our technique relies on a surprising connection with algorithms developed in the past two decades for similarity search in large data bases. Our approach achieves improved running times with provable approximation guarantees. We show that it performs well both on synthetic data and neural language models with large output spaces.

Stefano Ermon | Stephen Mussmann | S. Ermon | Stephen Mussmann | Stefano Ermon

[1] E. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .

[2] John W. Tukey,et al. A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[3] Dan Roth,et al. On the Hardness of Approximate Reasoning , 1993, IJCAI.

[4] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7] Patrick Pantel,et al. Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[8] Shi Zhong,et al. Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[9] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[10] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .

[11] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.

[12] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[13] Kristen Grauman,et al. Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[15] George Papandreou,et al. Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[16] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[17] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[18] Parikshit Ram,et al. Maximum inner-product search using cone trees , 2012, KDD.

[19] Parikshit Ram,et al. Efficient retrieval of recommendations in a matrix factorization framework , 2012, CIKM.

[20] Subhransu Maji,et al. On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.