Canopy Fast Sampling with Cover Trees

Hierarchical Bayesian models often capture distributions over a very large number of distinct atoms. The need for these models arises when organizing huge amount of unsupervised data, for instance, features extracted using deep convnets that can be exploited to organize abundant unlabeled images. Inference for hierarchical Bayesian models in such cases can be rather nontrivial, leading to approximate approaches. In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure. We provide theory for Canopy and demonstrate its effectiveness on both synthetic and real datasets, consisting of over 100 million images.

[1]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[2]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Andreas Krause,et al.  Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures , 2015, AISTATS.

[6]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[7]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[8]  Stefano Ermon,et al.  Learning and Inference via Maximum Inner Product Search , 2016, ICML.

[9]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[10]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[11]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[12]  Andreas Krause,et al.  Approximate K-Means++ in Sublinear Time , 2016, AAAI.

[13]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Alexander J. Smola,et al.  Exponential Stochastic Cellular Automata for Massively Parallel Inference , 2016, AISTATS.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Mike Izbicki,et al.  Faster cover trees , 2015, ICML.

[18]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Michael I. Jordan,et al.  Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[21]  Alexander J. Smola,et al.  FastEx: Hash Clustering with Exponential Families , 2012, NIPS.

[22]  Pascal Vincent,et al.  Clustering is Efficient for Approximate Maximum Inner Product Search , 2015, ArXiv.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[25]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[26]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[27]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[28]  Ting Liu,et al.  Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[29]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[30]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[31]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.