Approximate Bregman Near Neighbors in Sublinear Time: beyond the Triangle inequality

Bregman divergences are important distance measures that are used extensively in data-driven applications such as computer vision, text mining, and speech processing, and are a key focus of interest in machine learning. Answering nearest neighbor (NN) queries under these measures is very important in these applications and has been the subject of extensive study, but is problematic because these distance measures lack metric properties like symmetry and the triangle inequality. In this paper, we present the first provably approximate nearest-neighbor (ANN) algorithms for a broad sub-class of Bregman divergences under some assumptions. Specifically, we examine Bregman divergences which can be decomposed along each dimension and our bounds also depend on restricting the size of our allowed domain. We obtain bounds for both the regular asymmetric Bregman divergences as well as their symmetrized versions. To do so, we develop two geometric properties vital to our analysis: a reverse triangle inequality (RTI) and a relaxed triangle inequality called μ-defectiveness where μ is a domain-dependent value. Bregman divergences satisfy the RTI but not μ-defectiveness. However, we show that the square root of a Bregman divergence does satisfy μ-defectiveness. This allows us to then utilize both properties in an efficient search data structure that follows the general two-stage paradigm of a ring-tree decomposition followed by a quad tree search used in previous near-neighbor algorithms for Euclidean space and spaces of bounded doubling dimension. Our first algorithm resolves a query for a d-dimensional (1 + e)-ANN in time and O(n logd-1 n) space and holds for generic μ-defective distance measures satisfying a RTI. Our second algorithm is more specific in analysis to the Bregman divergences and uses a further structural parameter, the maximum ratio of second derivatives over each dimension of our allowed domain (c0). This allows us to locate a (1 + e)-ANN in O(log n) time and O(n) space, where there is a further (c0)d factor in the big-Oh for the query time.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1998, Discret. Comput. Geom..

[3]  Jorge Moraleda Gregory Shakhnarovich, Trevor Darrell and Piotr Indyk: Nearest-Neighbors Methods in Learning and Vision. Theory and Practice , 2007, Pattern Analysis and Applications.

[4]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[5]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  Sariel Har-Peled,et al.  Fast Algorithms for Computing the Smallest k-Enclosing Circle , 2004, Algorithmica.

[8]  Anthony K. H. Tung,et al.  Similarity Search on Bregman Divergence: Towards Non-Metric Indexing , 2009, Proc. VLDB Endow..

[9]  András Faragó,et al.  Fast Nearest-Neighbor Search in Dissimilarity Spaces , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[11]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[12]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[13]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[14]  Frank Nielsen,et al.  On the Smallest Enclosing Information Disk , 2008, CCCG.