Entropic Affinities: Properties and Efficient Numerical Computation

Gaussian affinities are commonly used in graph-based methods such as spectral clustering or nonlinear embedding. Hinton and Roweis (2003) introduced a way to set the scale individually for each point so that it has a distribution over neighbors with a desired perplexity, or effective number of neighbors. This gives very good affinities that adapt locally to the data but are harder to compute. We study the mathematical properties of these "entropic affinities" and show that they implicitly define a continuously differentiable function in the input space and give bounds for it. We then devise a fast algorithm to compute the widths and affinities, based on robustified, quickly convergent root-finding methods combined with a tree-or density-based initialization scheme that exploits the slowly-varying behavior of this function. This algorithm is nearly optimal and much more accurate and fast than the existing bisection-based approach, particularly with large datasets, as we show with image and text data.

[1]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[2]  C. J. F. Ridders,et al.  A new algorithm for computing a single root of a real continuous function , 1979 .

[3]  J. Traub Iterative Methods for the Solution of Equations , 1982 .

[4]  W. Gander On Halley's Iteration Method , 1985 .

[5]  T. R. Scavo,et al.  On the Geometry of Halley's Method , 1995 .

[6]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[7]  A. Melman Classroom Note: Geometry and Convergence of Euler's and Halley's Methods , 1997, SIAM Rev..

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[11]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[12]  Meng Joo Er,et al.  Face recognition with radial basis function (RBF) neural networks , 2002, IEEE Trans. Neural Networks.

[13]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[14]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[16]  S. Sheather Density Estimation , 2004 .

[17]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[18]  Ramani Duraiswami,et al.  The improved fast Gauss transform with applications to machine learning , 2005 .

[19]  M. Hazelton,et al.  Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernal density estimation , 2005 .

[20]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[21]  Miguel Á. Carreira-Perpiñán,et al.  Fast nonparametric clustering with Gaussian blurring mean-shift , 2006, ICML.

[22]  H. Frigui,et al.  Fuzzy relational kernel clustering with Local Scaling Parameter Learning , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[23]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.

[24]  V. Raykar,et al.  Fast Computation of Kernel Estimators , 2010 .

[25]  Miguel Á. Carreira-Perpiñán,et al.  Fast Training of Nonlinear Embedding Algorithms , 2012, ICML.