Nearly Optimal Classification for Semimetrics

We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. For metric spaces, the doubling dimension essentially characterizes both the runtime and sample complexity of classification algorithms --- yet we show that this is not the case for semimetrics. Instead, we define the {\em density dimension} and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We present nearly optimal sample compression algorithms and use these to obtain generalization guarantees, including fast rates. The latter hold for general sample compression schemes and may be of independent interest.

[1]  Lee-Ad Gottlieb,et al.  A Nonlinear Approach to Dimension Reduction , 2009, SODA '11.

[2]  TomasiCarlo,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000 .

[3]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[4]  Shay Moran,et al.  Sample compression schemes for VC classes , 2015, 2016 Information Theory and Applications Workshop (ITA).

[5]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[8]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[9]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[10]  Leonard J. Schulman,et al.  Dimensionality reduction: beyond the Johnson-Lindenstrauss bound , 2011, SODA '11.

[11]  Sariel Har-Peled,et al.  Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[12]  Maria-Florina Balcan,et al.  Improved Guarantees for Learning via Similarity Functions , 2008, COLT.

[13]  John Shawe-Taylor,et al.  PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[14]  Reuven Bar-Yehuda,et al.  A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem , 1981, J. Algorithms.

[15]  Daphna Weinshall,et al.  Condensing image databases when retrieval is based on non-metric distances , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[16]  Ronen Basri,et al.  Determining the similarity of deformable shapes , 1998, Vision Research.

[17]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[18]  Aryeh Kontorovich,et al.  Exact Lower Bounds for the Agnostic Probably-Approximately-Correct (PAC) Machine Learning Model , 2016, The Annals of Statistics.

[19]  Alexandr Andoni,et al.  The Computational Hardness of Estimating Edit Distance , 2010 .

[20]  Ruth Urner,et al.  Active Nearest-Neighbor Learning in Metric Spaces , 2016, NIPS.

[21]  Ingemar J. Cox,et al.  PicHunter: Bayesian relevance feedback for image retrieval , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[22]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[23]  Aryeh Kontorovich,et al.  Maximum Margin Multiclass Nearest Neighbors , 2014, ICML.

[24]  Daniel Berend,et al.  A finite sample analysis of the Naive Bayes classifier , 2015, J. Mach. Learn. Res..

[25]  Wallace Alvin Wilson,et al.  On Semi-Metric Spaces , 1931 .

[26]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[27]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[28]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[29]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[30]  Lee-Ad Gottlieb,et al.  Proximity Algorithms for Nearly-Doubling Spaces , 2010, APPROX-RANDOM.

[31]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[32]  Gideon Schechtman,et al.  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[33]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[34]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[36]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[37]  KontorovichAryeh,et al.  Nearly optimal classification for semimetrics , 2017 .

[38]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[39]  Lee-Ad Gottlieb,et al.  Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[40]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[41]  Lee-Ad Gottlieb,et al.  Near-Optimal Sample Compression for Nearest Neighbors , 2014, IEEE Transactions on Information Theory.

[42]  Piotr Sankowski,et al.  Maximum matchings via Gaussian elimination , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[43]  Liwei Wang,et al.  On learning with dissimilarity functions , 2007, ICML '07.

[44]  Anil K. Jain,et al.  Representation and Recognition of Handwritten Digits Using Deformable Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[46]  S. Konyagin,et al.  On measures with the doubling condition , 1988 .

[47]  Daphna Weinshall,et al.  Classification in Non-Metric Spaces , 1998, NIPS.

[48]  Dennis K. Burke Cauchy sequences in semimetric spaces , 1972 .

[49]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[50]  Richard Cole,et al.  Searching dynamic point sets in spaces with bounded doubling dimension , 2006, STOC '06.

[51]  Daphna Weinshall,et al.  Flexible Syntactic Matching of Curves and Its Application to Automatic Hierarchical Classification of Silhouettes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  M. Talagrand,et al.  Approximating a helix in finitely many dimensions , 1992 .