Distance-Based Classification with Lipschitz Functions

The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

[1]  R. Arens,et al.  On embedding uniform and topological spaces. , 1956 .

[2]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[3]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[4]  V. Pestov Free Banach spaces and representations of topological groups , 1986 .

[5]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[6]  J. Steele Probability theory and combinatorial optimization , 1987 .

[7]  M. Talagrand The Ajtai-Komlos-Tusnady Matching Theorem for General Measures , 1992 .

[8]  J. Yukich,et al.  Asymptotics for transportation cost in high dimensions , 1995 .

[9]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[10]  Robert E. Megginson An Introduction to Banach Space Theory , 1998 .

[11]  R. C. Williamson,et al.  Classification on proximity data with LP-machines , 1999 .

[12]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[13]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[14]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[15]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[16]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[17]  Dengyong Zhou,et al.  Global Geometry of SVM Classifiers , 2002 .

[18]  S. Mendelson,et al.  Entropy and the combinatorial dimension , 2002, math/0203275.

[19]  O. Bousquet Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[20]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[21]  O. Bousquet,et al.  Maximal Margin Classification for Metric Spaces , 2003, COLT 2003.