Estimating the Support of a High-Dimensional Distribution

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a simple subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

[1]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[2]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[3]  T. Sager An Iterative Method for Estimating a Multivariate Mode and Isopleth , 1979 .

[4]  L. Devroye,et al.  Detection of Abnormal Behavior Via Nonparametric Estimation of the Support , 1980 .

[5]  Numérisation de documents anciens mathématiques Annales de l'Institut Henri Poincaré. Section B, Calcul des probabilités et statistique , 1982 .

[6]  J. Hartigan Estimation of a Convex Density Contour in Two Dimensions , 1987 .

[7]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[8]  A. Cuevas On Pattern Analysis in the Non‐Convex Case , 1990 .

[9]  D. Nolan The excess-mass ellipsoid , 1991 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  D. Mason,et al.  Generalized quantile processes , 1992 .

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  A. P. Korostelev,et al.  MiniMax Methods for Image Reconstruction , 1993 .

[14]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[15]  W. Polonik Density estimation under qualative assumptions inhigher dimensions , 1995 .

[16]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[17]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[18]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[19]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[20]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[21]  A. Tsybakov On nonparametric estimation of density level sets , 1997 .

[22]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[23]  W. Polonik Minimum volume sets and generalized quantile processes , 1997 .

[24]  Shai Ben-David,et al.  Learning Distributions by Their Density Levels: A Paradigm for Learning without a Teacher , 1997, J. Comput. Syst. Sci..

[25]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[28]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[29]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[30]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[31]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[32]  Nello Cristianini,et al.  Margin Distribution Bounds on Generalization , 1999, EuroCOLT.

[33]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[34]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[35]  Joachim M. Buhmann,et al.  Single-class Support Vector Machines , 1999 .

[36]  Bernhard Schölkopf,et al.  Entropy Numbers, Operators and Support Vector Kernels , 1999, EuroCOLT.

[37]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[38]  D. Stoneking Improving the manufacturability of electronic designs , 1999 .

[39]  Nello Cristianini,et al.  Generalization Performance of Classifiers in Terms of Observed Covering Numbers , 1999, EuroCOLT.

[40]  Bernhard Schölkopf,et al.  Bounds on Error Expectation for SVM , 2000 .

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Bernhard Schölkopf,et al.  Four-legged Walking Gait Control Using a Neuromorphic Chip Interfaced to a Support Vector Learning Algorithm , 2000, NIPS.

[43]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[44]  Bernhard Schölkopf,et al.  Kernel method for percentile feature extraction , 2000 .

[45]  Bernhard Schölkopf,et al.  Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra , 2000, NIPS.

[46]  O. Chapelle,et al.  Bounds on error expectation for SVM , 2000 .

[47]  Bernhard Schölkopf,et al.  Entropy Numbers of Linear Function Classes , 2000, COLT.

[48]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[49]  Bernhard Schölkopf,et al.  Regularized Principal Manifolds , 1999, J. Mach. Learn. Res..

[50]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[51]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[52]  A. Belousov,et al.  Applicational aspects of support vector machines , 2002 .

[53]  D. Chakraborty,et al.  Making a multilayered perceptron network say - "don't know" when it should , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[54]  Nello Cristianini,et al.  On the generalization of soft margin algorithms , 2002, IEEE Trans. Inf. Theory.

[55]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.

[56]  Fernando Pérez-Cruz,et al.  Supervised-PCA and SVM classifiers for object detection in infrared images , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[57]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .