Unsupervised One-Class Learning for Automatic Outlier Removal

Outliers are pervasive in many computer vision and pattern recognition problems. Automatically eliminating outliers scattering among practical data collections becomes increasingly important, especially for Internet inspired vision applications. In this paper, we propose a novel one-class learning approach which is robust to contamination of input training data and able to discover the outliers that corrupt one class of data source. Our approach works under a fully unsupervised manner, differing from traditional one-class learning supervised by known positive labels. By design, our approach optimizes a kernel-based max-margin objective which jointly learns a large margin one-class classifier and a soft label assignment for inliers and outliers. An alternating optimization algorithm is then designed to iteratively refine the classifier and the labeling, achieving a provably convergent solution in only a few iterations. Extensive experiments conducted on four image datasets in the presence of artificial and real-world outliers demonstrate that the proposed approach is considerably superior to the state-of-the-arts in obliterating outliers from contaminated one class of images, exhibiting strong robustness at a high outlier proportion up to 60%.

[1]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[2]  Koby Crammer,et al.  A needle in a haystack: local one-class optimization , 2004, ICML.

[3]  Thomas S. Huang,et al.  One-class SVM for learning in image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[4]  Joydeep Ghosh,et al.  Robust one-class clustering using hybrid global and local search , 2005, ICML.

[5]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[6]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[7]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[8]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[9]  Frédéric Jurie,et al.  Improving web image search results using query-relative classifiers , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Wei Liu,et al.  Noise resistant graph ranking for improved web image search , 2011, CVPR 2011.

[11]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[12]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[13]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  W. Gander,et al.  A Constrained Eigenvalue Problem , 1989 .

[15]  Yuesheng Xu,et al.  The Bedrosian identity for the Hilbert transform of product functions , 2006 .

[16]  Clayton D. Scott,et al.  Robust kernel density estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Jinbo Bi,et al.  Support Vector Classification with Input Data Uncertainty , 2004, NIPS.

[18]  Fernando De la Torre,et al.  Robust Kernel Principal Component Analysis , 2008, NIPS.

[19]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[21]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[23]  Monalisa Singh,et al.  Web Image Re-Ranking using Query-Specific Semantic Signatures , 2016 .

[24]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[25]  Philip S. Yu,et al.  One-Class-Based Uncertain Data Stream Learning , 2011, SDM.

[26]  Pietro Perona,et al.  Learning Object Categories From Internet Image Searches , 2010, Proceedings of the IEEE.

[27]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[28]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[29]  Francesca Bovolo,et al.  Semisupervised One-Class Support Vector Machines for Classification of Remote Sensing Data , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[31]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[32]  Xiaogang Wang,et al.  Web Image Re-Ranking UsingQuery-Specific Semantic Signatures , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[34]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning II: Margin-Based Classification without Labels , 2011, AISTATS.

[35]  Philip S. Yu Editorial: State of the Transactions , 2004, IEEE Trans. Knowl. Data Eng..