Novelty generating machine

Novelty detection is one of primary tasks in data mining and machine learning. The task is to differentiate unseen outliers from normal patterns. Though novelty detection has been well-studied for many years and has found a wide range of applications, identifying outliers is still very challenging because of the absence or scarcity of outliers. We observe several characteristics of outliers and normal patterns. First, normal patterns are usually grouped together and form some clusters in high density regions of the data. Second, outliers are very different from the normal patterns, and in turn these outliers are far away from the normal patterns. Third, the number of outliers is very small compared with the size of the dataset. Based on these observations, we can envisage that the decision boundary between outliers and normal patterns usually lies in some low density regions of the data, which is referred to as cluster assumption. The resultant optimization problem is in form of a mixed integer programming. Then, we present a cutting plane algorithm together with multiple kernel learning techniques to solve its convex relaxation. Moreover, we make use of the scarcity of outliers to find a violating solution in cutting plane algorithm.

[1]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[2]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[3]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[4]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[5]  Stephen P. Boyd,et al.  A minimax theorem with applications to machine learning, signal processing, and finance , 2007, 2007 46th IEEE Conference on Decision and Control.

[6]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[7]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[8]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[9]  S. Marsland Novelty Detection in Learning Systems , 2008 .

[10]  Ivor W. Tsang,et al.  A Convex Method for Locating Regions of Interest with Multi-instance Learning , 2009, ECML/PKDD.

[11]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[12]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[13]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[14]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[15]  Sebastian Nowozin,et al.  Infinite Kernel Learning , 2008, NIPS 2008.

[16]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[18]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[19]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.