Improving Active Learning for One-Class Classification Using Dimensionality Reduction

This work aims to improve the performance of active learning techniques for one-class classification (OCC) via dimensionality reduction (DR) and pre-filtering of the unlabelled input data. In practice, the input data of OCC problems is high-dimensional and often contains significant redundancy of negative examples. Thus, DR is typically an important pre-processing step to address the high-dimensionality challenge. However, the redundancy has not been previously addressed. In this work, we propose a framework to exploit the detected DR basis functions of the instance space in order to filter-out most of the redundant data. Instances are removed or maintained using an adaptive thresholding operator depending on their distance to the identified DR basis functions. This reduction in the dimensionality, redundancy and size of the instance space results in significant reduction of the computational complexity of active learning for OCC process. For the preserved instances, their distance to the identified DR basis functions is also used in order to select more efficiently the initial training batch as well as additional instances at each iteration of the active training algorithm. This was done by ensuring that the labelled data always contains nearly uniform representation along the different DR basis functions of the instance space. Experimental results show that applying the DR and pre-filtering steps results in better performance of the active learning for OCC.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Nathalie Japkowicz,et al.  One-Class versus Binary Classification: Which and When? , 2012, 2012 11th International Conference on Machine Learning and Applications.

[3]  Mustafa Bilgic,et al.  Combining Active Learning and Dynamic Dimensionality Reduction , 2012, SDM.

[4]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[5]  Ranga Vemuri,et al.  A spline based regression technique on interval valued noisy data , 2007, ICMLA 2007.

[6]  Nathalie Japkowicz,et al.  Active Learning for One-Class Classification , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[7]  Saturnino Luz,et al.  Dimensionality reduction for active learning with nearest neighbour classifier in text categorisation problems , 2007, ICMLA 2007.

[8]  Padraig Cunningham,et al.  An evaluation of dimension reduction techniques for one-class classification , 2007, Artificial Intelligence Review.