Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering

Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

[1]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[2]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Charles Elkan,et al.  Results of the KDD'99 classifier learning , 2000, SKDD.

[6]  Ling Guan,et al.  iARM - an interactive video retrieval system , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[7]  Chih-Jen Lin,et al.  Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..

[8]  Wei Hu,et al.  HIGCALS: a hierarchical graph-theoretic clustering active learning system , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Marcello Pelillo,et al.  Efficient Out-of-Sample Extension of Dominant-Set Clusters , 2004, NIPS.

[10]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[11]  M. Pavan,et al.  A new graph-theoretic approach to clustering and segmentation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Erland Jonsson,et al.  Using active learning in intrusion detection , 2004, Proceedings. 17th IEEE Computer Security Foundations Workshop, 2004..

[13]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Charles T. Zahn,et al.  and Describing GestaltClusters , 1971 .

[16]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[17]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[18]  Eli Shamir,et al.  Query by Committee, Linear Separation and Random Walks , 1999, EuroCOLT.

[19]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[20]  Malcolm I. Heywood,et al.  Training genetic programming on half a million patterns: an example from anomaly detection , 2005, IEEE Transactions on Evolutionary Computation.

[21]  Thore Graepel,et al.  The Kernel Gibbs Sampler , 2000, NIPS.

[22]  J.C. Principe,et al.  Information theoretic spectral clustering , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[23]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[24]  Ulrike von Luxburg,et al.  Multi-agent Random Walks for Local Clustering on Graphs , 2010, 2010 IEEE International Conference on Data Mining.

[25]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[26]  Shih-Fu Chang,et al.  Semantic video clustering across sources using bipartite spectral clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[27]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[28]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[29]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[30]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[31]  Ulrike von Luxburg,et al.  Limits of Spectral Clustering , 2004, NIPS.

[32]  S. T. Sarasamma,et al.  Hierarchical Kohonenen net for anomaly detection in network security , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[34]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[35]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[37]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[38]  Lei Wang,et al.  Image retrieval with SVM active learning embedding Euclidean search , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[39]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[40]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[41]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[42]  Bo Zhang,et al.  A unified framework for image retrieval using keyword and visual features , 2005, IEEE Transactions on Image Processing.

[43]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[44]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[45]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[46]  Rong Yan,et al.  Multi-class active learning for video semantic feature extraction , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[47]  Michael R. Lyu,et al.  A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  PfahringerBernhard Winning the KDD99 classification cup , 2000 .

[49]  Wei Hu,et al.  AdaBoost-Based Algorithm for Network Intrusion Detection , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[50]  Bernhard Pfahringer,et al.  Winning the KDD99 classification cup: bagged boosting , 2000, SKDD.

[51]  Gökhan Tür,et al.  Active learning for spoken language understanding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[52]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[53]  Wei Zhang,et al.  A genetic clustering method for intrusion detection , 2004, Pattern Recognit..