Active learning for object classification: from exploration to exploitation

Classifying large datasets without any a-priori information poses a problem in numerous tasks. Especially in industrial environments, we often encounter diverse measurement devices and sensors that produce huge amounts of data, but we still rely on a human expert to help give the data a meaningful interpretation. As the amount of data that must be manually classified plays a critical role, we need to reduce the number of learning episodes involving human interactions as much as possible. In addition for real world applications it is fundamental to converge in a stable manner to a solution that is close to the optimal solution. We present a new self-controlled exploration/exploitation strategy to select data points to be labeled by a domain expert where the potential of each data point is computed based on a combination of its representativeness and the uncertainty of the classifier. A new Prototype Based Active Learning (PBAC) algorithm for classification is introduced. We compare the results to other active learning approaches on several benchmark datasets.

[1]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[2]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[3]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[4]  Kun Deng,et al.  Balancing exploration and exploitation: a new algorithm for active machine learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Kwang Ryel Ryu,et al.  Using Cluster-Based Sampling to Select Initial Training Set for Active Learning in Text Classification , 2004, PAKDD.

[7]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[8]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[9]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[10]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[11]  Stephen L. Chin An Efficient Method for Extracting Fuzzy Classification Rules from High Dimensional Data , 1997, J. Adv. Comput. Intell. Intell. Informatics.

[12]  Daniel P. W. Ellis,et al.  Support vector machine active learning for music retrieval , 2006, Multimedia Systems.

[13]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[14]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[16]  Michael R. Berthold,et al.  Adaptive Active Classification of Cell Assay Images , 2006, PKDD.

[17]  J. Buhmann,et al.  Active learning for hierarchical pairwise data clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.