EGAL: Exploration Guided Active Learning for TCBR

The task of building labelled case bases can be approached using active learning (AL), a process which facilitates the labelling of large collections of examples with minimal manual labelling effort. The main challenge in designing AL systems is the development of a selection strategy to choose the most informative examples to manually label. Typical selection strategies use exploitation techniques which attempt to refine uncertain areas of the decision space based on the output of a classifier. Other approaches tend to balance exploitation with exploration, selecting examples from dense and interesting regions of the domain space. In this paper we present a simple but effective exploration-only selection strategy for AL in the textual domain. Our approach is inherently case-based, using only nearest-neighbour-based density and diversity measures. We show how its performance is comparable to the more computationally expensive exploitation-based approaches and that it offers the opportunity to be classifier independent.

[1]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[2]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[3]  Li Guo,et al.  An active learning based TCM-KNN algorithm for supervised network intrusion detection , 2007, Comput. Secur..

[4]  Kun Deng,et al.  Balancing exploration and exploitation: a new algorithm for active machine learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[6]  Qian Zhang,et al.  Back to the Future: Knowledge Light Case Base Cookery , 2008, ECCBR Workshops.

[7]  Kentaro Inui,et al.  Selective Sampling for Example-based Word Sense Disambiguation , 1998, CL.

[8]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[9]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[10]  Jingbo Zhu,et al.  A Density-Based Re-ranking Technique for Active Learning for Data Annotations , 2009, ICCPOL.

[11]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[12]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[13]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[14]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[15]  Brian Mac Namee,et al.  Off to a Good Start: Using Clustering to Select the Initial Training Set in Active Learning , 2010, FLAIRS.

[16]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[17]  Udo Hahn,et al.  An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data , 2007, EMNLP.

[18]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[19]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[20]  Martina Hasenjäger,et al.  Active Learning with Local Models , 1998, Neural Processing Letters.

[21]  Santiago Ontañón,et al.  Collaborative Case Retention Strategies for CBR Agents , 2003, ICCBR.

[22]  ChengXiang Zhai,et al.  Active feedback in ad hoc information retrieval , 2005, SIGIR '05.

[23]  Ram Akella,et al.  Active relevance feedback for difficult queries , 2008, CIKM '08.

[24]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[25]  Bernd Freisleben,et al.  Learning Semantic Annotations for Textual Cases , 2005, ICCBR Workshops.

[26]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[27]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[28]  Thomas S. Huang,et al.  Combining diversity-based active learning with discriminant analysis in image retrieval , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[29]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[30]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[31]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[32]  Michael R. Berthold,et al.  Active learning for object classification: from exploration to exploitation , 2009, Data Mining and Knowledge Discovery.

[33]  Brian Mac Namee,et al.  Sweetening the Dataset : Using Active Learning to Label Unlabelled Datasets , 2008 .

[34]  Stewart Massie,et al.  Index Driven Selective Sampling for CBR , 2003, ICCBR.