论文信息 - Initial training data selection for active learning

Initial training data selection for active learning

The crucial issue in many classification applications is how to achieve the best possible classifier with a limited number of labeled training data. Active learning is one method which addresses this issue by selecting the most informative data for training. In this work, we argue that the performance of active learning could be improved through carefully selecting the initial training samples. To confirm our argument, we propose three initial training data selection mechanisms based on fuzzy clustering method: center-based selection, border-based selection and hybrid selection. Center-based selection selects the samples with high degree of membership in each cluster as initial training data. Border-based selection selects the samples around the border between clusters. Hybrid selection is the combination of center-based selection and border-based selection. The effects of them are empirically studied on a set of UCI data sets. Experimental result indicates that, compared with randomly selecting initial training samples, hybrid selection can effectively enhance the performance of active learning.

[1] L. Zadeh. Fuzzy sets as a basis for a theory of possibility , 1999 .

[2] Prasad Tadepalli,et al. Active learning with committees: an approach to efficient learning in text categorization using linear threshold algorithms , 2000 .

[3] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[4] David D. Lewis,et al. Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[5] Min Tang,et al. Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[6] Raymond J. Mooney,et al. Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[7] Rebecca Hwa,et al. On minimizing training corpus for parser acquisition , 2001, CoNLL.

[8] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[9] James C. Bezdek,et al. Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[10] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[11] Ion Muslea,et al. Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[12] Shlomo Argamon,et al. Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[13] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.

[14] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.