Pool-Based Active Learning with Query Construction

Active learning is an important method for solving data scarcity problem in machine learning, and most research work of active learning are pool-based. However, this type of active learning is easily affected by pool size, and makes performance improvement of classifier slow. A novel active learning with constructing queries based pool is proposed. Each iteration the training process first chooses representative instance from pool predefined, then employs climbing algorithm to construct instance to label which best represents the original unlabeled set. It makes each queried instance more representative than any instance in the pool. Compared with the original pool based method and a state-of-the-art active learning with constructing queries directly, the new method makes the prediction error rate of classifier drop more fast, and improves the performance of active learning classifier.

[1]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[2]  Mohammad Al Hasan,et al.  SPARCL: Efficient and Effective Shape-Based Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Jun Du,et al.  Asking Generalized Queries to Domain Experts to Improve Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[6]  Jun Du,et al.  Asking Generalized Queries to Ambiguous Oracle , 2010, ECML/PKDD.

[7]  Jun Du,et al.  Active learning with direct query construction , 2008, KDD.

[8]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[11]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[12]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[13]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[14]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.