Asking Generalized Queries to Ambiguous Oracle

Asking generalized queries (by regarding some features as don't-care) in active learning has been proposed and studied recently. As each generalized query is equivalent to a set of specific ones, the answers from the oracle can usually provide more information thus speeding up the learning effectively. However, as the answers to the generalized queries might be uncertain, previous studies often assume that the oracle is capable of providing (accurate) probabilistic answers. This assumption, however, is often too stringent in real-world situations. In this paper, we make a more realistic assumption that the oracle can only provide (non-probabilistic) ambiguous answers, similar to the setting in multiple-instance learning. That is, the generalized query is labeled positive if at least one of the corresponding specific queries is positive, and is labeled negative otherwise. We therefore propose an algorithm to construct the generalized queries and improve the learning model with such ambiguous answers in active learning. Empirical study shows that, the proposed algorithm can significantly speed up the learning process, and outperform active learning with either specific queries or inaccurately answered generalized queries.

[1]  Jun Du,et al.  Asking Generalized Queries to Domain Experts to Improve Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[3]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[4]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[5]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[6]  Jun Du,et al.  Active Learning with Generalized Queries , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[8]  Haym Hirsh,et al.  Improving Short-Text Classification using Unlabeled Data for Classification Problems , 2000, ICML.

[9]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[10]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[11]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[12]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[13]  Yann Chevaleyre,et al.  Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.

[14]  Stan Matwin,et al.  Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence , 2001 .

[15]  D. Angluin Queries and Concept Learning , 1988 .

[16]  Mark Craven,et al.  Supervised versus multiple instance learning: an empirical comparison , 2005, ICML.

[17]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[18]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[19]  Dragos D. Margineantu,et al.  Active Cost-Sensitive Learning , 2005, IJCAI.

[20]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[21]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[22]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[23]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[24]  Jun Du,et al.  Active learning with direct query construction , 2008, KDD.

[25]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[26]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[27]  Qi Zhang,et al.  Content-Based Image Retrieval Using Multiple-Instance Learning , 2002, ICML.