论文信息 - On Discarding, Caching, and Recalling Samples in Active Learning

On Discarding, Caching, and Recalling Samples in Active Learning

We address challenges of active learning under scarce informational resources in non-stationary environments. In real-world settings, data labeled and integrated into a predictive model may become invalid over time. However, the data can become informative again with switches in context and such changes may indicate unmodeled cyclic or other temporal dynamics. We explore principles for discarding, caching, and recalling labeled data points in active learning based on computations of value of information. We review key concepts and study the value of the methods via investigations of predictive performance and costs of acquiring data for simulated and real-world data sets.

Eric Horvitz | Ashish Kapoor

[1] Kamal Nigamyknigam,et al. Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[2] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[3] Andrew McCallum,et al. Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[4] Yoram Singer,et al. The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[5] Craig A. Knoblock,et al. Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[6] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[7] Eric Horvitz,et al. BusyBody: creating and fielding personalized models of the cost of interruption , 2004, CSCW.

[8] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[9] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[10] Claudio Gentile,et al. Learning Probabilistic Linear-Threshold Classifiers via Selective Sampling , 2003, COLT.

[11] Koby Crammer,et al. Online Classification on a Budget , 2003, NIPS.