We introduce AL2, a pool-based active learning approach that learns how to inform the active set selection. The framework is classifier-independent, amenable to different performance targets, applicable to both binary and multinomial classification for batch-mode active learning. Here, we consider a special instantiation, ALsubmodular, in which the choice of learning structure leads to a submodular objective function, therefore allowing for an efficient algorithm with optimality guarantee of 1−1/e. Statistically significant improvements over the state of the art are offered for two supervised learning methods, benchmark (UCI) datasets and the motivating sustainability application of land-cover prediction in the Arctic. 1 Motivation and Related Work Sustainability research is inherently a predictive science and can be crucially informed by accurate models for e.g. species distributions, land-use and climate change [6, 8]. Consider a predictive model for land-cover in the Arctic that relates ecological covariates to vegetation type. Such a model enables projections of the possible effects of climate scenarios by predicting the future composition of the land cover under drift of the ecological covariates [15]. Predictive accuracy and uncertainty estimates of the model are crucial and depend not only on the model complexity and inherent assumptions but also on the amount and quality of the training data. On one hand, ecological and environmental features such as biomass are readily available from remote sensing data sources. On the other hand though, collecting information on the actual vegetation cover in different parts of the Arctic is an expensive and time-consuming task performed by surveys over areas of large spatial extent. Hence, land-cover survey planning has to be done very carefully, in a targeted way, and with certain constraints in mind. This leads to experimental design and active learning (AL); for a comprehensive review, see Settles [18]. In pool-based active learning, one starts with a small training datasetL of labeled samples and a large pool U of unlabeled samples. On each iteration the active learner selects one or more samples from U , which are then labeled by an oracle (e.g., a human annotator) and added to the training dataset. The learner then retrains the predictive model and selects more samples for labeling. The goal of active learning is to achieve good performance of the predictive model with as few labeled samples as possible. Most active learning research has focused on sequential active learning, in which one greedily selects a single most informative unlabeled sample from U according to some utility measure. The most commonly used utility measures fall within the family of uncertainty sampling methods such as least confident sampling [3], margin sampling [17], and entropy sampling [21]. Another family of sequential active learning approaches is based on the query-by-committee (QBC) algorithm [20], where active learning selection is based on the disagreement of the committee classifiers about the label of an unlabeled sample. A key limitation of sequential active learning is the need for retraining which can be time consuming and in many applications is not even possible due to limited resources and expertise.
[1]
Leo Breiman,et al.
Bagging Predictors
,
1996,
Machine Learning.
[2]
Thomas G. Dietterich.
Machine Learning in Ecosystem Informatics and Sustainability
,
2009,
IJCAI.
[3]
Mark Craven,et al.
An Analysis of Active Learning Strategies for Sequence Labeling Tasks
,
2008,
EMNLP.
[4]
Yi Zhang,et al.
Incorporating Diversity and Density in Active Learning for Relevance Feedback
,
2007,
ECIR.
[5]
Thomas G. Dietterich.
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization
,
2000,
Machine Learning.
[6]
Richard G. Pearson,et al.
Arctic greening under future climate change predicted using machine learning
,
2011
.
[7]
Theodoros Damoulas,et al.
Pattern Recognition
,
1998,
Encyclopedia of Information Systems.
[8]
Russell Greiner,et al.
Optimistic Active-Learning Using Mutual Information
,
2007,
IJCAI.
[9]
Rong Jin,et al.
Batch mode active learning and its application to medical image classification
,
2006,
ICML.
[10]
Burr Settles,et al.
Active Learning Literature Survey
,
2009
.
[11]
Sang Joon Kim,et al.
A Mathematical Theory of Communication
,
2006
.
[12]
H. Sebastian Seung,et al.
Query by committee
,
1992,
COLT '92.
[13]
Andrew McCallum,et al.
Reducing Labeling Effort for Structured Prediction Tasks
,
2005,
AAAI.
[14]
Klaus Brinker,et al.
Incorporating Diversity in Active Learning with Support Vector Machines
,
2003,
ICML.
[15]
Stefan Wrobel,et al.
Multi-class Ensemble-Based Active Learning
,
2006,
ECML.
[16]
Yuhong Guo,et al.
Active Instance Sampling via Matrix Partition
,
2010,
NIPS.
[17]
Robert E. Schapire,et al.
The Boosting Approach to Machine Learning An Overview
,
2003
.
[18]
Stefan Wrobel,et al.
Active Hidden Markov Models for Information Extraction
,
2001,
IDA.
[19]
C. Gomes.
Computational Sustainability: Computational methods for a sustainable environment, economy, and society
,
2009
.
[20]
Dale Schuurmans,et al.
Discriminative Batch Mode Active Learning
,
2007,
NIPS.
[21]
M. L. Fisher,et al.
An analysis of approximations for maximizing submodular set functions—I
,
1978,
Math. Program..