Active Learning with Interpretable Predictor

Active learning is a method of constructing a useful prediction model with the minimum number of annotations or labeling for a response variable. It is widely used as a modern experimental design method, particularly for problems with high annotation cost. An appropriate reason for the selection of the next experimental setting is required for experiments with high annotation cost, for example, situations requiring large-scale experiments or long-term experiments such as agricultural examinations. In conventional active learning, it is only known that the samples that can improve the prediction accuracy of a prediction model are selected. This is not a satisfactory explanation for the selection of the next experimental setting for approving an experiment. In this paper, we propose a novel active learning algorithm with the following two models: a model to predict a response variable and a model to predict the amount of decrease in test loss. A new sample is selected using a model that predicts the amount of decrease in test loss. It is possible to provide a reason for sample selection by employing a model that can evaluate variable importance, e.g., using a random forest as a model of predicting the decrease in test loss. We applied the proposed method to multiple datasets and showed that the prediction performance of the proposed method is comparable to those of existing methods and the computational time is superior to those of existing methods. In addition, we demonstrated that it is possible to provide suitable reasons for selecting a sample in the process of active learning.

[1]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[2]  Jun Zhou,et al.  Maximizing Expected Model Change for Active Learning in Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[4]  Ruth Urner,et al.  Active Nearest Neighbors in Changing Environments , 2015, ICML.

[5]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[6]  Ross D. King,et al.  Active Learning for Regression Based on Query by Committee , 2007, IDEAL.

[7]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[8]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[9]  Gerhard Paass,et al.  Bayesian Query Construction for Neural Network Models , 1994, NIPS.

[10]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Jaime G. Carbonell,et al.  Optimizing estimated loss reduction for active sampling in rank learning , 2008, ICML '08.

[12]  Klaus Obermayer,et al.  Gaussian Process Regression: Active Data Selection and Test Point Rejection , 2000, DAGM-Symposium.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Hsuan-Tien Lin,et al.  Active Learning by Learning , 2015, AAAI.

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[17]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[18]  Pascal Fua,et al.  Learning Active Learning from Data , 2017, NIPS.

[19]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[20]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[21]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[22]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[23]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[24]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .