Active Learning for Multi-Class Logistic Regression

Which of the many proposed methods for active learning can we expect to yield good performance in learning logistic regression classifiers? In this article, we evaluate different approaches to determine suitable practices. Among our contributions, we test several explicit objective functions for active learning: an empirical consideration lacking in the literature until this point. We develop a theoretical framework for applying different loss functions motivated by work in optimal experimental design. Empirical investigations demonstrate the benefits of our variance reduction method which gives attractive classification accuracy and matches or beats random performance in all evaluations. Of the alternative heuristic approaches, we identify a method called margin sampling as giving promising performance with little computational overhead.

[1]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[2]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[3]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[6]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[7]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[8]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[9]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[10]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[11]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[12]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[13]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[14]  David A. Cohn,et al.  Minimizing Statistical Bias with Queries , 1996, NIPS.

[15]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[16]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[17]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[18]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[19]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[20]  Andreas Buja,et al.  Degrees of Boosting – A Study of Loss Functions for Classification and Class Probability Estimation , .

[21]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.