Active learning for penalized logistic regression via sequential experimental design

Penalized logistic regression is useful for classification that not only provides class probability estimates but also can overcome overfitting problem. Traditionally, supervised classifier learning has required a lot of labeled data. Due to technical innovation, it is easy to collect large amounts of unlabeled data, while labeling is usually expensive and difficult. Active learning aims to select the most informative subjects for labeling to decrease the amount of labeling requests. Recently, active learning using experimental design techniques have attracted considerable attention. The typical criteria attempt to reduce the generalization error of a model by minimizing either its estimation variance or estimation bias. However, they fail to take into account both components simultaneously. In this article, we introduce a new algorithm of active learning using penalized logistic regression. The most informative subjects are selected as those with the smallest mean squared estimation error. This criterion, integrated with the idea of sequential design, is exploited in our algorithms to guide a procedure for a new subject selection. Experiments on extensive real-world data sets demonstrate the effectiveness and efficiency of the proposed method compared to several state-of-the-art active-learning alternatives.

[1]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[2]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[3]  Chengqi Zhang,et al.  Post-mining: maintenance of association rules by weighting , 2003, Inf. Syst..

[4]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  André I. Khuri,et al.  Quantile dispersion graphs for evaluating and comparing designs for logistic regression models , 2003, Comput. Stat. Data Anal..

[6]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[7]  Xinwei Deng,et al.  Experimental design , 2012, WIREs Data Mining Knowl. Discov..

[8]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[9]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[10]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[11]  Sándor Kemény,et al.  LOGISTIC RIDGE REGRESSION FOR CLINICAL DATA ANALYSIS (A CASE STUDY) , 2006 .

[12]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[13]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[14]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[15]  C. F. Wu,et al.  Efficient Sequential Designs with Binary Data , 1985 .

[16]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[17]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[18]  Chengqi Zhang,et al.  Data preparation for data mining , 2003, Appl. Artif. Intell..

[19]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[20]  Jiawei Han,et al.  Regularized locality preserving indexing via spectral regression , 2007, CIKM '07.

[21]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[22]  Shichao Zhang,et al.  Mining Multiple Data Sources: Local Pattern Analysis , 2006, Data Mining and Knowledge Discovery.

[23]  Chun Chen,et al.  Manifold optimal experimental design via dependence maximization for active learning , 2014, Neurocomputing.

[24]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  David Gomez-Cabrero,et al.  Data integration in the era of omics: current and future challenges , 2014, BMC Systems Biology.

[26]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[27]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[28]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[31]  Lorenzo Bruzzone,et al.  A Fast Cluster-Assumption Based Active-Learning Technique for Classification of Remote Sensing Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[34]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[35]  Roberto Todeschini,et al.  Quantitative Structure − Activity Relationship Models for Ready Biodegradability of Chemicals , 2013 .

[36]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[37]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[38]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[39]  David A. Cohn,et al.  Minimizing Statistical Bias with Queries , 1996, NIPS.

[40]  Jean-Philippe Vert,et al.  A Bayesian active learning strategy for sequential experimental design in systems biology , 2014, BMC Systems Biology.

[41]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..

[42]  Chengqi Zhang,et al.  Semi-parametric optimization for missing data imputation , 2007, Applied Intelligence.

[43]  Jianfeng Lu,et al.  Active learning via query synthesis and nearest neighbour search , 2015, Neurocomputing.

[44]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[45]  Anthony C. Atkinson,et al.  Optimum Experimental Designs, with SAS , 2007 .

[46]  Hod Lipson,et al.  Optimal Experiment Design for Coevolutionary Active Learning , 2014, IEEE Transactions on Evolutionary Computation.

[47]  Lisha Hu,et al.  A new and informative active learning approach for support vector machine , 2013, Inf. Sci..

[48]  T. Santner,et al.  On the small sample properties of norm-restricted maximum likelihood estimators for logistic regression models , 1989 .

[49]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.