论文信息 - Active Learning and Basis Selection for Kernel-Based Linear Models: A Bayesian Perspective

Active Learning and Basis Selection for Kernel-Based Linear Models: A Bayesian Perspective

We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, ? ? \BBRN?N, for which the (i,j)th element is defined by the kernel function K(?i,?j) ? \BBR, with the observed data ?i ? \BBRd. We seek a model, M:?i? yi, where yi is a real-valued response or integer-valued label, which we do not have access to a priori. To achieve this goal, a submatrix, ?Il,Ib ? \BBRn?m, is sought that corresponds to the intersection of n rows and m columns of ?, indexed by the sets Il and Ib, respectively. Typically m ? N and n ? N. We have two objectives: (i) Determine the m columns of ?, indexed by the set Ib, that are the most informative for building a linear model, M: [1 ?i,Ib]T ? yi , without any knowledge of {yi}i=1N and (ii) using active learning, sequentially determine which subset of n elements of {yi}i=1N should be acquired; both stopping values, |Ib| = m and |Il| = n, are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x, as measured by the differential entropy of its posterior distribution. The parameter vector x ? \BBRm, as well as the model bias ? ? \BBR, is then learned from the resulting problem, yIl = ?Il,Ibx + ?1+?. The remaining N-n responses/labels not included in yIl can be inferred by applying x to the remaining N-n rows of ? :, Ib. We show experimental results for several regression and classification problems, and compare to other active learning methods.

[1] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[2] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.

[3] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4] Andrew McCallum,et al. Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[5] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .

[6] K. Chaloner,et al. Bayesian Experimental Design: A Review , 1995 .

[7] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[8] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[9] Matthias W. Seeger,et al. Compressed sensing and Bayesian experimental design , 2008, ICML '08.

[10] Richard G. Baraniuk,et al. Random Projections of Smooth Manifolds , 2009, Found. Comput. Math..

[11] Lawrence Carin,et al. Plan-In-Advance Active Learning 0f Classifiers , 2008 .

[12] J. Lafferty,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[13] Richard G. Baraniuk,et al. Random Projections of Signal Manifolds , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14] David J. C. MacKay,et al. The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[15] Chuan-Sheng Foo,et al. A majorization-minimization algorithm for (multiple) hyperparameter learning , 2009, ICML '09.

[16] Lawrence Carin,et al. Active selection of labeled data for target detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[18] Neil D. Lawrence,et al. Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[19] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[20] Lawrence Carin,et al. Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[21] J. Tropp,et al. SIGNAL RECOVERY FROM PARTIAL INFORMATION VIA ORTHOGONAL MATCHING PURSUIT , 2005 .

[22] George Eastman House,et al. Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[23] J. H. Schuenemeyer,et al. Generalized Linear Models (2nd ed.) , 1992 .

[24] Pascal Vincent,et al. Kernel Matching Pursuit , 2002, Machine Learning.

[25] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[26] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[27] P. Grassberger,et al. Measuring the Strangeness of Strange Attractors , 1983 .

[28] Lawrence Carin,et al. Application of the theory of optimal experiments to adaptive electromagnetic-induction sensing of buried targets , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[30] P. McCullagh,et al. Generalized Linear Models, 2nd Edn. , 1990 .

[31] Chinmay Hegde,et al. Random Projections for Manifold Learning , 2007, NIPS.

[32] Joel A. Tropp,et al. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[33] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[34] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[35] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.