Active learning based on minimization of the expected path-length of random walks on the learned manifold structure

Abstract Active learning algorithms aim at selecting important samples to label for subsequent machine learning tasks. Many active learning algorithms make use of the reproducing kernel Hilbert space (RKHS) induced by a Gaussian radial basis function (RBF) kernel and leverage the geometrical structure of the data for query-sample selection. Parameters for the kernel function and the k -nearest-neighborhood graph must be properly set beforehand. As a tool exploring the structure of data, active learning algorithms with automatic tuning of those parameters are desirable. In this paper, local linear embedding (LLE) with convex constraints on neighbor weights is used to learn the geometrical structure of the data in the RKHS induced by a Gaussian RBF kernel. Automatic tuning of the kernel parameter is based on the assumption that the geometrical structure of the data in the RKHS is sparse and local. With the Markov matrix established based on the learned LLE weight matrix, the total expected path-length of the random walks from all samples to selected samples is proposed to be a criterion for query-sample selection. A greedy algorithm having a guaranteed solution bound is developed to select query samples and a two-phase scheme is also proposed for scaling the proposed active learning algorithm. Experimental results on data sets including hundreds to tens of thousands of samples have shown the feasibility of the proposed approach.

[1]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[2]  Edoardo Pasolli,et al.  Ensemble Multiple Kernel Active Learning For Classification of Multisource Remote Sensing Data , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[3]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[4]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[7]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[8]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[9]  Yao Hu,et al.  Active learning via neighborhood reconstruction , 2013, IJCAI 2013.

[10]  Jieping Ye,et al.  Querying discriminative and representative samples for batch mode active learning , 2013, KDD.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Deng Cai,et al.  Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[15]  Xiaoqi He,et al.  Combining clustering coefficient-based active learning and semi-supervised learning on networked data , 2010, 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering.

[16]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[17]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[18]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[20]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[21]  Murat Akçakaya,et al.  Classification Active Learning Based on Mutual Information , 2016, Entropy.

[22]  Sethuraman Panchanathan,et al.  Active Batch Selection via Convex Relaxations with Guaranteed Solution Bounds , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[24]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[25]  Yatong Zhou,et al.  Analysis of the Distance Between Two Classes for Tuning SVM Hyperparameters , 2010, IEEE Transactions on Neural Networks.

[26]  Sungzoon Cho,et al.  Locally linear reconstruction for instance-based learning , 2008, Pattern Recognit..

[27]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Sethuraman Panchanathan,et al.  Batch Mode Active Sampling Based on Marginal Probability Distribution Matching , 2013, ACM Trans. Knowl. Discov. Data.

[29]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[30]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[31]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[32]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[33]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[34]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[35]  Chun Chen,et al.  Manifold optimal experimental design via dependence maximization for active learning , 2014, Neurocomputing.

[36]  Cheng Li,et al.  Active learning on manifolds , 2014, Neurocomputing.

[37]  Bin Luo,et al.  Similarity Learning of Manifold Data , 2015, IEEE Transactions on Cybernetics.

[38]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[39]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[40]  Neil D. Lawrence,et al.  Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis , 2006, J. Mach. Learn. Res..