Manifold optimal experimental design via dependence maximization for active learning

Naturally occurring data have been growing in a huge volume size, which poses a big challenge to give them high-quality labels to learn a good model. Therefore, it is critical to only select the most informative data points for labeling, which is cast into the framework of active learning. We study this problem in a regression model from optimal experimental design (OED). To this end, several OED based methods have been developed, but the relations between the data points and their predictions are still not fully explored. Inspired by this, we employ the Hilbert-Schmidt independence criterion (HSIC) to maximize the dependence between the samples and their estimations in a global view. Thus, we present a novel active learning method named manifold optimal experimental design via dependence maximization (MODM). Specifically, those points having maximum dependence with their predictions are expected to be included for labeling. Besides, it utilizes the graph Laplacian to preserve the locally geometrical structure of the data. In this way, the most informative data points can be better selected. Moreover, we adopt a sequential strategy to optimize the objective function. The effectiveness of the proposed algorithm has been experimentally verified in content-based image retrieval.

[1]  Yi Yang,et al.  Active learning for social image retrieval using Locally Regressive Optimal Design , 2012, Neurocomputing.

[2]  Yanrong Guo,et al.  Active learning based intervertebral disk classification combining shape and texture similarities , 2013, Neurocomputing.

[3]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[4]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Yi Yang,et al.  Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation , 2014, Expert Syst. Appl..

[7]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by nonrigid image matching , 2008, ACM Multimedia.

[9]  Chun Chen,et al.  Convex experimental design using manifold structure for image retrieval , 2009, MM '09.

[10]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[11]  Chun Chen,et al.  Subspace learning via Locally Constrained A-optimal nonnegative projection , 2013, Neurocomputing.

[12]  Deng Cai,et al.  Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[14]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[15]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[16]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Chun Chen,et al.  G-Optimal Design with Laplacian Regularization , 2010, AAAI.

[19]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[20]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Zohreh Azimifar,et al.  Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds , 2011, Pattern Recognit..

[22]  Chun Chen,et al.  Relational Multimanifold Coclustering , 2013, IEEE Transactions on Cybernetics.

[23]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[24]  Ke Lu,et al.  Hessian optimal design for image retrieval , 2011, Pattern Recognit..

[25]  WangMeng,et al.  Active learning in multimedia annotation and retrieval , 2011 .

[26]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[27]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[28]  Chun Chen,et al.  Locally discriminative spectral clustering with composite manifold , 2013, Neurocomputing.

[29]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[30]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[31]  Anthony C. Atkinson,et al.  Optimum Experimental Designs, with SAS , 2007 .

[32]  Chun Chen,et al.  Efficient manifold ranking for image retrieval , 2011, SIGIR.

[33]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[34]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.