Manifold Adaptive Experimental Design for Text Categorization

In many information processing tasks, labels are usually expensive and the unlabeled data points are abundant. To reduce the cost on collecting labels, it is crucial to predict which unlabeled examples are the most informative, i.e., improve the classifier the most if they were labeled. Many active learning techniques have been proposed for text categorization, such as SVMActive and Transductive Experimental Design. However, most of previous approaches try to discover the discriminant structure of the data space, whereas the geometrical structure is not well respected. In this paper, we propose a novel active learning algorithm which is performed in the data manifold adaptive kernel space. The manifold structure is incorporated into the kernel space by using graph Laplacian. This way, the manifold adaptive kernel space reflects the underlying geometry of the data. By minimizing the expected error with respect to the optimal classifier, we can select the most representative and discriminative data points for labeling. Experimental results on text categorization have demonstrated the effectiveness of our proposed approach.

[1]  Stephen Smale,et al.  Finding the Homology of Submanifolds with High Confidence from Random Samples , 2008, Discret. Comput. Geom..

[2]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[3]  Jiawei Han,et al.  Spectral Regression: A Regression Framework for Efficient Regularized Subspace Learning , 2009 .

[4]  Yihong Gong,et al.  trNon-greedy active learning for text categorization using convex ansductive experimental design , 2008, SIGIR '08.

[5]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[6]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[7]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[8]  Gerhard Weikum,et al.  Graph-based text classification: learn from your neighbors , 2006, SIGIR.

[9]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[10]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Michael I. Jordan,et al.  Robust design of biological experiments , 2005, NIPS.

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[15]  Jiawei Han,et al.  Regularized locality preserving indexing via spectral regression , 2007, CIKM '07.

[16]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[17]  Olivier Chapelle,et al.  Active Learning for Parzen Window Classifier , 2005, AISTATS.

[18]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[19]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[20]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[21]  David Madigan,et al.  Constructing informative prior distributions from domain knowledge in text classification , 2006, SIGIR.

[22]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[23]  Tao Qin,et al.  Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[25]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[26]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[27]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[28]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.

[29]  David A. Forsyth,et al.  ManifoldBoost: stagewise function approximation for fully-, semi- and un-supervised learning , 2008, ICML '08.

[30]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[31]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[32]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[34]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[35]  James Allan,et al.  An interactive algorithm for asking and incorporating feature feedback into support vector machines , 2007, SIGIR.

[36]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[37]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[38]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[39]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[40]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[41]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[42]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[43]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[44]  Anthony C. Atkinson,et al.  Optimum Experimental Designs, with SAS , 2007 .

[45]  Wei-Ying Ma,et al.  Locality preserving indexing for document representation , 2004, SIGIR '04.

[46]  Kun Zhou,et al.  Laplacian optimal design for image retrieval , 2007, SIGIR.

[47]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[48]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[49]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.