Optimally Extracting Discriminative Disjunctive Features for Dimensionality Reduction

Dimension Reduction is one popular approach to tackle large and redundant feature spaces as seen in most practical problems, either by selecting a subset of features or by projecting the features onto a smaller space. Most of these approaches suffer from the drawback that the dimensionality reduction objective and the objective for classifier training are decoupled. Recently, there have been some efforts to address the two tasks in a combined manner by attempting to solve an upper-bound to a single objective function. But the main drawback of these methods is that they are all parametric, in the sense that the number of reduced dimensions needs to be provided as an input to the system. Here we propose an integrated non-parametric learning approach to supervised dimension reduction by exploring a search space of all possible disjunctions of features and discovering a sparse subset of (interpretable) disjunctions that minimise a regularised loss function. Here, in order to discover good disjunctive features, we employ algorithms from hierarchical kernel learning to simultaneously achieve efficient feature selection and optimal classifier training in a maximum margin framework and demonstrate the effectiveness of our approach on benchmark datasets.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[3]  Wanhong Xu Supervising Latent Topic Model for Maximum-Margin Text Classification and Regression , 2010, PAKDD.

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[6]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[7]  Qi Tian,et al.  Integrating Discriminant and Descriptive Information for Dimension Reduction and Classification , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[10]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[11]  Yulan He,et al.  Incorporating Sentiment Prior Knowledge for Weakly Supervised Sentiment Analysis , 2012, TALIP.

[12]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[13]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[14]  Xiaoyan Zhu,et al.  Sentiment Analysis with Global Topics and Local Dependency , 2010, AAAI.

[15]  Francis R. Bach,et al.  High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning , 2009, ArXiv.

[16]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[17]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[18]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Frank D. Wood,et al.  Hierarchically Supervised Latent Dirichlet Allocation , 2011, NIPS.

[21]  Gal Chechik Max Margin Dimensionality Reduction , 2008 .

[22]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[23]  M. Narasimha Murty,et al.  On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations , 2010, PAKDD.

[24]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[25]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[26]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[27]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.