Datum-Wise Classification: A Sequential Approach to Sparsity

We propose a novel classification technique whose aim is to select an appropriate representation for each datapoint, in contrast to the usual approach of selecting a representation encompassing the whole dataset. This datum-wise representation is found by using a sparsity inducing empirical risk, which is a relaxation of the standard L0 regularized risk. The classification problem is modeled as a sequential decision process that sequentially chooses, for each datapoint, which features to use before classifying. Datum-Wise Classification extends naturally to multi-class tasks, and we describe a specific case where our inference has equivalent complexity to a traditional linear classifier, while still using a variable number of features. We compare our classifier to classical L1 regularized linear models (L1-SVM and LARS) on a set of common binary and multi-class datasets and show that for an equal average number of features used we can get improved performance using our method.

[1]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[2]  Zongben Xu,et al.  L1/2 regularization , 2010, Science China Information Sciences.

[3]  Lawrence Carin,et al.  Cost-sensitive feature acquisition and classification , 2007, Pattern Recognit..

[4]  Mircea Preda,et al.  Adaptive building of decision trees by reinforcement learning , 2007 .

[5]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  Nathan R. Sturtevant,et al.  Learning when to stop thinking and do something! , 2009, ICML '09.

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Michail G. Lagoudakis,et al.  Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[11]  Patrick Gallinari,et al.  Text Classification: A Sequential Reading Approach , 2011, ECIR.

[12]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[13]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[14]  Emre Ertin,et al.  Reinforcement learning and design of nonparametric sequential decision networks , 2002, SPIE Defense + Commercial Sensing.

[15]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[18]  Philippe Preux,et al.  Feature Discovery in Reinforcement Learning Using Genetic Programming , 2008, EuroGP.

[19]  Wang Yao,et al.  L 1/2 regularization , 2010 .