Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding

We consider the problem of active feature acquisition where the goal is to sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way at test time. In this work, we formulate this active feature acquisition as a jointly learning problem of training both the classifier (environment) and the RL agent that decides either to `stop and predict' or `collect a new feature' at test time, in a cost-sensitive manner. We also introduce a novel encoding scheme to represent acquired subsets of features by proposing an order-invariant set encoding at the feature level, which also significantly reduces the search space for our agent. We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several medical datasets. Our framework shows meaningful feature acquisition process for diagnosis that complies with human knowledge, and outperforms all baselines in terms of prediction performance as well as feature acquisition cost.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Venkatesh Saligrama,et al.  Supervised Sequential Classification Under Budget Constraints , 2013, AISTATS.

[5]  Andrew McCallum,et al.  Selecting actions for resource-bounded information extraction using reinforcement learning , 2012, WSDM '12.

[6]  Brendan J. Frey,et al.  Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[7]  Matt J. Kusner,et al.  Cost-Sensitive Tree of Classifiers , 2012, ICML.

[8]  Christian Osendorfer,et al.  Sequential Feature Selection for Classification , 2011, Australasian Conference on Artificial Intelligence.

[9]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[10]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[11]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[12]  He He,et al.  Imitation Learning by Coaching , 2012, NIPS.

[13]  Pallika H. Kanani Prediction-time Active Feature-value Acquisition for Cost-effective Customer Targeting , 2008 .

[14]  Lise Getoor,et al.  VOILA: Efficient Feature-value Acquisition for Classification , 2007, AAAI.

[15]  Regina Barzilay,et al.  Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[16]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[17]  Christian Osendorfer,et al.  Minimizing data consumption with sequential online feature selection , 2013, Int. J. Mach. Learn. Cybern..

[18]  Victor S. Sheng,et al.  Feature value acquisition in testing: a sequential batch test algorithm , 2006, ICML.

[19]  Ludovic Denoyer,et al.  Datum-Wise Classification: A Sequential Approach to Sparsity , 2011, ECML/PKDD.

[20]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[23]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[27]  He He,et al.  Active Information Acquisition , 2016, ArXiv.

[28]  Venkatesh Saligrama,et al.  Fast margin-based cost-sensitive classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[30]  Eric Horvitz,et al.  Breaking boundaries: active information acquisition across learning and diagnosis , 2009, NIPS 2009.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.