Sequential Learning without Feedback

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret.

[1]  Venkatesh Saligrama,et al.  Supervised Sequential Classification Under Budget Constraints , 2013, AISTATS.

[2]  Koby Crammer,et al.  Prediction with Limited Advice and Multiarmed Bandits with Paid Observations , 2014, ICML.

[3]  Russell Greiner,et al.  Learning and Classifying Under Hard Budgets , 2005, ECML.

[4]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[5]  Noga Alon,et al.  From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[6]  Russell Greiner,et al.  Online Learning with Costly Features and Labels , 2013, NIPS.

[7]  Venkatesh Saligrama,et al.  Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction , 2015, NIPS.

[8]  D. Teneketzis,et al.  Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .

[9]  Venkatesh Saligrama,et al.  Multi-stage classifier design , 2012, Machine Learning.

[10]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[11]  Bruce A. Draper,et al.  ADORE: Adaptive Object Recognition , 1999, ICVS.

[12]  Venkatesh Saligrama,et al.  Feature-Budgeted Random Forest , 2015, ICML.

[13]  Nathan R. Sturtevant,et al.  Learning when to stop thinking and do something! , 2009, ICML '09.

[14]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[15]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[16]  Yifan Wu,et al.  Online Learning with Gaussian Payoffs and Side Observations , 2015, NIPS.

[17]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[18]  Russell Greiner,et al.  Efficient Interpretation Policies , 2001, IJCAI.