论文信息 - Sequential Learning without Feedback

Sequential Learning without Feedback

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret.

Venkatesh Saligrama | Csaba Szepesvári | Manjesh Kumar Hanawal

[1] Venkatesh Saligrama,et al. Supervised Sequential Classification Under Budget Constraints , 2013, AISTATS.

[2] Koby Crammer,et al. Prediction with Limited Advice and Multiarmed Bandits with Paid Observations , 2014, ICML.

[3] Russell Greiner,et al. Learning and Classifying Under Hard Budgets , 2005, ECML.

[4] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[5] Noga Alon,et al. From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[6] Russell Greiner,et al. Online Learning with Costly Features and Labels , 2013, NIPS.

[7] Venkatesh Saligrama,et al. Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction , 2015, NIPS.

[8] D. Teneketzis,et al. Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .

[9] Venkatesh Saligrama,et al. Multi-stage classifier design , 2012, Machine Learning.

[10] Dan Roth,et al. Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[11] Bruce A. Draper,et al. ADORE: Adaptive Object Recognition , 1999, ICVS.