论文信息 - Asymptotic Convergence in Online Learning with Unbounded Delays

Asymptotic Convergence in Online Learning with Unbounded Delays

We study the problem of predicting the results of computations that are too expensive to run, via the observation of the results of smaller computations. We model this as an online learning problem with delayed feedback, where the length of the delay is unbounded, which we study mainly in a stochastic setting. We show that in this setting, consistency is not possible in general, and that optimal forecasters might not have average regret going to zero. However, it is still possible to give algorithms that converge asymptotically to Bayes-optimal predictions, by evaluating forecasters on specific sparse independent subsequences of their predictions. We give an algorithm that does this, which converges asymptotically on good behavior, and give very weak bounds on how long it takes to converge. We then relate our results back to the problem of predicting large computations in a deterministic setting.

[1] H. Gaifman. Concerning measures in first order calculi , 1964 .

[2] Per Martin-Löf,et al. The Definition of Random Sequences , 1966, Inf. Control..

[3] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[4] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[5] David Haussler,et al. Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[6] Jeffrey C. Mogul,et al. Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[7] N. Cesa-Bianchi,et al. On Bayes Methods for On-Line Boolean Prediction , 1998, Annual Conference Computational Learning Theory.

[8] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[9] Erik Ordentlich,et al. On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.

[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11] Chris Mesterharm,et al. On-line Learning with Delayed Label Feedback , 2005, ALT.

[12] Haym Hirsh,et al. Improving on-line learning , 2007 .

[13] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[14] Rodney G. Downey,et al. Algorithmic Randomness and Complexity , 2010, Theory and Applications of Computability.

[15] Anirban DasGupta,et al. Probability for Statistics and Machine Learning: Fundamentals and Advanced Topics , 2011 .

[16] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[17] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[18] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[19] Abram Demski. Logical Prior Probability , 2012, AGI.

[20] Denis R. Hirschfeldt,et al. Algorithmic randomness and complexity. Theory and Applications of Computability , 2012 .

[21] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.

[22] Claudio Gentile,et al. Regret Minimization for Branching Experts , 2022 .

[23] Marcus Hutter,et al. Probabilities on Sentences in an Expressive Logic , 2012, J. Appl. Log..

[24] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.

[25] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.

[26] Kent Quanrud,et al. Online Learning with Adversarial Delays , 2015, NIPS.