Adversarial laws of large numbers and optimal regret in online classification

Laws of large numbers guarantee that given a large enough sample from some population, the measure of any fixed sub-population is well-estimated by its frequency in the sample. We study laws of large numbers in sampling processes that can affect the environment they are acting upon and interact with it. Specifically, we consider the sequential sampling model proposed by Ben-Eliezer and Yogev (2020), and characterize the classes which admit a uniform law of large numbers in this model: these are exactly the classes that are online learnable. Our characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. The sample-complexity bounds we obtain are tight for many parameter regimes, and as an application, we determine the optimal regret bounds in online learning, stated in terms of Littlestone’s dimension, thus resolving the main open question from Ben-David, Pál, and Shalev-Shwartz (2009), which was also posed by Rakhlin, Sridharan, and Tewari (2015). *Department of Mathematics, Princeton University, Princeton, New Jersey, USA and Schools of Mathematics and Computer Science, Tel Aviv University, Tel Aviv, Israel. Research supported in part by NSF grant DMS-1855464, BSF grant 2018267 and the Simons Foundation. Email: nalon@math.princeton.edu. †Center for Mathematical Sciences and Applications, Harvard University, Massachusetts, USA. Research partially conducted while the author was at Weizmann Institute of Science, supported in part by a grant from the Israel Science Foundation (no. 950/15). Email: omribene@cmsa.fas.harvard.edu. ‡Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Email: dagan@mit.edu. §Department of Mathematics, Technion, Israel. Email: smoran@technion.ac.il. Research supported in part by the Israel Science Foundation (grant No. 1225/20), by an Azrieli Faculty Fellowship, and by a grant from the United States Israel Binational Science Foundation (BSF). ¶Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel. Supported in part by grants from the Israel Science Foundation (no. 950/15 and 2686/20) and by the Simons Foundation Collaboration on the Theory of Algorithmic Fairness. Incumbent of the Judith Kleeman Professorial Chair. Email: moni.naor@weizmann.ac.il. ||Department of Computer Science, Boston University and Department of Computer Science, Tel Aviv University. Email: eylony@gmail.com. Research supported in part by ISF grants 484/18, 1789/19, Len Blavatnik and the Blavatnik Foundation, and The Blavatnik Interdisciplinary Cyber Research Center at Tel Aviv University. ar X iv :2 10 1. 09 05 4v 1 [ cs .L G ] 2 2 Ja n 20 21

[1]  Ambuj Tewari,et al.  Online learning via sequential complexities , 2010, J. Mach. Learn. Res..

[2]  Vitaly Feldman,et al.  The Everlasting Database: Statistical Validity at a Fair Price , 2018, NeurIPS.

[3]  Karthik Sridharan,et al.  Statistical Learning and Sequential Prediction , 2014 .

[4]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[5]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[6]  Ambuj Tewari,et al.  Sequential complexities and uniform martingale laws of large numbers , 2015 .

[7]  Tim Roughgarden,et al.  Smoothed Analysis of Online and Differentially Private Learning , 2020, NeurIPS.

[8]  Richard M. Dudley,et al.  Sample Functions of the Gaussian Process , 1973 .

[9]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[10]  Haim Kaplan,et al.  Separating Adaptive Streaming from Oblivious Streaming , 2021, ArXiv.

[11]  Prateek Mittal,et al.  DARTS: Deceiving Autonomous Cars with Toxic Signs , 2018, ArXiv.

[12]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[13]  Noga Alon,et al.  Transversal numbers for hypergraphs arising in geometry , 2002, Adv. Appl. Math..

[14]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[15]  David P. Woodruff,et al.  Reusable low-error compressive sampling schemes through privacy , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[16]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[17]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[18]  Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Ambuj Tewari,et al.  Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.

[20]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[21]  David P. Woodruff,et al.  A Framework for Adversarially Robust Streaming Algorithms , 2020, SIGMOD Rec..

[22]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[23]  Stanislav Abaimov,et al.  Understanding Machine Learning , 2022, Machine Learning for Cyber Agents.

[24]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[25]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[26]  D. Blackwell Controlled Random Walks , 2010 .

[27]  Jirí Matousek,et al.  Tight upper bounds for the discrepancy of half-spaces , 1995, Discret. Comput. Geom..

[28]  Moni Naor,et al.  Sketching in adversarial environments , 2008, STOC.

[29]  R. Dudley A course on empirical processes , 1984 .

[30]  V. Climenhaga Markov chains and mixing times , 2013 .

[31]  Atri Rudra,et al.  Recovering simple signals , 2012, 2012 Information Theory and Applications Workshop.

[32]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[33]  V. Peña A General Class of Exponential Inequalities for Martingales and Ratios , 1999 .

[34]  J. Matousek,et al.  Geometric Discrepancy: An Illustrated Guide , 2009 .

[35]  Jirí Matousek,et al.  Discrepancy and approximations for bounded VC-dimension , 1993, Comb..

[36]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[37]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[38]  Eylon Yogev,et al.  The Adversarial Robustness of Sampling , 2019, IACR Cryptol. ePrint Arch..

[39]  Richard Peng,et al.  Graph Sparsification, Spectral Sketches, and Faster Resistance Computation, via Short Cycle Decompositions , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[40]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[41]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[42]  David P. Woodruff,et al.  How robust are linear sketches to adaptive inputs? , 2012, STOC '13.

[43]  S. Chatterjee Concentration inequalities with exchangeable pairs (Ph.D. thesis) , 2005, math/0507526.

[44]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[45]  Haim Kaplan,et al.  Adversarially Robust Streaming Algorithms via Differential Privacy , 2020, NeurIPS.

[46]  Yeshwanth Cherapanamjeri,et al.  On Adaptive Distance Estimation , 2020, NeurIPS.

[47]  Karthik Sridharan,et al.  On Martingale Extensions of Vapnik–Chervonenkis Theory with Applications to Online Learning , 2015 .

[48]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[49]  Moni Naor,et al.  Bloom Filters in Adversarial Environments , 2015, CRYPTO.

[50]  Alexander Rakhlin,et al.  Majorizing Measures, Sequential Complexities, and Online Learning , 2021, COLT.

[51]  Shai Ben-David,et al.  Agnostic Online Learning , 2009, COLT.

[52]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[53]  D. Freedman On Tail Probabilities for Martingales , 1975 .