Estimating means of bounded random variables by betting

This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization (and improvement) of the celebrated Chernoff method. At its heart, it is based on deriving a new class of composite nonnegative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations by Howard et al. [1]. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.

[1]  Philip B. Stark Sets of Half-Average Nulls Generate Risk-Limiting Audits: SHANGRLA , 2020, Financial Cryptography Workshops.

[2]  Lawrence K. Saul,et al.  Large Deviation Methods for Approximate Probabilistic Inference , 1998, UAI.

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  T. Cover Universal Portfolios , 1996 .

[5]  Xiaotong Shen,et al.  Empirical Likelihood , 2002 .

[6]  Vladimir Vovk,et al.  Testing Randomness Online , 2019, Statistical Science.

[7]  Xiequan Fan,et al.  Exponential inequalities for martingales with applications , 2013, 1311.6273.

[8]  Wouter M. Koolen,et al.  Admissible anytime-valid sequential inference must rely on nonnegative martingales. , 2020, 2009.03167.

[9]  Kei Takeuchi,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Sequential Optimizing Strategy in Multi-dimensional Bounded Forecasting Games , 2009, 0911.3933.

[10]  A. Barron,et al.  Estimation of mixture models , 1999 .

[11]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[12]  Peter Hall,et al.  Methodology and algorithms of empirical likelihood , 1990 .

[13]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[14]  H. Robbins,et al.  Confidence sequences for mean, variance, and median. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Karthik Sridharan,et al.  On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities , 2015, COLT.

[16]  H Robbins,et al.  Probability distributions related to the law of the iterated logarithm. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Aaditya Ramdas,et al.  Sequential estimation of quantiles with applications to A/B testing and best-arm identification , 2019, Bernoulli.

[18]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[19]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[20]  Aaditya Ramdas,et al.  Confidence sequences for sampling without replacement , 2020, NeurIPS.

[21]  G. Bennett Probability Inequalities for the Sum of Independent Random Variables , 1962 .

[22]  T. Lai On Confidence Sequences , 1976 .

[23]  Jon D. McAuliffe,et al.  Time-uniform Chernoff bounds via nonnegative supermartingales , 2018, Probability Surveys.

[24]  Philip S. Thomas,et al.  A New Confidence Interval for the Mean of a Bounded Random Variable , 2019, ArXiv.

[25]  T. W. Anderson CONFIDENCE LIMITS FOR THE EXPECTED VALUE OF AN ARBITRARY BOUNDED RANDOM VARIABLE WITH A CONTINUOUS DISTRIBUTION FUNCTION , 1969 .

[26]  Michael M. McKerns,et al.  Building a Framework for Predictive Science , 2012, SciPy.

[27]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[28]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[29]  Tatiana Tommasi,et al.  Training Deep Networks without Learning Rates Through Coin Betting , 2017, NIPS.

[30]  T. Lai,et al.  Pseudo-maximization and self-normalized processes , 2007, 0709.2233.

[31]  G. Shafer The Language of Betting as a Strategy for Statistical and Scientific Communication , 2019, 1903.06991.

[32]  H. Robbins,et al.  Iterated logarithm inequalities. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[34]  H. Hendriks Test Martingales for bounded random variables , 2018, 2109.08923.

[35]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[36]  Wouter M. Koolen,et al.  Testing exchangeability: Fork-convexity, supermartingales and e-processes , 2021, Int. J. Approx. Reason..

[37]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[38]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[39]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[40]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[41]  V. Bentkus,et al.  On domination of tail probabilities of (super)martingales: Explicit bounds , 2006 .

[42]  H. Robbins,et al.  A Class of Stopping Rules for Testing Parametric Hypotheses , 1985 .

[43]  Jasjeet S. Sekhon,et al.  Time-uniform, nonparametric, nonasymptotic confidence sequences , 2020, The Annals of Statistics.

[44]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[45]  Larry Wasserman,et al.  Universal inference , 2019, Proceedings of the National Academy of Sciences.

[46]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[47]  Jean-Luc Ville Étude critique de la notion de collectif , 1939 .

[48]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[49]  G. Shafer,et al.  Test Martingales, Bayes Factors and p-Values , 2009, 0912.4269.

[50]  Francesco Orabona,et al.  Improved Strongly Adaptive Online Learning using Coin Betting , 2016, AISTATS.

[51]  H. Robbins,et al.  Boundary Crossing Probabilities for the Wiener Process and Sample Sums , 1970 .

[52]  H. Robbins,et al.  Inequalities for the sequence of sample means. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[54]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[55]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[56]  V. Bentkus On Hoeffding’s inequalities , 2004, math/0410159.

[57]  Stefano Ermon,et al.  Adaptive Concentration Inequalities for Sequential Decision Problems , 2016, NIPS.

[58]  L. M. M.-T. Theory of Probability , 1929, Nature.

[59]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[60]  Wouter M. Koolen,et al.  Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals , 2018, J. Mach. Learn. Res..

[61]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[62]  Philip S. Thomas,et al.  Towards Practical Mean Bounds for Small Samples , 2021, ICML.

[63]  Wouter M. Koolen,et al.  Safe Testing , 2019, 2020 Information Theory and Applications Workshop (ITA).

[64]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[65]  Ralph Roskies,et al.  Bridges: a uniquely flexible HPC resource for new communities and data analytics , 2015, XSEDE.

[66]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[67]  Akshay Balsubramani Sharp Finite-Time Iterated-Logarithm Martingale Concentration , 2014 .

[68]  Francesco Orabona,et al.  Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.

[69]  Francesco Orabona,et al.  Parameter-free Online Convex Optimization with Sub-Exponential Noise , 2019, COLT.

[70]  Arun Kumar Kuchibhotla,et al.  Near-Optimal Confidence Sequences for Bounded Random Variables , 2021, ICML.

[71]  Vladimir Vovk,et al.  Game‐Theoretic Foundations for Probability and Finance , 2019, Wiley Series in Probability and Statistics.

[72]  H. Robbins,et al.  The Expected Sample Size of Some Tests of Power One , 1974 .

[73]  H. Robbins Statistical Methods Related to the Law of the Iterated Logarithm , 1970 .