A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

The advent of data science has spurred interest in estimating properties of distributions over large alphabets. Fundamental symmetric properties such as support size, support coverage, entropy, and proximity to uniformity, received most attention, with each property estimated using a different technique and often intricate analysis tools. We prove that for all these properties, a single, simple, plug-in estimator---profile maximum likelihood (PML)---performs as well as the best specialized techniques. This raises the possibility that PML may optimally estimate many other symmetric properties.

[1]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Alon Orlitsky,et al.  Competitive Distribution Estimation: Why is Good-Turing Good , 2015, NIPS.

[3]  James Zou,et al.  Quantifying the unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects , 2015, bioRxiv.

[4]  Pascal O. Vontobel The Bethe approximation of the pattern maximum likelihood distribution , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[5]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[6]  Thomas M. Cover,et al.  Elements of Information Theory 2006 , 2009 .

[7]  Gregory Valiant,et al.  The Power of Linear Estimators , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[8]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[9]  Alon Orlitsky,et al.  Recent results on pattern maximum likelihood , 2009, 2009 IEEE Information Theory Workshop on Networking and Information Theory.

[10]  Richard D. Gill,et al.  Estimating a probability mass function with unknown labels , 2013, 1312.1200.

[11]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[12]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[13]  Alon Orlitsky,et al.  Universal compression of memoryless sources over unknown alphabets , 2004, IEEE Transactions on Information Theory.

[14]  Alon Orlitsky,et al.  A Competitive Test for Uniformity of Monotone Distributions , 2013, AISTATS.

[15]  T. Cai,et al.  Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional , 2011, 1105.3039.

[16]  Robert K. Colwell,et al.  Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages , 2012 .

[17]  G. Hardy,et al.  Asymptotic Formulaæ in Combinatory Analysis , 1918 .

[18]  A. Orlitsky,et al.  On estimating the probability multiset , 2011 .

[19]  Alon Orlitsky,et al.  The maximum likelihood probability of unique-singleton, ternary, and length-7 patterns , 2009, 2009 IEEE International Symposium on Information Theory.

[20]  Ryan O'Donnell,et al.  Optimal Bounds for Estimating Entropy with PMF Queries , 2015, MFCS.

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Michael J. Berry,et al.  The structure and precision of retinal spike trains. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A. Suresh,et al.  Optimal prediction of the number of unseen species , 2016, Proceedings of the National Academy of Sciences.

[24]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[25]  Alon Orlitsky,et al.  On Modeling Profiles Instead of Values , 2004, UAI.

[26]  Dana Ron,et al.  On Testing Expansion in Bounded-Degree Graphs , 2000, Studies in Complexity and Cryptography.

[27]  Yingbin Liang,et al.  Estimation of KL divergence between large-alphabet distributions , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[28]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[29]  A. Timan Theory of Approximation of Functions of a Real Variable , 1994 .

[30]  Alon Orlitsky,et al.  Optimal Probability Estimation with Applications to Prediction and Classification , 2013, COLT.

[31]  Yihong Wu,et al.  Chebyshev polynomials, moment matching, and optimal estimation of the unseen , 2015, The Annals of Statistics.

[32]  Yanjun Han,et al.  Minimax estimation of the L1 distance , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[33]  Yanjun Han,et al.  Maximum Likelihood Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[34]  Pascal O. Vontobel The Bethe and Sinkhorn approximations of the pattern maximum likelihood estimate and their connections to the Valiant-Valiant estimate , 2014, 2014 Information Theory and Applications Workshop (ITA).

[35]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[36]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[37]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[38]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[39]  Himanshu Tyagi,et al.  The Complexity of Estimating Rényi Entropy , 2015, SODA.

[40]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[41]  Ravi Kumar,et al.  Sampling algorithms: lower bounds and applications , 2001, STOC '01.

[42]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[43]  Gregory Valiant,et al.  Instance-by-instance optimal identity testing , 2013, Electron. Colloquium Comput. Complex..

[44]  Pascal O. Vontobel,et al.  Pattern maximum likelihood estimation of finite-state discrete-time Markov chains , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[45]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[46]  Tugkan Batu Testing Properties of Distributions , 2001 .

[47]  Dana Ron,et al.  Strong Lower Bounds for Approximating Distribution Support Size and the Distinct Elements Problem , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[48]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[49]  Alon Orlitsky,et al.  Algorithms for modeling distributions over large alphabets , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[50]  Alon Orlitsky,et al.  Exact calculation of pattern probabilities , 2010, 2010 IEEE International Symposium on Information Theory.

[51]  Alon Orlitsky,et al.  Tight bounds for universal compression of large alphabets , 2013, 2013 IEEE International Symposium on Information Theory.

[52]  Alon Orlitsky,et al.  The maximum likelihood probability of skewed patterns , 2009, 2009 IEEE International Symposium on Information Theory.

[53]  Alon Orlitsky,et al.  25th Annual Conference on Learning Theory Competitive Classification and Closeness Testing , 2022 .

[54]  Ronitt Rubinfeld,et al.  The complexity of approximating the entropy , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[55]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[56]  Alon Orlitsky,et al.  Competitive Closeness Testing , 2011, COLT.

[57]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[58]  A. Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[59]  Shengjun Pan On the theory and application of pattern maximum likelihood , 2012 .

[60]  C. Papadimitriou,et al.  Algorithmic Approaches to Statistical Questions , 2012 .