Non-Bayesian Parametric Missing-Mass Estimation

We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the conventional constrained Cramér-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators, does not provide a relevant bound on the performance for this problem. In this paper, we introduce a frequentist, non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We derive a non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the missing-mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The missing-mass unbiasedness and the proposed mmCCRB can be used to evaluate the performance of existing estimators. Based on the new mmCCRB, we propose a new method to improve existing estimators by an iterative missing-mass Fisher scoring method. Finally, we demonstrate via numerical simulations that the proposed mmCCRB is a valid and informative lower bound on the mmMSE of state-of-the-art estimators for this problem: the CML, the Good-Turing, and Laplace estimators. We also show that the performance of the Laplace estimator is improved by using the new Fisher-scoring method.

[1]  Dietrich Braess,et al.  Bernstein polynomials and learning theory , 2004, J. Approx. Theory.

[2]  D. Berend,et al.  The Missing Mass Problem , 2011, 1111.2328.

[3]  Jayadev Acharya,et al.  Improved Bounds for Minimax Risk of Estimating Missing Mass , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[4]  Alfred O. Hero,et al.  Lower bounds for parametric estimation with constraints , 1990, IEEE Trans. Inf. Theory.

[5]  David A. McAllester,et al.  On the Convergence Rate of Good-Turing Estimators , 2000, COLT.

[6]  D. Berend,et al.  On the concentration of the missing mass , 2012, 1210.3248.

[7]  Alon Orlitsky,et al.  On Learning Markov Chains , 2018, NeurIPS.

[8]  Lang Tong,et al.  Estimation After Parameter Selection: Performance Analysis and Estimation Methods , 2015, IEEE Transactions on Signal Processing.

[9]  Lang Tong,et al.  Estimation of the number of operating sensors in large-scale sensor networks with mobile access , 2006, IEEE Transactions on Signal Processing.

[10]  Joseph Tabrikian,et al.  Bayesian Estimation in the Presence of Deterministic Nuisance Parameters—Part I: Performance Bounds , 2015, IEEE Transactions on Signal Processing.

[11]  B. C. Ng,et al.  On the Cramer-Rao bound under parametric constraints , 1998, IEEE Signal Processing Letters.

[12]  Joseph Tabrikian,et al.  Cramér-Rao Bound for Constrained Parameter Estimation Using Lehmann-Unbiasedness , 2020 .

[13]  Tirza Routtenberg,et al.  Cramér-Rao Bound for Estimation After Model Selection and Its Application to Sparse Vector Estimation , 2019, IEEE Transactions on Signal Processing.

[14]  P. McCullagh Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? , 2008 .

[15]  Alfred O. Hero,et al.  Empirical Non-Parametric Estimation of the Fisher Information , 2014, IEEE Signal Processing Letters.

[16]  Biing-Hwang Juang,et al.  On the bias of the Turing-Good estimate of probabilities , 1994, IEEE Trans. Signal Process..

[17]  Tirza Routtenberg,et al.  Low-Complexity Methods for Estimation After Parameter Selection , 2019, IEEE Transactions on Signal Processing.

[18]  Y. Yatracos On the rare species of a population , 1995 .

[19]  K. Burnham,et al.  Robust Estimation of Population Size When Capture Probabilities Vary Among Animals , 1979 .

[20]  Sanjeev R. Kulkarni,et al.  Probability Estimation in the Rare-Events Regime , 2011, IEEE Transactions on Information Theory.

[21]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22]  Fulvio Gini Estimation strategies in the presence of nuisance parameters , 1996, Signal Process..

[23]  Lang Tong,et al.  Good-Turing estimation of the number of operating sensors: a large deviations analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Alon Orlitsky,et al.  Competitive Distribution Estimation: Why is Good-Turing Good , 2015, NIPS.

[25]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[26]  Joseph Tabrikian,et al.  Non-Bayesian Periodic Cramér-Rao Bound , 2013, IEEE Transactions on Signal Processing.

[27]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[28]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[29]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[30]  Kenneth Ward Church,et al.  - 1-What ’ s Wrong with Adding One ? , 1994 .

[31]  Arthur Nádas,et al.  On Turing's formula for word probabilities , 1985, IEEE Trans. Acoust. Speech Signal Process..

[32]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[33]  Harold B. Sackrowitz,et al.  Admissibility of estimators of the probability of unobserved outcomes , 1990 .

[34]  Maciej Skorski,et al.  Missing Mass Concentration for Markov Chains , 2020 .

[35]  Joseph Tabrikian,et al.  Cramér-Rao Bound Under Norm Constraint , 2019, IEEE Signal Processing Letters.

[36]  B. Lindsay,et al.  Estimating the number of classes , 2007, 0708.2153.

[37]  Peter J. Bickel,et al.  On estimating the total probability of the unobserved outcomes of an experiment , 1986 .

[38]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[39]  Yonina C. Eldar,et al.  The Cramér-Rao Bound for Estimating a Sparse Parameter Vector , 2010, IEEE Transactions on Signal Processing.

[40]  Alon Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[41]  Harry Kesten,et al.  A PROPERTY OF THE MULTINOMIAL DISTRIBUTION , 1959 .

[42]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.