Non-Bayesian Parametric Missing-Mass Estimation

We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the constrained Cramér-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators of the entire probability mass function (pmf) vector, does not provide a relevant bound for missing-mass estimation. In this paper, we introduce a non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We derive a non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the missing-mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The proposed mmCCRB can be used for system design and for the performance evaluation of existing estimators. Moreover, based on the mmCCRB, we propose a new method to improve estimators by an iterative missing-mass Fisher-scoring method. Finally, we demonstrate via numerical simulations that the biased mmCCRB is a valid and informative lower bound on the mmMSE of state-of-the-art estimators for this problem: the CML, asymptotic profile maximum likelihood (aPML), Good-Turing, and Laplace estimators. We also show that the mmMSE and missing-mass bias of the Laplace estimator is reduced by using the new missing-mass Fisher-scoring method.

[1]  Hagit Messer,et al.  Total performance evaluation of intensity estimation after detection , 2021, Signal Process..

[2]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[3]  Tirza Routtenberg,et al.  Cramér-Rao Bound for Estimation After Model Selection and Its Application to Sparse Vector Estimation , 2019, IEEE Transactions on Signal Processing.

[4]  Tirza Routtenberg,et al.  Bayesian Post-Model-Selection Estimation , 2021, IEEE Signal Processing Letters.

[5]  Maciej Skorski,et al.  Missing Mass Concentration for Markov Chains , 2020 .

[6]  Tirza Routtenberg,et al.  Low-Complexity Methods for Estimation After Parameter Selection , 2019, IEEE Transactions on Signal Processing.

[7]  Joseph Tabrikian,et al.  Cramér-Rao Bound for Constrained Parameter Estimation Using Lehmann-Unbiasedness , 2020 .

[8]  Joseph Tabrikian,et al.  Cramér-Rao Bound Under Norm Constraint , 2019, IEEE Signal Processing Letters.

[9]  Alon Orlitsky,et al.  The Broad Optimality of Profile Maximum Likelihood , 2019, NeurIPS.

[10]  Tsachy Weissman,et al.  Approximate Profile Maximum Likelihood , 2017, J. Mach. Learn. Res..

[11]  Alon Orlitsky,et al.  On Learning Markov Chains , 2018, NeurIPS.

[12]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[13]  Jayadev Acharya,et al.  Improved Bounds for Minimax Risk of Estimating Missing Mass , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[14]  Alon Orlitsky,et al.  A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions , 2017, ICML.

[15]  Andrew Thangaraj,et al.  Minimax risk for missing mass estimation , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[16]  Joseph Tabrikian,et al.  Optimal biased estimation using Lehmann-unbiasedness , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Emre Ertin,et al.  Constrained Fisher Scoring for a Mixture of Factor Analyzers , 2016 .

[18]  Lang Tong,et al.  Estimation After Parameter Selection: Performance Analysis and Estimation Methods , 2015, IEEE Transactions on Signal Processing.

[19]  Alon Orlitsky,et al.  Competitive Distribution Estimation: Why is Good-Turing Good , 2015, NIPS.

[20]  Joseph Tabrikian,et al.  Bayesian Estimation in the Presence of Deterministic Nuisance Parameters—Part I: Performance Bounds , 2015, IEEE Transactions on Signal Processing.

[21]  Alfred O. Hero,et al.  Empirical Non-Parametric Estimation of the Fisher Information , 2014, IEEE Signal Processing Letters.

[22]  Joseph Tabrikian,et al.  Cyclic Barankin-Type Bounds for Non-Bayesian Periodic Parameter Estimation , 2014, IEEE Transactions on Signal Processing.

[23]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[24]  Thomas Brox,et al.  Maximum Likelihood Estimation , 2019, Time Series Analysis.

[25]  Joseph Tabrikian,et al.  Non-Bayesian Periodic Cramér-Rao Bound , 2013, IEEE Transactions on Signal Processing.

[26]  D. Berend,et al.  On the concentration of the missing mass , 2012, 1210.3248.

[27]  W. Marsden I and J , 2012 .

[28]  D. Berend,et al.  The Missing Mass Problem , 2011, 1111.2328.

[29]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[30]  Sanjeev R. Kulkarni,et al.  Probability Estimation in the Rare-Events Regime , 2011, IEEE Transactions on Information Theory.

[31]  Yonina C. Eldar,et al.  The Cramér-Rao Bound for Estimating a Sparse Parameter Vector , 2010, IEEE Transactions on Signal Processing.

[32]  Yonina C. Eldar,et al.  On the Constrained CramÉr–Rao Bound With a Singular Fisher Information Matrix , 2009, IEEE Signal Processing Letters.

[33]  Brian M. Sadler,et al.  Maximum-Likelihood Estimation, the CramÉr–Rao Bound, and the Method of Scoring With Parameter Constraints , 2008, IEEE Transactions on Signal Processing.

[34]  Yonina C. Eldar,et al.  Rethinking Biased Estimation , 2008 .

[35]  P. McCullagh Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? , 2008 .

[36]  B. Lindsay,et al.  Estimating the number of classes , 2007, 0708.2153.

[37]  Lang Tong,et al.  Estimation of the number of operating sensors in large-scale sensor networks with mobile access , 2006, IEEE Transactions on Signal Processing.

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  Alon Orlitsky,et al.  On Modeling Profiles Instead of Values , 2004, UAI.

[40]  Yonina C. Eldar Minimum variance in biased estimation: bounds and asymptotically optimal estimators , 2004, IEEE Transactions on Signal Processing.

[41]  Dietrich Braess,et al.  Bernstein polynomials and learning theory , 2004, J. Approx. Theory.

[42]  Lang Tong,et al.  Good-Turing estimation of the number of operating sensors: a large deviations analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  Alon Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[44]  David A. McAllester,et al.  On the Convergence Rate of Good-Turing Estimators , 2000, COLT.

[45]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[46]  B. C. Ng,et al.  On the Cramer-Rao bound under parametric constraints , 1998, IEEE Signal Processing Letters.

[47]  Ram Zamir,et al.  A Proof of the Fisher Information Inequality via a Data Processing Argument , 1998, IEEE Trans. Inf. Theory.

[48]  Fulvio Gini Estimation strategies in the presence of nuisance parameters , 1996, Signal Process..

[49]  Y. Yatracos On the rare species of a population , 1995 .

[50]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[51]  Biing-Hwang Juang,et al.  On the bias of the Turing-Good estimate of probabilities , 1994, IEEE Trans. Signal Process..

[52]  Kenneth Ward Church,et al.  - 1-What ’ s Wrong with Adding One ? , 1994 .

[53]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[54]  S. Lo From the Species Problem to a General Coverage Problem via a New Interpretation , 1992 .

[55]  Harold B. Sackrowitz,et al.  Admissibility of estimators of the probability of unobserved outcomes , 1990 .

[56]  Alfred O. Hero,et al.  Lower bounds for parametric estimation with constraints , 1990, IEEE Trans. Inf. Theory.

[57]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[58]  Peter J. Bickel,et al.  On estimating the total probability of the unobserved outcomes of an experiment , 1986 .

[59]  Arthur Nádas,et al.  On Turing's formula for word probabilities , 1985, IEEE Trans. Acoust. Speech Signal Process..

[60]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[61]  K. Burnham,et al.  Robust Estimation of Population Size When Capture Probabilities Vary Among Animals , 1979 .

[62]  R. Kempton,et al.  Stochastic Abundance Models , 1980 .

[63]  Steinar Engen Stochastic abundance models, with emphasis on biological communities and species diversity , 1978 .

[64]  N. L. Johnson Linear Statistical Inference and Its Applications , 1966 .

[65]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[66]  Harry Kesten,et al.  A PROPERTY OF THE MULTINOMIAL DISTRIBUTION , 1959 .

[67]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[68]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .