A general framework for maximizing likelihood under incomplete data

Abstract Maximum likelihood is a standard approach to computing a probability distribution that best fits a given dataset. However, when datasets are incomplete or contain imprecise data, a major issue is to properly define the likelihood function to be maximized. This paper highlights the fact that there are several possible likelihood functions to be considered, depending on the purpose to be addressed, namely whether the behavior of the imperfect measurement process causing incompleteness should be included or not in the model, and what are the assumptions we can make or the knowledge we have about this measurement process. Various possible approaches, that differ by the choice of the likelihood function and/or the attitude of the analyst in front of imprecise information are comparatively discussed on examples, and some light is shed on the nature of the corresponding solutions.

[1]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[2]  Inés Couso,et al.  Advocating the Use of Imprecisely Observed Data in Genetic Fuzzy Systems , 2007, IEEE Transactions on Fuzzy Systems.

[3]  Philippe Smets,et al.  Constructing the Pignistic Probability Function in a Context of Uncertainty , 1989, UAI.

[4]  Thomas Augustin,et al.  Statistical Modelling under Epistemic Data Imprecision: Some Results on Estimating Multinomial Distributions and Logistic Regression for Coarse Categorical Data , 2015 .

[5]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[6]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[7]  Daniel F. Heitjan,et al.  Ignorability, Sufficiency and Ancillarity , 1997 .

[8]  Inés Couso,et al.  Diagnosis of dyslexia with low quality data with genetic fuzzy systems , 2010, Int. J. Approx. Reason..

[9]  Ting Hsiang Lin,et al.  A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data , 2010 .

[10]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[11]  Didier Dubois,et al.  Statistical reasoning with set-valued information: Ontic vs. epistemic views , 2014, Int. J. Approx. Reason..

[12]  Dominique Guyonnet,et al.  A fuzzy constraint-based approach to data reconciliation in material flow analysis , 2014, Int. J. Gen. Syst..

[13]  Eyke Hüllermeier,et al.  Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization , 2013, Int. J. Approx. Reason..

[14]  Serafín Moral,et al.  Upper entropy of credal sets. Applications to credal classification , 2005, Int. J. Approx. Reason..

[15]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[16]  R. Jeffrey The Logic of Decision , 1984 .

[17]  Mathieu Serrurier,et al.  An informational distance for estimating the faithfulness of a possibility distribution, viewed as a family of probability distributions, with respect to data , 2013, Int. J. Approx. Reason..

[18]  Charles F. Manski,et al.  Confidence Intervals for Partially Identified Parameters , 2003 .

[19]  Thierry Denoeux,et al.  Clustering and classification of fuzzy data using the fuzzy EM algorithm , 2016, Fuzzy Sets Syst..

[20]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[21]  Chuong B Do,et al.  What is the expectation maximization algorithm? , 2008, Nature Biotechnology.

[22]  Didier Dubois,et al.  Belief Revision and the EM Algorithm , 2016, IPMU.

[23]  Jorge Casillas,et al.  Genetic learning of fuzzy rules based on low quality data , 2009, Fuzzy Sets Syst..

[24]  Inés Couso,et al.  Machine learning models, epistemic set-valued data and generalized loss functions: An encompassing approach , 2016, Inf. Sci..

[25]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[26]  Thierry Denoeux,et al.  Inferring a possibility distribution from empirical data , 2006, Fuzzy Sets Syst..

[27]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[28]  Thomas Augustin,et al.  Testing of Coarsening Mechanisms: Coarsening at Random Versus Subgroup Independence , 2016, SMPS.

[29]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[30]  Serafín Moral,et al.  Range of Entropy for Credal Sets , 2004 .

[31]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[32]  Thierry Denoeux,et al.  Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework , 2013, IEEE Transactions on Knowledge and Data Engineering.

[33]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[34]  Charles F. Manski,et al.  Partial identification with missing data: concepts and findings , 2005, Int. J. Approx. Reason..

[35]  Jesús Cid-Sueiro,et al.  Proper losses for learning from partial labels , 2012, NIPS.

[36]  Manfred Jaeger The AI&M Procedure for Learning from Incomplete Data , 2006, UAI.

[37]  A. P. Dawid,et al.  Likelihood and Bayesian Inference from Selectively Reported Data , 1977 .

[38]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[39]  Didier Dubois,et al.  Maximum Likelihood Under Incomplete Information: Toward a Comparison of Criteria , 2016, SMPS.

[40]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[41]  Didier Dubois,et al.  Random Sets and Random Fuzzy Sets as Ill-Perceived Random Variables: An Introduction for Ph.D. Students and Practitioners , 2014 .

[42]  Thierry Denoeux,et al.  Making Use of Partial Knowledge About Hidden States in HMMs: An Approach Based on Belief Functions , 2014, IEEE Transactions on Fuzzy Systems.

[43]  M. Jaeger,et al.  Ignorability in Statistical and Probabilistic Inference , 2005, J. Artif. Intell. Res..

[44]  P. M. Williams Bayesian Conditionalisation and the Principle of Minimum Information , 1980, The British Journal for the Philosophy of Science.

[45]  Eyke Hüllermeier,et al.  Superset Learning Based on Generalized Loss Minimization , 2015, ECML/PKDD.

[46]  Didier Dubois,et al.  Robust parameter estimation of density functions under fuzzy interval observations , 2015 .

[47]  C. Manski Partial Identification of Probability Distributions , 2003 .

[48]  D. Rubin,et al.  Ignorability and Coarse Data , 1991 .