Partial Identifiability in Discrete Data With Measurement Error

When data contains measurement errors, it is necessary to make assumptions relating the observed, erroneous data to the unobserved true phenomena of interest. These assumptions should be justifiable on substantive grounds, but are often motivated by mathematical convenience, for the sake of exactly identifying the target of inference. We adopt the view that it is preferable to present bounds under justifiable assumptions than to pursue exact identification under dubious ones. To that end, we demonstrate how a broad class of modeling assumptions involving discrete variables, including common measurement error and conditional independence assumptions, can be expressed as linear constraints on the parameters of the model. We then use linear programming techniques to produce sharp bounds for factual and counterfactual distributions under measurement error in such models. We additionally propose a procedure for obtaining outer bounds on non-linear models. Our method yields sharp bounds in a number of important settings – such as the instrumental variable scenario with measurement error – for which no bounds were previously known.

[1]  J. Pearl,et al.  Measurement bias and effect restoration in causal inference , 2014 .

[2]  Francesca Molinari Partial identification of probability distributions with misclassified data , 2008 .

[3]  Walter W. Hauck,et al.  Effects of Interviewer Gender, Interviewer Choice, and Item Wording on Responses to Questions Concerning Sexual Behavior , 1996 .

[4]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexander Balke,et al.  Probabilistic counterfactuals: semantics, computation, and applications , 1996 .

[6]  Robin J. Evans,et al.  Graphs for Margins of Bayesian Networks , 2014, 1408.1809.

[7]  Christopher Meek,et al.  Quantifier Elimination for Statistical Problems , 1999, UAI.

[8]  J. Angrist,et al.  Does Compulsory School Attendance Affect Schooling and Earnings? , 1990 .

[9]  Fabio Sciarrino,et al.  Exclusivity Graph Approach to Instrumental Inequalities , 2019, UAI.

[10]  Suchi Saria,et al.  Learning Models from Data with Measurement Error: Tackling Underreporting , 2019, ICML.

[11]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[12]  Blai Bonet,et al.  Instrumentality Tests Revisited , 2001, UAI.

[13]  Joel L. Horowitz,et al.  Identification and Robustness with Contaminated and Corrupted Data , 1995 .

[14]  R. Spitzer,et al.  The PHQ-9: validity of a brief depression severity measure. , 2001, Journal of general internal medicine.

[15]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[16]  A. Fine Hidden Variables, Joint Probability, and the Bell Inequalities , 1982 .

[17]  John Torous,et al.  Utilizing a Personal Smartphone Custom App to Assess the Patient Health Questionnaire-9 (PHQ-9) Depressive Symptoms in Patients With Major Depressive Disorder , 2015, JMIR mental health.

[18]  Maarten van Smeden,et al.  Measurement error is often neglected in medical literature: a systematic review. , 2018, Journal of clinical epidemiology.

[19]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[20]  Panos M. Pardalos,et al.  Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..

[21]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[22]  Thomas G. Dietterich,et al.  Three-quarter Sibling Regression for Denoising Observational Data , 2019, IJCAI.

[23]  P. Levy Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments , 2004 .

[24]  Robert W. Spekkens,et al.  The Inflation Technique for Causal Inference with Latent Variables , 2016, Journal of Causal Inference.

[25]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[26]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Teppei Yamamoto,et al.  Causal Inference with Differential Measurement Error: Nonparametric Identification and Sensitivity Analysis , 2010 .

[28]  Judea Pearl,et al.  Nonparametric Bounds on Causal Effects from Partial Compliance Data , 2011 .

[29]  Robin J. Evans,et al.  Graphical methods for inequality constraints in marginalized DAGs , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[30]  L. Huddy,et al.  The Effect of Interviewer Gender on the Survey Response , 1997 .

[31]  R. Evans Margins of discrete Bayesian networks , 2015, The Annals of Statistics.

[32]  Marc Henry,et al.  Partial Identification of Finite Mixtures in Econometric Models , 2013 .

[33]  S. Taubman,et al.  The Effect of Medicaid on Management of Depression: Evidence From the Oregon Health Insurance Experiment , 2018, The Milbank quarterly.

[34]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[35]  Péter Sólymos,et al.  Conditional likelihood approach for analyzing single visit abundance survey data in the presence of zero inflation and detection error , 2012 .

[36]  James M. Robins,et al.  Partial Identification of the Average Treatment Effect Using Instrumental Variables: Review of Methods for Binary Instruments, Treatments, and Outcomes , 2018, Journal of the American Statistical Association.

[37]  S. Becker,et al.  The effect of the sex of interviewers on the quality of data in a Nigerian family planning questionnaire. , 1995, Studies in family planning.

[38]  C. Manski Nonparametric Bounds on Treatment Effects , 1989 .