What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems

We consider basic conceptual questions concerning the relationship between statistical estimation and causal inference. Firstly, we show how to translate causal inference problems into an abstract statistical formalism without requiring any structure beyond an arbitrarily-indexed family of probability models. The formalism is simple but can incorporate a variety of causal modelling frameworks, including 'structural causal models', but also models expressed in terms of, e.g., differential equations. We focus primarily on the structural/graphical causal modelling literature, however. Secondly, we consider the extent to which causal and statistical concerns can be cleanly separated, examining the fundamental question: 'What can be estimated from data?'. We call this the problem of estimability. We approach this by analysing a standard formal definition of 'can be estimated' commonly adopted in the causal inference literature -- identifiability -- in our abstract statistical formalism. We use elementary category theory to show that identifiability implies the existence of a Fisher-consistent estimator, but also show that this estimator may be discontinuous, and thus unstable, in general. This difficulty arises because the causal inference problem is, in general, an ill-posed inverse problem. Inverse problems have three conditions which must be satisfied to be considered well-posed: existence, uniqueness, and stability of solutions. Here identifiability corresponds to the question of uniqueness; in contrast, we take estimability to mean satisfaction of all three conditions, i.e. well-posedness. Lack of stability implies that naive translation of a causally identifiable quantity into an achievable statistical estimation target may prove impossible. Our article is primarily expository and aimed at unifying ideas from multiple fields, though we provide new constructions and proofs.

[1]  V. P. Godambe,et al.  Robust estimation through estimating equations , 1984 .

[2]  C. Scovel,et al.  Brittleness of Bayesian Inference Under Finite Information in a Continuous World , 2013, 1304.6772.

[3]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[4]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  M. Bensebti,et al.  Statistical Model , 2005 .

[7]  Arthur Lewbel,et al.  The Identification Zoo: Meanings of Identification in Econometrics , 2019 .

[8]  Z. Geng,et al.  Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.

[9]  S. Geer,et al.  Regularization in statistics , 2006 .

[10]  M. Zuhair Nashed,et al.  A Unified Operator Theory of Generalized Inverses , 1976 .

[11]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[12]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  Piyush Srivastava,et al.  Stability of Causal Inference , 2016, Conference on Uncertainty in Artificial Intelligence.

[15]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[16]  L. J. Savage,et al.  The nonexistence of certain statistical procedures in nonparametric problems , 1956 .

[17]  Arne Gabrielsen Consistency and identifiability , 1978 .

[18]  F. W. Lawvere,et al.  Sets for Mathematics , 2003 .

[19]  Patrick Laurie Davies Data Analysis and Approximate Models: Model Choice, Location-Scale, Analysis of Variance, Nonparametric Regression and Image Analysis , 2014 .

[20]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[21]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[22]  C. W. Groetsch,et al.  Generalized inverses of linear operators , 1977 .

[23]  F. William Lawvere,et al.  Conceptual Mathematics: A First Introduction to Categories , 1997 .

[24]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[25]  Clifford H. Thurber,et al.  Parameter estimation and inverse problems , 2005 .

[26]  Shakeeb Khan,et al.  Irregular Identification, Support Conditions, and Inverse Weight Estimation , 2010 .

[27]  Bernhard Schölkopf,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  Judea Pearl,et al.  Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models , 2006, AAAI.

[30]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[31]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[32]  C. Small,et al.  Hilbert Space Methods in Probability and Statistical Inference , 1994 .

[33]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[34]  Houman Owhadi,et al.  On the Brittleness of Bayesian Inference , 2013, SIAM Rev..

[35]  D. Donoho One-sided inference about functionals of a density , 1988 .

[36]  Joel L. Horowitz,et al.  Ill-Posed Inverse Problems in Economics , 2013 .

[37]  P. Stark Inverse problems as statistics , 2002 .

[38]  Joris M. Mooij,et al.  Beyond Structural Causal Models: Causal Constraints Models , 2018, UAI.

[39]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[40]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[41]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[42]  Rauf Izmailov,et al.  V-matrix method of solving statistical inference problems , 2015, J. Mach. Learn. Res..