Estimation of Semiparametric Models in the Presence of Endogeneity and Sample Selection

We analyze a semiparametric model for data that suffer from the problems of sample selection, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. The introduction of nonparametric functions in the model permits great flexibility in the way covariates affect response variables. We present an efficient Bayesian method for the analysis of such models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered. Estimation is by Markov chain Monte Carlo (MCMC) methods. The algorithm we propose does not require simulation of the outcomes that are missing due to the selection mechanism, which reduces the computational load and improves the mixing of the MCMC chain. The approach is applied to a model of women’s labor force participation and log-wage determination. Data and computer code used in this article are available online.

[1]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[2]  Patrick A. Puhani,et al.  The Heckman Correction for Sample Selection and Its Critique - A Short Survey , 2000 .

[3]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[4]  R. Shiller,et al.  Smoothness Priors and Nonlinear Regression , 1982 .

[5]  J. MacKinnon,et al.  Seasonality in Regression: An Application of Smoothness Priors , 1978 .

[6]  Bani K. Mallick,et al.  Accounting for Model Uncertainty in Seemingly Unrelated Regressions , 2002 .

[7]  Adrian F. M. Smith,et al.  Automatic Bayesian curve fitting , 1998 .

[8]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[9]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[10]  T. Raghunathan,et al.  A Bayesian Approach for Clustered Longitudinal Ordinal Outcome With Nonignorable Missing Data , 2006 .

[11]  R. Kass,et al.  Bayesian curve-fitting with free-knot splines , 2001 .

[12]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[13]  S. Chib Bayes inference in the Tobit censored regression model , 1992 .

[14]  Siddhartha Chib,et al.  Semiparametric Modeling and Estimation of Instrumental Variable Models , 2007 .

[15]  D. Poirier,et al.  Semiparametric Bayesian inference in multiple equation models , 2005 .

[16]  S. Chib,et al.  Analysis of Additive Instrumental Variable Models , 2005 .

[17]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[18]  S. Chib,et al.  Bayesian analysis of cross-section and clustered data treatment models , 2000 .

[19]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[20]  Thomas S. Shively,et al.  Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior , 1999 .

[21]  Siddhartha Chib,et al.  Semiparametric Bayes analysis of longitudinal data treatment models , 2002 .

[22]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[23]  G. Wahba Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression , 1978 .

[24]  Ekaterini Kyriazidou,et al.  Estimation of a Panel Data Sample Selection Model , 1997 .

[25]  Keisuke Hirano,et al.  Semiparametric Bayesian Inference in Autoregressive Panel Data Models , 2002 .

[26]  C. Goldin,et al.  Life-Cycle Labor-Force Participation of Married Women: Historical Evidence and Implications , 1983, Journal of Labor Economics.

[27]  Siddhartha Chib,et al.  Inference in Semiparametric Dynamic Models for Binary Longitudinal Data , 2006 .

[28]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[29]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[30]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[31]  R. Kohn,et al.  Nonparametric seemingly unrelated regression , 2000 .

[32]  S. MacEachern,et al.  Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing , 2005 .

[33]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[34]  A. Gallant,et al.  Semi-nonparametric Maximum Likelihood Estimation , 1987 .

[35]  Pravin K. Trivedi,et al.  Bayesian analysis of a self-selection model with multiple outcomes using simulation-based estimation: an application to the demand for healthcare , 2003 .

[36]  R. Shiller A DISTRIBUTED LAG ESTIMATOR DERIVED FROM SMOOTHNESS PRIORS , 1973 .

[37]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[38]  D. Poirier,et al.  Bayesian Semiparametric Inference in Multiple Equation Models , 2003 .

[39]  M. Hansen,et al.  Spline Adaptation in Extended Linear Models , 1998 .

[40]  P. Müller,et al.  A Bayesian Model for Detecting Acute Change in Nonlinear Profiles , 2001 .

[41]  Siddhartha Chib,et al.  MARKOV CHAIN MONTE CARLO METHODS: COMPUTATION AND INFERENCE , 2001 .

[42]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[43]  J. Geweke,et al.  Computationally Intensive Methods for Integration in Econometrics , 2001 .

[44]  T. Mroz,et al.  The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions , 1987 .

[45]  S. Chib,et al.  Marginal Likelihood and Bayes Factors for Dirichlet Process Mixture Models , 2003 .

[46]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[47]  Kerrie Mengersen,et al.  [Bayesian Computation and Stochastic Systems]: Rejoinder , 1995 .

[48]  J. Heckman Sample selection bias as a specification error , 1979 .

[49]  Refik Soyer,et al.  Bayesian Methods for Nonlinear Classification and Regression , 2004, Technometrics.

[50]  Thomas S. Shively,et al.  Variable selection and function estimation in additive nonparametric regression using a data-based prior. Commentary. Authors' reply , 1999 .

[51]  Siddhartha Chib,et al.  Hierarchical analysis of SUR models with extensions to correlated serial errors and time-varying parameter models☆ , 1995 .

[52]  Qi Li,et al.  Recent Two-Stage Sample Selection Procedures With an Application to the Gender Wage Gap , 2003 .

[53]  J. Besag,et al.  Bayesian Computation and Stochastic Systems , 1995 .

[54]  Thomas S. Shively,et al.  Model selection in spline nonparametric regression , 2002 .

[55]  Whitney K. Newey,et al.  Nonparametric Estimation of Sample Selection Models , 2003 .

[56]  A. Gelfand,et al.  Dirichlet Process Mixed Generalized Linear Models , 1997 .

[57]  Siddhartha Chib,et al.  Analysis of treatment response data without the joint distribution of potential outcomes , 2007 .

[58]  L. Fahrmeir,et al.  Bayesian inference for generalized additive mixed models based on Markov random field priors , 2001 .

[59]  Bradley P. Carlin,et al.  On MCMC sampling in hierarchical longitudinal models , 1999, Stat. Comput..

[60]  Charles Kooperberg,et al.  Spline Adaptation in Extended Linear Models (with comments and a rejoinder by the authors , 2002 .

[61]  J. Heckman What Has Been Learned about Labor Supply in the Past Twenty Years , 1993 .

[62]  P. Hall,et al.  Nonparametric methods for inference in the presence of instrumental variables , 2003, math/0603130.

[63]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .