Bernoulli Regression Models: Revisiting the Specification of Statistical Models with Binary Dependent Variables

The latent variable and generalized linear modelling approaches do not provide a systematic approach for modelling discrete choice observational data. Another alternative, the probabilistic reduction (PR) approach, provides a systematic way to specify such models that can yield reliable statistical and substantive inferences. The purpose of this paper is to re-examine the underlying probabilistic foundations of conditional statistical models with binary dependent variables using the PR approach. This leads to the development of the Bernoulli Regression Model, a family of statistical models, which includes the binary logistic regression model. The paper provides an explicit presentation of probabilistic model assumptions, guidance on model specification and estimation, and empirical application.

[1]  S. Warner Multivariate Regression of Dummy Variates under Normality Assumptions , 1963 .

[2]  R. Kay,et al.  Transformations of the explanatory variables in the logistic regression model for binary data , 1987 .

[3]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[4]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[5]  S. Zeger,et al.  Multivariate Regression Analyses for Categorical Data , 1992 .

[6]  Rashid B. Al-Hmoud,et al.  A Means to an End: Studying the Existing Environment for Private Sector Participation in the Water and Sanitation Sector , 2004 .

[7]  S. Cosslett DISTRIBUTION-FREE MAXIMUM LIKELIHOOD ESTIMATOR OF THE BINARY CHOICE MODEL1 , 1983 .

[8]  J. Marschak Binary Choice Constraints on Random Utility Indicators , 1959 .

[9]  B. Arnold,et al.  Conditional specification of statistical models , 1999 .

[10]  J. Hardin,et al.  Generalized Linear Models and Extensions , 2001 .

[11]  Richard Kay,et al.  Assessing the fit of the logistic model: a case study of children with the haemolytic uraemic syndrome , 1986 .

[12]  G. McLachlan,et al.  LOGISTIC REGRESSION COMPARED TO NORMAL DISCRIMINATION FOR NON-NORMAL POPULATIONS‘ , 1980 .

[13]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[14]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[15]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[16]  Daniel A. Powers,et al.  Statistical Methods for Categorical Data Analysis , 1999 .

[17]  Christian Gourieroux,et al.  Econometrics of Qualitative Dependent Variables , 2000 .

[18]  Norman R. Draper,et al.  On Distributions and Their Transformation to Normality , 1969 .

[19]  Richard A. Johnson,et al.  A new family of power transformations to improve normality or symmetry , 2000 .

[20]  Aris Spanos,et al.  Probability theory and statistical inference: econometric modelling with observational data , 1999 .

[21]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[22]  John H. Aldrich,et al.  Linear probability, logit and probit models , 1984 .

[23]  Robert F. Tate,et al.  Correlation Between a Discrete and a Continuous Variable. Point-Biserial Correlation , 1954 .

[24]  K. Train Discrete Choice Methods with Simulation , 2003 .

[25]  J. Doornik,et al.  An Omnibus Test for Univariate and Multivariate Normality , 2008 .

[26]  Geert Dhaene,et al.  Probability Theory and Statistical Inference: Econometric Modeling With Observational Data , 2001 .

[27]  Day Ne,et al.  A GENERAL MAXIMUM LIKELIHOOD DISCRIMINANT , 1967 .

[28]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[29]  J. Anderson Separate sample logistic discrimination , 1972 .

[30]  Ingram Olkin,et al.  Multivariate Correlation Models with Mixed Discrete and Continuous Variables , 1961 .

[31]  Francisco J. Aranda-Ordaz,et al.  On Two Families of Transformations to Additivity for Binary Response Data , 1981 .

[32]  D. A. Williams,et al.  Extra‐Binomial Variation in Logistic Linear Models , 1982 .

[33]  D. McFadden Econometric analysis of qualitative response models , 1984 .

[34]  Luca Scrucca,et al.  A Simulation Study to Investigate the Behavior of the Log-Density Ratio Under Normality , 2004 .