Robustness against separation and outliers in logistic regression

The logistic regression model is commonly used to describe the effect of one or several explanatory variables on a binary response variable. It suffers from the problem that its parameters are not identifiable when there is separation in the space of the explanatory variables. In that case, existing fitting techniques fail to converge or give the wrong answer. To remedy this, a slightly more general model is proposed under which the observed response is strongly related but not equal to the unobservable true response. This model will be called the hidden logistic regression model because the unobservable true responses are comparable to a hidden layer in a feedforward neural net. The maximum estimated likelihood estimator is proposed in this model. It is robust against separation, always exists, and is easy to compute. Outlier-robust estimation is also studied in this setting, yielding the weighted maximum estimated likelihood estimator.

[1]  Andreas Christmann,et al.  Measuring overlap in logistic regression , 1999 .

[2]  A. Ekholm,et al.  A MODEL FOR A BINARY RESPONSE WITH MISCLASSIFICATIONS , 1982 .

[3]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[4]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[5]  C. Müller,et al.  Breakdown points of trimmed likelihood estimators and related estimators in generalized linear models , 2003 .

[6]  Regina Y. Liu,et al.  Regression depth. Commentaries. Rejoinder , 1999 .

[7]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[8]  Andreas Christmann,et al.  Measuring overlap in binary regression , 2001 .

[9]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[10]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[11]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[12]  D. G. Simpson,et al.  Breakdown robustness of tests , 1990 .

[13]  D. J. Finney,et al.  The estimation from individual records of the relationship between dose and quantal response. , 1947, Biometrika.

[14]  B. Efron Double Exponential Families and Their Use in Generalized Linear Regression , 1986 .

[15]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[16]  Hans Riedwyl,et al.  Lineare Regression und Verwandtes , 1997 .

[17]  Andreas Christmann,et al.  Least median of weighted squares in logistic regression with large strata , 1994 .

[18]  D. Pregibon Resistant fits for some commonly used logistic models with medical application. , 1982, Biometrics.

[19]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[20]  Yangxin Huang Interval estimation of the ED 50 when a logistic dose-response curve is incorrectly assumed , 2001 .

[21]  Thomas J. Santner,et al.  A note on A. Albert and J. A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models , 1986 .

[22]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[23]  Nathan Intrator,et al.  Interpreting neural-network results: a simulation study , 2001 .

[24]  Thorsten Joachims,et al.  Comparison between various regression depth methods and the support vector machine to approximate the minimum number of missclassifications , 2002, Comput. Stat..

[25]  R. Carroll,et al.  Conditionally Unbiased Bounded-Influence Estimation in General Regression Models, with Applications to Generalized Linear Models , 1989 .