Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression

Logistic and probit regression models are commonly used in practice to analyze binary response data, but the maximum likelihood estimators of these models are not robust to outliers. This paper considers a robit regression model, which replaces the normal distribution in the probit regression model with a t-distribution with a known or unknown number of degrees of freedom. It is shown that (i) the maximum likelihood estimators of the robit model with a known number of degrees of freedom are robust; (ii) the robit link with about seven degrees of freedom provides an excellent approximation to the logistic link; and (iii) the robit link with a large number of degrees of freedom approximates the probit link. The maximum likelihood estimates can be obtained using efficient EM-type algorithms. EM-type algorithms also provide information that can be used to identify outliers, to which the maximum likelihood estimates of the logistic and probit regression coefficient would be sensitive. The EM algorithms for robit regression are easily modified to obtain efficient Data Augmentation (DA) algorithms for Bayesian inference with the robit regression model. The DA algorithms for robit regression model are much simpler to implement than the existing Gibbs sampler for the logistic regression model. A numerical example illustrates the methodology.

[1]  D. J. Finney,et al.  The estimation from individual records of the relationship between dose and quantal response. , 1947, Biometrika.

[2]  H. A. Luther,et al.  Applied numerical methods , 1969 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  E. Olusegun George,et al.  A remark on the shape of the logistic distribution , 1978 .

[5]  D. Pregibon Resistant fits for some commonly used logistic models with medical application. , 1982, Biometrics.

[6]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[7]  R. Wolke,et al.  Iteratively Reweighted Least Squares: Algorithms, Convergence Analysis, and Numerical Comparisons , 1988 .

[8]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[9]  Scott L. Zeger,et al.  Generalized linear models with random e ects: a Gibbs sampling approach , 1991 .

[10]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[11]  Iteratively reweighted least squares based learning , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[12]  D. Rubin,et al.  Parameter expansion to accelerate EM: The PX-EM algorithm , 1998 .

[13]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[14]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[15]  Ming-Hui Chen,et al.  Propriety of posterior distribution for dichotomous quantal response models , 2000 .

[16]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[17]  Donald B. Rubin Iteratively Reweighted Least Squares , 2006 .

[18]  Edward H Ip,et al.  General linear models. , 2007, Methods in molecular biology.