Maximizing proportions of correct classifications in binary logistic regression

Abstract In this paper, we give simple mathematical results that allow us to get all cut-off points that maximize the overall proportion of correct classifications in any binary classification method (and, in particular, in binary logistic regression). In addition, we give results that allow us to get all cut-off points that maximize a weighted combination of specificity and sensitivity. In addition, we discuss measures of association between predicted probabilities and observed responses, and, in particular, we discuss the calculation of the overall percentages of concordant, discordant, and tied pairs of input observations with different responses. We mention that the calculation of these quantities by SAS and Minitab is sometimes incorrect. The concepts and methods of the paper are illustrated by a hypothetical example of school retention data.

[1]  Panayiotis Theodossiou,et al.  FINANCIAL DISTRESS AND CORPORATE ACQUISITIONS: FURTHER EMPIRICAL EVIDENCE , 1996 .

[2]  Panayiotis Theodossiou,et al.  ALTERNATIVE MODELS FOR ASSESSING THE FINANCIAL CONDITION OF BUSINESS IN GREECE , 1991 .

[3]  Nitin R. Patel,et al.  Computing Distributions for Exact Logistic Regression , 1987 .

[4]  P. Hadjicostas,et al.  The asymptotic distribution of the proportion of correct classifications for a holdout sample in logistic regression , 2001 .

[5]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[6]  S. Menard Applied Logistic Regression Analysis , 1996 .

[7]  Nitin R. Patel,et al.  Exact logistic regression: theory and examples. , 1995, Statistics in medicine.

[8]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[9]  Shelby J. Haberman,et al.  Maximum Likelihood Estimates in Exponential Response Models , 1977 .

[10]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[11]  S. Haberman,et al.  The analysis of frequency data , 1974 .

[12]  Terence J. O'Neill The General Distribution of the Error Rate of a Classification Procedure With Application to Logistic Regression Discrimination , 1980 .

[13]  T. Amemiya QUALITATIVE RESPONSE MODELS: A SURVEY , 1981 .

[14]  S. Menard Coefficients of Determination for Multiple Logistic Regression Analysis , 2000 .

[15]  S. Cessie,et al.  Logistic Regression, a review , 1988 .

[16]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[17]  Anastasios A. Tsiatis,et al.  Median Unbiased Estimation for Binary Data , 1989 .

[18]  Mezbahur Rahman,et al.  A note on logistic regression , 2001 .

[19]  Thomas P. Ryan,et al.  Modern Regression Methods , 1996 .

[20]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[21]  The Bias of Estimating Equations with Application to the Error Rate of Logistic Discrimination , 1994 .

[22]  Terence J. O'Neill Error rates of non-Bayes classification rules and the robustness of Fisher's linear discriminant function , 1992 .

[23]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[24]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .