Comparison of logistic regression and linear discriminant analysis

Two of the most widely used statistical methods for analyzing categorical outcome variables are linear discriminant analysis and logistic regression. While both are appropriate for the development of linear classification models, linear discriminant analysis makes more assumptions about the underlying data. Hence, it is assumed that logistic regression is the more flexible and more robust method in case of violations of these assumptions. In this paper we consider the problem of choosing between the two methods, and set some guidelines for proper choice. The comparison between the methods is based on several measures of predictive accuracy. The performance of the methods is studied by simulations. We start with an example where all the assumptions of the linear discriminant analysis are satisfied and observe the impact of changes regarding the sample size, covariance matrix, Mahalanobis distance and direction of distance between group means. Next, we compare the robustness of the methods towards categorisation and non-normality of explanatory variables in a closely controlled way. We show that the results of LDA and LR are close whenever the normality assumptions are not too badly violated, and set some guidelines for recognizing these situations. We discuss the inappropriateness of LDA in all other cases.