The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

In many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA.

[1]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[2]  Subhas C. Nandy,et al.  Efficiency of discriminant analysis when initial samples are classified stochastically , 1990, Pattern Recognit..

[3]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[4]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[5]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[6]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[7]  M. Höfler The effect of misclassification on the estimation of association: a review , 2005, International journal of methods in psychiatric research.

[8]  L. Magder,et al.  Logistic regression when the outcome is measured with uncertainty. , 1997, American journal of epidemiology.

[9]  Wensheng Zhang,et al.  A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer , 2006, Bioinform..

[10]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[11]  R. Tripathi,et al.  The Effect of Errors in Diagnosis and Measurement on the Estimation of the Probability of an Event , 1980 .

[12]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[13]  T. Krishnan,et al.  Pattern recognition with an imperfect supervisor , 1989, Pattern Recognit..

[14]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[15]  Marcel J. T. Reinders,et al.  Classification in the presence of class noise using a probabilistic Kernel Fisher method , 2007, Pattern Recognit..

[16]  Li Hsu,et al.  Partially Supervised Learning Using an EM‐Boosting Algorithm , 2004, Biometrics.

[17]  T. Krishnan Efficiency of learning with imperfect supervision , 1988, Pattern Recognit..

[18]  J. Neuhaus Bias and efficiency loss due to misclassified responses in binary regression , 1999 .

[19]  Haym Hirsh,et al.  Classifier Learning from Noisy Data as Probabilistic Evidence Combination , 1992, AAAI.