论文信息 - The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

In many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA.

Yingtao Bi | Daniel R. Jeske | D. Jeske | Yingtao Bi

[1] Bernhard Schölkopf,et al. Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[2] Subhas C. Nandy,et al. Efficiency of discriminant analysis when initial samples are classified stochastically , 1990, Pattern Recognit..

[3] Xingquan Zhu,et al. Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[4] Andrian Marcus,et al. Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[5] Ker-Chau Li,et al. Regression Analysis Under Link Violation , 1989 .

[6] B. Efron. The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[7] M. Höfler. The effect of misclassification on the estimation of association: a review , 2005, International journal of methods in psychiatric research.

[8] L. Magder,et al. Logistic regression when the outcome is measured with uncertainty. , 1997, American journal of epidemiology.

[9] Wensheng Zhang,et al. A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer , 2006, Bioinform..

[10] T. Hassard,et al. Applied Linear Regression , 2005 .

[11] R. Tripathi,et al. The Effect of Errors in Diagnosis and Measurement on the Estimation of the Probability of an Event , 1980 .

[12] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[13] T. Krishnan,et al. Pattern recognition with an imperfect supervisor , 1989, Pattern Recognit..

[14] H. White. Maximum Likelihood Estimation of Misspecified Models , 1982 .

[15] Marcel J. T. Reinders,et al. Classification in the presence of class noise using a probabilistic Kernel Fisher method , 2007, Pattern Recognit..

[16] Li Hsu,et al. Partially Supervised Learning Using an EM‐Boosting Algorithm , 2004, Biometrics.

[17] T. Krishnan. Efficiency of learning with imperfect supervision , 1988, Pattern Recognit..

[18] J. Neuhaus. Bias and efficiency loss due to misclassified responses in binary regression , 1999 .

[19] Haym Hirsh,et al. Classifier Learning from Noisy Data as Probabilistic Evidence Combination , 1992, AAAI.