Label‐noise resistant logistic regression for functional data classification with an application to Alzheimer's disease study

Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI.

[1]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[2]  Babak A. Ardekani,et al.  Application of fused lasso logistic regression to the study of corpus callosum thickness in early Alzheimer's disease , 2014, Journal of Neuroscience Methods.

[3]  M. Victoria-Feser,et al.  Robust Logistic Regression for Binomial Responses , 2000 .

[4]  Seokho Lee,et al.  A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood , 2014, Stat. Comput..

[5]  J. Morris The Clinical Dementia Rating (CDR) , 1993, Neurology.

[6]  Hartwig R. Siebner,et al.  Corpus Callosum Atrophy in Patients with Mild Alzheimer’s Disease , 2011, Neurodegenerative Diseases.

[7]  S. P. Pederson,et al.  On Robustness in the Logistic Regression Model , 1993 .

[8]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[9]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .

[10]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[11]  Ana M. Bianco,et al.  Robust Estimation in the Logistic Regression Model , 1996 .

[12]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[13]  Enrico Blanzieri,et al.  Detecting potential labeling errors in microarrays by data perturbation , 2006, Bioinform..

[14]  Jan de Leeuw,et al.  Principal component analysis of binary data by iterated singular value decomposition , 2006, Comput. Stat. Data Anal..

[15]  S. MacEachern,et al.  Regularization of Case-Specific Parameters for Robustness and Efficiency , 2012, 1210.0701.

[16]  Babak A. Ardekani,et al.  Multi-Atlas Corpus Callosum Segmentation with Adaptive Atlas Selection , 2011 .

[17]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[18]  Benoît Frénay,et al.  A comprehensive introduction to label noise , 2014, ESANN.

[19]  Fabrice Muhlenbach,et al.  Identifying and Handling Mislabelled Instances , 2004, Journal of Intelligent Information Systems.

[20]  Jianhua Z. Huang,et al.  SPARSE LOGISTIC PRINCIPAL COMPONENTS ANALYSIS FOR BINARY DATA. , 2010, The annals of applied statistics.

[21]  G. Wahba Spline models for observational data , 1990 .

[22]  Paul M. Thompson,et al.  Callosal atrophy in mild cognitive impairment and Alzheimer's disease: Different effects in different stages , 2010, NeuroImage.

[23]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[24]  Ata Kabán,et al.  Label-Noise Robust Logistic Regression and Its Applications , 2012, ECML/PKDD.

[25]  Hans-Paul Schwefel,et al.  A comprehensive introduction , 2002 .

[26]  Benoît Frénay,et al.  A Comprehensive Introduction to Label Noise: Proceedings of the 2014 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) , 2014 .

[27]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[28]  R. Defendini,et al.  Sexual dimorphism of the human corpus callosum from three independent samples: relative size of the corpus callosum. , 1993, American journal of physical anthropology.

[29]  Seo Young Park,et al.  Robust penalized logistic regression with truncated loss functions , 2011, Canadian Journal of Statistics-revue Canadienne De Statistique.

[30]  Christopher D. Manning,et al.  Robust Logistic Regression using Shift Parameters , 2013, ACL.

[31]  Hyejin Shin An extension of Fisher's discriminant analysis for stochastic processes , 2008 .

[32]  John G. Csernansky,et al.  Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older Adults , 2007, Journal of Cognitive Neuroscience.