Class Imbalance in the Prediction of Dementia from Neuropsychological Data

Class imbalance affects medical diagnosis, as the number of disease cases is often outnumbered. When it is severe, learning algorithms fail to retrieve the rarer classes and common assessment metrics become uninformative. In this work, class imbalance is approached using neuropsychological data, with the aim of differentiating Alzheimer’s Disease (AD) from Mild Cognitive Impairment (MCI) and predicting the conversion from MCI to AD. The effect of the imbalance on four learning algorithms is examined through the application of bagging, Bayes risk minimization and MetaCost. Plain decision trees were always outperformed, indicating susceptibility to the imbalance. The naive Bayes classifier was robust but suffered a bias that was adjusted through risk minimization. This strategy outperformed all other combinations of classifiers and meta-learning/ensemble methods. The tree-augmented naive Bayes classifier also benefited from an adjustment of the decision threshold. In the nearly balanced datasets, it was improved by bagging, suggesting that the tree structure was too strong for the attribute dependencies. Support vector machines were robust, as their plain version achieved good results and was never outperformed.

[1]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[4]  Bingru Yang,et al.  A SVM Regression Based Approach to Filling in Missing Values , 2005, KES.

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[7]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[8]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[9]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[10]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[11]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[12]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[13]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[14]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[15]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[16]  Gary Weiss,et al.  Does cost-sensitive learning beat sampling for classifying rare classes? , 2005, UBDM '05.

[17]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[18]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[19]  João Maroco,et al.  Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests , 2011, BMC Research Notes.

[20]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[21]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[22]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[23]  Ruth O'Hara,et al.  Modeling the prevalence and incidence of Alzheimer's disease and mild cognitive impairment. , 2002, Journal of psychiatric research.

[24]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[25]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[26]  Ah Chung Tsoi,et al.  Neural Network Classification and Prior Class Probabilities , 1996, Neural Networks: Tricks of the Trade.

[27]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[28]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[29]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[31]  Carlos García,et al.  A doença de Alzheimer : problemas do diagnóstico clínico , 1984 .

[32]  Xiao-Ping Zhang,et al.  Advances in Intelligent Computing, International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I , 2005, ICIC.

[33]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[34]  Yves Bardière,et al.  Où sont passées nos « belles infidèles » ? , 2008 .

[35]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[36]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[37]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Kathryn Ziegler-Graham,et al.  Forecasting the global burden of Alzheimer’s disease , 2007, Alzheimer's & Dementia.

[40]  Lars Schmidt-Thieme,et al.  Learning Optimal Threshold on Resampling Data to Deal with Class Imbalance , 2010 .

[41]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[42]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[43]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[44]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[45]  João Maroco,et al.  Comparison of Four Verbal Memory Tests for the Diagnosis and Predictive Value of Mild Cognitive Impairment , 2012, Dementia and Geriatric Cognitive Disorders Extra.

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[47]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[48]  Todd P. Coleman,et al.  Approximating discrete probability distributions with causal dependence trees , 2010, 2010 International Symposium On Information Theory & Its Applications.

[49]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.