Optimizing specificity under perfect sensitivity for medical data classification

One of the main purposes of a computer-aided diagnosis (CAD) system is to reduce the workload of the radiologists in identifying potential diseases. However, such system can become unreliable and useless if it produces even only a small amount of false negatives, since a misclassification of any unhealthy patient as healthy can result in the delay of treatment, which can lead to fatal outcomes. Designing a CAD system that is capable of reducing the workload of radiologists and meanwhile avoiding any false negative is a very challenging problem. To tackle this problem, we propose a two-stage framework and a novel evaluation criterion, namely optimal specificity under perfect sensitivity (OSPS). We argue that for medical data classification, this criterion is more suitable than other conventional measures such as accuracy, f-score, or area-under-ROC curve. We further propose two learning strategies to improve OSPS. The first aims particularly at multi-instance learning tasks via disregarding the misclassified negative instances of positive patients. The second tries to improve OSPS by embedding more restricted constraints for negatives.

[1]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[2]  Zhi-Hua Zhou,et al.  Ensembles of Multi-instance Learners , 2003, ECML.

[3]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[4]  Ayyaz Hussain,et al.  Fuzzy entropy based optimization of clusters for the segmentation of lungs in CT scanned images , 2009, Knowledge and Information Systems.

[5]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[6]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[7]  Rangaraj M. Rangayyan,et al.  A review of computer-aided diagnosis of breast cancer: Toward the detection of subtle signs , 2007, J. Frankl. Inst..

[8]  Sung-Nien Yu,et al.  Detection of microcalcifications in digital mammograms using wavelet filter and Markov random field model , 2006, Comput. Medical Imaging Graph..

[9]  Dev P. Chakraborty,et al.  Free-response receiver operating characteristic analysis in medical imaging , 1990, Medical Imaging.

[10]  Hamid Soltanian-Zadeh,et al.  Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms , 2004, Pattern Recognit..

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Glenn Fung,et al.  SVM Feature Selection for Classification of SPECT Images of Alzheimer's Disease Using Spatial Information , 2005, ICDM.

[14]  Glenn Fung,et al.  SVM feature selection for classification of SPECT images of Alzheimer's disease using spatial information , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  Huimin Zhao,et al.  Instance weighting versus threshold adjusting for cost-sensitive classification , 2008, Knowledge and Information Systems.

[16]  Thomas Hofmann,et al.  Multiple Instance Learning for Computer Aided Diagnosis , 2007 .

[17]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..