论文信息 - Optimizing specificity under perfect sensitivity for medical data classification

Optimizing specificity under perfect sensitivity for medical data classification

One of the main purposes of a computer-aided diagnosis (CAD) system is to reduce the workload of the radiologists in identifying potential diseases. However, such system can become unreliable and useless if it produces even only a small amount of false negatives, since a misclassification of any unhealthy patient as healthy can result in the delay of treatment, which can lead to fatal outcomes. Designing a CAD system that is capable of reducing the workload of radiologists and meanwhile avoiding any false negative is a very challenging problem. To tackle this problem, we propose a two-stage framework and a novel evaluation criterion, namely optimal specificity under perfect sensitivity (OSPS). We argue that for medical data classification, this criterion is more suitable than other conventional measures such as accuracy, f-score, or area-under-ROC curve. We further propose two learning strategies to improve OSPS. The first aims particularly at multi-instance learning tasks via disregarding the misclassified negative instances of positive patients. The second tries to improve OSPS by embedding more restricted constraints for negatives.

[1] Qi Zhang,et al. EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[2] Zhi-Hua Zhou,et al. Ensembles of Multi-instance Learners , 2003, ECML.

[3] Krzysztof J. Cios,et al. Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[4] Ayyaz Hussain,et al. Fuzzy entropy based optimization of clusters for the segmentation of lungs in CT scanned images , 2009, Knowledge and Information Systems.

[5] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[6] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.

[7] Rangaraj M. Rangayyan,et al. A review of computer-aided diagnosis of breast cancer: Toward the detection of subtle signs , 2007, J. Frankl. Inst..

[8] Sung-Nien Yu,et al. Detection of microcalcifications in digital mammograms using wavelet filter and Markov random field model , 2006, Comput. Medical Imaging Graph..

[9] Dev P. Chakraborty,et al. Free-response receiver operating characteristic analysis in medical imaging , 1990, Medical Imaging.

[10] Hamid Soltanian-Zadeh,et al. Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms , 2004, Pattern Recognit..

[11] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13] Glenn Fung,et al. SVM Feature Selection for Classification of SPECT Images of Alzheimer's Disease Using Spatial Information , 2005, ICDM.

[14] Glenn Fung,et al. SVM feature selection for classification of SPECT images of Alzheimer's disease using spatial information , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15] Huimin Zhao,et al. Instance weighting versus threshold adjusting for cost-sensitive classification , 2008, Knowledge and Information Systems.

[16] Thomas Hofmann,et al. Multiple Instance Learning for Computer Aided Diagnosis , 2007 .

[17] Yang Wang,et al. Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..