Minimization of annotation work: diagnosis of mammographic masses via active learning

The prerequisite for establishing an effective prediction system for mammographic diagnosis is the annotation of each mammographic image. The manual annotation work is time-consuming and laborious, which becomes a great hindrance for researchers. In this article, we propose a novel active learning algorithm that can adequately address this problem, leading to the minimization of the labeling costs on the premise of guaranteed performance. Our proposed method is different from the existing active learning methods designed for the general problem as it is specifically designed for mammographic images. Through its modified discriminant functions and improved sample query criteria, the proposed method can fully utilize the pairing of mammographic images and select the most valuable images from both the mediolateral and craniocaudal views. Moreover, in order to extend active learning to the ordinal regression problem, which has no precedent in existing studies, but is essential for mammographic diagnosis (mammographic diagnosis is not only a classification task, but also an ordinal regression task for predicting an ordinal variable, viz. the malignancy risk of lesions), multiple sample query criteria need to be taken into consideration simultaneously. We formulate it as a criteria integration problem and further present an algorithm based on self-adaptive weighted rank aggregation to achieve a good solution. The efficacy of the proposed method was demonstrated on thousands of mammographic images from the digital database for screening mammography. The labeling costs of obtaining optimal performance in the classification and ordinal regression task respectively fell to 33.8 and 19.8 percent of their original costs. The proposed method also generated 1228 wins, 369 ties and 47 losses for the classification task, and 1933 wins, 258 ties and 185 losses for the ordinal regression task compared to the other state-of-the-art active learning algorithms. By taking the particularities of mammographic images, the proposed AL method can indeed reduce the manual annotation work to a great extent without sacrificing the performance of the prediction system for mammographic diagnosis.

[1]  Jack O'Neill,et al.  An Evaluation of Selection Strategies for Active Learning with Regression , 2015 .

[2]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[3]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Wenjian Wang,et al.  An active learning-based SVM multi-class classification model , 2015, Pattern Recognit..

[6]  Deng Cai,et al.  Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  A. Malich,et al.  CAD for mammography: the technique, results, current role and further developments , 2006, European Radiology.

[8]  E. Burnside,et al.  A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. , 2009, AJR. American journal of roentgenology.

[9]  Farid Melgani,et al.  Kernel ridge regression with active learning for wind speed prediction , 2013 .

[10]  Nico Karssemeijer,et al.  Large scale deep learning for computer aided detection of mammographic lesions , 2017, Medical Image Anal..

[11]  N Houssami,et al.  New technologies in screening for breast cancer: a systematic review of their accuracy , 2004, British Journal of Cancer.

[12]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[13]  I. Jolliffe Principal Component Analysis , 2002 .

[14]  Arnau Oliver,et al.  A review of automatic mass detection and segmentation in mammographic images , 2010, Medical Image Anal..

[15]  Lorenzo Bruzzone,et al.  Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Shili Lin,et al.  Rank aggregation methods , 2010 .

[17]  Edward Y. Chang,et al.  Active learning in very large databases , 2006, Multimedia Tools and Applications.

[18]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[19]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[20]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[21]  Mia K Markey,et al.  Breast cancer CADx based on BI-RAds descriptors from two mammographic views. , 2006, Medical physics.

[22]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[24]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.

[25]  Lei Zhang,et al.  Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  C. Mathers,et al.  GLOBOCAN 2012 v1.0, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11 [Internet]. Lyon, France: International Agency for Research on Cancer , 2013 .

[27]  Wei Liu,et al.  Scalable Histopathological Image Analysis via Active Learning , 2014, MICCAI.

[28]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[29]  Christopher Winship,et al.  REGRESSION MODELS WITH ORDINAL VARIABLES , 1984 .

[30]  Ion Muslea,et al.  Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[31]  Christopher J. Taylor,et al.  Web Services for the DDSM and Digital Mammography Research , 2006, Digital Mammography / IWDM.

[32]  T. Santhanam,et al.  BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS -A SURVEY , 2013 .