Computerized breast cancer analysis system using three stage semi-supervised learning method

BACKGROUND AND OBJECTIVE A large number of labeled medical image data is usually a requirement to train a well-performed computer-aided detection (CAD) system. But the process of data labeling is time consuming, and potential ethical and logistical problems may also present complications. As a result, incorporating unlabeled data into CAD system can be a feasible way to combat these obstacles. METHODS In this study we developed a three stage semi-supervised learning (SSL) scheme that combines a small amount of labeled data and larger amount of unlabeled data. The scheme was modified on our existing CAD system using the following three stages: data weighing, feature selection, and newly proposed dividing co-training data labeling algorithm. Global density asymmetry features were incorporated to the feature pool to reduce the false positive rate. Area under the curve (AUC) and accuracy were computed using 10 fold cross validation method to evaluate the performance of our CAD system. The image dataset includes mammograms from 400 women who underwent routine screening examinations, and each pair contains either two cranio-caudal (CC) or two mediolateral-oblique (MLO) view mammograms from the right and the left breasts. From these mammograms 512 regions were extracted and used in this study, and among them 90 regions were treated as labeled while the rest were treated as unlabeled. RESULTS Using our proposed scheme, the highest AUC observed in our research was 0.841, which included the 90 labeled data and all the unlabeled data. It was 7.4% higher than using labeled data only. With the increasing amount of labeled data, AUC difference between using mixed data and using labeled data only reached its peak when the amount of labeled data was around 60. CONCLUSIONS This study demonstrated that our proposed three stage semi-supervised learning can improve the CAD performance by incorporating unlabeled data. Using unlabeled data is promising in computerized cancer research and may have a significant impact for future CAD system applications.

[1]  Hyunjung Shin,et al.  Research and applications: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data , 2013, J. Am. Medical Informatics Assoc..

[2]  D. Saslow,et al.  Cancer screening in the United States, 2011 , 2011, CA: a cancer journal for clinicians.

[3]  Dansheng Song,et al.  Ipsilateral-mammogram computer-aided detection of breast cancer. , 2004, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.

[4]  W Qian,et al.  Digital mammography: wavelet transform and Kalman-filtering neural network in mass segmentation and detection. , 2001, Academic radiology.

[5]  A. Jemal,et al.  Cancer statistics, 2013 , 2013, CA: a cancer journal for clinicians.

[6]  J. Wolfe Breast patterns as an index of risk for developing breast cancer. , 1976, AJR. American journal of roentgenology.

[7]  B. Zheng,et al.  Soft-copy mammographic readings with different computer-assisted detection cuing environments: preliminary findings. , 2001, Radiology.

[8]  Wenqing Sun,et al.  Prediction of near-term risk of developing breast cancer using computerized features from bilateral mammograms , 2014, Comput. Medical Imaging Graph..

[9]  Hao Wu,et al.  Optimized recognition with few instances based on semantic distance , 2014, The Visual Computer.

[10]  Wenqing Sun,et al.  Using undiagnosed data to enhance computerized breast cancer analysis with a three stage data labeling method , 2014, Medical Imaging.

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[13]  Hyunjung Shin,et al.  Robust predictive model for evaluating breast cancer survivability , 2013, Eng. Appl. Artif. Intell..

[14]  Hyunjung Shin,et al.  Sharpened graph ensemble for semi-supervised learning , 2013, Intell. Data Anal..

[15]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[16]  Zhen Jiang,et al.  Inter-training: Exploiting unlabeled data in multi-classifier systems , 2013, Knowl. Based Syst..

[17]  Jingrui He,et al.  Graph-Based Semi-Supervised Learning as a Generative Model , 2007, IJCAI.

[18]  Karen Drukker,et al.  Enhancement of breast CADx with unlabeled dataa). , 2010, Medical physics.

[19]  Lihua Li,et al.  Improving performance of computer-aided detection of masses by incorporating bilateral mammographic density asymmetry: an assessment. , 2012, Academic radiology.

[20]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[21]  Timothy J Wilt,et al.  Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. , 2009, Annals of internal medicine.

[22]  Timothy J Wilt,et al.  Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. , 2009, Annals of internal medicine.

[23]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[24]  Hao Wu,et al.  Image completion with multi-image based on entropy reduction , 2015, Neurocomputing.

[25]  Wei Qian,et al.  Image feature extraction for mass detection in digital mammography: Influence of wavelet analysis , 1999 .

[26]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[27]  J Benichou,et al.  Proportion of breast cancer cases in the United States explained by well-established risk factors. , 1995, Journal of the National Cancer Institute.

[28]  Hong Li Wang,et al.  Abnormal Voice Detection Algorithm Based on Semi-Supervised Co-Training Algorithm , 2012 .

[29]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[30]  L P Clarke,et al.  Digital mammography: computer-assisted diagnosis method for mass detection with multiorientation and multiresolution wavelet transforms. , 1997, Academic radiology.

[31]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[32]  Martin J. Yaffe,et al.  Mammographic densities as a marker of human breast cancer risk and their use in chemoprevention , 2001, Current oncology reports.

[33]  Kunio Doi,et al.  Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. , 2009, Radiology.

[34]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.