Breast Cancer Diagnosis From Biopsy Images Using Generic Features and SVMs

AbstractA fully automatic method for breast cancer diagnosis based on microscopic biopsy images is presented.The method achieves high recognition rates by applying multi-class support vector machines on generic featurevectors that are based on level-set statistics of the images. We also consider the problem of classification withrejection and show preliminary results that point to the potential benefits.Index Termsbreast cancer, biopsy, automatic diagnosis, SVM, image processing, feature extraction I. I NTRODUCTION Recent years have witnessed a large increase of interest in automated and semi-automated breast cancerdiagnosis [1], [2], [3], [4], [5], [6], [7], [8], [9]; see [10] for a recent review. In this paper we present anautomatic classification method for breast cancer diagnosi s based on microscopic biopsy images. In particular,we consider the problem of classifying a tissue specimen as either healthy, tumor in situ 1 , or invasive carcinoma.We propose a fully automatic classification method, using ge neric features and state-of-the-art statisticallearning algorithms and methodologies. We experiment with a dataset that was previously used in [11], wherea highly specialized morphology-based feature generationprocess was used. We show here that simple genericfeatures lead to slightly superior (6.6% vs. 8.0%) error rate.A desirable functionality of automated or semi-automated medical diagnosis systems is the option of ‘decisionwith rejection’ whereby the system generates decisions with confidence larger than some prescribed thresholdand transfers the decision on cases with lower confidence to a human expert. In this work we also examine asimple method for decision with rejection and demonstrate its viability.Figure 1 presents three sample images of healthy tissue, tumor in situ and invasive carcinoma. Note that theimages are not clear-cut, and contain much information of nodiagnostic significance. These were the individualsamples in our work; they were not subdivided or cropped in any way. For more information on the dataset,see Section IV.

[1]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[2]  Hai-Shan Wu,et al.  Self-Affinity and Lacunarity of Chromatin Texture in Benign and Malignant Breast Epithelial Cell Nuclei , 1998 .

[3]  J. Gil,et al.  Iterative thresholding for segmentation of cells from noisy images , 2000, Journal of microscopy.

[4]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[5]  Heung-Kook Choi,et al.  Multi-Resolution Wavelet-Transformed Image Analysis of Histological Sections of Breast Carcinomas , 2005, Cellular oncology : the official journal of the International Society for Cellular Oncology.

[6]  Monica Morrow,et al.  Changes in breast cancer therapy because of pathology second opinions , 2002, Annals of Surgical Oncology.

[7]  Hans A. Kestler,et al.  Classification of spatial textures in benign and cancerous glandular tissues by stereology and stochastic geometry using artificial neural networks , 2000, Journal of microscopy.

[8]  Geula Klorin,et al.  Ploidy and nuclear area as a predictive factor of histologic grade in primary breast cancer. , 2003, Analytical and quantitative cytology and histology.

[9]  A D Ramsay,et al.  Errors in histopathology reporting: detection and avoidance , 1999, Histopathology.

[10]  Haishan Wu,et al.  Image analysis and morphometry in the diagnosis of breast cancer , 2002, Microscopy research and technique.

[11]  Constantinos S. Pattichis,et al.  Computer-aided detection of breast cancer nuclei , 1997, IEEE Transactions on Information Technology in Biomedicine.

[12]  P H Bartels,et al.  COMPUTERIZED SCENE SEGMENTATION FOR THE DISCRIMINATION OF ARCHITECTURAL FEATURES IN DUCTAL PROLIFERATIVE LESIONS OF THE BREAST , 1997, The Journal of pathology.

[13]  Ran El-Yaniv,et al.  Size-density spectra and their application to image classification , 2007, Pattern Recognit..

[14]  Kimerly A Powell,et al.  Automated cell nuclear segmentation in color images of hematoxylin and eosin-stained breast biopsy. , 2003, Analytical and quantitative cytology and histology.

[15]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[16]  A. Llombart‐Bosch,et al.  Ductal Carcinoma In Situ of the Breast: A Comparative Analysis of Histology, Nuclear Area, Ploidy, and Neovascularization Provides Differentiation Between Low‐ and High‐Grade Tumors , 2002, The breast journal.

[17]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[18]  Nicolás García-Pedrajas,et al.  Improving multiclass pattern recognition by the combination of two strategies , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[20]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[21]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[22]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[23]  Ron Meir,et al.  A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound , 2004, NIPS.

[24]  B. Yener,et al.  Automated cancer diagnosis based on histopathological images : a systematic survey , 2005 .

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..