The Feature Selection Problem in Computer–Assisted Cytology

Abstract Modern cancer diagnostics is based heavily on cytological examinations. Unfortunately, visual inspection of cytological preparations under the microscope is a tedious and time-consuming process. Moreover, intra- and inter-observer variations in cytological diagnosis are substantial. Cytological diagnostics can be facilitated and objectified by using automatic image analysis and machine learning methods. Computerized systems usually preprocess cytological images, segment and detect nuclei, extract and select features, and finally classify the sample. In spite of the fact that a lot of different computerized methods and systems have already been proposed for cytology, they are still not routinely used because there is a need for improvement in their accuracy. This contribution focuses on computerized breast cancer classification. The task at hand is to classify cellular samples coming from fine-needle biopsy as either benign or malignant. For this purpose, we compare 5 methods of nuclei segmentation and detection, 4 methods of feature selection and 4 methods of classification. Nuclei detection and segmentation methods are compared with respect to recall and the F1 score based on the Jaccard index. Feature selection and classification methods are compared with respect to classification accuracy. Nevertheless, the main contribution of our study is to determine which features of nuclei indicate reliably the type of cancer. We also check whether the quality of nuclei segmentation/detection significantly affects the accuracy of cancer classification. It is verified using the test set that the average accuracy of cancer classification is around 76%. Spearman’s correlation and chi-square test allow us to determine significantly better features than the feature forward selection method.

[1]  Vikram Pakrashi,et al.  Automated Segmentation of Nuclei in Breast Cancer Histopathology Images , 2016, PloS one.

[2]  Catarina Eloy,et al.  Classification of breast cancer histology images using Convolutional Neural Networks , 2017, PloS one.

[3]  Roman Monczak,et al.  Computer-Aided Breast Cancer Diagnosis Based on the Analysis of Cytological Images of Fine Needle Biopsies , 2013, IEEE Transactions on Medical Imaging.

[4]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Giorgio Roffo,et al.  Feature Selection Library (MATLAB Toolbox) , 2016, 1607.01327.

[7]  Marton Szemenyei,et al.  Dimension Reduction for Objects Composed of Vector Sets , 2017, Int. J. Appl. Math. Comput. Sci..

[8]  Karolina Nurzynska Optimal Parameter Search for Colour Normalization Aiding Cell Nuclei Segmentation , 2018, BDAS.

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Adam Piórkowski,et al.  A Statistical Dominance Algorithm for Edge Detection and Segmentation of Medical Images , 2016, ITIB.

[11]  A. Ruifrok,et al.  Quantification of histochemical staining by color deconvolution. , 2001, Analytical and quantitative cytology and histology.

[12]  Jagath C. Rajapakse,et al.  Segmentation of Clustered Nuclei With Shape Markers and Marking Function , 2009, IEEE Transactions on Biomedical Engineering.

[13]  Xiaoou Tang,et al.  Texture information in run-length matrices , 1998, IEEE Trans. Image Process..

[14]  Ewa Pietka,et al.  Watershed based intelligent scissors , 2015, Comput. Medical Imaging Graph..

[15]  Chanho Jung,et al.  Segmenting Clustered Nuclei Using H-minima Transform-Based Marker Extraction and Contour Parameterization , 2010, IEEE Transactions on Biomedical Engineering.

[16]  Carolina Wählby,et al.  Automated Training of Deep Convolutional Neural Networks for Cell Segmentation , 2017, Scientific Reports.

[17]  Adam Krzyzak,et al.  Classification of Breast Cancer Malignancy Using Cytological Images of Fine Needle Aspiration Biopsies , 2008, Int. J. Appl. Math. Comput. Sci..

[18]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[19]  Xiaobo Zhou,et al.  Nuclei Segmentation Using Marker-Controlled Watershed, Tracking Using Mean-Shift, and Kalman Filter in Time-Lapse Microscopy , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[20]  Can Fahrettin Koyuncu,et al.  Iterative h‐minima‐based marker‐controlled watershed for cell nucleus segmentation , 2016, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[21]  Marek Kowal,et al.  Nuclei segmentation for computer-aided diagnosis of breast cancer , 2014, Int. J. Appl. Math. Comput. Sci..

[22]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[23]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[24]  H. Irshad,et al.  Methods for Nuclei Detection, Segmentation, and Classification in Digital Histopathology: A Review—Current Status and Future Potential , 2014, IEEE Reviews in Biomedical Engineering.

[25]  Giorgio Roffo Feature Selection Techniques for Classification: A widely applicable code library , 2016 .

[26]  Stanisław Kozielski,et al.  Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety , 2018, Communications in Computer and Information Science.

[27]  Bahram Parvin,et al.  Detection of nuclei in H&E stained sections using convolutional neural networks , 2017, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[29]  Joanna Jaworek-Korjakowska,et al.  Automated epidermis segmentation in histopathological images of human skin stained with hematoxylin and eosin , 2017, Medical Imaging.