Mining hidden data to predict patient prognosis: texture feature extraction and machine learning in mammography

The UK currently has a national breast cancer-screening program and images are routinely collected from a number of screening sites, representing a wealth of invaluable data that is currently under-used. Radiologists evaluate screening images manually and recall suspicious cases for further analysis such as biopsy. Histological testing of biopsy samples confirms the malignancy of the tumour, along with other diagnostic and prognostic characteristics such as disease grade. Machine learning is becoming increasingly popular for clinical image classification problems, as it is capable of discovering patterns in data otherwise invisible. This is particularly true when applied to medical imaging features; however clinical datasets are often relatively small. A texture feature extraction toolkit has been developed to mine a wide range of features from medical images such as mammograms. This study analysed a dataset of 1,366 radiologist-marked, biopsy-proven malignant lesions obtained from the OPTIMAM Medical Image Database (OMI-DB). Exploratory data analysis methods were employed to better understand extracted features. Machine learning techniques including Classification and Regression Trees (CART), ensemble methods (e.g. random forests), and logistic regression were applied to the data to predict the disease grade of the analysed lesions. Prediction scores of up to 83% were achieved; sensitivity and specificity of the models trained have been discussed to put the results into a clinical context. The results show promise in the ability to predict prognostic indicators from the texture features extracted and thus enable prioritisation of care for patients at greatest risk.

[1]  Wellington Pinheiro Dos Santos,et al.  Detection and classification of masses in mammographic images in a multi-kernel approach. , 2016, Computer methods and programs in biomedicine.

[2]  S. Thamarai Selvi,et al.  Mammogram tumour classification using Q learning , 2011 .

[3]  M. Lux,et al.  Characterizing mammographic images by using generic texture features , 2012, Breast Cancer Research.

[4]  Jack A Tuszynski,et al.  Automatic prediction of tumour malignancy in breast cancer with fractal dimension , 2016, Royal Society Open Science.

[5]  Christos Davatzikos,et al.  Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme , 2009, Magnetic resonance in medicine.

[6]  Camel Tanougast,et al.  Extracted magnetic resonance texture features discriminate between phenotypes and are associated with overall survival in glioblastoma multiforme patients , 2016, Medical & Biological Engineering & Computing.

[7]  Susan M. Astley,et al.  Texture-Based Breast Cancer Prediction in Full-Field Digital Mammograms Using the Dual-Tree Complex Wavelet Transform and Random Forest Classification , 2014, Digital Mammography / IWDM.

[8]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[9]  P. Lambin,et al.  Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach , 2014, Nature Communications.

[10]  M. Fernö,et al.  Histological grade provides significant prognostic information in addition to breast cancer subtypes defined according to St Gallen 2013 , 2017, Acta oncologica.

[11]  Domenec Puig,et al.  Breast Masses Identification through Pixel-Based Texture Classification , 2014, Digital Mammography / IWDM.

[12]  P. Lambin,et al.  Radiomic Machine-Learning Classifiers for Prognostic Biomarkers of Head and Neck Cancer , 2015, Front. Oncol..