A decision support system for mammography reports interpretation

Purpose Mammography plays a key role in the diagnosis of breast cancer; however, decision-making based on mammography reports is still challenging. This paper aims to addresses the challenges regarding decision-making based on mammography reports and propose a Clinical Decision Support System (CDSS) using data mining methods to help clinicians to interpret mammography reports. Methods For this purpose, 2441 mammography reports were collected from Imam Khomeini Hospital from March 21, 2018, to March 20, 2019. In the first step, these mammography reports are analyzed and program code is developed to transform the reports into a dataset. Then, the weight of every feature of the dataset is calculated. Random Forest, Naïve Bayes, K-nearest neighbor (K-NN), Deep Learning classifiers are applied to the dataset to build a model capable of predicting the need for referral to biopsy. Afterward, the models are evaluated using cross-validation with measuring Area Under Curve (AUC), accuracy, sensitivity, specificity indices. Results The mammography type (diagnostic or screening), mass and calcification features mentioned in the reports are the most important features for decision-making. Results reveal that the K-NN model is the most accurate and specific classifier with the accuracy and specificity values of 84.06% and 84.72% respectively. The Random Forest classifier has the best sensitivity and AUC with the sensitivity and AUC values of 87.74% and 0.905 respectively. Conclusions Accordingly, data mining approaches are proved to be a helpful tool to make the final decision as to whether patients should be referred to biopsy or not based on mammography reports. The developed CDSS may also be helpful especially for less experienced radiologists.

[1]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[2]  Bethany Percha,et al.  Automatic classification of mammography reports by BI-RADS breast tissue composition class , 2012, J. Am. Medical Informatics Assoc..

[3]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[4]  Mojtaba Jamshidi,et al.  Breast Cancer Prediction Using a Hybrid Data Mining Model , 2019, JOIV : International Journal on Informatics Visualization.

[5]  Dipanjan Moitra,et al.  Automated AJCC (7th edition) staging of non-small cell lung cancer (NSCLC) using deep convolutional neural network (CNN) and recurrent neural network (RNN) , 2019, Health Inf. Sci. Syst..

[6]  Marjan Ghazi Saeedi,et al.  A study of factors related to patients’ length of stay using data mining techniques in a general hospital in southern Iran , 2020, Health Inf. Sci. Syst..

[7]  David Page,et al.  Information Extraction for Clinical Data Mining: A Mammography Case Study , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[8]  Alexandre Mendes,et al.  Identification of Breast Cancer Subtypes Using Multiple Gene Expression Microarray Datasets , 2011, Australasian Conference on Artificial Intelligence.

[9]  P. K. Sinha,et al.  Pruning of Random Forest classifiers: A survey and future directions , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[10]  F. Pompei,et al.  Long‐term relationships between screening rates, breast cancer characteristics, and overdiagnosis in US counties, 1975–2009 , 2018, International journal of cancer.

[11]  Keijo Heljanko,et al.  ESANN 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, April 25-27, 2012 , 2012 .

[12]  Xin Zhang,et al.  Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches , 2018, Int. J. Medical Informatics.

[13]  Daniel L. Rubin,et al.  Proposing New RadLex Terms by Analyzing Free-Text Mammography Reports , 2018, J. Digit. Imaging.

[14]  Syamsiah Mashohor,et al.  Foundation and methodologies in computer-aided diagnosis systems for breast cancer detection , 2017, EXCLI journal.

[15]  Shyam Visweswaran,et al.  Automated annotation and classification of BI-RADS assessment from radiology reports , 2017, J. Biomed. Informatics.

[16]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[17]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[18]  Rangaraj M. Rangayyan,et al.  A review of computer-aided diagnosis of breast cancer: Toward the detection of subtle signs , 2007, J. Frankl. Inst..

[19]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[20]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[21]  Selen Bozkurt,et al.  Automatic inference of BI-RADS final assessment categories from narrative mammography report findings , 2019, J. Biomed. Informatics.

[22]  Craig A. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[23]  Goreti Marreiros,et al.  Applying Data Mining Techniques to Improve Breast Cancer Diagnosis , 2016, Journal of Medical Systems.

[24]  Mustafa Musa Jaber,et al.  Cloud based framework for diagnosis of diabetes mellitus using K-means clustering , 2018, Health Information Science and Systems.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[27]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[28]  H. Mcdonald,et al.  Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. , 2005, JAMA.

[29]  L. Liberman,et al.  Breast imaging reporting and data system (BI-RADS). , 2002, Radiologic clinics of North America.

[30]  C. D. Page,et al.  Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. , 2009, Radiology.

[31]  Nuno A. Fonseca,et al.  Predicting malignancy from mammography findings and image-guided core biopsies , 2015, Int. J. Data Min. Bioinform..

[32]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[33]  C. D'Orsi,et al.  Accuracy of screening mammography interpretation by characteristics of radiologists. , 2004, Journal of the National Cancer Institute.

[34]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[35]  Karamjit Kaur,et al.  Application of Data Mining for high accuracy prediction of breast tissue biopsy results , 2016, 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC).

[36]  Juanying Xie,et al.  Colon cancer data analysis by chameleon algorithm , 2019, Health Inf. Sci. Syst..

[37]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[38]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[39]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[40]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[41]  S. Pal,et al.  Prediction of benign and malignant breast cancer using data mining techniques , 2018 .

[42]  Selen Bozkurt,et al.  Automatic abstraction of imaging observations with their characteristics from mammography reports , 2015, J. Am. Medical Informatics Assoc..

[43]  Ali Idri,et al.  Reviewing ensemble classification methods in breast cancer , 2019, Comput. Methods Programs Biomed..

[44]  Stephen T. C. Wong,et al.  Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods , 2017, Cancer.

[46]  Charles E. Kahn,et al.  Knowledge Discovery from Structured Mammography Reports Using Inductive Logic Programming , 2005, AMIA.

[47]  D M Parkin,et al.  Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods , 2018, International journal of cancer.

[48]  Michael Fellmann,et al.  Supporting breast cancer decisions using formalized guidelines and experts decision patterns: initial prototype and evaluation , 2017, Health Inf. Sci. Syst..

[49]  Teresa Wu,et al.  Classification of Breast Masses Using a Computer-Aided Diagnosis Scheme of Contrast Enhanced Digital Mammograms , 2018, Annals of Biomedical Engineering.

[50]  Ümit Budak,et al.  Transfer learning based histopathologic image classification for breast cancer detection , 2018, Health Information Science and Systems.

[51]  H. Welch,et al.  Overdiagnosis in cancer. , 2010, Journal of the National Cancer Institute.

[52]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[53]  H. Welch Overdiagnosis and mammography screening , 2009, BMJ : British Medical Journal.

[54]  Oguzhan Alagoz,et al.  Collaborative Modeling of the Benefits and Harms Associated With Different U.S. Breast Cancer Screening Strategies , 2016, Annals of Internal Medicine.