Machine Learning Performance Metrics and Diagnostic Context in Radiology

In this pilot study data gathered from interviewing specialists in radiology is combined with an assessment of the way machine learning metrics are used in studies of radiological work. It argues that situated context of use should be an important contributor to the design of machine learning applications in radiology. The article shows how radiologists see their professional practice as utilizing a wider range of expert knowledge than many existing studies on machine learning in radiology allow for. The article describes a case study drawn from radiology practice in a major Danish hospital and discusses a widely cited study on machine learning in radiological work. The study connects current understandings of appropriate metrics used by machine learning researchers with professional radiologists' understanding of their diagnostic work. This comparison helps identify gaps in understanding between these two communities and suggests how they might be addressed.

[1]  Sara Jones,et al.  Informing the Specification of a Large-Scale Socio-technical System with Models of Human Activity , 2007, REFSQ.

[2]  Rjoè,et al.  Activity theory as a framework for analyzing and redesigning work , 2005 .

[3]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[4]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[5]  Paul R. Carlile,et al.  Transferring, Translating, and Transforming: An Integrative Framework for Managing Knowledge Across Boundaries , 2004, Organ. Sci..

[6]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[7]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[8]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[9]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[10]  Michael A. Osborne,et al.  The future of employment: How susceptible are jobs to computerisation? , 2017 .

[11]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[12]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[13]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[14]  Li Yao,et al.  Learning to diagnose from scratch by exploiting dependencies among labels , 2017, ArXiv.

[15]  Trisha Greenhalgh,et al.  Studying technology use as social practice: the untapped potential of ethnography , 2011, BMC medicine.