The need to approximate the use-case in clinical machine learning

Abstract The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.

[1]  Andres Hoyos Idrobo,et al.  Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , 2016, NeuroImage.

[2]  Max A. Little,et al.  Using and understanding cross-validation strategies. Perspectives on Saeb et al. , 2017, GigaScience.

[3]  Shyamal Patel,et al.  A Novel Approach to Monitor Rehabilitation Outcomes in Stroke Survivors Using Wearable Technology , 2010, Proceedings of the IEEE.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Max A. Little,et al.  Detecting and monitoring the symptoms of Parkinson's disease using smartphones: A pilot study. , 2015, Parkinsonism & related disorders.

[6]  Konrad Kording,et al.  Automatic discovery of cell types and microcircuitry from neural connectomics , 2014, eLife.

[7]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[8]  David C. Mohr,et al.  Making Activity Recognition Robust against Deceptive Behavior , 2015, PloS one.

[9]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[10]  Cecilia Mascolo,et al.  Opportunities for smartphones in clinical care: the future of mobile mood monitoring. , 2016, The Journal of clinical psychiatry.

[11]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[12]  B. Dobkin,et al.  Reliability and Validity of Bilateral Ankle Accelerometer Algorithms for Activity Recognition and Walking Speed After Stroke , 2011, Stroke.

[13]  Konrad Paul Kording,et al.  Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study , 2015, Journal of medical Internet research.

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  Mark V Albert,et al.  Monitoring daily function in persons with transfemoral amputations using a commercial activity monitor: a feasibility study. , 2014, PM & R : the journal of injury, function, and rehabilitation.

[16]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[17]  B. Dobkin Wearable motion sensors to continuously measure real-world physical activities. , 2013, Current opinion in neurology.

[18]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[19]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[20]  Dimitris Samaras,et al.  Deriving robust biomarkers from multi-site resting-state data: An Autism-based example , 2016, bioRxiv.

[21]  Daniel Gatica-Perez,et al.  Mining large-scale smartphone data for personality studies , 2013, Personal and Ubiquitous Computing.

[22]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[23]  Konrad Paul Kording,et al.  Fall Classification by Machine Learning Using Mobile Phones , 2012, PloS one.

[24]  Jeffrey M. Hausdorff,et al.  Wearable Assistant for Parkinson’s Disease Patients With the Freezing of Gait Symptom , 2010, IEEE Transactions on Information Technology in Biomedicine.

[25]  Vijay Viswam,et al.  High-resolution CMOS MEA platform to study neurons at subcellular, cellular, and network levels. , 2015, Lab on a chip.

[26]  L. Piwek,et al.  The Rise of Consumer Health Wearables: Promises and Barriers , 2016, PLoS medicine.

[27]  Jun Cheng,et al.  A Wearable Smartphone-Based Platform for Real-Time Cardiovascular Disease Detection Via Electrocardiogram Processing , 2010, IEEE Transactions on Information Technology in Biomedicine.

[28]  Paolo Bonato,et al.  Monitoring Motor Fluctuations in Patients With Parkinson's Disease Using Wearable Sensors , 2009, IEEE Transactions on Information Technology in Biomedicine.