Data augmentation and evolutionary algorithms to improve the prediction of blood glucose levels in scarcity of training data

Diabetes Mellitus Type 1 patients are waiting for the arrival of the Artificial Pancreas. Artificial Pancreas systems will control the blood glucose of patients, improving their quality of life and reducing the risks they face daily. At the core of the Artificial Pancreas, an algorithm will forecast future glucose levels and estimate insulin bolus sizes. Grammatical Evolution has been proved as a suitable algorithm for predicting glucose levels. Nevertheless, one of the main obstacles that researches have found for training the Grammatical Evolution models is the lack of significant amounts of data. As in many other fields in medicine, the collection of data from real patients is very complex along with the fact that the patient's response can vary in a high degree due to a lot of personal factors which can be seen as different scenarios. In this paper, we propose both a classification system for scenario selection and a data augmentation algorithm that generates synthetic glucose time series from real data. Our experimental results show that, in a scarce data context, Grammatical Evolution models can get more accurate and robust predictions using scenario selection and data augmentation.

[1]  Michel Gevers Identification for Control: From the Early Achievements to the Revival of Experiment Design , 2005, CDC 2005.

[2]  Lenore Cowen,et al.  Augmented training of hidden Markov models to recognize remote homologs via simulated evolution , 2009, Bioinform..

[3]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[4]  Giuseppe De Nicolao,et al.  Modeling and Control of Diabetes: Towards the Artificial Pancreas , 2011 .

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Lovekesh Vig,et al.  ODE - Augmented Training Improves Anomaly Detection in Sensor Data from Machines , 2016, ArXiv.

[7]  Richard L. Jones,et al.  Cost analysis of intensive glycemic control in critically ill adult patients. , 2006, Chest.

[8]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[9]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[10]  Giuseppe De Nicolao,et al.  Model individualization for artificial pancreas , 2016, Comput. Methods Programs Biomed..

[11]  Chunhui Zhao,et al.  Rapid Model Identification for Online Glucose Prediction of New Subjects With Type 1 Diabetes Using Model Migration Method , 2014 .

[12]  D. Cox,et al.  Evaluating Clinical Accuracy of Systems for Self-Monitoring of Blood Glucose , 1987, Diabetes Care.

[13]  José Ignacio Hidalgo,et al.  glUCModel: A monitoring and modeling system for chronic diseases applied to diabetes , 2014, J. Biomed. Informatics.

[14]  Toni Giorgino,et al.  Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation , 2009, Artif. Intell. Medicine.

[15]  Filippo Festini Cost Analysis of Intensive Glycemic Control in Critically Ill Adult Patients , 2015 .

[16]  Martin Pelikan,et al.  Marginal Distributions in Evolutionary Algorithms , 2007 .

[17]  George Papadakis,et al.  Comparative analysis of a-priori and a-posteriori dietary patterns using state-of-the-art classification algorithms: A case/case-control study , 2013, Artif. Intell. Medicine.

[18]  Lucy Mays,et al.  Diabetes Mellitus Standards of Care. , 2015, The Nursing clinics of North America.

[19]  Martin A. Tanner,et al.  From EM to Data Augmentation: The Emergence of MCMC Bayesian Computation in the 1980s , 2010, 1104.2210.