A Practical Application of Data Mining Methods to Build Predictive Models for Autism Spectrum Disorder Based on Biosensor Data From Janssen Autism Knowledge Engine (JAKE®)

Abstract The Janssen Autism Knowledge Engine (JAKE®) collects a large number of features from five biosensors across a range of tasks. The application of data mining methods to these data may be a useful approach to enable objective discrimination between autism spectrum disorder (ASD) and typically developing (TD) participants. Following a prospective observational study using JAKE, ASD participants classified as “moderate” or “severe” based on total scores on the Social Responsiveness Scale, and TD participants were used to build models, using repeated cross-validation, to identify biosensor features contributing to diagnosis. Four different models (partial least squares, random forest, elastic net, and C5.0) were chosen to build diagnostic classifiers using the training set, and the fitted models were evaluated on the test set. Model performance on the training set, based on receiver operating characteristics (ROC), was moderate (area under ROC curve = 0.61–0.72), and model performance on the test set based on kappa statistic was between 0.40 and 0.46 across the four models. Data mining methods applied to biosensor data can lead to models that discriminate ASD from TD. This method may prove useful in creating new diagnostic tests for ASD.

[1]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[2]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  Matthew S. Goodwin,et al.  JAKE® Multimodal Data Capture System: Insights from an Observational Study of Autism Spectrum Disorder , 2017, Front. Neurosci..

[5]  Richard A. Johnson,et al.  A new family of power transformations to improve normality or symmetry , 2000 .

[6]  E. Walker,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[7]  S. Wold,et al.  PLS: Partial Least Squares Projections to Latent Structures , 1993 .

[8]  Rok Blagus,et al.  SMOTE for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[9]  Melissa H. Black,et al.  Mechanisms of facial emotion recognition in autism spectrum disorders: Insights from eye tracking and electroencephalography , 2017, Neuroscience & Biobehavioral Reviews.

[10]  James W Tanaka,et al.  Using computerized games to teach face recognition skills to children with autism spectrum disorder: the Let's Face It! program. , 2010, Journal of child psychology and psychiatry, and allied disciplines.

[11]  L. E. Arnold,et al.  Assessing change in core autism symptoms: challenges for pharmacological studies. , 2015, Journal of child and adolescent psychopharmacology.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[14]  Charis Eng,et al.  Development of an Objective Autism Risk Index Using Remote Eye Tracking. , 2016, Journal of the American Academy of Child and Adolescent Psychiatry.

[15]  L. Schieve,et al.  Estimated Prevalence of Autism and Other Developmental Disabilities Following Questionnaire Changes in the 2014 National Health Interview Survey. , 2015, National health statistics reports.

[16]  Jennifer C. Dalton,et al.  Behavioral and Physiological Responses to Child-Directed Speech of Children with Autism Spectrum Disorders or Typical Development , 2011, Journal of Autism and Developmental Disorders.

[17]  John-John Cabibihan,et al.  Sensing Technologies for Autism Spectrum Disorder Screening and Intervention , 2016, Sensors.

[18]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[19]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[20]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[21]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[22]  Connie Kasari,et al.  Measuring social communication behaviors as a treatment endpoint in individuals with autism spectrum disorder , 2015, Autism : the international journal of research and practice.

[23]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[24]  C. Lopata,et al.  RCT of a Psychosocial Treatment for Children with High-Functioning ASD: Supplemental Analyses of Treatment Effects on Facial Emotion Encoding , 2015 .