Predicting Language Difficulties in Middle Childhood From Early Developmental Milestones: A Comparison of Traditional Regression and Machine Learning Techniques.

Purpose The current study aimed to compare traditional logistic regression models with machine learning algorithms to investigate the predictive ability of (a) communication performance at 3 years old on language outcomes at 10 years old and (b) broader developmental skills (motor, social, and adaptive) at 3 years old on language outcomes at 10 years old. Method Participants (N = 1,322) were drawn from the Western Australian Pregnancy Cohort (Raine) Study (Straker et al., 2017). A general developmental screener, the Infant Monitoring Questionnaire (Squires, Bricker, & Potter, 1990), was completed by caregivers at the 3-year follow-up. Language ability at 10 years old was assessed using the Clinical Evaluation of Language Fundamentals-Third Edition (Semel, Wiig, & Secord, 1995). Logistic regression models and interpretable machine learning algorithms were used to assess predictive abilities of early developmental milestones for later language outcomes. Results Overall, the findings showed that prediction accuracies were comparable between logistic regression and machine learning models using communication-only performance as well as performance on communication and broader developmental domains to predict language performance at 10 years old. Decision trees are incorporated to visually present these findings but must be interpreted with caution because of the poor accuracy of the models overall. Conclusions The current study provides preliminary evidence that machine learning algorithms provide equivalent predictive accuracy to traditional methods. Furthermore, the inclusion of broader developmental skills did not improve predictive capability. Assessment of language at more than 1 time point is necessary to ensure children whose language delays emerge later are identified and supported. Supplemental Material https://doi.org/10.23641/asha.6879719.

[1]  L. Straker,et al.  Cohort Profile Cohort Profile : The Western Australian Pregnancy Cohort ( Raine ) Study – Generation 2 , 2017 .

[2]  L. Ungar,et al.  MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine , 2016, Scientific Reports.

[3]  K. Mcmahon,et al.  Predicting receptive vocabulary change from childhood to adulthood: A birth cohort study. , 2016, Journal of communication disorders.

[4]  Trisha Greenhalgh,et al.  CATALISE: A Multinational and Multidisciplinary Delphi Consensus Study. Identifying Language Impairments in Children , 2016, PloS one.

[5]  Charles Hulme,et al.  Language profiles and literacy outcomes of children with resolving, emerging, or persisting language impairments , 2015, Journal of child psychology and psychiatry, and allied disciplines.

[6]  E. Bavin,et al.  Levers for Language Growth: Characteristics and Predictors of Language Trajectories between 4 and 7 Years , 2015, PloS one.

[7]  K. Lohr,et al.  Screening for Speech and Language Delay in Children 5 Years Old and Younger: A Systematic Review , 2015, Pediatrics.

[8]  Fiona J. Duff,et al.  Early prediction of language and literacy problems: is 18 months too early? , 2015, PeerJ.

[9]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[10]  Olatz Arbelaitz,et al.  Coverage-based resampling: Building robust consolidated decision trees , 2015, Knowl. Based Syst..

[11]  A. Pickles,et al.  Predicting the rate of language development from early motor skills in at-risk infants who develop autism spectrum disorder. , 2015, Research in autism spectrum disorders.

[12]  Yixin Chen,et al.  Learning accurate and interpretable models based on regularized random forests regression , 2014, BMC Systems Biology.

[13]  R. Plomin,et al.  Illusory Recovery: Are Recovered Children With Early Language Delay at Continuing Elevated Risk? , 2014, American journal of speech-language pathology.

[14]  F. Pons,et al.  Trajectories of language delay from age 3 to 5: persistence, recovery and late onset. , 2014, International journal of language and communication disorders.

[15]  H. Asadi,et al.  Machine Learning for Outcome Prediction of Acute Ischemic Stroke Post Intra-Arterial Therapy , 2014, PloS one.

[16]  J. Iverson,et al.  Fine motor skill predicts expressive language in infant siblings of children with autism. , 2013, Developmental science.

[17]  S. Zubrick,et al.  Risk Factors for Children's Receptive Vocabulary Development from Four to Eight Years in the Longitudinal Study of Australian Children , 2013, PloS one.

[18]  Carol A. Miller,et al.  Late talking, typical talking, and weak language skills at middle childhood. , 2013, Learning and individual differences.

[19]  S. Parsons,et al.  The relationship between gender, receptive vocabulary, and literacy from school entry through to adulthood , 2013, International journal of speech-language pathology.

[20]  R. O’Kearney,et al.  Emotional and behavioural outcomes later in childhood and adolescence for children with specific language impairments: meta-analyses of controlled prospective studies. , 2013, Journal of child psychology and psychiatry, and allied disciplines.

[21]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[22]  A N Bhat,et al.  Infant Behavior and Development Relation between early motor delay and later communication delay in infants at risk for autism , 2012 .

[23]  Charles DiMaggio,et al.  Long-term Differences in Language and Cognitive Function After Childhood Exposure to Anesthesia , 2012, Pediatrics.

[24]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[25]  C. Anandan,et al.  Predicting Language Change Between 3 and 5 Years and Its Implications for Early Identification , 2012, Pediatrics.

[26]  J. Carlin,et al.  Profiles of language development in pre-school children: a longitudinal latent class analysis of data from the Early Language in Victoria Study. , 2012, Child: care, health and development.

[27]  Francisco Herrera,et al.  Special issue on "New Trends in Data Mining" NTDM , 2012, Knowl. Based Syst..

[28]  R. Paul,et al.  Characterizing and predicting outcomes of communication delays in infants and toddlers: implications for clinical practice. , 2011, Language, speech, and hearing services in schools.

[29]  A. Hofman,et al.  Examining continuity of early expressive vocabulary development: the generation R study. , 2011, Journal of speech, language, and hearing research : JSLHR.

[30]  M. Prior,et al.  Predicting Language Outcomes at 4 Years of Age: Findings From Early Language in Victoria Study , 2010, Pediatrics.

[31]  Veera Boonjing,et al.  Comparing performances of logistic regression, decision trees, and neural networks for classifying heart disease patients , 2010, 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM).

[32]  S. Mcleod,et al.  Risk and protective factors associated with speech and language impairment in a nationally representative sample of 4- to 5-year-old children. , 2010, Journal of speech, language, and hearing research : JSLHR.

[33]  S. Parsons,et al.  Modeling developmental language difficulties from school entry into adulthood: literacy, mental health, and employment outcomes. , 2009, Journal of speech, language, and hearing research : JSLHR.

[34]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[35]  S. Mcleod,et al.  Epidemiology of speech and language impairment in a nationally representative sample of 4- to 5-year-old children. , 2009, Journal of speech, language, and hearing research : JSLHR.

[36]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[37]  Leslie Rescorla,et al.  Age 17 language and reading outcomes in late-talking toddlers: support for a dimensional perspective on language delay. , 2009, Journal of speech, language, and hearing research : JSLHR.

[38]  M. Prior,et al.  The Early Language in Victoria Study (ELVS): A prospective, longitudinal study of communication skills and expressive vocabulary development at 8, 12 and 24 months , 2009 .

[39]  Leigh M. Smith,et al.  The role of early fine and gross motor development on later motor and cognitive ability. , 2008, Human movement science.

[40]  S. Zubrick,et al.  Language outcomes of 7-year-old children with or without a history of late language emergence at 24 months. , 2008, Journal of speech, language, and hearing research : JSLHR.

[41]  David W. Slegers,et al.  Late language emergence at 24 months: an epidemiological study of prevalence, predictors, and covariates. , 2007, Journal of speech, language, and hearing research : JSLHR.

[42]  Craig Newschaffer,et al.  Predictors of Language Acquisition in Preschool Children with Autism Spectrum Disorders , 2007, Journal of autism and developmental disorders.

[43]  Olatz Arbelaitz,et al.  Combining multiple class distribution modified subsamples in a single tree , 2007, Pattern Recognit. Lett..

[44]  M. Prior,et al.  Growth of infant communication between 8 and 12 months: A population study , 2006, Journal of paediatrics and child health.

[45]  P. Magnus,et al.  Cohort profile: the Norwegian Mother and Child Cohort Study (MoBa). , 2006, International journal of epidemiology.

[46]  Ricardo Santiago-Mozos,et al.  Using data mining to explore complex clinical decisions: A study of hospitalization after a suicide attempt. , 2006, The Journal of clinical psychiatry.

[47]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[48]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49]  H. Rockette,et al.  Concurrent and predictive validity of parent reports of child language at ages 2 and 3 years. , 2005, Child development.

[50]  M. Rowe,et al.  Measuring productive vocabulary of toddlers in low-income families: concurrent and predictive validity of three sources of data , 2004, Journal of Child Language.

[51]  Margaret J. Briggs-Gowan,et al.  Language delay in a community cohort of young children. , 2003, Journal of the American Academy of Child and Adolescent Psychiatry.

[52]  Robert Plomin,et al.  Outcomes of early language delay: I. Predicting persistent and transient language difficulties at 3 and 4 years. , 2003, Journal of speech, language, and hearing research : JSLHR.

[53]  J Bruce Tomblin,et al.  A longitudinal investigation of reading outcomes in children with language impairments. , 2002, Journal of speech, language, and hearing research : JSLHR.

[54]  A. Carter,et al.  The social-emotional development of "late-talking" toddlers. , 2002, Journal of the American Academy of Child and Adolescent Psychiatry.

[55]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[56]  E. Hill,et al.  Non-specific nature of specific language impairment: a review of the literature with regard to concomitant motor impairments. , 2001, International journal of language & communication disorders.

[57]  J. Law,et al.  Prevalence and natural history of primary speech and language delay: findings from a systematic review of the literature. , 2000, International journal of language & communication disorders.

[58]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[59]  T. Gallagher Interrelationships among Children's Language, Behavior, and Emotional Problems , 1999 .

[60]  J. Tomblin,et al.  Prevalence of specific language impairment in kindergarten children. , 1997, Journal of speech, language, and hearing research : JSLHR.

[61]  J. Squires,et al.  Revision of a parent-completed development screening tool: Ages and Stages Questionnaires. , 1997, Journal of pediatric psychology.

[62]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[63]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[64]  Philip Archer,et al.  The Denver II: a major revision and restandardization of the Denver Developmental Screening Test. , 1992, Pediatrics.

[65]  Jane Squires,et al.  The Effectiveness of Parental Screening of At-Risk Infants , 1989 .

[66]  J. Squires,et al.  The validity, reliability, and cost of a parent-completed questionnaire system to evaluate at-risk infants. , 1988, Journal of pediatric psychology.

[67]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[68]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[69]  L. Aarø,et al.  Co-occurring development of early childhood communication and motor skills: results from a population-based longitudinal study. , 2014, Child: care, health and development.

[70]  Neeraj Bhargava,et al.  Decision Tree Analysis on J48 Algorithm for Data Mining , 2013 .

[71]  J. Hilbe Logistic Regression Models , 2009 .

[72]  Xiaohua Hu,et al.  A Data Mining Approach for Retailing Bank Customer Attrition Analysis , 2004, Applied Intelligence.