Ensemble statistical and subspace clustering model for analysis of autism spectrum disorder phenotypes

Heterogeneity in Autism Spectrum Disorder (ASD) is complex including variability in behavioral phenotype as well as clinical, physiologic, and pathologic parameters. The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) now diagnoses ASD using a 2-dimensional model based social communication deficits and fixated interests and repetitive behaviors. Sorting out heterogeneity is crucial for study of etiology, diagnosis, treatment and prognosis. In this paper, we present an ensemble model for analyzing ASD phenotypes using several machine learning techniques and a k-dimensional subspace clustering algorithm. Our ensemble also incorporates statistical methods at several stages of analysis. We apply this model to a sample of 208 probands drawn from the Simon Simplex Collection Missouri Site patients. The results provide useful evidence that is helpful in elucidating the phenotype complexity within ASD. Our model can be extended to other disorders that exhibit a diverse range of heterogeneity.

[1]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Catherine Lord,et al.  Standardizing ADOS Domain Scores: Separating Severity of Social Affect and Restricted and Repetitive Behaviors , 2014, Journal of autism and developmental disorders.

[3]  Judith H Miles,et al.  Defining Autism Subgroups: A Taxometric Solution , 2008, Journal of autism and developmental disorders.

[4]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Pat Mirenda,et al.  Investigating phenotypic heterogeneity in children with autism spectrum disorder: a factor mixture modeling approach. , 2012, Journal of child psychology and psychiatry, and allied disciplines.

[6]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[7]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[8]  D. Geschwind,et al.  Advances in autism genetics: on the threshold of a new neurobiology , 2008, Nature Reviews Genetics.

[9]  M. Stevens,et al.  Subgroups of children with autism by cluster analysis: a longitudinal examination. , 2000, Journal of the American Academy of Child and Adolescent Psychiatry.

[10]  Michael H. Boyle,et al.  Importance of studying heterogeneity in autism , 2013 .

[11]  Donald C. Wunsch,et al.  Sorting the phenotypic heterogeneity of autism spectrum disorders: A hierarchical clustering model , 2015, 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[12]  C. Lord,et al.  Autism Diagnostic Observation Schedule , 2016 .

[13]  J. Miles Autism spectrum disorders—A genetics review , 2011, Genetics in Medicine.

[14]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[15]  Edwin H Cook,et al.  Autism as a paradigmatic complex genetic disorder. , 2004, Annual review of genomics and human genetics.

[16]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[17]  A. Couteur,et al.  Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders , 1994, Journal of autism and developmental disorders.