Genotype Combinations Linked to Phenotype Subgroups in Autism Spectrum Disorders

This paper investigates a computational model that allows for systematic comparison of phenotype data with genotype (Single Nucleotide Polymorphisms (SNPs)) data based on machine learning techniques to identify discriminant genotype markers associated with the phenotypic subgroups. The proposed discriminant SNP identifier model is empirically evaluated using Autism Spectrum Disorder (ASD) simplex sample. Six phenotype markers were selected to cluster the sample in a hexagonal lattice format yielding five multidimensional subgroups based on extremities of the phenotype markers. The SNP selection model includes random subspace selection of SNPs in conjunction with feature selection algorithms to determine which set of SNPs were discriminant among these five subgroups. This yielded a set of SNPs that attained a mean ROC performance of 95% using a Support Vector Machine prediction model. Biological analysis of these SNPs and associated genes across the subgroups is presented to examine their potential clinical significance.

[1]  Chi-Ren Shyu,et al.  Heritable genotype contrast mining reveals novel gene associations specific to autism subgroups , 2018, J. Biomed. Informatics.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Suchi Saria,et al.  A $3 Trillion Challenge to Computational Scientists: Transforming Healthcare Delivery , 2014, IEEE Intelligent Systems.

[4]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[5]  Donald A. Adjeroh,et al.  What can one chromosome tell us about human biogeographical ancestry? , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  C. Lord,et al.  Austism diagnostic observation schedule: A standardized observation of communicative and social behavior , 1989, Journal of autism and developmental disorders.

[7]  H. Aiandhealt Subtyping : What It Is and Its Role in Precision Medicine , 2015 .

[8]  M. Pericak-Vance,et al.  Genetically meaningful phenotypic subgroups in autism spectrum disorders , 2014, Genes, brain, and behavior.

[9]  D. Geschwind,et al.  Disentangling the heterogeneity of autism spectrum disorder through genetic findings , 2014, Nature Reviews Neurology.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Elizabeth A. Laugeson,et al.  Transition to Adulthood for High-Functioning Individuals with Autism Spectrum Disorders , 2011 .

[12]  Donald C. Wunsch,et al.  Sorting the phenotypic heterogeneity of autism spectrum disorders: A hierarchical clustering model , 2015, 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Mahmut Ozer,et al.  EEG signals classification using the K-means clustering and a multilayer perceptron neural network model , 2011, Expert Syst. Appl..

[15]  R. Gibbs,et al.  Oligogenic heterozygosity in individuals with high-functioning autism spectrum disorders , 2011, Human molecular genetics.

[16]  J. Miles Complex Autism Spectrum Disorders and Cutting-Edge Molecular Diagnostic Tests. , 2015, JAMA.

[17]  Gayla R. Olbricht,et al.  Ensemble statistical and subspace clustering model for analysis of autism spectrum disorder phenotypes , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[18]  Michael H. Boyle,et al.  Importance of studying heterogeneity in autism , 2013 .

[19]  M. Spence,et al.  Development and validation of a measure of dysmorphology: Useful for autism subgroup classification , 2008, American journal of medical genetics. Part A.

[20]  James C McPartland,et al.  Moving beyond a categorical diagnosis of autism , 2016, The Lancet Neurology.

[21]  Sharmila Banerjee-Basu,et al.  SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs) , 2013, Molecular Autism.

[22]  Tsviya Olender,et al.  GeneCardsTM 2002: towards a complete, object-oriented, human gene compendium , 2002, Bioinform..

[23]  Judith H. Miles,et al.  Autism Subgroups from a Medical Genetics Perspective , 2011 .

[24]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[25]  P. Sahota,et al.  Essential versus complex autism: Definition of fundamental prognostic subtypes , 2005, American journal of medical genetics. Part A.

[26]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[27]  A. Couteur,et al.  Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders , 1994, Journal of autism and developmental disorders.

[28]  Lei Xu,et al.  Best first strategy for feature selection , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[29]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[30]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[31]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.