Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learning

Machine learning is a discipline of artificial intelligence, geared towards the development of various critical applications. Due to its high precision, it is widely adopted in the process of extracting useful hidden patterns and valuable insights from complex data structures. Data extracted from the real-time environment might contain some irrelevant information. The presence of noise in the data degrades the model performance. Gene expression is an important source, carries the genetic information of species. Gene expression pattern reveals the significant relationship between genes associated with several diseases. But due to irregular molecular interactions and reactions occurs during the transcription process, the gene expressions are minimally affected. It causes a detrimental effect on the identification of biological markers of the diseases. To address this problem, a novel gene selection strategy is proposed to identify the candidate gene biomarkers from the genomic data. Signal to Noise ratio with logistic sigmoid function, Hilbert–Schmidt Independence Criterion Lasso, and regularized genetic algorithm amalgamation finds the optimal features. The proposed system is tested with the microarray gene expression dataset of autism spectrum disorder (ASD), accessed from gene expression omnibus repository. FAM104B, CCNDBP1, H1F0, ZER1 are identified as the candidate biomarkers of ASD. The methodical performance evaluation of the proposed model is examined with widely used machine learning algorithms. The proposed methodology enhanced the prediction rate of ASD and attained an accuracy of 97.62%, outperformed existing methods. Also, this system could act as a significant tool to assist the medical practitioners for accurate ASD diagnosis.

[1]  Janet E. Lainhart,et al.  Comorbid Psychiatric Disorders in Children with Autism: Interview Development and Rates of Disorders , 2006, Journal of autism and developmental disorders.

[2]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[3]  Hala Alshamlan,et al.  mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling , 2015, BioMed research international.

[4]  Alagan Anpalagan,et al.  Evolutionary algorithms for wireless network resource allocation , 2015 .

[5]  H. Hannah Inbarani,et al.  Fuzzy Soft Set Based Classification for Gene Expression Data , 2013, ArXiv.

[6]  Lalit M. Patnaik,et al.  Adaptive probabilities of crossover and mutation in genetic algorithms , 1994, IEEE Trans. Syst. Man Cybern..

[7]  Rohayanti Hassan,et al.  Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm , 2017, PloS one.

[8]  M. Sudha,et al.  Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network , 2020, Evol. Intell..

[9]  V. Nandagopal,et al.  Feasible analysis of gene expression –a computational based classification for breast cancer , 2019, Measurement.

[10]  Murat Gök,et al.  A novel machine learning model to predict autism spectrum disorders risk gene , 2019, Neural Computing and Applications.

[11]  F. Collins,et al.  The Human Genome Project: Lessons from Large-Scale Biology , 2003, Science.

[12]  Yoko Kamio,et al.  Autism-Associated Gene Expression in Peripheral Leucocytes Commonly Observed between Subjects with Autism and Healthy Women Having Autistic Children , 2011, PloS one.

[13]  Kandhasamy Premalatha,et al.  Cuckoo search optimisation for feature selection in cancer classification: a new approach , 2015, Int. J. Data Min. Bioinform..

[14]  Hamid Beigy,et al.  Gene Expression Based Classification using Iterative Transductive Support Vector Machine , 2012 .

[15]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[16]  Eleftherios Pilalis,et al.  Studying Microarray Gene Expression Data of Schizophrenic Patients for Derivation of a Diagnostic Signature through the Aid of Machine Learning , 2016 .

[17]  D. Wall,et al.  Use of machine learning for behavioral distinction of autism and ADHD , 2016, Translational Psychiatry.

[18]  C. Lajonchere,et al.  Genetic heritability and shared environmental factors among twin pairs with autism. , 2011, Archives of general psychiatry.

[19]  S. Ramakrishnan,et al.  Attribute selection using fuzzy roughset based customized similarity measure for lung cancer microarray gene expression data , 2018, Future Computing and Informatics Journal.

[20]  Mitja Lustrek,et al.  Tissue-based Alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets , 2012, BMC Bioinformatics.

[21]  Wei-Chung Cheng,et al.  Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm , 2014, BMC Bioinformatics.

[22]  Makoto Yamada,et al.  Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data , 2019, Bioinform..

[23]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[24]  Masashi Sugiyama,et al.  High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso , 2012, Neural Computation.

[25]  Youping Deng,et al.  Gene selection and classification for cancer microarray data based on machine learning and similarity measures , 2011, BMC Genomics.

[26]  Feng Wei,et al.  An Ensemble Model for Diabetes Diagnosis in Large-scale and Imbalanced Dataset , 2017, Conf. Computing Frontiers.

[27]  M. Sudha,et al.  Evolutionary and Neural Computing Based Decision Support System for Disease Diagnosis from Clinical Data Sets in Medical Practice , 2017, Journal of Medical Systems.

[28]  M. Sudha,et al.  Predicting drug responsiveness with deep learning from the effects on gene expression of Obsessive-Compulsive Disorder affected cases , 2020, Comput. Commun..

[29]  Hamid Alavi Majd,et al.  Prediction of the Thromboembolic Syndrome: an Application of Artificial Neural Networks in Gene Expression Data Analysis , 2016 .

[30]  Dong Hoon Oh,et al.  Predicting Autism Spectrum Disorder Using Blood-based Gene Expression Signatures and Machine Learning , 2017, European Neuropsychopharmacology.

[31]  C. Devi Arockia Vanitha,et al.  Gene Expression Data Classification Using Support Vector Machine and Mutual Information-based Gene Selection☆ , 2015 .

[32]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[33]  Kannan Arputharaj,et al.  A Discrete Wavelet Based Feature Extraction and Hybrid Classification Technique for Microarray Data Analysis , 2014, TheScientificWorldJournal.

[34]  Ali Najafi,et al.  A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata , 2017 .

[35]  Erik Linstead,et al.  Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning , 2019, Int. J. Medical Informatics.

[36]  P. V. S. S. R. Chandra Mouli,et al.  Breast Cancer Classification Using Deep Neural Networks , 2018 .

[37]  H. Faras,et al.  Autism spectrum disorders , 2010, Annals of Saudi medicine.

[38]  M. Ebrahimi,et al.  Identification of Genes Involved in the Early Stages of Alzheimer Disease Using a Neural Network Algorithm , 2016 .

[39]  Thomas E. Yankeelov,et al.  Precision Medicine with Imprecise Therapy: Computational Modeling for Chemotherapy in Breast Cancer , 2018, Translational oncology.

[40]  A. E. Eiben,et al.  On Evolutionary Exploration and Exploitation , 1998, Fundam. Informaticae.

[41]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[42]  Mohammad Hossein Moattar,et al.  A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. , 2016, Genomics.

[43]  K. J. Parker,et al.  Biomarker discovery for disease status and symptom severity in children with autism , 2018, Psychoneuroendocrinology.

[44]  Walter Kolch,et al.  Personalized Computational Models as Biomarkers , 2017, Journal of personalized medicine.

[45]  Fei Wang,et al.  Tikhonov or Lasso Regularization: Which Is Better and When , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[46]  Nirmal Kumar,et al.  A hybrid approach for gene selection and classification using support vector machine , 2015, Int. Arab J. Inf. Technol..

[47]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..